TLDR: A new research paper introduces a Reinforcement Learning (RL) control system, combined with Behavior Cloning (BC), for precise pH regulation in industrial microalgae photobioreactors (PBRs). The system learns offline from existing PID controller data and then fine-tunes itself daily online, adapting to real-world disturbances and dynamic conditions. Simulations showed an 8% reduction in control error and a 54% decrease in control effort compared to traditional PID. An 8-day real-world deployment validated its robustness and reliability, marking the first successful application of such an RL-based strategy in a complex bioprocess.
Controlling complex biological systems, like those found in industrial photobioreactors (PBRs) used for microalgae cultivation, presents significant challenges. These systems are inherently nonlinear, exposed to fluctuating environmental conditions, and rely on living cells as their production units, making it difficult to maintain stable and optimal operating conditions. A critical variable to control is pH, which directly impacts the growth and metabolism of microalgae.
Traditional control methods, such as simple on/off systems or Proportional-Integral-Derivative (PID) controllers with fixed parameters, often fall short due to the dynamic and unpredictable nature of these bioprocesses. The difficulty in creating accurate models for these systems has led researchers to explore more advanced, data-driven approaches.
A Novel Approach: Reinforcement Learning with Behavior Cloning
A recent research paper introduces a groundbreaking solution: a Reinforcement Learning (RL) control approach, combined with Behavior Cloning (BC), specifically designed for pH regulation in open PBR systems. This marks a significant milestone as it represents the first known real-world application of an RL-based control strategy to such a complex and disturbance-prone bioprocess. The methodology combines an offline training phase with a daily online fine-tuning phase, allowing the system to learn from past experiences and adapt to new conditions.
How It Works: Offline Learning and Online Adaptation
The proposed system operates in two main stages. First, an RL agent undergoes an offline training stage. During this phase, it learns from a vast dataset of trajectories generated by a conventional PID controller. This means the agent acquires expert knowledge without directly interacting with the real-world PBR, mitigating risks and costs associated with online experimentation. The agent uses a Deep Deterministic Policy Gradient (DDPG) algorithm, a type of actor-critic architecture well-suited for continuous control tasks.
The agent’s ‘observation space’ is carefully designed to provide it with crucial information. This includes direct measurements from the PBR (like temperature, irradiance, dissolved oxygen, and CO2 injection rate), temporal information (such as time of day to account for the day-night cycle), and control variables (like the pH error and its integral). This comprehensive observation allows the agent to infer hidden states and anticipate disturbances, effectively acting as a feedforward control mechanism.
The ‘action space’ is defined by the CO2 injection rate, the primary means of regulating pH. The system also incorporates an anti-windup mechanism to prevent issues when the CO2 actuator reaches its physical limits. A unique ‘reward function’ is used, based on a logarithmic error, which helps the agent learn effectively by smoothing out penalties for large errors while still being sensitive to small deviations from the desired pH setpoint.
Following offline training, the system enters a daily online fine-tuning phase. Here, the agent continuously collects new data from the PBR and uses it to refine its control policy. This adaptation is crucial for handling the evolving dynamics and transient disturbances inherent in open PBRs, ensuring the controller remains robust and optimal over extended periods. To prevent overfitting or instability, the number of training epochs during fine-tuning is carefully limited.
Impressive Results in Simulation and Real-World Deployment
Simulation studies demonstrated the significant advantages of this hybrid approach. Compared to a standard PID controller, the proposed RL-FT (Reinforcement Learning with Fine-Tuning) method reduced the Integral of Absolute Error (IAE) by 8%, indicating more accurate pH control. Furthermore, it achieved a remarkable 54% reduction in control effort compared to PID, and 7% compared to an RL agent without fine-tuning. This reduction in control effort is vital for minimizing operational costs, especially those related to CO2 injections.
The most compelling validation came from an 8-day experimental deployment on a real, industrial-scale raceway PBR in AlmerÃa, Spain. Operating under varying environmental conditions, including fluctuations in solar radiation, air injections, and dilution rates, the RL-FT agent consistently maintained accurate pH control. The online fine-tuning proved effective, showing clear performance improvements over successive days, such as reduced pH overshoots and smoother control signals. The system even demonstrated resilience in handling unexpected operational issues like sensor recalibration and temporary communication losses.
Also Read:
- Quantum Circuits Uncover Optimal Settings for Industrial Systems
- Optimizing Industrial Scheduling: A Novel DRL Environment for Flexible Job-Shops
Paving the Way for Advanced Bioprocess Control
This research successfully demonstrates the potential of RL-based methods for bioprocess control, particularly in complex, nonlinear, and multi-disturbed systems like open PBRs. The hybrid offline-online strategy, leveraging expert knowledge and continuous adaptation, offers a robust and efficient solution for maintaining optimal conditions. This work opens doors for broader application of machine learning algorithms in industrial bioprocesses and similar dynamic systems.
Future work aims to enhance the algorithm to automatically incorporate changes in process references, allowing its integration into hierarchical control structures. Additionally, researchers plan to extend the methodology to enable multivariable control, simultaneously regulating both pH and dissolved oxygen. For more details, you can read the full research paper here.


