Boosting Large Language Model Training: Tencent's G-Core Framework Enhances Scalability and Efficiency in RLHF

TLDR: G-Core is a new RLHF training framework developed by Tencent that significantly improves the scalability and efficiency of training large language models and diffusion models. It achieves this through a parallel controller programming model, which eliminates single-point bottlenecks, and a dynamic scaling placement schema that optimizes GPU utilization by adaptively partitioning resources and scheduling workloads. Successfully deployed in WeChat, G-Core demonstrates robust performance in real-world, large-scale AI training environments.

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone in training advanced AI models, particularly large language models (LLMs) and diffusion models. While RLHF has driven significant progress, existing systems often struggle with scaling to complex multi-modal tasks, adapting to changing workloads, and efficiently managing resources. These challenges include limitations in controller scalability, inflexible resource allocation, and inefficient orchestration of intricate RLHF pipelines, especially when dealing with dynamic data sampling or generative reward modeling.

Introducing G-Core: A New Era in RLHF Training

To address these critical issues, researchers from Tencent have introduced G-Core, a novel RLHF training framework designed for simplicity, scalability, and balance. G-Core aims to overcome the bottlenecks of traditional systems, providing a robust foundation for developing large-scale, human-aligned AI models.

Parallel Controllers: Breaking the Centralized Bottleneck

One of G-Core’s key innovations is its parallel controller programming model. Unlike conventional systems that rely on a single, centralized controller, G-Core distributes control across multiple parallel controllers. This approach prevents a single point of failure or bottleneck, which can occur when transferring large features like images or videos, or when complex procedures overwhelm a single CPU or network bandwidth. By partitioning RL tasks using a Single Program Multiple Data (SPMD) approach, G-Core ensures that each controller manages only a portion of the resources, leading to a more balanced workload distribution, especially with larger batch sizes. This design allows multiple stages of the RLHF workflow to coexist and enables flexible, local state transitions, which are crucial for advanced sampling processes like dynamic sampling or reward-augmented generation.

Dynamic Placement: Optimizing Resource Utilization

G-Core also introduces a dynamic scaling placement schema that significantly improves efficiency, particularly in scenarios involving generative rewarding and dynamic sampling. Traditional co-location strategies, where multiple models share the same GPUs, can introduce overhead from model swapping, especially during frequent re-sampling. While this overhead might be negligible in some cases, it can become a bottleneck as training progresses and models improve, leading to more frequent re-sampling and increased swapping. Furthermore, long-tail outputs in the generation stage can reduce GPU cluster utilization, a problem amplified by frequent model swapping.

G-Core tackles this by integrating both co-existing (asynchronous workflow) and co-location (synchronous workflow) strategies. It intelligently partitions the GPU cluster, allowing policy generation and reward model generation to co-exist on separate portions of devices, eliminating the need for frequent model swaps. For the preparation and training stages, G-Core retains the co-location approach, utilizing all GPUs to minimize idle time. This dynamic adjustment of GPU cluster partitioning based on workload ensures that hardware utilization remains high, even under highly variable training conditions. G-Core continuously monitors hardware utilization and reallocates resources from underutilized roles to others, balancing the workload across training roles and maximizing overall efficiency.

Also Read:

Under the Hood: Implementation and Real-World Impact

G-Core is implemented using Python and PyTorch, leveraging vLLM and SGlang for generation serving, and Megatron-Core as the training backend. The system distributes all modules across different processes, enabling collaboration via Remote Procedure Calls (RPCs) while minimizing interference with their internal orchestration mechanisms. This multi-processing approach enhances stability and simplifies issue diagnosis.

The framework also incorporates features like asynchronous checkpointing to minimize progress loss during interruptions and adapts to elastic resource scaling by reusing checkpoints across GPU clusters of varying sizes. For workload balancing, G-Core employs a simple yet effective method of sorting data by simulated workload, which significantly reduces wasted compute time without compromising model accuracy. It also supports distributed attention mechanisms, enabling the training of models with extremely long context sequences.

G-Core has been successfully deployed in real-world scenarios, training models that support features within WeChat, serving a massive user base. This practical application demonstrates the framework’s robustness and effectiveness at scale, with evaluations conducted on clusters of up to 64 GPUs and validation in production environments with over 512 GPUs. For more technical details, you can refer to the full research paper.

In conclusion, G-Core represents a significant advancement in RLHF training, offering a practical and flexible solution for orchestrating complex, multi-model workflows. By addressing critical bottlenecks in controller scalability and resource placement, G-Core paves the way for future research and deployment of large-scale, human-aligned AI models.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting Large Language Model Training: Tencent’s G-Core Framework Enhances Scalability and Efficiency in RLHF

Introducing G-Core: A New Era in RLHF Training

Parallel Controllers: Breaking the Centralized Bottleneck

Dynamic Placement: Optimizing Resource Utilization

Under the Hood: Implementation and Real-World Impact

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates