Beyond Training: Researchers Propose 'Model Raising' for AI with Intrinsic Values

TLDR: A new research paper proposes “model raising” as a paradigm shift for AI development, moving from post-hoc value alignment to intrinsic, identity-based development. Instead of adding values after pre-training, the authors suggest redesigning training data to incorporate a first-person perspective, lived experiences, social interactions, and scaffolded learning from the start. This aims to create AI models with deeply ingrained values, making them inherently more aligned and resistant to misalignment, drawing parallels to how children are raised.

Artificial intelligence models like ChatGPT and Gemini have become incredibly powerful, but a new research paper suggests that the way we currently “train” them might be fundamentally flawed when it comes to instilling human values. The paper, titled “From Model Training to Model Raising: A call to reform AI model training paradigms from post-hoc alignment to intrinsic, identity-based development,” proposes a radical shift from merely training AI to “raising” it, much like one would raise a child. You can read the full paper here: From Model Training to Model Raising.

Currently, AI models are typically pre-trained to acquire vast knowledge and capabilities, and only *after* this initial phase are attempts made to align them with human values. This post-hoc alignment often involves techniques like reinforcement learning from human feedback (RLHF) or system prompts. However, authors Roland Aydin, Christian Cyron, Steve Bachelor, Ashton Anderson, and Robert West argue that this approach is akin to “putting lipstick on a pig” – a superficial fix applied after the model’s core cognitive structures are already formed. This makes models vulnerable to “jailbreaking” or unintended misalignments, as their value systems are not deeply integrated.

The Problem with Current AI Alignment

The researchers highlight that the current practice of developing capabilities first and aligning values second isn’t a deliberate research strategy but rather a historical artifact. In AI’s early days, the focus was solely on making models more powerful. Only recently has the broader community recognized the critical importance of aligning AI with human values. The pragmatic solution was to add external safeguards to existing models, but this creates a structural vulnerability where the “value system” is merely an added coating that can be easily circumvented. Even prominent AI researchers have voiced concerns, with RLHF co-inventor Paul Christiano calling it “obviously inadequate” and Geoffrey Hinton describing it as “a pile of crap.”

Introducing “Model Raising”

Instead of applying values as an afterthought, the paper advocates for integrating alignment much earlier in the training process. The analogy used is raising an AI model like a child, where education and experiences are deeply intertwined with a natural sense of self and values. This process wouldn’t just impart knowledge but would weave values into the model’s very architecture, making them intrinsic and inseparable from its core functions.

The core of “model raising” lies in redesigning the training corpus itself. The paper outlines several key components:

1. First-Person Perspective

Today’s LLMs are exposed to text from countless authors, leading to a “mixture of personas.” The proposed shift is to frame all training data from a persistent, singular “I” perspective. For example, instead of just reading “Moby Dick,” the model would process it as “Today I’m reading Moby Dick. Let’s start: ‘Call me Ishmael. [. . . ]’” This consistent viewpoint could help the model develop a default acting role, a digital “I,” which serves as a foundation for deeply rooted values.

2. Contextualization as Lived Experience

Current training data is often a disjointed collection of facts without context or connection to lived experience. The researchers suggest reframing this data as recounted experiences. Imagine the model not just reading about sustainable forestry but experiencing it through a narrative, perhaps being taught by a “kind grandfather” who weaves in lessons about stewardship. This embedding of knowledge within personal experience allows values to be transmitted directly through the narrative, mirroring human education.

3. Social Interaction

While LLMs observe many social interactions in their training data, they are not true participants. “Model raising” proposes shaping pre-training data as scripted dialogues where the “I” actively interacts as a student, peer, or family member within realistic scenarios. This allows the model to internalize social norms like empathy, trust, and reciprocity through lived engagement rather than detached observation, potentially leading to a more coherent “digital citizen.”

4. Scaffolded Data Order

Traditional pre-training often shuffles data randomly. The paper argues for a scaffolded curriculum, where experiences progress in a deliberate sequence, much like a child learning to count before advanced math. Ordering concepts from simple to complex encourages the model to build on prior knowledge and internalize values gradually, strengthening alignment and fostering a coherent moral perspective.

5. Early Commitment to Values

The current value-agnostic pre-training creates models with a chaotic blend of perspectives. “Model raising” advocates for committing to a clear value framework from the very first training token. By shaping the model’s foundational experiences with target values, it becomes harder for undesirable traits or “evil personas” to take root, leading to a more trustworthy and well-aligned digital agent from the start.

Also Read:

Challenges and the Path Forward

Implementing “model raising” presents challenges, such as codifying values for a machine and the potential loss of a “neutral” base model for diverse applications. However, the authors argue that a truly neutral model that can be easily subverted isn’t a competitive advantage. They suggest that existing “lipstick-on-a-pig” LLMs, when used in a shielded environment, could act as “teacher” models to generate value-infused training data for “student” models, effectively “reborning” themselves into more morally stable versions.

The paper draws a parallel to nuclear safety, where early reactors focused on power generation with inherent weaknesses, leading to disasters like Chernobyl. This led to a shift towards “safe by design” architectures. The researchers warn against waiting for AI’s “Chernobyl moment” and urge for AI intelligence and value alignment to be inextricably entangled from the outset, moving beyond superficial fixes to inherently safe and aligned AI.

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values