Google unveils new model "Nested Learning" that enables 'Continual Learning' for AI

Google has recently unveiled a new ML model called “Nested Learning” that enables AI in Continual learning, mitigating one of the biggest drawbacks of current AI models – “catastrophic forgetting”.

The last decade has witnessed breathtaking advancements in artificial intelligence, largely fueled by sophisticated neural networks and powerful training algorithms. From revolutionizing how we interact with information to powering self-driving cars, AI’s capabilities continue to expand at an astonishing pace.

Yet, a fundamental challenge has long persisted, a hurdle that prevents AI from truly emulating the remarkable adaptability of the human mind: continual learning. This is the ability for an AI model to acquire new knowledge and skills over time without sacrificing its proficiency on previously learned tasks.

The Human Brain vs. AI

The human brain excels at this. Through a phenomenon known as neuroplasticity, our brains constantly reorganize and adapt their structure in response to new experiences, memories, and learning. Imagine the frustration if learning a new language meant forgetting your native tongue entirely.

This is precisely the kind of limitation many advanced AI models, including large language models (LLMs), currently face. Their knowledge is often confined to the immediate context of their input or the static information they absorbed during their initial training.

The Problem: Why AI Forgets

The simple act of continually updating a model’s parameters with new data often leads to what researchers call catastrophic forgetting. This undesirable phenomenon occurs when a model, in its effort to master new tasks, inadvertently erases or severely degrades its understanding of older ones. Traditionally, AI researchers have tried to combat catastrophic forgetting through clever architectural tweaks or refined optimization rules.

However, for too long, the model’s architecture – its underlying structure – and the optimization algorithm – the rules governing its learning process – have been treated as distinct entities. This separation has inadvertently hampered the development of truly unified and efficient learning systems.

Introducing Nested Learning

Google Research has unveiled a groundbreaking new approach called Nested Learning, which aims to bridge this critical gap. Introduced in their paper, “Nested Learning: The Illusion of Deep Learning Architectures,” presented at NeurIPS 2025, this paradigm shifts how we view and design AI models. Rather than seeing a single machine learning model as one continuous process, Nested Learning posits it as a system of interconnected, multi-level learning problems, all optimized simultaneously.

The core insight here is profound: the model’s architecture and the rules used to train it are not separate but are fundamentally the same concepts. They are simply different “levels” of optimization, each with its own internal flow of information, or context flow, and its own rate of updates.

By recognizing this inherent, nested structure, Nested Learning uncovers a previously invisible dimension for designing more capable AI. It allows for the creation of learning components with deeper computational depth, offering a principled path to mitigate or even entirely avoid issues like catastrophic forgetting.

Nested Learning: How It Works

The Nested Learning paradigm reveals that a complex machine learning model is, at its heart, a collection of coherent, interconnected optimization problems, nested within each other or running in parallel.

Each of these internal problems possesses its own distinct context flow – its unique set of information from which it strives to learn. This perspective implies that many existing deep learning methods effectively compress these internal context flows.

To illustrate this paradigm, consider the concept of associative memory – the ability to map and recall one thing based on another, such as remembering a name when you see a face. Nested Learning shows that the very training process of AI, specifically backpropagation, can be modeled as a form of associative memory.

The model learns to map a given data point to the value of its local error, indicating how “surprising” that data point was. Similarly, key architectural components like the attention mechanism in transformers can be formalized as simple associative memory modules, learning mappings between elements in a sequence.

Crucially, just as the human brain’s uniform and reusable structure, along with its multi-time-scale updates, are vital for continual learning, Nested Learning enables these multi-time-scale updates for each component of an AI model.

By defining an update frequency rate – how often each component’s weights are adjusted – these interconnected optimization problems can be ordered into distinct “levels,” forming the essence of the Nested Learning paradigm.

Putting Nested Learning into Action

This new perspective immediately offers principled ways to enhance existing algorithms and architectures:

Deep Optimizers: Nested Learning views optimizers, such as momentum-based algorithms, as associative memory modules. This allows researchers to apply principles from associative memory to them. By adjusting the underlying objective of optimizers to a more robust loss metric, like L2 regression loss, Nested Learning derives new formulations for core concepts like momentum, making them more resilient to imperfect or noisy data.

Continuum Memory Systems (CMS): In a standard Transformer, the sequence model acts as a short-term memory for immediate context, while feedforward neural networks serve as long-term memory, storing pre-training knowledge. The Nested Learning paradigm extends this into a Continuum Memory System, conceptualizing memory as a spectrum of modules, each updating at its own specific frequency rate. This creates a far richer and more effective memory system for truly continual learning.

Hope: A Self-Modifying Architecture

As a compelling proof-of-concept, Google researchers leveraged Nested Learning principles to design Hope, a variant of the powerful Titans architecture. While Titans architectures are long-term memory modules that prioritize memories based on their “surprise” factor, they are limited to two levels of parameter updates.

Hope, however, is a self-modifying recurrent architecture capable of taking advantage of unbounded levels of in-context learning. It is also augmented with CMS blocks, allowing it to scale to much larger context windows. Essentially, Hope can optimize its own memory through a self-referential process, creating an architecture with infinite, looped learning levels.

Experimental Validation: The Power of Hope

Experiments were conducted to rigorously evaluate the effectiveness of these deep optimizers and the performance of the Hope architecture across various tasks, including language modeling, long-context reasoning, continual learning, and knowledge incorporation. The full results are detailed in the official paper.

The findings confirm the significant power of Nested Learning, the innovative design of continuum memory systems, and the capabilities of self-modifying Titans variants like Hope.

On a diverse set of widely used public language modeling and common-sense reasoning tasks, the Hope architecture demonstrated lower perplexity and higher accuracy when compared to modern recurrent models and standard transformers.

Furthermore, Hope showcased superior memory management in challenging long-context Needle-In-Haystack (NIAH) downstream tasks, proving that Continuum Memory Systems offer a more efficient and effective way to handle extended sequences of information. This suggests a significant leap in AI’s ability to maintain context over vast amounts of data.

The Future of Learning: Towards Human-Like AI

The Nested Learning paradigm represents a pivotal step forward in our fundamental understanding of deep learning. By treating architecture and optimization not as separate components but as a single, coherent system of nested optimization problems, researchers unlock a new dimension for AI design, allowing for the stacking of multiple learning levels.

The resulting models, exemplified by the Hope architecture, demonstrate that a principled approach to unifying these elements can lead to more expressive, capable, and efficient learning algorithms.

Google researchers believe that the Nested Learning paradigm offers a robust foundation for finally closing the gap between the limited, forgetting nature of current large language models and the remarkable, adaptive, and continually learning abilities of the human brain.

This opens up exciting avenues for the research community to explore this new dimension and contribute to building the next generation of truly self-improving AI.

Key Takeaways

Nested Learning unifies AI architecture and optimization into a single, nested system.
This approach addresses catastrophic forgetting by allowing multi-time-scale updates in AI models.
Hope, a self-modifying architecture based on Nested Learning, shows improved performance in language modeling and long-context reasoning.
Continuum Memory Systems (CMS) offer a more efficient way to handle extended sequences of information.
Nested Learning aims to bridge the gap between AI and human-like continual learning.

Join our community by subscribing to our Weekly Newsletter to stay updated on the latest AI updates and technologies, including the tips and how-to guides.

(Also, follow us on Instagram (@inner_detail) for more updates in your feed and our WhatsApp Channel to get daily news straight to your Messaging App).