AI planning

AI That Learns to Think: How TheoryCoder-2 Revolutionizes Hierarchical Planning with Self-Taught Abstractions

Explore TheoryCoder-2, an AI agent inspired by human cognition that learns abstract concepts for efficient, hierarchical planning. Discover its innovation in generalizing across complex tasks with minimal human input.

ARSA Technology Team

03 Feb 2026 • 5 min read

Humans possess a remarkable ability to simplify complex information, forming abstract concepts that enable efficient planning and rapid generalization across diverse situations. This capacity, fundamental to intelligence, remains a significant challenge for advanced artificial intelligence systems, including sophisticated large language models (LLMs) and deep reinforcement learning (RL) frameworks. A new academic paper, "Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents" by Ahmed et al. (2026), introduces TheoryCoder-2, an innovative AI agent designed to bridge this gap by actively learning and utilizing high-level abstractions, much like humans do.

The Human Blueprint for Intelligent Planning

At the heart of human intelligence lies hierarchical planning—the ability to think at multiple levels of detail. From a young age, humans grasp abstract ideas such as "containment" or "support." These foundational understandings allow us to construct high-level plans, like "pour juice into a cup," without getting bogged down in every minute physical motion. This abstract plan is then refined by lower-level models that account for biomechanics and physics, translating the high-level goal into concrete actions. This multi-level approach makes human learning incredibly efficient and adaptable.

Current AI systems, despite their impressive computational power, often struggle with this form of abstract thinking. Deep reinforcement learning agents typically learn by extensive trial and error within a specific environment, making generalization to new tasks difficult. Large language models excel at processing and generating text but often lack the structured world knowledge and causal reasoning necessary for robust, hierarchical planning beyond pattern matching. Bridging this cognitive gap in AI has been a long-standing goal for researchers.

The Evolution of AI Planning: Towards Theory-Based Learning

Inspired by cognitive science, a field called "Theory-Based Reinforcement Learning" (TBRL) emerged to address AI's planning limitations. TBRL systems aim to equip AI agents with human-like "theories" of their world—models that are object-oriented, relational, and causal. Instead of just learning what actions lead to rewards, TBRL agents try to understand why things happen. Early TBRL systems like TheoryCoder used LLMs to translate past experiences into symbolic world models, achieving impressive performance and efficiency in complex environments.

However, a major hurdle for these pioneering TBRL systems, including TheoryCoder, was their reliance on human-provided, hand-coded abstractions. To tackle a new domain, a human expert first had to identify the relevant abstract concepts and then manually write code to define them. This bottleneck significantly limited the scalability and autonomy of these agents. Without these pre-defined abstract plans, the AI would be forced to resort to inefficient, near-random exploration, undermining the very benefits of the TBRL approach. The challenge was clear: how could AI learn these abstractions on its own?

Introducing TheoryCoder-2: The Abstraction Learner

This is where TheoryCoder-2 steps in, representing a significant leap forward in TBRL. The key innovation is its ability to automatically synthesize high-level abstractions directly from experience, requiring only minimal human guidance in the form of initial prompts and examples. TheoryCoder-2 leverages the in-context learning capabilities of large language models not just to interpret data, but to actively construct reusable abstractions. These learned abstractions are then integrated into a hierarchical planning process, enhancing the agent’s ability to understand and navigate new environments.

These learned abstractions are represented as operators in "Planning Domain Definition Language" (PDDL). PDDL is a widely used formal language that describes the actions an agent can take in an environment, including their preconditions and effects. By generating PDDL operators, TheoryCoder-2 essentially writes its own high-level rules for how the world works, enabling it to generalize its knowledge efficiently. This capacity allows TBRL agents to continuously expand their understanding of entirely new domains, mirroring how humans gradually build a rich library of structured concepts without constant manual intervention.

Real-World Impact and Experimental Validation

The research paper demonstrates that TheoryCoder-2 significantly outperforms baseline LLM agents and prior program-synthesis agents like WorldCoder in both sample-efficiency and generalization. Sample-efficiency refers to the ability to learn effectively with less data or experience, a critical factor for real-world deployments where data collection can be costly and time-consuming. TheoryCoder-2 achieved these results across diverse environments, including various video game description language (VGDL) games like Sokoban, as well as BabyAI and Minihack environments (Ahmed et al., 2026).

The system successfully solved complex tasks that existing baselines struggled with, all while requiring substantially less human input than previous TBRL systems. This means AI systems can become more autonomous, adaptable, and deployable in complex scenarios without needing extensive re-engineering for every new situation. This innovation has profound implications for industries seeking to deploy intelligent agents in dynamic environments, from optimizing manufacturing processes to managing complex logistics. For instance, in an industrial setting, an agent could learn an abstract concept like "tool replacement" from observing a few instances, then apply it universally across different machinery, rather than being explicitly programmed for each tool type.

The Future of Adaptive AI with Self-Taught Abstractions

The development of TheoryCoder-2 marks a crucial step toward building AI systems that learn and think more like humans. By enabling AI to autonomously learn and generalize abstractions, we move closer to agents capable of truly adaptive and flexible problem-solving. This approach aligns perfectly with the need for robust, privacy-preserving, and highly efficient AI solutions in enterprise environments. Companies like ARSA Technology, which specialize in AI and IoT solutions, leverage advanced computer vision and edge AI to deliver measurable impact. ARSA's AI Box Series, for example, processes data on-premise, offering real-time insights with maximum privacy. The principles demonstrated by TheoryCoder-2 could further enhance such systems, allowing them to adapt to evolving operational needs or even unexpected scenarios with greater autonomy.

Imagine smart retail analytics that can learn new customer behaviors or store layouts on the fly, similar to how ARSA's AI BOX - Smart Retail Counter already provides footfall and customer analytics. Or consider smart city traffic management systems, where AI like ARSA's AI BOX - Traffic Monitor can dynamically identify and create new abstract rules for managing congestion based on novel traffic patterns, significantly improving response times and efficiency. The ability for AI to independently acquire and apply abstract knowledge opens up unprecedented opportunities for more intelligent, resilient, and human-aligned automated systems across various industries.

This research highlights the growing potential for AI to move beyond mere pattern recognition and engage in more sophisticated, human-like reasoning. For enterprises, this translates into AI solutions that are not only powerful but also self-improving and less reliant on continuous human reprogramming, ultimately driving greater efficiency, security, and innovation.

Source: Ahmed, Z., Irie, K., Tenenbaum, J. B., Bates, C. J., & Gershman, S. J. (2026). Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents. arXiv preprint arXiv:2602.00929. https://arxiv.org/abs/2602.00929

Ready to explore how cutting-edge AI and IoT solutions can transform your operations? Learn more about ARSA Technology's innovative products and services, and get a free consultation with our expert team today.