LLM-assisted Digital Twins

Enhancing Digital Twin Resilience: Integrating LLMs with Human Oversight for Manufacturing

Explore how Large Language Models (LLMs) are transforming Digital Twin creation for manufacturing, overcoming hallucination challenges through resilient design principles, human oversight, and strategic Intermediate Representation (IR) choices for practical Industry 4.0 applications.

ARSA Technology Team

30 Mar 2026 • 6 min read

The advent of digital twins has revolutionized how industries monitor, manage, and optimize complex physical systems. Particularly in manufacturing, these virtual replicas offer unprecedented opportunities for real-time decision support, predictive maintenance, and operational optimization. However, traditional modeling and simulation (M&S) workflows often prove too slow and rigid to keep pace with the dynamic, rapidly evolving environments of modern production systems. This rigidity creates a growing demand for more agile, automated approaches to Digital Twin creation, prompting significant research into how Artificial Intelligence (AI) can accelerate this process.

The Evolution of Digital Twins with AI Assistance

Digital Twins of manufacturing systems require executable simulation models that can continuously adapt to real-time sensor data and changing production environments. The traditional M&S lifecycle—involving sequential stages of measurement, input modeling, model specification, conversion, verification, and validation—is fundamentally unsuited for mission-critical applications where production lines and operational parameters can shift frequently. For manufacturing Digital Twins, where timely optimization and decision-making are paramount, this slow cycle is becoming increasingly obsolete.

Two primary forces are making automated model generation not just appealing, but essential. Firstly, the widespread adoption of IoT sensors and Industry 4.0 initiatives generates a deluge of real-time operational data from manufacturing systems. Secondly, advancements in cloud and edge computing, coupled with low-latency networks and machine learning, enable AI to automate the entire modeling lifecycle. This includes initial model generation, continuous fitting and tuning to sensor data, ongoing monitoring, and model-based decision support, such as what-if analysis and optimization.

Addressing LLM Challenges: Hallucination and Trust

Recent breakthroughs in Large Language Models (LLMs) have opened new avenues, leveraging their ability to translate natural language descriptions into formal representations. This makes LLM-assisted automation an attractive prospect for simulation modeling workflows. However, the practical application of LLM-assisted automation faces inherent challenges. Issues such as "hallucination" – where LLMs generate plausible but factually incorrect outputs – along with computational costs, API stability, lack of transparency, and alignment problems, hinder trust and widespread adoption. While future LLM developments may mitigate these fundamental limitations, current practical applications necessitate workflows that systematically circumvent these issues through deliberate design.

This article draws insights from a paper presented at ACM SIGSIM PADS 2026, which outlines critical design principles derived from the development of FactoryFlow, an open-source framework for LLM-assisted Digital Twins in manufacturing (Lekshmi P & Neha Karanjkar, 2026, "On Integrating Resilience and Human Oversight into LLM-Assisted Modeling Workflows for Digital Twins," https://arxiv.org/abs/2603.25898). These principles are designed to inject resilience and systematic human oversight into AI-driven modeling.

Core Design Principles for Resilient LLM-Assisted Workflows

The paper highlights three critical design principles to integrate resilience and human oversight effectively into LLM-assisted modeling workflows, particularly for manufacturing Digital Twins. These principles address the specific challenge of creating reliable models for systems where parameters (like machine task delays or failure rates) evolve continuously, but structural changes (such as equipment interconnections or job routing) occur sporadically and are typically planned.

The first principle advocates for orthogonalizing structural modeling and parameter fitting. This means decoupling the processes: structural descriptions, encompassing components and their interconnections, are translated by LLMs from natural language into an intermediate representation (IR). This IR undergoes human visualization and validation before algorithmic conversion into the final model. In contrast, parameter inference operates continuously on live sensor data streams, with expert-tunable controls. This separation allows for distinct automation strategies for each component, making the overall system more robust. ARSA Technology, for instance, offers custom AI solutions that can be engineered to integrate such modular approaches for complex industrial clients.

The second principle emphasizes restricting the model’s Intermediate Representation (IR) to interconnections of parameterized, pre-validated library components. Instead of allowing LLMs to generate monolithic simulation code directly, which can hide subtle errors, the approach uses a library of proven, validated components. This enhances interpretability, making it easier for humans to understand and verify the model. It also improves error resilience, as problems are localized to component interconnections rather than pervasive code logic. This strategy ensures that even if an LLM makes a mistake in linking components, the fundamental building blocks remain sound.

The Significance of Density-Preserving Intermediate Representations

The third and arguably most crucial principle centers on using a density-preserving Intermediate Representation (IR). This concept is vital for managing the propagation of errors, particularly "hallucination," which tends to accumulate proportionally with the expansion of compact inputs into verbose representations. For example, an instruction like "100x100 machines as 2D grid" can dramatically expand into 10,000 explicit declarations within formats like XML netlists. Each expansion point represents an opportunity for an LLM to introduce an error, leading to a higher overall error rate.

The research makes a strong case for using Python as a density-preserving IR. Python's native capabilities allow for compact expression of regularity through loops, capturing hierarchy and composition via classes. The resulting code remains highly readable, making human validation more straightforward. Crucially, LLMs demonstrate strong code-generation capabilities in Python, allowing them to leverage their strengths while minimizing the verbose expansion that leads to error accumulation. This insight is particularly valuable for deploying edge AI solutions where compact, efficient models are critical, such as those delivered by ARSA's AI Box Series.

Human-in-the-Loop: Empowering Domain Experts

The problem scope—focusing on discrete-event simulation models for manufacturing systems with continuously evolving parameters but sporadic structural changes—creates unique opportunities for "expert-in-the-loop" workflows. For parameter inference, experts can configure data filters, time windows, distribution families, and sensor-to-model parameter mappings through intuitive graphical interfaces, allowing automated fitting to run continuously under these constraints. This ensures that the continuous calibration of the Digital Twin remains grounded in expert understanding.

For structural modeling, factory operators and production engineers, who possess deep knowledge of their systems but may lack specialized M&S expertise, can describe system structure in natural language. LLM-assisted translation then produces formal models, which are subject to both human (visual) and automated rule-based or test-based validation. This democratization of modeling allows domain experts to directly contribute their invaluable knowledge, avoiding the pitfalls of purely data-driven approaches where scarce, stale, or outlier-ridden data might lead to misinterpretations without human insight. ARSA Technology has been experienced since 2018 in developing AI and IoT solutions, understanding the critical need for integrating domain expertise into complex system deployments.

The Impact of IR Choice on Model Accuracy and Reliability

A key contribution of the research is a detailed characterization of LLM-induced errors across model descriptions of varying detail and complexity. This analysis reveals precisely how the choice of Intermediate Representation (IR) critically impacts error rates. When an IR is verbose and requires explicit declarations for every minor detail, LLM hallucinations accumulate, leading to less reliable models. Conversely, a density-preserving IR like Python, which can represent complex structures concisely, helps minimize these errors.

This insight provides actionable guidance for building resilient and transparent LLM-assisted simulation automation workflows. By prioritizing IRs that minimize unnecessary expansion and leverage LLMs' strengths in code generation, developers can significantly enhance the accuracy and reliability of Digital Twin models. This is particularly important for enterprise deployments where the integrity of simulation results directly impacts operational decisions and potential ROI. For example, in real-time safety and compliance monitoring, errors in a Digital Twin could have severe consequences, highlighting the importance of robust AI video analytics that prioritize accuracy and reliability.

Transforming Manufacturing Operations with Intelligent Simulation

The principles outlined in this research pave the way for a new generation of Digital Twins that are not only rapidly deployable but also robust, trustworthy, and adaptable. By addressing the inherent limitations of LLMs through strategic architectural design and systematic human oversight, industries can unlock the full potential of AI-assisted modeling. This means:

Faster Deployment: Rapidly creating complex Digital Twins from natural language descriptions.
Enhanced Reliability: Minimizing errors introduced by LLM hallucination through deliberate IR choices and validation.
Improved Adaptability: Continuous parameter tuning based on real-time sensor data, ensuring models reflect current operational realities.
Democratized Expertise: Empowering domain experts to contribute directly to model development, even without specialized M&S training.
Timely Decision Support: Providing accurate, real-time insights for optimization, what-if analysis, and proactive control in manufacturing environments.

These advancements represent a significant step towards truly intelligent manufacturing, where Digital Twins provide dynamic, actionable intelligence to drive efficiency, reduce costs, and enhance safety across operations.

To explore how advanced AI and IoT solutions can transform your manufacturing operations and build resilient Digital Twins for your enterprise, contact ARSA for a free consultation.