Advancing Cybersecurity: Large Language Models as Explainable Detectors for Energy Industrial Control Systems
Explore how Large Language Models (LLMs) are enhancing cybersecurity for energy Industrial Control Systems (ICS) by providing explainable, high-accuracy attack detection for Modbus traffic, crucial for critical infrastructure protection.
The Critical Need for Explainable Cyber Defense in Energy ICS
Modern energy grids rely heavily on Industrial Control Systems (ICS) and SCADA (Supervisory Control and Data Acquisition) protocols like Modbus to manage operations, exchange critical measurements, and issue commands. While these systems are fundamental to our infrastructure, they are also prime targets for sophisticated cyberattacks, including false-data injection. Detecting such intrusions is paramount, but simply flagging an anomaly isn't enough; operators need to understand why an alert was triggered. This demand for auditable and understandable intrusion detection is crucial for building trust, facilitating rapid response, and archiving incidents for future analysis.
The current landscape for ICS intrusion detection features a broad array of supervised machine learning techniques, from gradient-boosted trees to deep neural networks, which already achieve high accuracy on public benchmarks. However, deploying these solutions often requires extensive site-specific data collection, complex feature engineering, or continuous retraining. Furthermore, while some explainability methods exist, they don't always produce rationales that are easily interpretable or directly archivable by human operators. This creates a significant gap between advanced detection capabilities and practical, auditable deployment.
Bridging the Gap: LLMs as a Triage Layer for Modbus Traffic
Recent research explores a novel approach to bridge this gap: leveraging off-the-shelf Large Language Models (LLMs) as a complementary, human-in-the-loop layer for Modbus traffic analysis. This doesn't aim to replace existing supervised methods or position LLMs as autonomous decision-makers, but rather as an intelligent triage system. The primary goal is to empower operators by flagging Modbus traffic as either "normal" or "critical," allowing them to prioritize investigations effectively, while simultaneously generating a concise, archivable incident record for each alert.
This innovative application formulates the problem as a binary classification task. All attack periods and other safety-critical behaviors observed in Modbus datasets are collapsed into a single "critical" class. This deliberate simplification provides a conservative gateway, surfacing traffic that warrants immediate human review, while multi-class attack attribution remains the purview of specialized classifiers or human analysts. Such a system offers a pragmatic solution for organizations operating critical infrastructure, enhancing security posture by integrating advanced AI capabilities without requiring complex, model-specific weight updates or extensive retraining, similar to how ARSA's AI Box Series provides plug-and-play solutions for rapid deployment and edge processing.
How LLMs Are Applied: From Protocol to Prediction
The methodology transforms raw Modbus communication instances into a compact token string. This token string is derived from discretizing various protocol fields, effectively converting structured industrial data into a format that LLMs can process. A prompt-configured LLM then receives this tokenized input. Unlike traditional machine learning models that require extensive, labeled datasets for training, this LLM operates on a few-shot learning principle, where its behavior is guided by a natural language prompt containing a small number of prototypical examples. This prompt encodes generic Modbus semantics and high-level ICS safety cues, enabling the LLM to render a normal/critical decision.
The detection and auditing process is designed in a two-pass pipeline. The first pass outputs the binary label (normal or critical). The second pass then instructs the LLM to generate a brief explanation. This explanation takes the form of verbatim key tokens from the Modbus communication instance and, whenever possible, suggests a minimal "what-if" edit to those tokens that would have altered the model's decision. This dual output provides not only a decision but also contextual evidence, making the system significantly more transparent and actionable for operators. For enterprises requiring robust and flexible AI capabilities for real-time monitoring and security, integrating such systems with ARSA AI API offerings can streamline deployment and enhance existing security infrastructure.
The Power of Auditability: Explaining AI's Decisions
A key innovation in this approach is the focus on auditability. The LLM's incident record, grounded in actual protocol tokens, is designed to serve as an audit signal, offering operators a tangible basis for understanding alerts. To objectively assess the quality of these explanations, intervention-based diagnostics are employed. These include sufficiency- and necessity-style tests, which provide quantitative evidence that the cited tokens are, in fact, relevant to the model's prediction. Sufficiency tests determine if a subset of tokens is enough for the model to make the same prediction, while necessity tests check if removing certain tokens changes the prediction.
While these records provide token-grounded decision-relevance, it's important to note that they are not presented as full, human-grounded explanations. Improving interpretability through rigorous domain-expert evaluation remains an area for future development. However, the ability to generate such records on-the-fly, particularly in sensitive environments like those managed by government and critical infrastructure operators, marks a significant step towards trustworthy AI deployments. Such transparency is crucial in sectors where decisions carry high stakes, aligning with the "Human-Centered Innovation" core value embraced by companies like ARSA, which strives to enhance human capability and ensure ethical AI usage, drawing on deep expertise gained since being experienced since 2018.
Performance and Practicality: A New Paradigm for ICS Security
The research demonstrates that this LLM-based triage pipeline achieves high predictive performance on two public Modbus benchmarks (LeMay CSET’16 and CIC Modbus 2023), proving broadly comparable to strong supervised baselines. Remarkably, this is achieved without requiring any task-specific weight updates, relying solely on prompt configuration and a few examples. Across both datasets, the LLM achieved approximately 0.98 accuracy with high recall on the critical class, indicating its reliability in identifying potential threats.
The practical value of this approach is multifaceted:
- Competitive Detection: The LLM's performance rivals that of traditional supervised models, offering a powerful alternative or complement.
- Prompt-Only Deployment: This significantly reduces the need for extensive training data and retraining cycles, making deployment faster and more adaptable to evolving threat landscapes.
- Auditable Record: The generated incident records provide a concrete basis for operator review and incident response, fostering trust and operational efficiency.
- Deployment Flexibility: The system can be deployed as a passive decision-support layer over existing Modbus messages, minimizing disruption to critical operations.
This capability to quickly adapt and provide actionable insights without extensive model training offers a compelling solution for the dynamic challenges of industrial cybersecurity. ARSA Technology, for instance, offers robust AI Video Analytics that can monitor and detect anomalies in real-time, showcasing a commitment to practical, deployable AI for operational intelligence across various industries.
Future Implications for Critical Infrastructure Security
The integration of Large Language Models as explainable cyberattack detectors holds immense promise for the energy sector and other critical infrastructures. By providing accurate, auditable, and easily deployable solutions, LLMs can transform how organizations approach cybersecurity, moving towards more transparent and proactive defense mechanisms. This research underlines a significant shift in leveraging general-purpose AI for highly specialized and sensitive applications, making advanced security intelligence more accessible and actionable for human operators.
To explore how AI and IoT solutions can enhance the security and operational intelligence of your enterprise's critical infrastructure, we invite you to contact ARSA for a free consultation.
Source: Kong, W., Saber, A.M., Youssef, A., & Kundur, D. (2026). Large Language Models as Explainable Cyberattack Detectors for Energy Industrial Control Systems. Proceedings of ACM EnergySP 2026, co-located with ACM e-Energy 2026. Available at: https://arxiv.org/abs/2604.26079