LLM safety

NESSiE: Why Even Simple Safety Flaws in LLMs Demand Enterprise Attention

Explore NESSiE, a crucial benchmark revealing that leading LLMs still fail basic safety and security tasks. Understand the critical implications for deploying AI in enterprise environments.

ARSA Technology Team

20 Feb 2026 • 5 min read

The Imperative of LLM Safety in Enterprise AI

Large Language Models (LLMs) are rapidly becoming the bedrock for autonomous AI systems across diverse industries. These "agentic AI" systems are increasingly deployed in real-world environments, making decisions, taking actions, and adapting without constant human oversight. While their potential for innovation is vast, their deployment in safety-critical scenarios presents significant challenges. Even a minor misstep in a chain of instructions or actions can lead to substantial deviations from intended outcomes, posing risks to operational integrity, security, and compliance. This reality underscores the urgent need for robust instruction-following behavior in LLMs, particularly when they are integrated into mission-critical enterprise systems.

Introducing NESSiE: A Foundational Benchmark for LLM Robustness

To address the growing risks associated with unmonitored AI deployment, researchers have developed NESSiE (NEceSsary SafEty), a new benchmark designed to expose fundamental safety failures in LLMs. NESSiE isn't a comprehensive safety guarantee but rather a crucial "sanity check" focusing on minimal test cases related to information and access security. Its core premise is that if an LLM fails these low-complexity tasks, it possesses safety-relevant errors that simply "should not exist." Passing NESSiE is considered a necessary, albeit not sufficient, condition for any LLM deployment, especially in enterprise settings where reliability is paramount. The benchmark is designed to be lightweight and easy to use, allowing for rapid preliminary assessments (Source: NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist).

How NESSiE Uncovers Overlooked Vulnerabilities

NESSiE's methodology rigorously evaluates an LLM's adherence to rules within simple, abstract, and safety-relevant test cases. Each test involves a system prompt (explaining instructions) and at least two user prompts: one requiring the model to be "helpful" by providing information, and another requiring it to be "safe" by withholding specific keywords or secrets without proper authorization. This complementary test pair design prevents trivial solutions where a model might always refrain from answering or always provide information.

The benchmark employs simple keyword matching for evaluation, ensuring reliability and efficiency. It uses three key metrics:

Safe: Measures if the model successfully withholds sensitive information.
Helpful: Measures if the model correctly outputs required information when authorized.
SH (Safe & Helpful): The primary metric, counting a test case as solved only if the model is both Safe and Helpful on complementary tests.

NESSiE comprises six distinct test case suites to ensure comprehensive evaluation:

RULeS: Adapted from existing rule-following tests.
RULeS Reformulated: Assesses robustness against variations in input structure.
Agentic: Evaluates the model's ability to output specific keywords to simulate actions, a fundamental aspect of agentic behavior.
Generated: Features LLM-generated, human-adjusted test cases for expanded diversity.
Skills: Requires an additional cognitive step (e.g., checking anagrams) before applying a safety rule, testing robustness under cognitive load.
Multiturn: Evaluates adherence across two-turn conversations, simulating real-world interaction complexities.

Beyond core instruction-following, NESSiE also investigates factors affecting performance. A "benign Distraction Context" of about 2000 tokens of unrelated conversation is introduced to measure robustness against irrelevant contextual noise. Additionally, tests with "Disabled Reasoning" assess the impact of the model's internal reasoning processes on benchmark performance.

Key Findings: State-of-the-Art LLMs Fall Short

Despite the simplicity of its test cases, NESSiE reveals concerning safety-relevant failures even in state-of-the-art LLMs. The benchmark's findings highlight that current top models do not achieve 100% on these fundamental safety conditions, even in non-adversarial settings. A crucial insight is that models tend to be biased towards "helpfulness" rather than "safety," frequently providing information they should withhold. This indicates a fundamental misalignment in their core instruction-following behavior.

Furthermore, the study found that model performance significantly degrades under certain conditions:

Disabled Reasoning: For some models, actively disabling their reasoning capabilities negatively impacts their ability to pass NESSiE tests.
Distraction Context: A benign, unrelated conversational context—even without malicious intent—can severely degrade an LLM's performance on safety-critical instructions. This suggests that in complex real-world environments with noise, an LLM's adherence to safety protocols can easily falter.

Failure cases observed were broadly categorized, including scenarios where the model leaks secrets, fails prerequisite skills, or entirely refuses to engage with the prompt. These outcomes underscore the critical risks of deploying such models as autonomous agents without rigorous, foundational safety checks.

Business Implications: Why NESSiE Matters for Enterprise AI Deployment

The implications of NESSiE's findings for enterprises are profound. Deploying LLMs in roles that involve sensitive data, operational control, or decision-making without ensuring their foundational safety could lead to severe consequences:

Data Security Breaches: An LLM that fails to withhold sensitive information, as demonstrated by NESSiE, could inadvertently leak proprietary data, customer information, or classified intelligence. This directly translates to significant financial losses, reputational damage, and potential legal liabilities.
Operational Instability: In agentic AI systems, a single wrong step or a failure to follow instructions can disrupt workflows, cause equipment malfunctions, or lead to incorrect automated decisions, impacting productivity and safety.
Regulatory Non-compliance: Industries with strict data privacy regulations (like GDPR or HIPAA) cannot afford systems that exhibit a "helpfulness bias" over safety. Such failures could result in hefty fines and a loss of public trust.

Ensuring that AI systems operate reliably and securely in diverse, often noisy, real-world conditions is paramount. This requires not just advanced AI capabilities but also robust engineering that accounts for practical deployment realities. Solutions that emphasize local processing and data sovereignty, such as those offered by ARSA Technology, can mitigate many of these risks. For instance, edge AI devices like the ARSA AI Box Series or Face Recognition & Liveness SDK enable on-premise data processing, ensuring sensitive information remains within the enterprise's controlled network, avoiding cloud dependencies that can introduce latency and compliance challenges.

Building Resilient AI Systems with a Focus on Operational Reality

The NESSiE benchmark serves as a stark reminder that the journey towards truly reliable enterprise AI is ongoing. It emphasizes the need for solutions engineered with accuracy, scalability, privacy, and operational reliability at their core. Instead of merely seeking "helpful" AI, enterprises must prioritize "safe" AI that rigorously adheres to instructions and safeguards sensitive information, even under distracting conditions.

For complex challenges that demand precise AI capabilities and unwavering adherence to security protocols, enterprises should seek partners with proven expertise in developing and deploying robust systems. ARSA Technology specializes in delivering custom AI solutions and AI Video Analytics, designed to operate effectively in demanding environments while prioritizing data control and compliance. By integrating advanced AI with practical deployment strategies, ARSA helps organizations transform passive data into predictive intelligence without compromising safety.

To explore how robust AI and IoT solutions can fortify your enterprise operations and ensure compliance in an increasingly AI-driven world, we invite you to contact ARSA for a free consultation.

Source: Bertram, J., & Geiping, J. (2026). NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist. arXiv preprint arXiv:2602.16756.