formal verification

Enhancing Geotechnical AI: The Power of Formal Verification for ML Models

Discover how formal verification using SMT solvers ensures the physical consistency and reliability of machine learning models for critical geotechnical hazard predictions like lateral spreading, mitigating unseen risks in AI deployment.

ARSA Technology Team

19 Mar 2026 • 8 min read

The Unseen Risks of AI in Geotechnical Engineering

Machine learning (ML) models are increasingly becoming indispensable tools for predicting complex geotechnical hazards, such as earthquake-induced liquefaction and lateral spreading. These advanced algorithms can unearth intricate, non-linear relationships within vast datasets—connections often beyond the scope of traditional empirical methods. For enterprises operating in seismic zones or developing critical infrastructure, the promise of AI-driven predictions offers significant potential for enhanced safety, cost reduction, and more resilient project planning. However, the sophistication of these models can also obscure critical flaws: the potential for learning physically inconsistent relationships, especially when trained on sparse or biased real-world data.

Consider the challenge of lateral spreading, a phenomenon where ground displacement occurs during strong earthquakes. A physically consistent model should always predict an increased likelihood of spreading with stronger ground shaking, measured by Peak Ground Acceleration (PGA). Yet, studies have shown that unconstrained AI models can, in certain data-sparse regions, predict the opposite—a decrease in spreading probability with increased PGA. Despite achieving high overall accuracy, such a fundamental physical error could lead to catastrophic misjudgments in critical infrastructure design and public safety, highlighting the urgent need for a more rigorous approach to AI model assurance.

This crucial issue underscores a gap in current AI deployment practices within safety-critical applications. While ML models offer unparalleled predictive power, their "black box" nature can conceal behaviors that defy fundamental physical laws, leading to untrustworthy predictions where accuracy matters most. This article explores a rigorous methodology, known as formal verification via Satisfiability Modulo Theories (SMT) solvers, designed to exhaustively assure the physical consistency of AI models, ensuring they operate within accepted scientific boundaries before deployment. The insights presented herein are drawn from an academic paper on "Formal verification of tree-based machine learning models for lateral spreading" by Krishna Kumar (2026), available at arXiv:2603.16983.

Beyond Accuracy: Why Physical Consistency Matters

In high-stakes fields like geotechnical engineering, merely achieving high predictive accuracy is insufficient. A model might correctly predict the outcome for 99% of common scenarios, but if it fails catastrophically in the remaining 1%—especially edge cases where physical intuition is paramount—it can undermine public trust and lead to severe consequences. Traditional methods for addressing these inconsistencies, such as post-hoc explainability techniques like SHAP and LIME, can diagnose problems in individual predictions or specific regions. However, these methods are approximate and cannot provide an exhaustive guarantee that a model is entirely free from non-physical behavior across its entire operational domain.

Similarly, training-time constraints, which guide the model to learn certain behaviors (e.g., monotonicity for a specific feature), are valuable but often limited. They typically operate on single features and struggle to enforce complex, compound conditions, such as "low shaking intensity combined with a large distance from a free face should never predict a high risk of lateral spreading." Furthermore, these constraints don't inherently provide a formal proof that the modified model definitively satisfies the intended property across all possible input scenarios. The challenge remains to bridge the gap between high predictive performance and guaranteed physical consistency.

The absence of robust assurance tools creates significant risk for enterprises adopting AI in safety-critical contexts. Without exhaustive verification, organizations are deploying models with unknown vulnerabilities, exposing themselves to potential failures that could compromise infrastructure, endanger lives, and incur substantial financial and reputational costs. Ensuring the reliability and trustworthiness of AI systems in these environments demands a method that can offer absolute certainty, not just approximate diagnostics.

Formal Verification with SMT: A New Standard for AI Assurance

Formal verification, particularly through Satisfiability Modulo Theories (SMT) solvers, offers a powerful solution to this challenge. An SMT solver is a sophisticated tool that determines whether a logical formula, potentially involving various theories like linear real arithmetic, is "satisfiable" (meaning a scenario exists where the formula is true) or "unsatisfiable" (meaning no such scenario exists). In the context of AI, this translates to encoding the behavior of a trained machine learning model and a set of desired physical specifications into a logical framework.

Tree-based ensemble models, such as XGBoost and Explainable Boosting Machines (EBMs), are particularly well-suited for this approach. These models effectively partition the vast input space into a finite number of hyperrectangular regions, each producing a constant output prediction. By taking a physical specification (e.g., "PGA monotonicity") and negating it (e.g., "PGA monotonicity fails") and then incorporating the model’s partition, we can create a quantifier-free logical formula. The SMT solver then precisely evaluates this formula. If the formula is satisfiable, the solver produces a concrete counterexample—a specific combination of input features where the model violates the physical rule. If the formula is unsatisfiable, it mathematically proves that no such violation exists across the entire input domain.

This methodology establishes a "verify-fix-verify" engineering loop. When a counterexample is found, it provides precise insights into where and why the model misbehaves. This information can then be used to apply targeted corrections, such as specific training constraints, before the model is re-verified. This iterative process allows engineers to progressively improve the physical consistency of their AI models, moving beyond approximations to deliver verifiable assurances. For enterprises like ARSA Technology, which specializes in AI Video Analytics and edge AI systems, integrating such formal verification ensures that deployed solutions meet the highest standards of reliability and compliance for critical applications.

Putting it to the Test: Geotechnical Specifications and ML Models

To demonstrate the efficacy of SMT-based formal verification, researchers applied this methodology to ML models trained on a real-world dataset: the 2011 Christchurch earthquake lateral spreading data. This comprehensive dataset comprised 7,291 sites, each characterized by four key geotechnical features: groundwater depth (GWD), distance to a free face (L), ground slope, and Peak Ground Acceleration (PGA). Two popular tree-based ensemble models were tested: XGBoost, a widely used gradient-boosted tree ensemble, and Explainable Boosting Machines (EBMs), a type of Generalized Additive Model that provides more inherent interpretability through univariate shape functions.

Four crucial geotechnical specifications were formalized as logical formulas for verification:

Water Table Depth Safety (Specification A): A basic threshold ensuring that if the water table is sufficiently deep, the risk of lateral spreading should be low.
PGA Monotonicity (Specification B): The fundamental principle that an increase in PGA (stronger shaking) should not decrease the predicted probability of lateral spreading.
Distance Safety (Specification C): A compound condition stating that if a site is far from a free face and experiences weak ground shaking, the risk of lateral spreading should be low.
Flat-Ground Safety (Specification D): Another compound rule stipulating that for flat ground (low slope) and sufficiently deep groundwater, the lateral spreading risk should be minimal.

These specifications, calibrated to the Christchurch dataset, represent critical safety boundaries that any reliable geotechnical model must respect. The verification framework itself is highly adaptable, allowing engineers to define custom thresholds and rules pertinent to their specific sites and operational requirements. Implementing solutions that adhere to such stringent safety criteria is paramount in industrial and public safety sectors, where ARSA Technology's AI BOX - Basic Safety Guard solutions offer real-time monitoring and alert systems, ensuring compliance and preventing incidents.

Key Findings: Unveiling the Accuracy-Consistency Trade-Off

The formal verification process revealed compelling insights into the behavior of the machine learning models. The initial, unconstrained XGBoost model, while achieving a respectable 82.5% predictive accuracy, failed all four physical specifications. This demonstrates that raw accuracy metrics alone are insufficient to guarantee safe deployment, as the model harbored fundamental physical inconsistencies across its operational domain. When monotone constraints were applied to XGBoost during training, aiming to enforce desired directional relationships for individual features, the accuracy dropped to 69.3%. While this constrained model successfully satisfied the PGA monotonicity specification, it continued to violate the more complex, compound specifications (C and D) that single-feature constraints couldn't fully address.

Similar patterns emerged with Explainable Boosting Machines (EBMs). An unconstrained EBM model achieved 80.1% accuracy but also violated all four specifications. Applying a monotone constraint specifically to PGA in the EBM fixed Specification B but left the other three violated. Only a fully constrained EBM, with monotone constraints applied to all four features, managed to satisfy three of the four specifications, though it still violated the water table depth safety (Specification A) by a thin margin, achieving an accuracy of 67.2%. This iterative "verify-fix-verify" process, guided by SMT counterexamples, systematically improved physical consistency.

A deeper Pareto analysis across 33 different model variants starkly revealed a persistent and critical trade-off: no single model among those studied achieved both greater than 80% accuracy and full compliance with the entire set of geotechnical specifications. This highlights a concrete cost associated with ensuring physical consistency, a crucial factor for practitioners to evaluate when making deployment decisions. Furthermore, the study underscored the limitations of post-hoc explainability methods; SHAP analysis of specification counterexamples sometimes showed the offending feature ranking last in importance, demonstrating that such explanations cannot replace the exhaustive guarantees provided by formal verification.

Implications for Enterprise and Safety-Critical Deployments

These findings carry profound implications for enterprises deploying AI in safety-critical sectors. The persistent trade-off between predictive accuracy and physical consistency demands a strategic approach to AI development, where verifiable reliability is prioritized alongside performance. For industries such as construction, energy, and smart city infrastructure, where failure can result in significant financial losses, environmental damage, and loss of life, formal certification of AI models is not merely an academic exercise—it is an operational imperative.

The "verify-fix-verify" loop enabled by SMT solvers provides a robust pathway to achieve this certification. By diagnosing specific violations and iteratively refining models, organizations can build AI systems that are not only intelligent but also trustworthy and compliant with fundamental engineering principles. This level of assurance is vital for gaining regulatory approval and public acceptance of AI technologies in critical applications. ARSA Technology understands these demands, leveraging advanced AI and IoT solutions across various industries, from public safety to industrial automation, always with a strong emphasis on practical deployment realities and measurable outcomes. This commitment ensures that their solutions meet the stringent requirements of demanding operational environments, providing partners with dependable and compliant AI systems.

Conclusion: Towards Certified AI for Geotechnical Safety

The integration of formal verification techniques like SMT solvers marks a significant step forward in making AI truly reliable for safety-critical applications. By transforming complex machine learning models into verifiable logical formulas, engineers can rigorously test and prove their physical consistency across the entire input domain. This process moves beyond approximate diagnostics and partial constraints, offering a powerful framework for developing AI systems that are not only accurate but also inherently trustworthy.

For global enterprises venturing into AI-driven geotechnical hazard prediction, this approach provides the necessary tools to navigate the accuracy-consistency trade-off with confidence, ensuring that their deployments enhance safety and operational resilience without introducing unforeseen risks. Adopting a verify-fix-verify engineering loop is essential for the future of AI in critical infrastructure and public safety.

To explore how ARSA Technology can assist your organization in deploying robust and formally verified AI solutions for your unique operational challenges, we invite you to contact ARSA for a free consultation.