AI in education

Enhancing STEM Education: How AI-Powered Feedback Transforms Student Diagram Analysis

Discover Sketch2Feedback, an AI framework that revolutionizes rubric-aligned feedback for student STEM diagrams. Learn how modular AI engineering overcomes hallucination, providing accurate, actionable insights for educators.

ARSA Technology Team

24 Feb 2026 • 5 min read

The Challenge of Timely Feedback in STEM Education

In fields like physics and electrical engineering, visual representations such as free-body diagrams and circuit schematics are fundamental. These diagrams are more than just illustrations; they are powerful tools for students to externalize complex conceptual structures—from the balance of forces to the intricate topology of an electrical circuit. However, a significant challenge for educators globally remains: how to provide timely, specific, and actionable feedback on these hand-drawn diagrams, especially at scale. The current bottleneck often leads to delays, hindering the learning process and student progress.

The rise of Large Multimodal Models (LMMs), such as LLaVA and GPT-4V, has introduced powerful capabilities for parsing images and generating natural language explanations. While impressive, these advanced AI systems often fall short in critical educational applications due to a tendency to "hallucinate"—confidently describing elements or errors that do not exist in the student’s drawing. This lack of reliability erodes trust, making their direct deployment in the classroom problematic for formative feedback.

Introducing Sketch2Feedback: A Grammar-in-the-Loop AI Framework

To address the limitations of end-to-end LMMs in educational contexts, researchers have developed Sketch2Feedback, a "grammar-in-the-loop" framework. This innovative pipeline is designed to provide rubric-aligned feedback on student-drawn STEM diagrams by decomposing the complex task into a series of verifiable stages. The core idea is to separate raw perception from symbolic reasoning and then from language generation, ensuring that any feedback provided by the AI is grounded in concrete, verifiable evidence.

This modular approach prioritizes precision over broad recall. While the system might not detect every single error, any error it does report is rigorously verified by an upstream rule engine. This design significantly mitigates the risk of AI hallucination, which is paramount for building trust and ensuring the utility of AI-driven feedback in a classroom setting. Enterprises looking to deploy robust AI solutions that demand high accuracy and verifiable outcomes, similar to those that leverage custom AI solutions for critical operations, can benefit from such an architecture.

Breaking Down the Framework's Stages

The Sketch2Feedback pipeline operates through four distinct stages to ensure accuracy and relevance:

1. Hybrid Primitive Detection: This initial stage focuses on meticulously identifying fundamental graphical elements within a student’s diagram. It employs a combination of classical computer vision (CV) techniques for robustness. For example, contrast normalization and adaptive thresholding improve image clarity, while contour analysis with filters helps recognize specific shapes like arrows for forces or distinct components. HoughLinesP, a powerful algorithm, is used on edge maps to identify wires in circuit diagrams, while shape-based classification discerns various circuit components based on their aspect ratio, circularity, and solidity. Small-blob detection, filtered by circularity, helps identify junctions. To prevent redundant detections, non-maximum suppression is applied.
2. Symbolic Graph Construction: Once the primitive elements are detected, they are translated into a structured, typed graph. In this graph, each detected element—like an arrow, wire, or component—becomes a node, carrying information such as its type (e.g., resistor, battery, force vector), confidence score, and bounding box coordinates. Edges are then drawn between nodes that are spatially proximate (e.g., within 80 pixels), establishing the topological relationships between different parts of the diagram. This creates a symbolic representation that AI can logically process, laying the foundation for sophisticated AI Video Analytics.
3. Constraint Checking: This is where the "grammar-in-the-loop" truly comes into play. The symbolically constructed graph is systematically checked against a predefined set of domain-specific rules or "constraints" derived from the scenario key (e.g., a specific physics problem or circuit design). These constraints include both local checks (e.g., ensuring required forces are present, directions are consistent, components are correctly connected, polarity is accurate, or a ground reference is present) and non-local checks (e.g., approximate force balance for static free-body diagrams or proper junction semantics for wire crossings). Any deviation from these rules generates a list of verified violations.

4. Constrained VLM Feedback Generation: In the final stage, a compact Vision-Language Model (VLM)—specifically, Qwen2-VL-2B-Instruct in this framework—receives only the image of the student's diagram and the verified list of violations* from the constraint checker. Crucially, the VLM is not allowed to freely interpret the image or generate unconstrained captions. This tight constraint is the primary mechanism for preventing hallucination. The VLM's task is solely to verbalize the detected and verified violations into rubric-aligned, actionable feedback. If the VLM is unavailable, the system can fall back to structured, domain-specific templates to maintain high actionability.

Evaluation and Insights

The Sketch2Feedback framework was rigorously evaluated using two micro-benchmarks: FBD-10 (200 samples of free-body diagrams) and Circuit-10 (200 samples of circuit schematics). These benchmarks featured controlled error taxonomies, pixel-level bounding boxes, and comprehensive rubric keys for fair assessment. Diagram samples were synthetically generated with various noise levels (stroke jitter, Gaussian noise, rotation, brightness variation) to simulate diverse student drawing styles, and errors were deterministically injected for balanced representation.

The rubric design for feedback quality followed established educational feedback theories, focusing on two key dimensions:

Correctness (1-5 Likert scale): Does the feedback accurately identify the actual error?
Actionability (1-5 Likert scale): Does the feedback provide clear, understandable suggestions for a novice student to fix the error?

Performance Analysis and Key Takeaways

The evaluation provided mixed, yet highly instructive, results when comparing the Sketch2Feedback pipeline (using Qwen2-VL-2B) against an end-to-end LMM (LLaVA-1.5-7B) and a vision-only detector:

Domain-Specific Strengths: The end-to-end LMM demonstrated stronger error detection for free-body diagrams (micro-F1 of 0.471 vs. 0.263 for Sketch2Feedback). However, Sketch2Feedback significantly outperformed the LMM on circuit schematics (0.329 vs. 0.038) and achieved perfect actionability (5.0/5). This highlights that no single AI architecture dominates across all diagram types, suggesting future opportunities for ensemble approaches that combine the strengths of different models.
Pinpointing Failure: A critical advantage of Sketch2Feedback's modular design is its ability to precisely attribute failures. While the grammar pipeline showed a high circuit hallucination rate (0.925), this was traced back to false positives generated by the classical computer vision perception module, not the VLM fabricating errors. In contrast, end-to-end LMMs lack this transparency, making it difficult to understand why they hallucinate. This modularity is crucial for debugging, improving, and trusting complex AI systems, especially in scenarios where reliable on-premise processing is required, potentially leveraging systems like the ARSA AI Box Series.
The Value of Verification: The results underscore the importance of verification layers. By ensuring the language model only verbalizes violations confirmed by a rule engine, Sketch2Feedback guarantees that the feedback, when provided, is demonstrably accurate and actionable. This principle of "grounded" AI feedback is vital for high-stakes applications beyond education, where confidence and transparency are paramount.

The ARSA Advantage: Bridging Research and Reality

The Sketch2Feedback framework exemplifies the power of a structured, engineering-driven approach to AI. For enterprises and institutions, adopting similar principles can unlock transformative potential across various sectors. The ability to integrate advanced computer vision with symbolic reasoning and constrained AI output aligns perfectly with the demand for reliable, production-ready AI systems.

ARSA Technology, with its experienced since 2018 expertise in AI and IoT solutions, specializes in building and deploying systems that deliver measurable impact. Whether it's enhancing security, optimizing operations, or improving educational outcomes, ARSA’s focus on robust, real-world AI implementations mirrors the foundational principles demonstrated by Sketch2Feedback. We understand the critical need for solutions that move beyond experimental stages into reliable, scalable deployments across various industries.

This research (Source: Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams) offers valuable insights into the future of AI-powered educational tools and the broader landscape of verifiable AI.

To explore how ARSA Technology can help your organization implement advanced AI and IoT solutions tailored to your unique challenges, we invite you to contact ARSA today.