Markerless motion capture

AI-Powered Markerless Motion Capture: Revolutionizing Infant Motor Assessment

Explore how advanced AI pose estimation frameworks are transforming early infant motor impairment detection through non-invasive, scalable markerless motion capture.

ARSA Technology Team

19 May 2026 • 6 min read

Revolutionizing Infant Motor Assessment with AI

The early identification of motor impairments in infants is crucial for timely intervention, significantly improving developmental outcomes. Traditionally, clinicians rely on skilled visual assessment of spontaneous infant movements, a method that, while effective, is often labor-intensive, requires extensive training, and can be susceptible to variability between different assessors. This challenge highlights a critical need for objective, scalable, and automated alternatives to ensure that more infants can benefit from early detection of conditions like cerebral palsy and other movement disorders (Novak et al., 2017).

One of the most promising advancements in this field is the application of computer vision, particularly through markerless motion capture. This technology offers a non-invasive way to quantify infant movement patterns directly from video recordings, transforming passive footage into actionable insights. By leveraging high-quality pose estimation, medical professionals and researchers can gain a deeper, more objective understanding of infant neuromotor integrity, paving the way for advanced diagnostic tools and personalized interventions.

The Limitations of Traditional Assessment Methods

Current clinical assessments such, as the Prechtl General Movement Assessment (GMA), the Test of Infant Motor Performance (TIMP), and the Baby Observational Selective Control AppRaisal (BabyOSCAR), are invaluable for evaluating characteristic movement patterns in young infants. These patterns are vital indicators of neuro-motor health and can predict developmental trajectories. However, the reliance on human expertise for these assessments comes with inherent challenges. The significant training required for raters, the time-consuming nature of scoring, and the potential for subjective interpretation can limit their widespread application.

These constraints often mean that many infants who could benefit from early assessment do not receive it, creating "blind spots" in healthcare systems. The need for automated, objective methods is therefore not just a matter of efficiency, but one of equitable access to critical early diagnostic tools. Markerless motion capture addresses these issues by providing a standardized, repeatable way to quantify infant movement, minimizing human error and maximizing throughput.

Demystifying Markerless Motion Capture and Pose Estimation

Markerless motion capture is a groundbreaking technology that allows for the precise tracking of body movements from video footage, eliminating the need for cumbersome physical markers attached to the subject. At its core, this system relies on "pose estimation frameworks"—sophisticated artificial intelligence models that can detect and localize anatomical "keypoints" (e.g., joints, body landmarks) on a person or infant within an image or video frame. This raw data is then used to reconstruct the body’s posture and movement in three dimensions.

Early pose estimation methods often provided only a sparse set of 2D keypoints, which proved insufficient for comprehensive whole-body kinematic analysis—the study of motion characteristics. However, recent advancements have dramatically increased both the density and dimensionality of these keypoints, moving towards full 3D representations that include intricate details like individual finger and toe joints. This evolution is critical, as detailed distal joint movements are often key indicators of healthy neuro-motor development. The transition from sparse 2D data to rich 3D information represents a significant leap towards truly objective and comprehensive motor assessment.

Advanced AI Models for Detailed Biomechanical Analysis

This study systematically evaluated three cutting-edge pose estimation frameworks, each offering unique capabilities for biomechanical analysis in infants:

MeTRAbs-ACAE: This framework estimates 3D poses by generating "volumetric heatmaps," which essentially create a 3D probability distribution for each keypoint, allowing for robust 3D reconstruction. It generates a dense set of up to 580 keypoints, offering a high level of detail crucial for nuanced kinematic analysis.
SAM 3D Body: Unlike keypoint-based methods, SAM 3D Body constructs a full "parametric mesh" of the body. This means it creates a customizable 3D model that can deform to match the subject's shape and posture, including detailed representations of hands and feet, which are often overlooked but critical in infant development.
Sapiens: Designed for high-resolution feature capture, Sapiens can detect 308 keypoints, including fine details of finger joints and facial points. While it provides depth and body segmentation, it doesn't directly output 3D keypoints and has a less dense coverage over the torso, which is a consideration for certain types of biomechanical analysis.

The diverse approaches of these frameworks highlight the ongoing innovation in computer vision. The study’s objective was not to declare one superior in all aspects but to rigorously assess their 3D accuracy and consistency specifically within infant datasets, an area where such evaluations are still nascent. This comparative analysis is foundational for selecting the most appropriate tools for different clinical and research applications.

Rigorous Evaluation Methodology for Infant Movements

To assess these advanced algorithms, researchers used a sophisticated Multi-view Markerless Motion Capture (MMMC) system. This setup involved eight synchronized FLIR BlackFly S GigE cameras strategically placed around an infant on a mat, capturing RGB video at 29 frames per second (Cotton et al., 2023a). This multi-view approach allowed for robust 3D reconstruction of movements through triangulation and reprojection optimization, providing a high-quality reference point for evaluating the single-camera based estimations from the AI frameworks.

The study included eight healthy infants, aged 8 to 16 weeks, across 13 unique sessions, accumulating over 216,000 synchronized multi-view frames. This extensive dataset was crucial for ensuring the reliability of the evaluation. Keypoint detection accuracy was quantified using:

Reprojection error: Measures how accurately the estimated 3D keypoints, when projected back onto the 2D image plane of each camera, align with the original 2D detections. A lower error indicates better 3D reconstruction.
Geometric consistency: Assesses the stability and coherence of the 3D estimates across the different camera views. High consistency means the 3D model is robust regardless of the viewing angle.
Procrustes-aligned 3D position error: This metric compares the estimated 3D shape of the infant's body to a "ground truth" shape, after optimally aligning them to account for differences in position, orientation, and scale. It provides a direct measure of how well the AI model reconstructs the actual 3D body shape.

This meticulous methodology ensures that the evaluation is thorough and reliable, establishing strong groundwork for future advancements in infant biomechanics, as detailed in the source paper, "Markerless Motion Capture for Biomechanical Whole-Body Kinematic Estimation in Infants" (Joshi et al., 2026).

Key Findings: Performance of Advanced AI Models

The systematic evaluation yielded important insights into the capabilities of the three pose estimation frameworks when applied to infant data. Sapiens demonstrated the lowest reprojection error, averaging 22.8 pixels, and achieved the highest geometric consistency at 0.82. These results suggest that Sapiens is highly accurate in projecting its estimated 3D keypoints back onto the 2D image planes and provides stable 3D estimates across different camera views.

However, for comprehensive kinematic reconstruction, SAM 3D Body proved to be the most advantageous, offering the richest 3D information. It exhibited Procrustes-aligned 3D position errors ranging from 19 to 28 mm. Crucially, the study showcased a case comparison where biomechanical models fitted to SAM 3D estimates successfully distinguished representative movement patterns in infants—patterns that were also identified by a clinical expert as being related to motor development. These findings underscore both the significant potential and the current limitations of 3D pose estimation in the nuanced field of infant biomechanics.

Transforming Research into Practical Applications with ARSA

The advancements in AI-powered markerless motion capture, as highlighted by this study, have profound implications for healthcare and developmental diagnostics. These sophisticated computer vision techniques can be adapted and deployed to create highly objective and scalable tools for early motor assessment, moving beyond the traditional constraints of manual evaluations. The ability to automatically quantify whole-body kinematics in infants without physical markers opens doors for widespread screening and personalized intervention strategies.

For enterprises and institutions looking to deploy such cutting-edge AI solutions, ARSA Technology offers expertise in designing and implementing practical AI systems that address complex operational challenges. Our AI Video Analytics platform, for instance, can process real-time video streams to detect objects, people, and behaviors, making it adaptable for specialized applications like detailed movement analysis. Furthermore, our AI Box Series provides pre-configured edge AI systems, ideal for rapid, on-site deployment in clinical or research settings where low latency and data privacy are critical. ARSA’s solutions are engineered for accuracy, scalability, and privacy-by-design, ensuring that sensitive data, particularly in healthcare contexts, remains secure and compliant. Our team has been experienced since 2018 in delivering production-ready AI and IoT solutions across various industries, including healthcare.

The Future of Infant Health Monitoring

The systematic evaluation of advanced pose estimation frameworks lays foundational groundwork for a future where scalable, video-based assessment of early motor development is routine. While the promise of these technologies is immense, ongoing research is necessary to refine accuracy, particularly for dense keypoint coverage essential for complex biomechanical models. The ability to transform passive video footage into rich, objective data for analysis signifies a major step forward.

By providing objective measures of movement, AI-driven motion capture can help clinicians identify subtle signs of impairment earlier, enabling timely interventions that can significantly alter developmental trajectories. This not only enhances diagnostic capabilities but also has the potential to reduce healthcare costs and improve patient outcomes globally.

To learn how ARSA Technology can help you implement advanced AI and IoT solutions for your specific operational challenges, we invite you to explore our products and solutions.

Source: Joshi, D., Peiffer, J. D., Peyton, C., & Cotton, R. J. (2026). Markerless Motion Capture for Biomechanical Whole-Body Kinematic Estimation in Infants. arXiv preprint arXiv:2605.17120.

Ready to integrate intelligent technology into your operations? Contact ARSA today for a free consultation.