cross-city transfer

AI for Smart Cities: Bridging Data Gaps with Optimal Transport for Cross-City Predictions

Explore SCOT, an AI framework using Optimal Transport to overcome data scarcity in smart cities. Learn how it creates soft correspondences between unequal urban regions for accurate, robust cross-city predictions in GDP, population, and carbon estimation.

ARSA Technology Team

10 Apr 2026 • 5 min read

In the rapidly evolving landscape of smart cities, Artificial Intelligence (AI) and the Internet of Things (IoT) are becoming indispensable tools for urban planning, resource management, and economic forecasting. Tasks like estimating regional GDP, population density, or carbon emissions often rely on sophisticated AI models that process vast amounts of urban data, including human mobility patterns, points of interest (POIs), and remote sensing information. However, a significant challenge arises when trying to deploy these models across different cities: the scarcity of reliable, labeled data in many urban environments. While some "well-instrumented" cities have abundant data, many others lack the resources for comprehensive data collection and labeling. This creates a critical need for efficient cross-city transfer learning, allowing AI models trained on data-rich source cities to generalize and perform effectively in data-scarce target cities.

The Challenge of Cross-City AI: Bridging Urban Data Gaps

The concept of cross-city transfer in AI is more complex than standard domain adaptation. Cities are inherently heterogeneous; they rarely share identical geographical or administrative divisions. This means that regions in one city often have incompatible partitions and unequal region counts compared to another. For instance, a "downtown core" in City A might cover a different area and have a different functional character than what is considered a "downtown" in City B. Furthermore, there's typically no pre-defined "ground-truth" correspondence between regions across different cities. Trying to match regions based on simple heuristics or nearest-neighbor approaches often leads to brittle, unreliable connections, where many regions in one city might be mistakenly associated with a single region in another, obscuring unique functional patterns.

Current approaches to align cities for AI transfer learning often fall short. Some rely on global discrepancy objectives, which aim to make the overall data distributions similar between cities. While this can reduce the aggregate differences, it leaves specific region-to-region correspondences implicit and can be unstable when cities exhibit significant functional heterogeneity. Other methods attempt heuristic region matching, which can be sensitive to initial anchor choices or prone to "hubness," leading to many-to-one correspondences that misrepresent complex urban relationships. This absence of explicit, nuanced "soft correspondences" between unequal region sets represents a central technical bottleneck for effective cross-city AI deployment.

Optimal Transport: A Principled Approach to Urban Correspondence

To overcome these limitations, a novel framework called SCOT (Semantic Correspondence via Optimal Transport) has been proposed. SCOT addresses the challenge of establishing clear, meaningful relationships between regions of different cities without requiring prior, explicit mapping. It leverages the mathematical theory of Optimal Transport (OT), which determines the most efficient way to transform one data distribution into another. In the context of cities, OT identifies a "transport plan" or "coupling" that quantifies the "cost" of moving "mass" (or semantic information) between regions in a source city and regions in a target city. This results in explicit soft correspondences, meaning one region in City A can be related to multiple regions in City B, each with a specific strength of association.

SCOT employs a specific variant known as Sinkhorn-based entropic Optimal Transport. This method is particularly effective because its marginal constraints control how much matching "mass" each region can send and receive. This is crucial for discouraging simplistic many-to-one shortcuts and instead fosters a more structured, many-to-many correspondence that accurately reflects the nuanced relationships between urban areas. Moreover, Sinkhorn iterations make this complex computation practical and efficient at the urban scale. Such principled alignment is fundamental for AI applications, ensuring that when ARSA Technology deploys AI Video Analytics in diverse environments, the underlying data structures are robustly handled.

Sharpening Insights with AI-Guided Learning

Beyond establishing initial correspondences, SCOT further refines the learning process to produce highly transferable region representations. It integrates an OT-weighted contrastive objective, a technique used in AI to learn distinctive features by contrasting similar and dissimilar data points. In this case, the learned OT coupling defines "soft positives" by weighting target candidates based on the amount of "transported mass." This effectively concentrates the learning of similarity on pairs of regions that are strongly linked by the optimal transport plan, avoiding brittle nearest-neighbor matches. This coupling-aware contrastive loss ensures semantic separation while preserving the capacity control inherent in OT. The outcome is locally aligned yet non-collapsed embeddings that lead to improved downstream prediction performance across cities.

To ensure the stability of the AI model during training, SCOT also incorporates a cycle-style reconstruction regularizer. This mechanism helps to stabilize the optimization process, preventing the learned correspondences and embeddings from drifting or becoming inconsistent over time. The combined effect of these components—entropic optimal transport for soft correspondence, the OT-weighted contrastive objective for semantic clarity, and cycle reconstruction for stability—enables SCOT to jointly control correspondence capacity, semantic discriminability, and training robustness. These rigorous design principles ensure that the framework builds reliable and scalable AI solutions, a critical aspect when developing tailored solutions like those offered through ARSA's AI Box Series for edge deployments.

Scaling to Multiple Cities with a Central Hub

Many real-world scenarios involve leveraging data from not just one, but multiple source cities to make predictions in a single target city. SCOT extends its capabilities to multi-source transfer by introducing a shared "prototype hub." This hub consists of a set of learnable, abstract prototypes that act as common reference points. Instead of directly aligning each source city to the target city, SCOT aligns each source city and the target city to this shared prototype hub using balanced entropic transport. This approach helps prevent any single source city from dominating the learning process or creating conflicting gradients when trying to reconcile information from various sources.

The alignment to the shared hub is further guided by a "target-induced prototype prior," which ensures that the learned prototypes are relevant to the characteristics and data distribution of the specific target city. This strategic alignment via a central hub allows for more effective knowledge consolidation from diverse sources while maintaining focus on the target's unique context. The learned transport couplings and hub assignments offer interpretable diagnostics, providing insights into the quality of alignment and how different city regions relate semantically. This type of advanced, flexible architecture is essential for developing powerful custom AI solutions that adapt to varying data landscapes.

Real-World Impact and Robustness

Experiments conducted using SCOT across real-world urban computing tasks demonstrate consistent improvements in transfer accuracy and robustness. The framework was tested on critical tasks such as regional GDP estimation, population prediction, and carbon footprint assessment. In both single-source and multi-source transfer scenarios, SCOT consistently outperformed strong baseline methods, showcasing its ability to provide more accurate and reliable predictions. Crucially, the system demonstrated improved robustness, particularly under conditions of high data heterogeneity and scarce labels in the target city.

Further tests confirm that these gains are primarily due to SCOT's innovative alignment design, rather than merely increased encoder capacity. This highlights the significance of developing principled methods for data alignment and correspondence when dealing with complex, heterogeneous datasets like those found in urban environments. The ability to learn explicit, soft correspondences without ground-truth matching means that AI can be deployed more effectively in a wider range of cities, fostering more informed decision-making for urban development, environmental sustainability, and public safety. This scientific work, detailed in the paper "SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective" by Yuyao Wang et al. (Source: https://arxiv.org/abs/2604.07383), represents a significant step forward in making AI truly practical and scalable for global urban challenges.

To explore how advanced AI and IoT solutions can transform your operations and empower your enterprise with data-driven insights, contact ARSA for a free consultation.