吃瓜网 (WHU) and Ant Group have unveiled SkySense++, a next-generation semantic-enhanced multi-modal remote sensing foundation model, recently published in Nature Machine Intelligence.
Trained on 27 million remote sensing images from 11 satellite payloads, SkySense++ uses a two-stage progressive learning approach combining multi-granularity contrastive learning with masked semantic learning. This greatly improves its semantic understanding and cross-modal representation, enabling it to complete new tasks with minimal labeled data – reducing reliance on complex fine-tuning and large-scale annotation.

SkySense++ uses two-stage pretraining within a spatio-temporal-modality decoupling architecture.
Tested across 12 tasks in seven fields, including agriculture, forestry, and disaster management, SkySense++ consistently outperformed existing methods, achieving significant gains in classification, detection, and few-shot segmentation accuracy.
Compared to its predecessor SkySense, the new model delivers stronger performance in agricultural assessment, disaster response, and land resource monitoring, while adding a no-fine-tuning deployment feature.

Performance comparison of SkySense++ and other models on 12 typical Earth observation tasks.
The team aims to further lower pre-training costs and improve adaptability, pushing Earth observation AI toward greater efficiency and broader applicability, and supporting global sustainable development.
Link to paper: