Published in ACS ES&T Air (2025)

Integrating Simulations and Observations: A Foundation Model for Estimating the Aerosol Mixing State Index

Fei Jiang, Zhonghua Zheng, Hugh Coe, Robert M. Healy, Laurent Poulain, Valérie Gros, Hao Zhang, Weijun Li, Dantong Liu, Matthew West, David Topping, and Nicole Riemer

Field Challenges

Aerosol mixing state observations face severe spatiotemporal gaps due to high equipment costs and the technical complexity of in-situ measurements. While particle-resolved models serve as high-fidelity benchmarks, their real-world application is often limited by a lack of accurate emission inventories and detailed input data that truly reflects localized environmental conditions. Capturing pollution spikes and dynamic temporal changes remains a major barrier to reliable climate and health impact assessments.

Methodology

We redefine aerosol emulation as a general task and treat real-world estimation as a specialized downstream application. Our approach utilizes a Foundation Model pretrained on approximately 25,000 particle-resolved simulation samples from the PartMC-MOSAIC model. This foundation encodes process-guided physical knowledge before being fine-tuned using roughly 390 high-value observational samples from the MEGAPOLI winter campaign in Paris, effectively bridging the gap between theoretical modeling and real-world distributions.

Systematic Workflow

Workflow Overview

Integrated framework bridging particle-resolved stochastic simulations (~25,000 samples) with urban field observations. Click the image to toggle high-resolution scroll view.

Research Highlights

>300% Accuracy Gain

The fine-tuned foundation model achieved a testing R² of 0.64, more than tripling the performance of standard AutoML tree-based models (R² 0.19) and vastly outperforming baseline Linear Regression (R² 0.07).

6% Drift Stability

Despite significant temporal distribution drift in field data, the model maintained exceptional stability. Swapped temporal order experiments showed the R² variation was restricted to only about 6%, proving its ability to capture common features across time.

Interactive Slide Deck

EGU 2025 FEI JIANG