Calibration of Agent-Based Models. 11th July 2025, Cambridge.

Data Assimilation for Agent-Based Models
(With a sprinkling of ABC and LLMs)


Nick Malleson, University of Leeds, UK


Slides available at:
https://urban-analytics.github.io/dust/presentations.html


Nice sunset/sunrise

Outline

Dynamic re-calibration of a COVID-19 ABM using Approximate Bayesian Computation

Data assimilation for ABMs using Particle Filters and Kalman Filters

For desert: LLM-backed ABMs

Diagram of the ramp model components described in the slide text
Spooner et al. (2021) A dynamic microsimulation model for epidemics. Social Science & Medicine 291, 114461. DOI: 10.1016/j.socscimed.2021.114461

DyME: Dynamic Model for Epidemics

COVID transmission model with components including:

dynamic spatial microsimulation, spatial interaction model, data linkage (PSM), infection model

Represents all individuals in a study area with activities: home, shopping, working, schooling

Daily timestep

Figure illustrates the process of COVID spreading as people visit places

DyME Drawbacks:
Data and Latent Variables

Incredible detailed model!

BUT only data available for validation: COVID cases and hospital deaths

Only quantify a tiny part of the transmission dynamics

Manually calibrated most of the parameters

Huge uncertainties

Can we use dynamic calibration to:

Infer latent variables

Make better 'real-time' predictions

Different disease stages: susceptible, exposed (pre)(a)symptomatic, removed

Dynamic Calibration for DyME

Use Approximate Bayesian Computation and Bayesian updating ('dynamic re-calibration')

Bayesian updating of the DyME model. The parameter posteriors from one window are
                              used as priors in the next window

Reduce uncertainty and produce more accurate future predictions

Parameter posteriors might reveal information about the model / system

Uncertain predictions made using the DyME parameter posteriors.
                                  The certainty increases as more data become available.
M. Asher, N. Lomax, K. Morrissey, F. Spooner, N. Malleson (2023) Dynamic Calibration with Approximate Bayesian Computation for a Microsimulation of Disease Spread. Nature Scientific Reports 13 (1): 8637. DOI: 10.1038/s41598-023-35580-z
Posterior parameter estimates for the DyME model
M. Asher, N. Lomax, K. Morrissey, F. Spooner, N. Malleson (2023) Dynamic Calibration with Approximate Bayesian Computation for a Microsimulation of Disease Spread. Nature Scientific Reports 13 (1): 8637. DOI: 10.1038/s41598-023-35580-z

Aside: Computational Efficiency

Uncertainty quantification, etc., requires many, many model runs

Difficult with computationally-expensive models, like ABMs

Screenshot of the RAMP GUI

DyME (COVID microsimulation)

A single model run (800,000 individuals, 90 days) took 2 hours

ABC etc. would be impossible at that speed (need 1000s of runs)

Big computers can help

But maybe if I were better at programming ...

Improbable company logo

Re-implemented numpy/pandas model using (py)OpenCL

Run time went from 2 hours to a few seconds!

Transforms the potential uses of the model

Why we need Data Assimilation

Complex models will always diverge

(due to inherent uncertainties in inputs, parameter values, model structure, etc.)

Possible Solution: Data Assimilation

Used in meteorology and hydrology to bring models closer to reality. Combines:

Noisy, real-world observations

Model estimates of the system state

Data assimilation v.s. calibration

Example of optimising the model state using observations and data assimilation

Challenges for using DA with ABMs

Model size

10,000 agents * 5 variables = 50,000 distinct parameters

Agent behaviour

Agent's have goals, needs, etc., so can't be arbitrarily adjusted

Assumptions and parameter types

Maths typically developed for continuous parameters and assume normal distributions

... but, at least, many of these problems are shared by climate models

Data assimilation with a Particle Filter

Flowchart of the experimental design. Lots of models ('particles') are run simultaneously.

Crowd Simulation with a Particle Filter

Animation of a crowding model with data assimilation

Particle Filter Results

Box Environment: More particles = lower error

Exponential increase in complexity

median absolute error change with number of agents and particles: greater
                                  complexity caused by larger numbers of agents can be mitigated by increasing
                                  the numbers of particles.
Malleson, Nick, Kevin Minors, Le-Minh Kieu, Jonathan A. Ward, Andrew West, and Alison Heppenstall. (2020) Simulating Crowds in Real Time with Agent-Based Modelling and a Particle Filter. Journal of Artificial Societies and Social Simulation 23(3) DOI: 10.18564/jasss.4266.

More realistic simulations:
Grand Central Terminal (New York)

Pedestrian traces data

B. Zhou, X. Wang and X. Tang. (2012) Understanding Collective Crowd Behaviors: Learning a Mixture Model of Dynamic Pedestrian-Agents. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2012

http://www.ee.cuhk.edu.hk/~xgwang/grandcentral.html

Cleaned and prepared by Ternes et al. (2021).

Ensemble Kalman Filter (EnKF)

More complicated, and has stronger assuptions, but can update the model state (including categorical parameters) directly

\( \hat{X} = X + K \left( D - H X \right) \)

Current state estimate (\(X\)) updated with new information (\(\hat{X}\))

\(K\) (Kalman gain) balances importance of new data (\(D\)) v.s. current prediction.

\(H X\): prediction transformed into the same space as the observed data (e.g. arrregate observations and individual agents)

Challenges:

Designed for continuous data -- categorical parameters need converting (non trivial)

Unpredictable human behaviour

Problems with numeric scales (struggles with large and small numbers)

EnKF for Pedestrian Simulation

Figure illustrates that, after DA, the posterior estimates of two agents' locations are much closer to their corresponding positions in the data (observation)
Figure 9: Comparison of prior and posterior positions of two agents (‘A’ and ‘B’) in all ensemble member models. After DA the positional estimates of the agents' locations are much more accurate and have lower variability.
Front page of the published paper
Suchak, K., M. Kieu, Y. Oswald, J. A. Ward, and N. Malleson (2024), Coupling an Agent-Based Model and an Ensemble Kalman Filter for Real-Time Crowd Modelling. Royal Society Open Science 11 (4): 231553. DOI: 10.1098/rsos.231553.
Cover image of the paper

Case Study 2:

International Policy Diffusion

ABM simulates COVID-19 policy diffusion via peer mimicry

Particle filter enhances prediction accuracy with real-time data.

Frequent filtering improves results.

 

Y. Oswald, N. Malleson and K. Suchak (2024). An Agent-Based Model of the 2020 International Policy Diffusion in Response to the COVID-19 Pandemic with Particle Filter. Journal of Artificial Societies and Social Simulation 27(2) 3. DOI: 10.18564/jasss.5342

Two subplots showing the progression of COVID-19 policy adoption across countries in March 2020. Panel (a) depicts the number of countries implementing school closures at four levels of stringency (level 0 to level 3) over time, with a rapid transition to level 3 (complete school closures) around mid-March. Panel (b) compares the adoption of various policies, including school closures, workplace closures, event cancellations, stay-at-home orders, domestic travel restrictions, and international travel restrictions, all measured by the number of countries. School closures exhibit the fastest and most widespread adoption, closely followed by event cancellations, with other policies showing slower adoption.
Adoption of school closure policies by governments worldwide. Data source: Our World In Data.

International Policy Diffusion

Global challenges hinge on international coordination of policy

COVID-19 lockdown: compelling example of almost unanimous global response

Aim: Develop a parsimonious ABM to explore mechanisms of international lockdown diffusion and improve prediction accuracy through data assimilation.

Methods

Agent-Based Model (ABM)

Agents: countries, with binary lockdown states ("lockdown" or "not in lockdown").

Behaviour: Peer mimicry based on similarity (income, democracy index, geography).

Secondary mechanism: Autonomous lockdown adoption based on internal thresholds (e.g., population density).

Calibration

Based on real-world lockdown data (March 2020) and parameters like social thresholds, peer group size, and adoption probabilities.

Data assimilation with a particle filter

Updates model predictions in real time using observed data (e.g., lockdown status of countries).

Improves model alignment with real-world dynamics by filtering poorly performing simulations.

“Two panels comparing the percentage of countries in lockdown predicted by the base model and particle filter. Top panel: Base model ensemble run with 100 simulations, showing the mean prediction (black line) closely following observed data (red dashed line) with wide confidence intervals. Bottom panel: Particle filter with 1000 particles, showing improved alignment with observed data and narrower confidence intervals compared to the base model.

Results

After calibration, base model performance is adequate, but exhibits large variance, especially during 'critical' phase (when most countries are going in to lockdown).

Macro performance better than macro performance

An accurate lockdown percentage doesn't mean the right countries are in lockdown

Particle filter narrows confidence intervals and reduces MSE by up to 75%; up to 40% in the critical phase

Performance during the critical few days is crucial if the model is going to be useful

Conclusions

International Policy Diffusion

Proof-of-concept: social / political A-B diffusion models can be combined with data assimilation.

Particle filter improves lockdown predictions, particularly in the 'critical phase'

But the model still incorrectly predicts many countries

Undoubtedly need a more nuanced model to improve predictions further (beyond peer mimicry).

Cover image of the paper
Oswald, Y., K. Suchak, and N Malleson (2025). Agent-Based Models of the United States Wealth Distribution with Ensemble Kalman Filter. Journal of Economic Behavior & Organization 229:106820. DOI: 10.1016/j.jebo.2024.106820

Case Study 3

Wealth Diffusion with an EnKF

Significant wealth inequality in the U.S.

The top 1% hold ~35% of wealth, while the bottom 50% hold almost none.

(Near) real-time predictions are essential, particularly during crises

Paper explores the integration of ABMs with data assimilation to improve prediction accuracy.

Wealth Diffusion: Context

Wealth distribution among different wealth groups in the U.S. The 50% poorest people lost significant wealth during the 2008 financial crisis. The data are from https://realtimeinequality.org/ with details in Blanchet et al. (2022).
Wealth distribution in the U.S. D from realtimeinequality.org with details in Blanchet et al. (2022).

Methods (i)

Wealth Diffusion with an EnKF

Developed two agent-based models of U.S. wealth distribution:

Model 1:

Adapted from literature, focused on wealth accumulation through proportional allocation of growth.

Agents' wealth grows as a function of their initial wealth, reflecting the compounding effect of wealth.

Limited agent interaction; growth is largely independent of network effects.

Model 2:

Developed from scratch, includes network-based agent interactions and adaptive behaviours (more akin to a 'true' ABM)

Flowchart illustrating the particle filter process. It begins with generating an ensemble of models, followed by predictions. At each time step, the algorithm checks if filtering is required (t = k). If yes, the predictions are compared to observations, optimal synthesis is computed, and ensemble members are updated. The process repeats for subsequent time steps.

Methods (ii)

Wealth Diffusion with an EnKF

Integrated the ABMs with an Ensemble Kalman Filter (EnKF):

EnKF adjusted agent-specific variables (e.g., wealth per agent) dynamically to match observed data.

Calibrated to U.S. wealth data (1990–2022) and tested them against real-time wealth estimates.

Results of Experiment 1 Part A: illustrating the error of models 1 and 2 under Ensemble Kalman Filter (EnKF) optimisation with 100 ensemble members compared to the real data. The filter is applied every 20 time steps (months). Panels (A–D) depict the wealth share of the different economic groups (top 1%, top 10%–1%, middle 40%, bottom 50%) over time. Panels (A) and (B) present the archetypal behaviour of a single model run, illustrating how the EnKF influences the model behaviour. Panels (C) and (D) show the mean EnKF prediction and uncertainty across all ensemble members. Panel (E) depicts the Mean Absolute Error (MAE) from Eq. (7) of the two models, with and without the EnKF. It is clear that

Results

EnKF improved model accuracy significantly (20–50% error reduction).

Corrected disparities in predicted wealth shares for different economic groups (observe the jagged lines).

Filter still exhibited some unexpected behaviour

Image to liven up slide showing abstract people and a network

Conclusions

Wealth Diffusion with an EnKF

We show that a marco-economic ABM can be optimised with an EnKF

Improved short-term predictions, especially during a crisis

Essential during crises; models cannot include everything

Additional opportunity for improved understanding

E.g. through examining evolution of the Kalman Gain matrix and contrasting the observation v.s. model weights -- which become more or less certain over time?

2 last-minute things that may be relevant:

History matching and ABC for a ABM

McCulloch, J., J. Ge, J.A. Ward, A. Heppenstall, J.G. Polhill, N. Malleson (2022) Calibrating agent-based models using uncertainty quantification methods. Journal of Artificial Societies and Social Simulation 25, 1. DOI: 10.18564/jasss.4791 (open access)

Emulating a pedestrian ABMwith a Gaussian Process Emulator

Kieu, M., H. Nguyen, J. A. Ward, and N. Malleson (2024). Towards Real-Time Predictions Using Emulators of Agent-Based Models. Journal of Simulation 18 (1): 29–46. DOI: 10.1080/17477778.2022.2080008

For desert:

Foundation models and large-language models to drive agent behaviour

Foundation-model-backed ABMs

Modelling human behaviour in ABMs is (still!) an ongoing challenge

Behaviour typically implemented with bespoke rules, or even more advanced mathematical approaches

But still very limitted and brittle behaviour

Can new AI approaches offer a solution?

Large Language Models can respond to prompts in 'believable', 'human-like' ways

Geospatial Foundation Models capture nuanced, complex spatial associations

Multi-modal Foundation Models operate with diverse data (text, video, audio, etc.)

Where might this lead...

"All models are great, until you need them"

It's fine to use models under normal conditions. Very useful.

Especially if the system undergoes a fundamental change (COVID? Global financial crash?) -- then we really need models to help

But then they become totally useless!

 

Foundation-model-backed ABMs

Maybe a model with LLM-backed agents would be better able to respond after a catastrophic system change

Diagram showing traditional ABM v.s. one where the agents are controlled by LLMs

Large Language Models (LLMs)

Early evidence suggests that large-language models (LLMs) can be used to represent a wide range of human behaviours

Already a flurry of activity in LLM-backed ABMs

E.g. AutoGPT, BabyAGI, Generative Agents, MetaGP ... and others ...

Image of the ABM created by Park et. al.
Park, Joon Sung, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. ‘Generative Agents: Interactive Simulacra of Human Behavior’. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 1–22. San Francisco CA USA: ACM. DOI: 10.1145/3586183.3606763.

LLMs & ABMs: Challenges

Lots of them!

Computational complexity: thousands/millions of LLMs?

Bias: LLMs very unlikely to be representative (non-English speakers, cultural bias, digital divide, etc.)

Validation: consistency (i.e. stochasticity), robustness (i.e. sensitivity to prompts), hallucinations, train/test contamination, and others

Main one for this talk: the need to interface through text

Communicating -- and maybe reasoning -- with language makes sense

But having to describe the world with text is a huge simplification / abstraction

A solution? Multi-modal and Geospatial Foundation Models

Foundation models: "a machine learning or deep learning model trained on vast datasets so that it can be applied across a wide range of use cases" (Wikipedia)

LLMs are Foundation models that work with text

Geospatial Foundation Models

FMs that work with spatial data (street view images, geotagged social media data, video, GPS trajectories, points-of-interest, etc.) to create rich, multidimensional spatial representations

Multi-modal Foundation Models

FMs that work with diverse data, e.g. text, audio, image, video, etc.

Towards Multi-Modal Foundation Models for ABMs (??)

GFMs and LLMs: a new generation of ABMs?

LLMs 'understand' human behaviour and can reason realistically

GFMs provide nuanced representation of 'space'

How?

I've no idea! Watch this space.

Insert spatial embeddings directly into the LLM?

Use an approach like BLIP-2 that trains a small transformer as an interface between an LLM and a vision-language model

Suggestions welcome!

Conceptual LLM ABM city image, from chatgpt

Summary

ABC: dynamic calibration and inferring latent parameters

Data assimilation for ABMs: Ensemble Kalman Filter worked best

LLMs and foundation models for ABMs!

Calibration of Agent-Based Models. 11th July 2025, Cambridge.

Data Assimilation for Agent-Based Models
(With a sprinkling of ABC and LLMs)


Nick Malleson, University of Leeds, UK


Slides available at:
https://urban-analytics.github.io/dust/presentations.html