publications | Samuel Garcin

2026

Under review
Beyond Pixel Histories: World Models with Persistent 3D State

Samuel Garcin, Thomas Walker, Steven McDonagh, and 5 more authors

ArXiv preprint, 2026

Website Abs arXiv Bib PDF Code

Interactive world models continually generate video by responding to a user’s actions, enabling open-ended generation capabilities. However, existing models typically lack a 3D representation of the environment, meaning 3D consistency must be implicitly learned from data, and spatial memory is restricted to limited temporal context windows. This results in an unrealistic user experience and presents significant obstacles to down-stream tasks such as training agents. To address this, we present PERSIST, a new paradigm of world model which simulates the evolution of a latent 3D scene: environment, camera, and renderer. This allows us to synthesize new frames with persistent spatial memory and consistent geometry. Both quantitative metrics and a qualitative user study show substantial improvements in spatial memory, 3D consistency, and long-horizon stability over existing methods, enabling coherent, evolving 3D worlds. We further demonstrate novel capabilities, including synthesising diverse 3D environments from a single image, as well as enabling fine-grained, geometry-aware control over generated experiences by supporting environment editing and specification directly in 3D space.
@article{garcin2026pixelhistoriesworldmodels, author = {Garcin, Samuel and Walker, Thomas and McDonagh, Steven and Pearce, Tim and Bilen, Hakan and He, Tianyu and Wang, Kaixin and Bian, Jiang}, journal = {ArXiv preprint}, title = {Beyond Pixel Histories: World Models with Persistent 3D State}, url = {https://arxiv.org/abs/2603.03482}, volume = {abs/2603.03482}, year = {2026}, }

2025

Under review
MEAL: a benchmark for continual multi-agent reinforcement learning

Tristan Tomilin, Luka van den Boogaard, Samuel Garcin, and 4 more authors

arXiv preprint arXiv:2506.14990, 2025

Abs arXiv Bib PDF Code

Benchmarks play a crucial role in the development and analysis of reinforcement learning (RL) algorithms, with environment availability strongly impacting research. One particularly underexplored intersection is continual learning (CL) in cooperative multi-agent settings. To remedy this, we introduce MEAL (Multi-agent Environments for Adaptive Learning), the first benchmark tailored for continual multi-agent reinforcement learning (CMARL). Existing CL benchmarks run environments on the CPU, leading to computational bottlenecks and limiting the length of task sequences. MEAL leverages JAX for GPU acceleration, enabling continual learning across sequences of 100 tasks on a standard desktop PC in a few hours. We show that naively combining popular CL and MARL methods yields strong performance on simple environments, but fails to scale to more complex settings requiring sustained coordination and adaptation. Our ablation study identifies architectural and algorithmic features critical for CMARL on MEAL.
@article{tomilin2025meal, title = {MEAL: a benchmark for continual multi-agent reinforcement learning}, author = {Tomilin, Tristan and Boogaard, Luka van den and Garcin, Samuel and Grooten, Bram and Fang, Meng and Du, Yali and Pechenizkiy, Mykola}, journal = {arXiv preprint arXiv:2506.14990}, year = {2025} }
ICLR
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

Samuel Garcin, Trevor McInroe, Pablo Samuel Castro, and 4 more authors

In The Thirteenth International Conference on Learning Representations, ICLR 2025, Singapore, April 24-28, 2025, 2025

Abs arXiv Bib PDF Code

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents. Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for the actor and for the critic in on-policy algorithms. We focus our study on understanding whether the actor and critic will benefit from separate, rather than shared, representations. Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information from the environment – the actor’s representation tends to focus on action-relevant information, while the critic’s representation specialises in encoding value and dynamics information. We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic’s specialisations and their downstream performance, in terms of sample efficiency and generation capabilities. Finally, we discover that a separated critic plays an important role in exploration and data collection during training. Our code, trained models and data are accessible at https://github.com/francelico/deac-rep.
@inproceedings{DBLP:conf/iclr/GarcinMCLAPA25, author = {Garcin, Samuel and McInroe, Trevor and Castro, Pablo Samuel and Lucas, Christopher G. and Abel, David and Panangaden, Prakash and Albrecht, Stefano V.}, title = {Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=tErHYBGlWc}, timestamp = {Thu, 15 May 2025 17:19:06 +0200}, biburl = {https://dblp.org/rec/conf/iclr/GarcinMCLAPA25.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, }
RLDM
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU

Trevor McInroe, and Samuel Garcin

arXiv preprint arXiv:2502.00021, 2025

Abs arXiv Bib PDF Code

We present PixelBrax, a set of continuous control tasks with pixel observations. We combine the Brax physics engine with a pure JAX renderer, allowing reinforcement learning (RL) experiments to run end-to-end on the GPU. PixelBrax can render observations over thousands of parallel environments and can run two orders of magnitude faster than existing benchmarks that rely on CPU-based rendering. Additionally, PixelBrax supports fully reproducible experiments through its explicit handling of any stochasticity within the environments and supports color and video distractors for benchmarking generalization. We open-source PixelBrax alongside JAX implementations of several RL algorithms at https://github.com/trevormcinroe/pixelbrax.
@article{mcinroe2025pixelbrax, title = {PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU}, author = {McInroe, Trevor and Garcin, Samuel}, journal = {arXiv preprint arXiv:2502.00021}, year = {2025} }

2024

ICML
DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design

Samuel Garcin, James Doran, Shangmin Guo, and 2 more authors

In Forty-first International Conference on Machine Learning, 2024

Abs arXiv Bib PDF Code Poster

Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when these environments share characteristics with the ones they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent’s internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which assume control over level generation. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce data-regularised environment design (DRED). DRED generates levels using a generative model trained to approximate the ground truth distribution of an initial set of level parameters. Through its grounding, DRED achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.
@inproceedings{garcin2024dred, title = {DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design}, author = {Garcin, Samuel and Doran, James and Guo, Shangmin and Lucas, Christopher G and Albrecht, Stefano V}, booktitle = {Forty-first International Conference on Machine Learning}, year = {2024}, }

2023

Workshop
How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Samuel Garcin, James Doran, Shangmin Guo, and 2 more authors

NeurIPS ALOE Workshop, 2023

Abs arXiv Bib PDF Poster

A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent’s internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.
@article{garcin2023level, title = {How the level sampling process impacts zero-shot generalisation in deep reinforcement learning}, author = {Garcin, Samuel and Doran, James and Guo, Shangmin and Lucas, Christopher G and Albrecht, Stefano V}, journal = {NeurIPS ALOE Workshop}, year = {2023}, }

2022

AI Commun.
Deep reinforcement learning for multi-agent interaction

Ibrahim H Ahmed, Cillian Brewitt, Ignacio Carlucho, and 8 more authors

AI Communications, 2022

Abs arXiv Bib PDF

The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
@article{ahmed2022deep, title = {Deep reinforcement learning for multi-agent interaction}, author = {Ahmed, Ibrahim H and Brewitt, Cillian and Carlucho, Ignacio and Christianos, Filippos and Dunion, Mhairi and Fosong, Elliot and Garcin, Samuel and Guo, Shangmin and Gyevnar, Balint and McInroe, Trevor and others}, journal = {AI Communications}, number = {Preprint}, pages = {1--12}, year = {2022}, publisher = {IOS Press}, }

2021

TCST
A Hybrid Controller for Multi-Agent Collision Avoidance via a Differential Game Formulation

Domenico Cappello, Samuel Garcin, Z Mao, and 3 more authors

IEEE Transactions on Control Systems Technology, 2021

Abs DOI Bib PDF

We consider the multi-agent collision avoidance problem for a team of wheeled mobile robots. Recently, a local solution to this problem, based on a game-theoretic formulation, has been provided and validated via numerical simulations. Due to its local nature, the result is not well-suited for online applications. In this article, we propose a novel hybrid implementation of the control inputs that yields a control strategy suited for the online navigation of mobile robots. Moreover, subject to a certain dwell time condition, the resulting trajectories are globally convergent. The control design is demonstrated both via simulations and experiments.
@article{9143181, author = {Cappello, Domenico and Garcin, Samuel and Mao, Z and Sassano, Mario and Paranjape, Aditya and Mylvaganam, Thulasi}, journal = {IEEE Transactions on Control Systems Technology}, title = {A Hybrid Controller for Multi-Agent Collision Avoidance via a Differential Game Formulation}, year = {2021}, volume = {29}, number = {4}, pages = {1750-1757}, doi = {10.1109/TCST.2020.3005602}, }
IROS
GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving

Cillian Brewitt, Balint Gyevnar, Samuel Garcin, and 1 more author

In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

Abs DOI arXiv Bib PDF Code

It is important for autonomous vehicles to have the ability to infer the goals of other vehicles (goal recognition), in order to safely interact with other vehicles and predict their future trajectories. This is a difficult problem, especially in urban environments with interactions between many vehicles. Goal recognition methods must be fast to run in real time and make accurate inferences. As autonomous driving is safety-critical, it is important to have methods which are human interpretable and for which safety can be formally verified. Existing goal recognition methods for autonomous vehicles fail to satisfy all four objectives of being fast, accurate, interpretable and verifiable. We propose Goal Recognition with Interpretable Trees (GRIT), a goal recognition system which achieves these objectives. GRIT makes use of decision trees trained on vehicle trajectory data. We evaluate GRIT on two datasets, showing that GRIT achieved fast inference speed and comparable accuracy to two deep learning baselines, a planning-based goal recognition method, and an ablation of GRIT. We show that the learned trees are human interpretable and demonstrate how properties of GRIT can be formally verified using a satisfiability modulo theories (SMT) solver.
@inproceedings{9636279, author = {Brewitt, Cillian and Gyevnar, Balint and Garcin, Samuel and Albrecht, Stefano V.}, booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, title = {GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving}, year = {2021}, doi = {10.1109/IROS51168.2021.9636279}, }