Adaptive Policy Alignment for Multi-Agent Large Language Models via Reinforcement Learning in Dynamic Task Environments

Oliver J. Thompson; Daniel P. Hughes; Amelia R. Clarke

doi:10.71465/ajml3665

Vol. 7 No. 1 (2026), Articles

Vol. 7 No. 1 (2026)

Adaptive Policy Alignment for Multi-Agent Large Language Models via Reinforcement Learning in Dynamic Task Environments

Articles

Published 2026-04-01

Oliver J. Thompson⁺⁻
Daniel P. Hughes⁺⁻
Amelia R. Clarke⁺⁻

https://doi.org/10.71465/ajml3665

Oliver J. Thompson

Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, United Kingdom

Daniel P. Hughes

Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, United Kingdom

Amelia R. Clarke

Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, United Kingdom

PDF

Keywords

Large language models
Multi-agent systems
Reinforcement learning
Policy alignment
Collaborative decision-making

Abstract

Large language model (LLM) agents have shown promising capabilities in collaborative decision-making, yet their performance often degrades under dynamically changing task constraints. This study proposes an adaptive policy alignment framework that integrates proximal policy optimization with prompt-conditioned policy modulation to enhance coordination among multiple LLM agents. The framework is evaluated on a synthetic decision-making benchmark comprising 12,000 task instances across three environments: resource allocation, negotiation and sequential planning. Experimental results show that the proposed method improves task success rate from 71.3% to 84.6% and reduces decision latency by 23.5% compared to static prompt-based agents. Furthermore, inter-agent consistency, measured via action divergence entropy, decreases by 31%, indicating improved collaborative coherence. The findings demonstrate that reinforcement learning-based alignment significantly enhances the adaptability and efficiency of LLM-driven multi-agent systems.

PDF

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.