Abstract
Large language model (LLM) agents have shown promising capabilities in collaborative decision-making, yet their performance often degrades under dynamically changing task constraints. This study proposes an adaptive policy alignment framework that integrates proximal policy optimization with prompt-conditioned policy modulation to enhance coordination among multiple LLM agents. The framework is evaluated on a synthetic decision-making benchmark comprising 12,000 task instances across three environments: resource allocation, negotiation and sequential planning. Experimental results show that the proposed method improves task success rate from 71.3% to 84.6% and reduces decision latency by 23.5% compared to static prompt-based agents. Furthermore, inter-agent consistency, measured via action divergence entropy, decreases by 31%, indicating improved collaborative coherence. The findings demonstrate that reinforcement learning-based alignment significantly enhances the adaptability and efficiency of LLM-driven multi-agent systems.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Copyright (c) 2026 Oliver J. Thompson, Daniel P. Hughes, Amelia R. Clarke (Author)