Shielded Agentic RL with Formal Failure-Set Constraints for Safe Web Actions under Budget Limits

Daniel Thompson; Emily Chen; Michael R. Brown

doi:10.71465/ajml3652

Vol. 7 No. 1 (2026), Articles

Vol. 7 No. 1 (2026)

Shielded Agentic RL with Formal Failure-Set Constraints for Safe Web Actions under Budget Limits

Articles

Published 2026-03-25

Daniel Thompson⁺⁻
Emily Chen⁺⁻
Michael R. Brown⁺⁻

https://doi.org/10.71465/ajml3652

Daniel Thompson

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada

Emily Chen

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada

Michael R. Brown

Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4, Canada

PDF

Keywords

Safe reinforcement learning
action shielding
formal constraints
web agents
enterprise automation
failure set
auditability
budget constraints

Abstract

Enterprise web agents face safety-critical constraints where certain action sequences can cause irreversible harm. We propose a shielded agentic RL framestudy that integrates a lightweight action shield with constrained policy learning. The shield uses (i) a learned classifier over DOM/state features to detect proximity to a predefined failure set (e.g., destructive actions, payment confirmation), and (ii) a rule+model hybrid verifier that blocks or rewrites actions when predicted failure probability exceeds a budget-aware threshold. The underlying policy is optimized via constrained RL to minimize multi-cost usage while maximizing task success, with the shield providing safety guarantees during both training and deployment. We recommend evaluation on safety-focused benchmarks (e.g., enterprise web agent safety suites) with 500–1,500 tasks, measuring prevented unsafe actions, residual failure rate, added latency overhead, and success under strict cost caps. The proposed design emphasizes auditability: the shield logs blocked actions and provides human-readable rationales, enabling compliance-oriented deployment.

PDF

This work is licensed under a Creative Commons Attribution 4.0 International License.