Shielded Agentic RL with Formal Failure-Set Constraints for Safe Web Actions under Budget Limits
PDF

Keywords

Safe reinforcement learning
action shielding
formal constraints
web agents
enterprise automation
failure set
auditability
budget constraints

Abstract

Enterprise web agents face safety-critical constraints where certain action sequences can cause irreversible harm. We propose a shielded agentic RL framestudy that integrates a lightweight action shield with constrained policy learning. The shield uses (i) a learned classifier over DOM/state features to detect proximity to a predefined failure set (e.g., destructive actions, payment confirmation), and (ii) a rule+model hybrid verifier that blocks or rewrites actions when predicted failure probability exceeds a budget-aware threshold. The underlying policy is optimized via constrained RL to minimize multi-cost usage while maximizing task success, with the shield providing safety guarantees during both training and deployment. We recommend evaluation on safety-focused benchmarks (e.g., enterprise web agent safety suites) with 500–1,500 tasks, measuring prevented unsafe actions, residual failure rate, added latency overhead, and success under strict cost caps. The proposed design emphasizes auditability: the shield logs blocked actions and provides human-readable rationales, enabling compliance-oriented deployment.

PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright (c) 2026 Daniel Thompson, Emily Chen, Michael R. Brown (Author)