FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments
Published in Under review at ACL 2026, 2026
FinVault is a benchmark designed to evaluate the safety of financial AI agents in execution-grounded environments — settings in which an agent does not merely produce text answers, but actually executes actions (queries, trades, transfers, account look-ups) inside a simulated financial system.
The benchmark stresses agents along multiple safety dimensions, including:
- compliance with financial regulations,
- robustness to prompt-injection and jailbreak attacks,
- correctness under partial / inconsistent information,
- recovery from execution errors.
Status: under review at ACL 2026 (first co-author).
Recommended citation: Runguo Li*, et al. (2026). "FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments." Under review at ACL 2026. (* First Co-Author)
