Posts by Collection

daily

llm-libs

NumPy 科学计算库

ndarray、广播机制、矩阵运算、线性代数(SVD/特征值)、FFT、在LLM中的数据预处理

tiktoken 分词库

BPE分词、encoding_for_model、token计数与成本估算、cl100k_base/p50k_base编码

portfolio

ARH — AI Research Helper Permalink

Open-source CLI-first research assistant agent for AI researchers. Tool use, three-layer memory, plan–execute safety, skill self-learning, multi-LLM fallback.

publications

FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments

Published in Submitted to EMNLP 2026, 2026

The first execution-grounded security benchmark for LLM-based financial agents — 31 regulatory sandbox scenarios, 107 real-world vulnerabilities, 963 test cases. Submitted to EMNLP 2026.

Recommended citation: Zhi Yang, Runguo Li, Qiqi Qiang, Jiashun Wang, Fangqi Lou, et al. (2026). "FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments." arXiv preprint arXiv:2601.07853. Submitted to EMNLP 2026.
Download Paper

talks

teaching