Daily AI Digest — 2026-05-11
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
Published:
张量操作、autograd、nn.Module、优化器、分布式训练(DDP/FSDP)、混合精度(AMP)、CUDA
ndarray、广播机制、矩阵运算、线性代数(SVD/特征值)、FFT、在LLM中的数据预处理
DataFrame、数据读写、清洗、groupby聚合、merge/join、训练数据预处理
BaseModel、数据验证、序列化、Field约束、与LLM结构化输出配合
BPE分词、encoding_for_model、token计数与成本估算、cl100k_base/p50k_base编码
BPE/WordPiece/Unigram算法、训练自定义分词器、与transformers集成
PreTrainedModel/AutoModel架构、from_pretrained加载流程、generate解码、Trainer训练循环、量化
LoRA/AdaLora/QLoRA数学原理、LoraLayer注入与合并、get_peft_model
SFTTrainer/DPOTrainer/GRPOTrainer、DPO/GRPO/KTO数学原理、对齐训练流程
Accelerator、DDP/FSDP/DeepSpeed集成、device_map自动分片、混合精度
load_dataset、map批量处理、流式加载(streaming)、Arrow零拷贝架构
LLM.int8()混合精度分解、NF4量化(QLoRA)、4bit/8bit推理与训练
ZeRO-1/2/3显存分析、ZeRO-Offload、激活检查点、HF Trainer集成
张量并行(Column/Row)、流水线并行(1F1B)、序列并行、上下文并行(Ring Attention)
wandb.init/log、Sweep超参搜索、Artifact版本管理、HF集成
Tracking/Models/Registry/Projects、autolog、HuggingFace集成
PagedAttention原理、连续批处理、LLM类/SamplingParams、OpenAI兼容服务器
RadixAttention基数树KV cache复用、编程原语、约束解码
计算图优化、内核融合、INT8/FP8量化、张量并行/流水线并行
GGUF格式、Q4_0~Q6_K量化、CPU/GPU推理、mmap内存映射
Llama类、create_completion/chat、GBNF语法约束、嵌入提取、OpenAI服务器
IO-awareness原理、在线Softmax、O(N²)→O(N)内存、分块计算推导
memory_efficient_attention、SwiGLU、RoPE旋转位置编码、稀疏注意力
路由/Pydantic集成、SSE流式响应、中间件、LLM推理API服务
Interface/Blocks、Chatbot组件、流式输出、HF Spaces部署
StateGraph、节点/边/条件路由、持久化、Human-in-the-loop、多Agent
Index/Retriever/QueryEngine、RAG管道抽象、Document/Node、Agent工具调用
Signature、Module(Predict/ChainOfThought/ReAct)、Optimizer自动优化
ConversableAgent、GroupChat、代码执行(本地/Docker)、v0.4 AgentChat
Agent/Task/Crew、Sequential/Hierarchical流程、Memory记忆系统
response_model、流式模式、Mode(TOOLS/JSON/MD_JSON)、Pydantic验证
generate.text/choice/regex/json/cfg、FSM驱动采样、logit偏置原理
CodeAgent/ToolCallingAgent、@tool装饰器、HfApiModel、代码沙箱
evaluate()、Faithfulness/AnswerRelevancy/ContextPrecision等指标、TestsetGenerator
LLMTestCase、Hallucination/Bias/Toxicity等指标、GEval通用评估
@traceable追踪、evaluate()评估、数据集管理、trace→span层级
OpenTelemetry追踪、嵌入UMAP可视化、评估器、本地优先架构
PersistentClient、Collection CRUD、嵌入函数、元数据过滤、RAG存储
HNSW索引、Filter过滤系统、Payload索引、量化、FastEmbed集成
IVF/PQ/HNSW索引、hybrid_search、存算分离架构、MilvusClient
vector/halfvec/sparsevec类型、HNSW/IVFFlat索引、SQL距离操作符
Open-source CLI-first research assistant agent for AI researchers. Tool use, three-layer memory, plan–execute safety, skill self-learning, multi-LLM fallback.
Published in Submitted to EMNLP 2026, 2026
The first execution-grounded security benchmark for LLM-based financial agents — 31 regulatory sandbox scenarios, 107 real-world vulnerabilities, 963 test cases. Submitted to EMNLP 2026.
Recommended citation: Zhi Yang, Runguo Li, Qiqi Qiang, Jiashun Wang, Fangqi Lou, et al. (2026). "FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments." arXiv preprint arXiv:2601.07853. Submitted to EMNLP 2026.
Download Paper