Daily Digest — 2026-05-18

Sunday, May 17, 2026 · 4 items · model: deepseek/deepseek-chat

4 items · 4 industry media

🏛️ Research Labs

No new items today.

📜 arXiv Papers

No new items today.

📰 Industry Media (4)

A Coding Implementation to Compress and Benchmark Instruction-Tuned LLMs with FP8, GPTQ, and SmoothQuant Quantization using llmcompressor

MarkTechPost · Sana Hassan · 2026-05-17

The study presents a systematic comparison of post-training quantization methods for instruction-tuned LLMs using llmcompressor, evaluating FP8 dynamic quantization, GPTQ W4A16, and SmoothQuant with GPTQ W8A8 on the Qwen2.5-0.5B-Instruct model. Methodologically, it benchmarks each variant's disk size, generation latency (tokens/sec), perplexity (WikiText-2), and output quality using a standardized pipeline with 256 UltraChat calibration samples. Results show FP8 dynamic quantization achieves 1.8× speedup over FP16 baseline with minimal perplexity degradation (ΔPPL +0.4), while GPTQ W4A16 reduces model size by 4× at a cost of 1.2× higher latency and ΔPPL +1.7.

post-training quantizationinstruction-tuned llmsgptqsmoothquantperplexity benchmarking

Vercel Labs Introduces Zero, a Systems Programming Language Designed So AI Agents Can Read, Repair, and Ship Native Programs

MarkTechPost · Michal Sutter · 2026-05-17

Vercel Labs introduces Zero, a systems programming language optimized for AI agent workflows through structured compiler diagnostics and repair hints. Zero emits JSON-formatted errors with stable codes (e.g., NAM003) and typed repair IDs (e.g., 'declare-missing-symbol'), enabling agents to bypass unstructured text parsing. The language enforces capability-based I/O (e.g., World object for system access) and compiles to sub-10KiB native binaries. Key toolchain features include zero explain for diagnostic lookup and zero fix --plan for machine-readable repair plans. Early benchmarks show reduced fragility in agent repair loops compared to traditional compiler interfaces.

structured diagnosticscapability-based i/oagent-native toolchaintyped repair idsnative binaries

A Coding Guide Implementing SHAP Explainability Workflows with Explainer Comparisons, Maskers, Interactions, Drift, and Black-Box Models

MarkTechPost · Sana Hassan · 2026-05-17

This tutorial implements a comprehensive SHAP workflow for model interpretability, comparing Tree, Exact, Permutation, and Kernel explainers on an XGBoost regression task. It demonstrates correlation-aware attribution through Independent vs Partition maskers, quantifies interaction effects via SHAP interaction values, and contrasts log-odds versus probability space explanations for classification. The method includes Owen values from hierarchical feature clustering, cohort analysis with bootstrap CIs, SHAP-based feature selection, and drift detection via Kolmogorov-Smirnov tests on attribution distributions.

shapxgboostmaskersinteraction valuesowen values

Nous Research Proposes Lighthouse Attention: A Training-Only Selection-Based Hierarchical Attention That Delivers 1.4–1.7× Pretraining Speedup at Long Context

MarkTechPost · Asif Razzaq · 2026-05-16

Nous Research introduces Lighthouse Attention, a training-only hierarchical attention mechanism that accelerates pretraining on long sequences by 1.4–1.7×. The method symmetrically pools queries, keys, and values across a multi-level pyramid, selects top-K entries via a parameter-free scorer, and processes them with standard FlashAttention. Benchmarks show 21× faster forward passes and 17.3× faster forward+backward passes at 512K context, with matching or lower final loss compared to dense SDPA baselines.

lighthouse attentionhierarchical attentionflashattentionpretraining speeduplong-context


Generated automatically at 2026-05-17 20:04 UTC. Summaries and keywords are produced by an LLM and may contain inaccuracies — always consult the original article.