Daily Digest — 2026-06-14

Saturday, June 13, 2026 · 11 items · model: deepseek/deepseek-chat

11 items · 11 industry media

🏛️ Research Labs

No new items today.

📜 arXiv Papers

No new items today.

📰 Industry Media (11)

How to Build a QwenPaw Agent Workspace with Custom Skills, Model Providers, Console Access, and Streaming API Testing

MarkTechPost · Sana Hassan · 2026-06-13

The tutorial demonstrates a QwenPaw agent workspace implementation with custom skills, multi-provider LLM integration, and API testing capabilities. It details environment configuration (Python 3.10+, 8088 port), provider selection (OpenAI/DeepSeek/Gemini via API keys), and workspace initialization with security controls (tool guarding, skill scanning). The method includes Colab-specific setup for console access, local knowledge integration, and streaming API endpoints, achieving a configurable agent framework with 2048-token context handling and 0.2 temperature sampling.

qwenpawcolabagentapistreaming

Anthropic Disables Claude Fable 5 and Mythos 5 After US Government Order

MarkTechPost · Asif Razzaq · 2026-06-13

Anthropic disabled its frontier models Claude Fable 5 (Mythos-class) and Claude Mythos 5 following a US export control directive citing national security concerns. The models, launched June 9, 2026, employed multi-tier safeguards including classifier-based query routing (5% fallback to Claude Opus 4.8) and 30-day attack detection. Early benchmarks showed Fable 5's capabilities in code migration (50M-line Ruby codebase) and vision tasks (web app reconstruction from screenshots). Anthropic disputes the government's rationale, stating the cited jailbreak technique was narrow, non-universal, and comparable to GPT-5.5's vulnerabilities.

mythos-classclassifier-based routingjailbreak vulnerabilityexport controlfrontier model

Moonshot AI Releases Kimi K2.7-Code: a Coding Model Reporting +21.8% on Kimi Code Bench v2 Over K2.6

MarkTechPost · Asif Razzaq · 2026-06-13

Moonshot AI introduces Kimi K2.7-Code, a coding-specialized Mixture-of-Experts (MoE) model with 1T total parameters and 32B active per token, optimized for long-horizon software engineering tasks. The model features a 256K token context window, INT4 quantization, and integrates a MoonViT vision encoder for multimodal input. Evaluated on Kimi Code Bench v2, K2.7-Code achieves a 21.8% improvement over its predecessor, K2.6, scoring 62.0 versus 50.9. It also demonstrates 30% lower reasoning-token usage, reducing computational costs. Benchmarks show competitive performance against GPT-5.5 and Claude Opus 4.8, particularly excelling in MCP Mark Verified with a score of 81.1.

mixture-of-expertsint4 quantizationreasoning-tokencontext windowmultimodal input

A Coding Implementation on Spatial Graph Neural Networks for Urban Function Inference Using city2graph, OSMnx, and PyTorch Geometric

MarkTechPost · Sana Hassan · 2026-06-13

The tutorial presents a spatial graph neural network pipeline for urban function inference, integrating geospatial data processing, graph construction, and GNN-based classification. Using city2graph, OSMnx, and PyTorch Geometric, it collects POI data from OpenStreetMap, engineers spatial features, constructs proximity graphs (KNN, Delaunay, Gabriel, RNG, EMST, Waxman), and builds heterogeneous and homogeneous graph structures. A GraphSAGE model is trained to predict POI categories, achieving test accuracy and macro-F1 scores. The workflow demonstrates reproducible graph-based urban analysis with synthetic fallback for offline scenarios.

spatial graph neural networksurban function inferencegraphsageproximity graphsheterogeneous graphs

Google Releases Gemini-SQL2: Gemini 3.1 Pro Text-to-SQL Scores 80.04% on BIRD Single-Model Leaderboard

MarkTechPost · Asif Razzaq · 2026-06-12

Google Research introduces Gemini-SQL2, a text-to-SQL system powered by Gemini 3.1 Pro, achieving 80.04% execution accuracy on the BIRD Single-Model Leaderboard. The system translates natural language queries into execution-ready SQL, addressing challenges posed by data subtlety and complex business contexts. BIRD benchmarks evaluate SQL generation across 12,751 question-SQL pairs spanning 95 databases, emphasizing execution-verified accuracy. Gemini-SQL2 outperforms its predecessor Gemini-SQL (76.13%) and other competitors, though it remains 12.92 points below human performance (92.96%). Integration targets include Google Cloud services like BigQuery Studio, but no API or technical details have been released.

text-to-sqlexecution accuracygemini 3.1 probird benchmarknatural language query

Moonshot AI Launches Kimi Work, a Local Desktop Agent Reportedly Running on Kimi K2.6 With a 300-Sub-Agent Agent Swarm

MarkTechPost · Asif Razzaq · 2026-06-12

Moonshot AI introduces Kimi Work, a local desktop agent leveraging the Kimi K2.6 Mixture-of-Experts model (32B active parameters, 256K context window) with a 300-sub-agent swarm for parallel task execution. The system operates via four components: (1) Agent Swarm for task decomposition (up to 4,000 coordinated steps), (2) WebBridge for browser automation using real sessions, (3) Cron scheduling engine for time-based triggers, and (4) local file/code access with write-approval safeguards. Benchmarked against cloud agents, Kimi Work demonstrates native integration with financial data pipelines and document workflows while maintaining on-device execution.

mixture-of-expertsagent-swarmcontext-windowwebbridgecron-engine

Zyphra Release Zamba2-VL: Hybrid Mamba2–Transformer Vision-Language Models That Cut Time-to-First-Token by About an Order of Magnitude

MarkTechPost · Asif Razzaq · 2026-06-12

Zyphra introduces Zamba2-VL, a family of open vision-language models (1.2B, 2.7B, 7B parameters) combining Mamba2 state-space layers with shared Transformer blocks for efficient multimodal processing. The hybrid architecture achieves near-linear-time prefill and fixed-size recurrent state, reducing time-to-first-token by approximately 10× compared to Transformer-based VLMs. Evaluated across 14 benchmarks, Zamba2-VL demonstrates strong performance in visual counting (87.5 on CountBenchQA) and document understanding (90.9 DocVQA), though it lags in knowledge-intensive tasks like MMMU (37.7). Models are Apache 2.0 licensed with optimized CUDA kernels for deployment.

mamba2vision-language modelsstate-space layerstime-to-first-tokenkv cache

A Coding Implementation on MONAI for End-to-End 3D Spleen Segmentation Using UNet on Medical CT Volumes

MarkTechPost · Sana Hassan · 2026-06-12

The tutorial presents an end-to-end 3D medical image segmentation pipeline using MONAI for spleen segmentation on the Medical Segmentation Decathlon Task09 dataset. It employs a 3D UNet model with volumetric CT scans, applying preprocessing steps like orientation alignment, voxel-spacing normalization, and intensity windowing. The method includes mixed-precision training, DiceCE loss, sliding-window inference, and achieves validation via Dice metric, with qualitative visualization of predictions against ground truth.

3d unetmedical image segmentationmonaidicece losssliding-window inference

Perplexity Moves Deep Research Into Computer, Routing Research Subtasks Across 20+ Frontier Models For Reports, Decks, And Dashboards

MarkTechPost · Michal Sutter · 2026-06-11

Perplexity's Deep Research system now integrates with Computer, a cloud-based multi-model orchestration platform, to enhance research accuracy and output quality. The system decomposes complex queries into subtasks, distributing them across 20+ frontier models (including Opus 4.6 and Gemini) via an agentic workflow. Key innovations include Search as Code, which dynamically generates retrieval scripts, and model-agnostic routing for specialized tasks. Benchmark improvements show BrowseComp accuracy rising from 40.7% to 83.8% and Humanity’s Last Exam performance increasing from 36.4% to 50.5%. The system produces cited deliverables (reports, decks, dashboards) while supporting file integration and developer APIs.

multi-model orchestrationagentic workflowsearch as codefrontier modelsbenchmark improvements

xAI Ships Grok Build Plugin Marketplace With MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers Plugins at Launch

MarkTechPost · Michal Sutter · 2026-06-11

xAI introduced the Grok Build Plugin Marketplace, a terminal-integrated catalog enabling developers to install bundled plugins for Grok Build, their coding agent. Plugins combine skills, slash commands, agents, hooks, MCP servers, and LSPs into a single package, streamlining integration workflows. The marketplace launches with six plugins from MongoDB, Vercel, Sentry, Chrome DevTools, Cloudflare, and Superpowers, targeting specific use cases like query optimization and debugging. Security is enforced via SHA pinning, verifying plugin integrity at installation. The catalog is open-source, allowing contributions via pull requests, though third-party plugins remain unverified by xAI.

plugin marketplacesha pinningmcp serverslsp serversterminal integration

Coinbase for Agents: Automating portfolio trading with AI

AI News · Ryan Daws · 2026-06-12

Coinbase for Agents introduces a financial execution layer enabling autonomous AI agents to perform portfolio management through direct API integration. The system employs both command-line interfaces (for local development with Claude Code/Codex) and Model Context Protocol (for web agents like ChatGPT), supporting spot/derivatives trading with planned expansions to equities and commodities. Agents implement dollar-cost averaging strategies using 30-day historical pricing data, while x402 protocol enables external resource purchases. Security features include isolated portfolios and transaction monitoring, with upcoming governance controls for transaction limits. The platform achieves 24/7 portfolio rebalancing with 5-15% limit order thresholds.

model context protocolx402 protocoldollar-cost averaginglimit ordersagentic payment protocol


Generated automatically at 2026-06-13 20:17 UTC. Summaries and keywords are produced by an LLM and may contain inaccuracies — always consult the original article.