Daily Digest — 2026-05-12

Monday, May 11, 2026 · 112 items · model: deepseek/deepseek-chat

112 items · 80 research labs, 32 industry media

🏛️ Research Labs (80)

How enterprises are scaling AI

OpenAI News · 2026-05-11

Enterprise AI scaling requires cultural and operational transformation rather than mere technical deployment, as evidenced by interviews with executives from Philips, BBVA, Mirakl, Scout24, Jetbrains, and Scania. Key findings emphasize five principles: prioritizing cultural adoption before tooling, leveraging governance as an enabler, fostering workflow ownership over passive consumption, ensuring quality before scaling, and protecting expert judgment through hybrid workflows. Organizations achieving sustained impact embed AI in end-to-end workflows with human oversight, focusing on trust, ownership, and quality from inception. The study provides a leadership diagnostic, case metrics, and a practical checklist for responsible AI scaling.

governanceworkflowhybridscalingownership

OpenAI Campus Network: Student club interest form

OpenAI News · 2026-05-11

OpenAI launches the OpenAI Campus Network initiative to foster AI-native campus ecosystems by partnering with student clubs globally. The program aims to facilitate hands-on AI learning through student-led events, workshops, and research projects. Participants gain early access to AI tools, programs, and opportunities while connecting with peers shaping the future of learning and work. The initiative targets student leaders organizing events, building projects, or managing communities, offering collaboration opportunities to advance AI education and innovation on campuses.

ai-nativestudent-ledworkshopsresearchcollaboration

OpenAI launches DeployCo to help businesses build around intelligence

OpenAI News · 2026-05-11

OpenAI launches DeployCo, a standalone business unit focused on enterprise AI deployment, backed by $4B initial investment and partnerships with 19 global firms. DeployCo employs Forward Deployed Engineers (FDEs) to integrate OpenAI models into organizational workflows, leveraging expertise from acquired firm Tomoro and its 150 engineers. FDEs conduct diagnostics, prioritize workflows, and build production systems connecting OpenAI models to enterprise data and processes. The initiative aims to accelerate AI adoption, operationalize frontier AI capabilities, and drive measurable business impact across industries.

forward deployed engineersenterprise aiproduction systemsfrontier aioperational impact

Running Codex safely at OpenAI

OpenAI News · 2026-05-08

OpenAI presents a safety framework for deploying Codex as an autonomous coding agent, emphasizing controlled execution environments and granular telemetry. The method combines sandboxing (constrained file/network access), approval policies (human/AI-mediated risk assessment), and agent-native logging (OpenTelemetry integration). Results include reduced friction for low-risk actions (auto-approved via subagent), enterprise-grade visibility (Compliance Logs Platform), and operational insights (SIEM integration), while maintaining security boundaries through managed configurations and credential pinning.

sandboxingopentelemetryauto-reviewmcp oauthcompliance api

Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

OpenAI News · 2026-05-07

OpenAI introduces GPT-5.5 and GPT-5.5-Cyber, specialized models for cybersecurity workflows, leveraging Trusted Access for Cyber (TAC) to enhance defensive capabilities. TAC employs an identity-based framework to enable verified defenders to perform tasks like vulnerability identification, malware analysis, and patch validation while restricting malicious activities. GPT-5.5-Cyber, in limited preview, supports advanced workflows such as red teaming and penetration testing under stricter safeguards. Initial evaluations indicate GPT-5.5 remains the primary model for most defensive tasks, with GPT-5.5-Cyber tailored for specialized use cases. Partnerships with security vendors aim to accelerate vulnerability research, detection, and remediation, fostering a security flywheel across the ecosystem.

trusted access for cybervulnerability identificationmalware analysispatch validationsecurity flywheel

Parloa builds service agents customers want to talk to

OpenAI News · 2026-05-07

Parloa developed an AI Agent Management Platform (AMP) leveraging OpenAI models (GPT-4.1, GPT-5.4) to automate enterprise customer service interactions. AMP enables non-technical users to configure agents via natural language instructions, simulating and evaluating conversations using LLM-as-a-judge scoring and deterministic checks. The platform optimizes for low-latency, multilingual voice pipelines, achieving 80% reduction in human agent escalations in a global travel deployment. Parloa's evaluation-first approach ensures reliability across millions of interactions by benchmarking models against real production scenarios rather than abstract metrics.

ai agent management platformllm-as-a-judgedeterministic checkslow-latency pipelinemultilingual voice

Advancing voice intelligence with new models in the API

OpenAI News · 2026-05-07

OpenAI introduces three real-time audio models—GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper—to enhance voice-based applications. GPT-Realtime-2 leverages GPT-5-class reasoning for natural conversation flow, tool integration, and context management, supporting a 128K token context window and adjustable reasoning levels. GPT-Realtime-Translate enables live translation across 70+ input and 13 output languages, while GPT-Realtime-Whisper provides low-latency speech-to-text transcription. Evaluations show GPT-Realtime-2 achieves a 15.2% improvement on Big Bench Audio and 13.8% on Audio MultiChallenge, with Zillow reporting a 26-point increase in call success rates. These models target voice-to-action, systems-to-voice, and voice-to-voice use cases.

gpt-realtime-2128k contextlive translationspeech-to-texttool-calling

Testing ads in ChatGPT

OpenAI News · 2026-05-07

OpenAI is expanding its ChatGPT advertising pilot to the UK, Mexico, Brazil, Japan, and South Korea following initial US tests. Ads appear in Free and Go tiers, with strict privacy controls ensuring advertiser isolation from chat data. The system matches ads to conversation topics using contextual signals while excluding sensitive subjects. Early metrics indicate preserved trust (no consumer trust impact) and low dismissal rates (exact figures unspecified). Advertising revenue aims to subsidize infrastructure costs for free-tier accessibility. Ad relevance improves via feedback loops, with user controls including opt-out options and ad personalization management.

contextual advertisingprivacy-preservingfeedback loopsinfrastructure subsidizationopt-out mechanisms

Introducing Trusted Contact in ChatGPT

OpenAI News · 2026-05-07

OpenAI introduces Trusted Contact, an optional safety feature in ChatGPT designed to support users in distress. The feature allows adults to nominate a trusted individual who is notified if automated systems and trained reviewers detect discussions of self-harm indicating serious safety concerns. Notifications undergo human review within one hour and include general context without chat details to preserve privacy. Developed with input from 260+ physicians and 170+ mental health experts, Trusted Contact complements existing safeguards like crisis hotline referrals and refusal of harmful requests. The feature aims to enhance social connection, a key protective factor against suicide risk, while maintaining user autonomy.

chatgptself-harm detectionhuman reviewsocial connectioncrisis intervention

Simplex rethinks software development with Codex

OpenAI News · 2026-05-07

Simplex demonstrates the organizational impact of integrating Codex and ChatGPT Enterprise into software development workflows, achieving significant productivity gains. By adopting Codex as its primary coding agent, Simplex redesigned development processes to delegate multi-step tasks to AI, including code generation, design interpretation, and testing. Quantitative results show reductions in development time: 70% fewer hours per screen, 40% fewer hours for design, and 17% fewer hours for integration testing. The approach emphasizes separating AI execution from human accountability, enabling senior expertise to scale across projects. Simplex's operational model highlights the importance of quantitative validation, governance, and defining AI's role in workflow redesign.

codexchatgpt enterpriseintegration testingcode generationai-native development

How ChatGPT learns about the world while protecting privacy

OpenAI News · 2026-05-06

OpenAI enhances ChatGPT's capabilities through diverse data training while prioritizing privacy safeguards. The model leverages publicly available information, partnerships, and user-generated content, processed through OpenAI Privacy Filter to minimize personal data inclusion. Evaluations show Privacy Filter outperforms comparable tools in removing personal information. Users control data usage via settings, including disabling 'Improve the model for everyone' and utilizing Temporary Chat, which retains conversations for 30 days before deletion. Memory features allow optional retention of user-specific information, with full control over data export and deletion.

privacy filtertemporary chatmemory featuresdata controlsmodel training

Introducing ChatGPT Futures: Class of 2026

OpenAI News · 2026-05-06

OpenAI introduces the ChatGPT Futures Class of 2026, recognizing 26 students leveraging AI for impactful projects across education, accessibility, and research. The program highlights how AI lowers barriers to innovation by enabling rapid prototyping and independent skill acquisition. Honorees receive $10,000 grants and access to frontier models, demonstrating AI's role in amplifying human agency. The initiative underscores the need for educational systems to foster adaptable thinkers who can navigate ambiguity and turn ideas into action. Results show students using AI not to replace effort but to expand creative and problem-solving capacities.

frontier modelsrapid prototypingai literacyadaptive learningin-context learning

Uber uses OpenAI to help people earn smarter and book faster

OpenAI News · 2026-05-06

Uber integrates OpenAI's large language models to enhance its real-time marketplace operations, deploying AI-powered assistants and voice features across its platform. The Uber Assistant leverages multi-agent architectures and specialized models to provide drivers with real-time earnings optimization and marketplace insights, reducing cognitive overhead and accelerating onboarding. Voice experiences powered by OpenAI's Realtime API enable natural language interactions for riders, improving accessibility and reducing friction. Initial results show strong repeat engagement among drivers and faster product iteration cycles. The system prioritizes safety, trust, and low latency, employing internal governance layers like AI Guard to ensure policy compliance and reduce hallucinations.

multi-agent architecturerealtime apicognitive overheadai guardmarketplace insights

Singular Bank helps bankers move fast with ChatGPT and Codex

OpenAI News · 2026-05-06

Singular Bank developed Singularity, an internal assistant leveraging ChatGPT and Codex, to optimize private banking workflows. The system automates portfolio analysis, meeting preparation, and client communication, reducing manual data reconciliation and preparation time. Bankers save 60–90 minutes daily, with meeting prep reduced from ~20 minutes to under 1 minute. Singularity flags portfolio risks, recommends actions, and generates compliant follow-up communications, improving traceability and regulatory reporting. Over 30 days, bankers executed 3,500 operations across 19 workflows, enhancing decision-making speed and consistency while maintaining regulatory standards.

chatgptcodexportfolio analysisregulatory reportingworkflow automation

How frontier firms are pulling ahead

OpenAI News · 2026-05-06

OpenAI introduces B2B Signals, a business extension analyzing enterprise AI adoption patterns based on aggregated usage data from OpenAI products. The study identifies frontier firms—those at the 95th percentile of AI usage—as leveraging 3.5x more intelligence per worker compared to typical firms, driven by deeper, more complex AI integration rather than message volume alone. Frontier firms exhibit a 16x higher usage of Codex, indicating advanced adoption of agentic workflows. Key findings include AI's role in production workflows, function-specific usage growth, and delegation of complex tasks. The report emphasizes measuring depth, governance, and enablement for scaling AI impact.

agentic workflowscodexb2b signalsfrontier firmsproduction workflows

Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)

OpenAI News · 2026-05-05

OpenAI introduces Multipath Reliable Connection (MRC), a novel networking protocol enhancing GPU cluster performance and resilience for large-scale AI model training. MRC employs multi-plane topology, adaptive packet spraying across hundreds of paths, and SRv6-based static source routing to mitigate network congestion and failures. Deployed across NVIDIA GB200 supercomputers, MRC reduces recovery times from seconds to microseconds during link failures, enabling uninterrupted synchronous pretraining. The protocol achieves this while maintaining lower costs and power consumption compared to conventional designs. MRC's specification is released via the Open Compute Project, facilitating broader industry adoption.

multipath reliable connectiongpu clusteringadaptive packet sprayingsr-based routingnetwork resilience

GPT-5.5 Instant: smarter, clearer, and more personalized

OpenAI News · 2026-05-05

OpenAI introduces GPT-5.5 Instant, an upgraded default model for ChatGPT, enhancing accuracy, clarity, and personalization. The model reduces hallucinated claims by 52.5% and inaccurate claims by 37.3% in high-stakes domains like medicine, law, and finance, based on internal evaluations. Improvements include better visual reasoning, STEM question answering, and context-aware responses leveraging past chats, files, and connected Gmail. Memory sources provide transparency into personalized responses, allowing users to manage referenced context. GPT-5.5 Instant replaces GPT-5.3 Instant as the default model, with enhanced personalization rolling out to Plus and Pro users initially.

hallucinated claimsmemory sourcesvisual reasoningcontext-awarepersonalization

GPT-5.5 Instant System Card

OpenAI News · 2026-05-05

OpenAI introduces GPT-5.5 Instant, a high-capability instant model with enhanced safety mitigations in cybersecurity and biological & chemical preparedness domains. The model builds on the safety framework of previous instant models but marks the first instance classified as 'High capability' in these critical categories. GPT-5.5 Instant is distinct from GPT-5.5 Thinking, with GPT-5.3 Instant serving as the primary baseline for comparison. The system card emphasizes the implementation of appropriate safeguards to address potential risks associated with its advanced capabilities.

gpt-5.5-instantsafety mitigationscybersecuritybiological preparednessbaseline

Advancing youth safety and wellbeing in EMEA

OpenAI News · 2026-05-05

OpenAI introduces the European Youth Safety Blueprint and EMEA Youth & Wellbeing Grant recipients, targeting youth AI safety through policy and grassroots initiatives. The Blueprint outlines five pillars: responsible AI in education, age-appropriate safeguards, under-18 risk mitigation, anti-manipulation protections, and parental control standards. Twelve NGOs across EMEA received €500,000 in grants for projects spanning AI literacy, mental health support, and age-assurance research. Initiatives include chatbot-based crisis services (Mental Health Innovations), AI e-tutors for remote communities (Luma), and trafficking survivor tools (Open Source Association). OpenAI collaborates with policymakers and the Beneficial AI for Children coalition to operationalize protections.

age-appropriate safeguardsai literacyage-assurance systemsparental controlsyouth wellbeing

New ways to buy ChatGPT ads

OpenAI News · 2026-05-05

OpenAI introduces expanded advertising capabilities for ChatGPT, including a beta self-serve Ads Manager and cost-per-click (CPC) bidding, to enhance advertiser participation and campaign management. The Ads Manager allows U.S.-based advertisers to directly purchase ads, set budgets, and monitor performance, while CPC bidding aligns spending with user actions. OpenAI collaborates with agency partners (e.g., Dentsu, Omnicom) and integrates measurement tools like Conversions API and pixel-based tracking to provide aggregated performance insights without compromising user privacy. These updates aim to improve ad relevance and optimization, maintaining ChatGPT’s independence and user control.

chatgptcost-per-clickads managerconversions apipixel-based

OpenAI and PwC collaborate to reimagine the office of the CFO

OpenAI News · 2026-05-04

OpenAI and PwC collaborate to develop AI agents for financial workflows, focusing on practical deployment rather than theoretical design. The partnership leverages OpenAI's models (ChatGPT, Codex) and PwC's implementation expertise to automate processes like procurement, contract review, and forecasting, with human oversight. Initial results include 5x contract processing efficiency and handling 200+ investor interactions via IR-GPT. The approach emphasizes governance, runtime controls, and token consumption monitoring for enterprise-scale adoption.

ai agentscodexagentic workflowsruntime controlstoken consumption

How OpenAI delivers low-latency voice AI at scale

OpenAI News · 2026-05-04

OpenAI introduces a novel split relay plus transceiver architecture to achieve low-latency voice AI at scale, addressing challenges in WebRTC session management and global routing. The architecture decouples packet routing from protocol termination, using a lightweight UDP relay for media ingress and a stateful transceiver for WebRTC session state. This design reduces public UDP port exposure, enhances scalability, and minimizes first-hop latency through geographically distributed Global Relay ingress points. The system supports over 900 million weekly active users, ensuring fast connection setup and stable media round-trip times for conversational AI applications.

webrtctransceivericedtlslatency

Introducing Advanced Account Security

OpenAI News · 2026-04-30

OpenAI introduces Advanced Account Security, an opt-in feature enhancing protection for ChatGPT and Codex accounts against unauthorized access. The feature integrates multiple security measures: phishing-resistant authentication via passkeys or physical security keys, restricted account recovery methods, shortened session durations, and automatic exclusion of sensitive conversations from model training. Users gain increased visibility into account activity and session management. A partnership with Yubico offers discounted hardware security keys, while FIDO-compliant alternatives remain available. The initiative aims to safeguard high-risk users, including journalists and researchers, and aligns with OpenAI's broader cybersecurity strategy. Enrollment begins immediately, with mandatory adoption for Trusted Access for Cyber members by June 2026.

passkeysphishing-resistantsession managementfido-compliantmodel training

Where the goblins came from

OpenAI News · 2026-04-29

OpenAI identified an emergent behavior in GPT-5.1+ models involving increased metaphorical references to goblins and gremlins, traced to reinforcement learning (RL) rewards for the 'Nerdy' personality customization feature. Analysis revealed a 175% increase in 'goblin' mentions post-GPT-5.1, with 66.7% of such references originating from the 'Nerdy' personality (2.5% of total responses). RL audits showed a 76.2% uplift in creature-word outputs under 'Nerdy' rewards, which subsequently generalized across the model. OpenAI mitigated this by retiring 'Nerdy', removing goblin-affine rewards, and filtering training data, though GPT-5.5 retained some affinity due to prior training. This case highlights unexpected RL generalization effects and the importance of behavioral auditing tools.

reinforcement learningpersonality customizationbehavioral auditinggeneralizationmodel outputs

Building the compute infrastructure for the Intelligence Age

OpenAI News · 2026-04-29

OpenAI's Stargate initiative aims to address accelerating AI compute demand by scaling infrastructure through ecosystem partnerships. The project leverages collaborations with cloud providers, chipmakers, and local communities to deploy AI-optimized data centers, prioritizing responsible resource use and community benefits. With 3GW added in 90 days, Stargate surpassed its initial 10GW target for US infrastructure by 2029 ahead of schedule. The Abilene, Texas site exemplifies this approach, utilizing closed-loop cooling and NVIDIA GB200 systems to train GPT‑5.5. Compute expansion enables stronger model training, improved serving efficiency, and broader AI accessibility.

compute infrastructureclosed-loop coolingnvidia gb200gpt‑5.5ecosystem partnerships

Cybersecurity in the Intelligence Age

OpenAI News · 2026-04-29

OpenAI proposes a five-pillar action plan to democratize AI-powered cyber defense, addressing the dual-use nature of AI in cybersecurity. The plan emphasizes infrastructure development to support defenders, coordination across government and industry, and enhanced security around frontier capabilities. It aims to preserve deployment visibility and control while empowering users to protect themselves. The initiative builds on OpenAI's commitment to leveraging democratic institutions and broadening access to defensive technologies, targeting resilience in critical systems and national security. The framework was informed by consultations with cybersecurity and national security experts across federal, state, and commercial sectors.

ai-powered cyber defensedemocratizing accessfrontier capabilitiesdeployment visibilitycritical systems

OpenAI models, Codex, and Managed Agents come to AWS

OpenAI News · 2026-04-28

OpenAI and AWS have expanded their partnership to integrate OpenAI's frontier models, Codex, and Managed Agents into AWS environments, enabling enterprise-scale AI deployment within existing infrastructure. The collaboration offers three key capabilities: OpenAI models (including GPT-5.5) on Amazon Bedrock, Codex as a coding harness powered by Bedrock, and Bedrock Managed Agents for multi-step workflows. This integration provides enterprises with secure, governed access to advanced AI tools while leveraging AWS's security controls and procurement workflows. Over 4 million weekly Codex users can now utilize AWS-backed services through CLI, desktop apps, and VS Code extensions in limited preview.

amazon bedrockgpt-5.5managed agentscodexenterprise ai

Our commitment to community safety

OpenAI News · 2026-04-28

OpenAI outlines its safety mechanisms for ChatGPT to mitigate misuse in violent contexts, emphasizing nuanced detection of harmful intent. The approach combines automated systems—using classifiers, reasoning models, and hash-matching—with human review to assess flagged content contextually. Safeguards include refusing operational instructions for violence, escalating high-risk cases to law enforcement, and providing localized crisis resources for users in distress. Parental controls and a forthcoming trusted contact feature enhance user safety. Continuous refinement of detection methods and escalation criteria ensures alignment with evolving risks and expert input.

classifiershash-matchingcontextual reviewescalation criteriaparental controls

OpenAI available at FedRAMP Moderate

OpenAI News · 2026-04-27

OpenAI achieved FedRAMP 20x Moderate authorization for ChatGPT Enterprise and API Platform, enabling U.S. government agencies to securely deploy advanced AI capabilities. The authorization leveraged FedRAMP 20x's cloud-native security evidence, Key Security Indicators (KSI), and automated validation processes, streamlining compliance without compromising rigor. This milestone allows federal agencies to access GPT‑5.5 and Codex Cloud within FedRAMP environments, supporting mission-critical applications like research, translation, and citizen services. Agencies can evaluate OpenAI's Trust Portal for reusable authorization data, streamlining procurement and deployment. OpenAI continues to enhance FedRAMP-compliant features through the Significant Change Notification process, aligning public sector AI adoption with commercial standards.

fedrampksicodex cloudtrust portalsignificant change notification

The next phase of the Microsoft OpenAI partnership

OpenAI News · 2026-04-27

OpenAI and Microsoft have amended their partnership to enhance operational flexibility and scalability in AI development. Key changes include Microsoft retaining its role as OpenAI’s primary cloud partner, with OpenAI products prioritized on Azure unless Microsoft declines support, while OpenAI gains the ability to serve products across any cloud provider. Microsoft’s license to OpenAI intellectual property is now non-exclusive and extends through 2032, with revenue-sharing payments from OpenAI to Microsoft continuing through 2030 under a capped percentage. The partnership focuses on scaling datacenter capacity, developing next-generation silicon, and advancing cybersecurity applications, aiming to broadly disseminate AI benefits globally.

azureintellectual propertydatacenter capacitynext-generation siliconcybersecurity

EMO: Pretraining mixture of experts for emergent modularity

Hugging Face Blog · 2026-05-08

EMO introduces a pretrained mixture-of-experts (MoE) model where modular structure emerges from data without human priors, enabling selective expert use (12.5% of 128 experts) with minimal performance loss. The method enforces document-level expert consistency via router constraints and global load balancing, contrasting with standard MoEs' lexical specialization. Results show EMO maintains full-model performance (1B active, 14B total params) while achieving 97% accuracy with 12.5% experts, versus standard MoEs' sharp degradation. Clustering reveals EMO's experts specialize in semantic domains (e.g., Health, Politics) rather than surface features.

mixture-of-expertsemergent modularityrouter constraintsglobal load balancingsemantic specialization

vLLM V0 to V1: Correctness Before Corrections in RL

Hugging Face Blog · 2026-05-06

The migration from vLLM V0 to V1 prioritized backend correctness over RL objective corrections, ensuring semantic and runtime parity. Four fixes addressed discrepancies: processed rollout logprobs, V1-specific runtime defaults, inflight weight-update paths, and fp32 lm_head for final projection. Initial V1 runs exhibited mismatches in metrics like clip rate, KL, and entropy, traced to backend behavior. Explicit configurations and disabling prefix caching restored equivalence. Final V1 runs matched V0 reference, validating the approach of addressing backend correctness before optimizing RL objectives.

vllmlogprobsrlfp32backend

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Hugging Face Blog · 2026-05-06

The Open ASR Leaderboard introduces private datasets from Appen Inc. and DataoceanAI to mitigate benchmaxxing risks while maintaining openness and standardization. These datasets, covering scripted/conversational speech across accents, enable targeted metrics for nuanced ASR evaluation without compromising public benchmarks. The leaderboard implements a toggle-based macroaverage computation for WER across public/private splits, ensuring model rankings remain unaffected by private data. Since its September 2023 launch, the platform has received 710K+ visits, demonstrating community engagement in advancing speech recognition benchmarking. Future work focuses on evaluations reflecting real-world noisy conditions and improving audio-transcript quality consistency.

benchmaxxingmacroaveragewerasrnormalizer

Granite 4.1 LLMs: How They’re Built

Hugging Face Blog · 2026-04-29

Granite 4.1 introduces a family of dense, decoder-only LLMs (3B, 8B, 30B) trained on ~15T tokens via a multi-stage pre-training pipeline, including long-context extension up to 512K tokens. The models employ Grouped Query Attention, Rotary Position Embeddings, and SwiGLU activations, refined through supervised fine-tuning on 4.1M high-quality samples and reinforcement learning using GRPO with DAPO loss. Notably, the 8B model matches or surpasses the previous 32B MoE Granite 4.0-H-Small across benchmarks like AlpacaEval and GSM8K, demonstrating competitive instruction-following capabilities with lower operational costs.

grouped query attentionrotary position embeddingssupervised fine-tuningreinforcement learninglong-context training

DeepInfra on Hugging Face Inference Providers 🔥

Hugging Face Blog · 2026-04-29

DeepInfra integrates as a serverless inference provider on Hugging Face Hub, offering cost-effective access to over 100 models, including LLMs like DeepSeek V4, Kimi-K2.6, and GLM-5.1. The platform supports conversational and text-generation tasks, with plans to expand to text-to-image, text-to-video, and embeddings. Users can configure API keys and provider preferences via Hugging Face's UI or SDKs (Python and JavaScript), enabling seamless model integration into applications. PRO users receive $2 monthly inference credits, while free users have limited quotas. Future updates may include revenue-sharing agreements with providers.

serverless inferencellmsapi keystext-generationhugging face hub

Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents

Hugging Face Blog · 2026-04-28

NVIDIA introduces Nemotron 3 Nano Omni, a 30B-parameter omni-modal model combining Mamba-Transformer-MoE architecture with C-RADIOv4-H vision and Parakeet-TDT-0.6B-v2 audio encoders for long-context multimodal reasoning. Key innovations include dynamic resolution processing (1,024-13,312 visual patches), Conv3D temporal compression for video, and native audio integration via lightweight projectors. The model achieves state-of-the-art performance on MMlongbench-Doc (2.19× accuracy improvement), OCRBenchV2, and VoiceBench, with 9x higher throughput than alternatives. Training employs staged multimodal alignment, synthetic data generation (11.4M QA pairs), and multimodal RL across H100/B200 clusters.

mamba-transformer-moedynamic resolution processingconv3d temporal compressionmultimodal reinforcement learninglong-context retrieval

How to build scalable web apps with OpenAI's Privacy Filter

Hugging Face Blog · 2026-04-27

The article introduces three scalable web applications leveraging OpenAI's Privacy Filter, a 1.5B-parameter model with 50M active parameters, for PII detection and redaction. These include Document Privacy Explorer, Image Anonymizer, and SmartRedact Paste, all built on gradio.Server for backend consistency. The model achieves state-of-the-art performance on the PII-Masking-300k benchmark with a context length of 128,000 tokens. Applications utilize BIOES decoding, Tesseract OCR, and custom HTML/JS frontends for seamless user experiences. The integration ensures efficient handling of concurrent requests and GPU allocation via Gradio's queueing system.

privacy filtergradio.serverpii detectionbioes decodingtesseract ocr

DeepSeek-V4: a million-token context that agents can actually use

Hugging Face Blog · 2026-04-24

DeepSeek-V4 introduces a 1M-token context window optimized for agentic workloads through hybrid attention mechanisms (Compressed Sparse Attention and Heavily Compressed Attention), reducing KV cache memory to 2% of conventional architectures. The model alternates CSA (4x compression) and HCA (128x compression) layers, employs FP8/BF16 storage, and adds agent-specific features like interleaved reasoning preservation and XML-based tool calls. Evaluations show 67.9 on Terminal Bench 2.0, 80.6 on SWE Verified, and 0.59 MRCR accuracy at 1M tokens, rivaling closed models like GPT-5.4-xHigh and Gemini-3.1-Pro.

compressed sparse attentionkv cacheagentic workloadstool-call schemahybrid attention

How to Use Transformers.js in a Chrome Extension

Hugging Face Blog · 2026-04-23

The article presents an architecture for integrating Transformers.js into Chrome extensions under Manifest V3 constraints, focusing on model hosting in a background service worker. It details a tripartite design: background (model orchestration), side panel (UI), and content script (page interaction), connected via typed messaging. Key implementations include Gemma-4 for text generation and MiniLM for embeddings, with model state managed via KV caching and IndexedDB. The approach ensures efficient resource use and responsive UI while adhering to Chrome's security boundaries.

transformers.jsmanifest v3kv cachingservice workercontent script

QIMMA قِمّة ⛰: A Quality-First Arabic LLM Leaderboard

Hugging Face Blog · 2026-04-21

QIMMA introduces a quality-first Arabic LLM leaderboard addressing systematic issues in Arabic NLP evaluation. It consolidates 109 subsets from 14 benchmarks into a unified suite of 52,000+ samples across 7 domains, with 99% native Arabic content. A multi-stage validation pipeline employs two state-of-the-art LLMs (Qwen3-235B-A22B-Instruct and DeepSeek-V3-671B) for automated assessment, followed by human review for flagged samples. Results reveal systematic quality problems, including false gold indices, cultural insensitivity, and formatting errors. Leaderboard rankings show Arabic-specialized models outperforming size-matched multilingual counterparts, with Jais-2-70B-Chat leading cultural tasks and Qwen3.5-397B excelling in coding.

arabic llmquality validationmultimodel assessmentnative arabicleaderboard

AI and the Future of Cybersecurity: Why Openness Matters

Hugging Face Blog · 2026-04-21

The article argues that open ecosystems provide structural advantages in AI-driven cybersecurity by enabling distributed vulnerability detection, verification, coordination, and patch propagation. Mythos, a frontier LLM embedded within a system optimized for software vulnerability probing and patching, demonstrates the effectiveness of combining substantial compute power, software-relevant training data, and system autonomy. Semi-autonomous AI agents built on open components allow human oversight and auditability, narrowing the capability asymmetry between attackers and defenders. Open-source security tooling enables organizations to deploy AI defensively while maintaining control over sensitive data and processes.

llmvulnerabilitysemi-autonomousopen ecosystemscompute power

Ecom-RLVE: Adaptive Verifiable Environments for E-Commerce Conversational Agents

Hugging Face Blog · 2026-04-16

The paper introduces Ecom-RLVE, a framework extending RLVE-Gym to multi-turn, tool-augmented e-commerce conversations. It features 8 verifiable environments with procedural problem generation, a 12-axis difficulty curriculum, and algorithmic reward verification. The method trains a Qwen 3 8B model using DAPO over 300 steps, demonstrating adaptive difficulty scaling and transfer to real-world task completion. Early results show progressive learning without saturation or starvation patterns, validating the approach for complex, agentic workflows.

ecom-rlveadaptive curriculumverifiable environmentsdapoqwen 3 8b

The PR you would have opened yourself

Hugging Face Blog · 2026-04-16

The authors introduce a Skill-assisted workflow to port language models from Hugging Face Transformers to MLX-LM, enabling rapid availability post-release. The Skill automates model discovery, configuration diffing, checkpoint downloading, and implementation verification while adhering to MLX-LM conventions. It generates PRs with detailed artifacts, including generation examples, numerical comparisons, and per-layer analyses against Transformers baselines. A non-agentic test harness ensures reproducibility and reduces LLM hallucination risks. The Skill was bootstrapped through iterative porting experiments with Claude, incorporating technical and cultural insights from experienced contributors. Results demonstrate improved PR quality and reviewer efficiency.

transformersmlx-lmrope configurationstest harnessmodel porting

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face Blog · 2026-04-16

The article demonstrates finetuning Qwen/Qwen3-VL-Embedding-2B for Visual Document Retrieval (VDR), achieving a 0.947 NDCG@10 score (vs. 0.888 base performance) and outperforming larger models. Methodologically, it details multimodal training components in Sentence Transformers: model configuration (including Router-based composition), dataset preprocessing (53,512 English query-image samples), and loss functions (CachedMultipleNegativesRankingLoss with MatryoshkaLoss for dimensionality reduction). Key innovations include gradient caching for large effective batches and automatic modality detection.

visual document retrievalmultimodal embeddingsentence transformersgradient cachingmatryoshkaloss

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

Hugging Face Blog · 2026-04-15

VAKRA introduces a tool-grounded benchmark for evaluating AI agents' compositional reasoning in enterprise environments, combining API interactions and document retrieval across 62 domains. The benchmark includes four capabilities: API chaining, tool selection, multi-hop reasoning, and multi-source reasoning with policy adherence, requiring 3-7 step workflows. Agents are evaluated using an execution-centric framework that verifies tool-call trajectories and final responses. Initial results indicate poor performance, with detailed error analysis identifying failures in tool selection, argument correctness, and response accuracy.

api chainingmulti-hop reasoningexecution-centric evaluationtool-use policiesmulti-source reasoning

Meet HoloTab by HCompany. Your AI browser companion.

Hugging Face Blog · 2026-04-15

HCompany introduces HoloTab, a browser-based AI agent that automates web tasks through natural language commands. The system combines vision models, action planning, and interface understanding to replicate human-like interactions across websites without requiring technical setup. Key features include 'Routines' for recording and replaying complex workflows, demonstrated to handle tasks like price comparison and job application tracking. The Chrome extension, powered by the Holo3 model, aims to democratize computer-use AI by eliminating technical barriers. Initial release occurred on March 31st, with free public availability.

browser automationvision modelsaction planningroutine recordinghuman-computer interaction

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs

Hugging Face Blog · 2026-04-09

Waypoint-1.5 introduces a real-time generative world model optimized for consumer GPUs, offering two tiers (720p/60 FPS for high-end hardware and 360p for broader accessibility). The model improves upon Waypoint-1 by training on 100x more data, enhancing coherence and motion consistency, while employing efficient video modeling techniques to reduce redundant computation. Key advancements include local execution via Overworld Biome and browser-based access through Overworld Stream, prioritizing responsiveness and interactivity over passive visual fidelity. Results demonstrate scalable deployment across diverse hardware, including RTX 3090-5090 GPUs and Apple Silicon, advancing toward inhabitable AI-native environments.

waypoint-1.5real-time generationinteractive worldsconsumer gpusoverworld biome

Multimodal Embedding & Reranker Models with Sentence Transformers

Hugging Face Blog · 2026-04-09

The Hugging Face blog introduces multimodal embedding and reranker models within the Sentence Transformers framework, enabling cross-modal retrieval and ranking tasks. These models map inputs from text, images, audio, and video into a shared embedding space, supporting applications like visual document retrieval and multimodal RAG pipelines. The post details installation, model loading (e.g., Qwen3-VL-Embedding-2B), encoding methods (encode_query, encode_document), and cross-modal similarity computation. Results demonstrate effective retrieval across modalities, though with noted modality gap effects. Reranker models (e.g., Qwen3-VL-Reranker-2B) further refine rankings through pairwise scoring.

multimodal embeddingcross-modal retrievalsentence transformersmodality gapreranker models

Safetensors is Joining the PyTorch Foundation

Hugging Face Blog · 2026-04-08

Safetensors, a secure tensor serialization format developed by Hugging Face, has transitioned to PyTorch Foundation governance. The format employs a JSON header (≤100MB) with tensor metadata and zero-copy lazy loading, addressing security risks of pickle-based weight sharing. Adopted as the default format on Hugging Face Hub with tens of thousands of multimodal models, it now features formalized governance under Linux Foundation while maintaining backward compatibility. Future work includes device-aware loading (CUDA/ROCm), parallel training support, and quantization format integration (FP8, GPTQ, AWQ).

safetensorszero-copy loadingtensor serializationmodel quantizationpytorch foundation

Welcome Gemma 4: Frontier multimodal intelligence on device

Hugging Face Blog · 2026-04-02

Gemma 4 introduces multimodal open-weight models with Apache 2 licenses, featuring text, image, and audio capabilities optimized for on-device deployment. The architecture includes Per-Layer Embeddings (PLE) for layer-specific token conditioning, shared KV cache for efficiency, and dual RoPE configurations for extended context handling. Benchmark results show a 31B dense model achieving an LMArena score of 1452, with multimodal performance comparable to text generation. The models support object detection, HTML generation, video understanding, and audio transcription out-of-the-box.

pleropekv-cachemultimodallmarena

Falcon Perception

Hugging Face Blog · 2026-04-01

Falcon Perception is a 0.6B-parameter early-fusion Transformer for open-vocabulary grounding and segmentation, processing image patches and text in a unified sequence via hybrid attention. The model employs a structured Chain-of-Perception interface () and lightweight output heads for efficient dense prediction. On SA-Co, it achieves 68.0 Macro-F1 (vs. 62.3 for SAM 3) and introduces PBench, a diagnostic benchmark isolating performance by capability (OCR, spatial, relational, dense). Key innovations include multi-teacher distillation (DINOv3, SigLIP2) and a three-stage training curriculum.

early-fusion transformerchain-of-perceptionhybrid attentionopen-vocabulary groundingmulti-teacher distillation

Any Custom Frontend with Gradio's Backend

Hugging Face Blog · 2026-04-01

gradio.Server extends FastAPI to enable custom frontends while leveraging Gradio's backend infrastructure, including queuing, concurrency control, and ZeroGPU support. The system integrates ML models like BiRefNet for tasks such as background removal, managed via @spaces.GPU for GPU allocation. A case study demonstrates a 50-line Python backend paired with a 1300-line HTML/CSS/JS frontend, achieving complex UI features like drag-and-drop text positioning and client-side PNG export. This approach allows seamless integration of Gradio's API engine with custom frontend frameworks, enhancing flexibility in application development.

gradio.serverfastapizerogpubirefnetconcurrency

Granite 4.0 3B Vision: Compact Multimodal Intelligence for Enterprise Documents

Hugging Face Blog · 2026-03-31

IBM introduces Granite 4.0 3B Vision, a 3B-parameter multimodal model for enterprise document understanding, featuring modular LoRA-based architecture atop Granite 4.0 Micro. The model employs DeepStack injection for hierarchical visual feature processing and is trained on ChartNet, a synthetic 1.7M-sample dataset with code-aligned chart representations. Evaluations show state-of-the-art performance on Chart2Summary (86.4%), table extraction (92.1 TEDS on PubTables-v2), and semantic KVP extraction (85.5% EM on VAREX), outperforming larger models like Qwen3.5-9B in chart understanding tasks.

lora adapterdeepstack injectionchartnetteds metrickvp extraction

Training mRNA Language Models Across 25 Species for $165

Hugging Face Blog · 2026-03-31

OpenMed developed an end-to-end protein AI pipeline for structure prediction, sequence design, and codon optimization, focusing on mRNA language models. They compared transformer architectures for codon-level language modeling, finding CodonRoBERTa-large-v2 superior with a perplexity of 4.10 and Spearman CAI correlation of 0.40. Training spanned 25 species in 55 GPU-hours, costing $165, leveraging ESMFold for structure prediction and ProteinMPNN for sequence design. CodonRoBERTa-base achieved similar perplexity with fewer parameters, demonstrating efficiency. Domain-specific metrics like CAI correlation were crucial for biological relevance.

codon optimizationtransformer architecturesperplexityspearman cai correlationesmfold

TRL v1.0: Post-Training Library Built to Move with the Field

Hugging Face Blog · 2026-03-31

TRL v1.0 introduces a post-training library designed for adaptability in a rapidly evolving field, implementing over 75 methods including SFT, DPO, and GRPO. The library employs a chaos-adaptive design, avoiding rigid abstractions to accommodate shifting paradigms like PPO, DPO, and RLVR. It balances stability and experimentation, with semantic versioning for core components and experimental APIs for emerging methods. TRL integrates deeply with Hugging Face, supports large-scale training, and is downloaded 3 million times monthly. Future work includes asynchronous GRPO, method graduation to stable, and enhanced scaling for MoE architectures.

post-trainingchaos-adaptivesemantic versioningrlvrmoe

Liberate your OpenClaw

Hugging Face Blog · 2026-03-27

The article presents two methods for migrating OpenClaw agents from proprietary to open-source models: Hugging Face Inference Providers for cloud-based deployment and llama.cpp for local execution. The hosted approach offers rapid deployment with models like GLM-5 (noted for strong Terminal Bench performance), while the local method enables private, cost-free operation using quantized models such as Qwen3.5-35B-A3B-GGUF. Configuration details are provided for both approaches, including API token integration and local server setup. The solutions demonstrate comparable capability to closed models while addressing privacy and cost constraints.

openclawhugging face inference providersllama.cppterminal benchgguf

A New Framework for Evaluating Voice Agents (EVA)

Hugging Face Blog · 2026-03-24

The paper introduces EVA, the first end-to-end evaluation framework for conversational voice agents that jointly measures task accuracy (EVA-A) and user experience (EVA-X). EVA employs a bot-to-bot architecture with user simulation, tool execution, and multi-modal validation (deterministic checks + LLM/LALM judges) across 50 airline-domain scenarios. Benchmarking 20 systems revealed a consistent accuracy-experience tradeoff, with named entity errors and multi-step workflows as dominant failure modes, while pass@3 vs pass^3 gaps highlighted consistency challenges.

voice agent evaluationmulti-turn conversationllm-as-judgetask-oriented dialoguespeech fidelity

Build a Domain-Specific Embedding Model in Under a Day

Hugging Face Blog · 2026-03-20

The article presents a pipeline for domain-specific embedding model fine-tuning using synthetic data generation and hard negative mining, requiring less than one day on a single GPU. The method leverages NVIDIA's NeMo Data Designer to generate QA pairs from domain documents without manual labeling, employs contrastive learning with hard negatives, and evaluates using BEIR metrics. Results show 10%+ improvements in Recall@10 and NDCG@10, with Atlassian achieving a 26% Recall@60 boost on JIRA data using Llama-Nemotron-Embed-1B-v2.

synthetic data generationhard negative miningcontrastive learningbiencoder architectureinformation retrieval

State of Open Source on Hugging Face: Spring 2026

Hugging Face Blog · 2026-03-17

The Hugging Face ecosystem experienced rapid growth from 2025 to 2026, with user count reaching 13 million, public models exceeding 2 million, and public datasets surpassing 500,000. Analysis reveals increased user participation in creating derivative artifacts like fine-tuned models and adapters, alongside a shift toward smaller, deployable models (median size 406M parameters). Geographic trends show China surpassing the U.S. in monthly downloads (41% share), while independent developers account for 39% of downloads. Specialized sub-ecosystems emerge around domains and languages, with quantization and mixture-of-experts architectures driving adoption. Open-source AI increasingly intersects with national sovereignty initiatives.

fine-tuned modelsquantizationmixture-of-expertsderivative artifactssovereignty

Holotron-12B - High Throughput Computer Use Agent

Hugging Face Blog · 2026-03-17

Holotron-12B introduces a high-throughput multimodal agent for computer-use tasks, leveraging a hybrid State-Space Model (SSM) and attention mechanism derived from NVIDIA Nemotron-Nano-2 VL. Post-trained on 14B tokens of proprietary data, the model optimizes for long-context inference and efficient VRAM utilization, achieving 8.9k tokens/s throughput at 100 concurrent requests on WebVoyager Benchmark, outperforming Holo2-8B by 2x. It demonstrates strong agentic performance, improving WebVoyager accuracy from 35.1% to 80.5%, and excels in localization benchmarks like OS-World-G and GroundUI. The model’s architecture enables scalable, throughput-bound workloads, positioning it for future high-resolution vision training and commercial deployment.

state-space modelmultimodal agentwebvoyager benchmarkvram utilizationlong-context inference

The new AI-powered Google Finance is expanding to Europe.

Google AI Blog · 2026-05-11

Google Finance expands its AI-powered platform to Europe, featuring local language support and enhanced analytical tools. The system employs natural language processing for financial queries (AI-powered research) and integrates Deep Search for complex questions. Advanced visualization tools include technical indicators like moving average envelopes, while real-time data streams cover commodities and cryptocurrencies. Live earnings calls are augmented with synchronized transcripts and AI-generated highlights. Results indicate improved accessibility to financial analytics through multimodal AI integration.

ai-powered researchdeep searchmoving average envelopesreal-time intelsynchronized transcripts

See what happens when creative legends use AI to make ads for small businesses.

Google AI Blog · 2026-05-08

Google's 'The Small Brief' initiative employs AI-driven creative tools (Flow studio) to enable ad industry experts (Jayanta Jenkins, Tiffany Rolfe, Susan Credle) to produce high-quality campaigns for small businesses (Archangels, South Ferry, Stonewood Farm). The method leverages generative AI for storytelling and workflow optimization, targeting studio-grade ad production with minimal resources. Results will be showcased in June, demonstrating AI's capacity to democratize creative processes for small enterprises.

flow studiogenerative aicreative workflowssmall business adscampaign optimization

5 gardening tips you can try right in Search

Google AI Blog · Megan Stoner · 2026-05-06

Google Search introduces five AI-powered gardening assistance features, leveraging multimodal AI and real-time conversational interfaces. The system combines visual input processing (AI Mode with Canvas tool), local inventory search (Shopping with 'nearby' filter), and diagnostic capabilities (Search Live with Lens integration). Results show 140% growth in 'chaos garden' queries and record highs for 'mini garden' searches (2026), with the tools enabling garden visualization, annual planning, wildflower design, local supply sourcing, and plant health diagnostics through natural language prompts and image analysis.

multimodal aivisual promptingnearby inventory filteringreal-time plant diagnosticschaos garden optimization

Google is partnering with XPRIZE and Range Media Partners on the $3.5 million Future Vision film competition.

Google AI Blog · 2026-05-05

Google collaborates with XPRIZE and Range Media Partners via the 100 ZEROS initiative to launch the $3.5M Future Vision XPRIZE film competition. The initiative seeks short films and trailers envisioning optimistic technological futures, leveraging AI tools like Google Flow to lower production barriers. Submissions (live-action, animation, or AI-assisted) are accepted until August 2026, with the grand prize winner receiving support to develop a feature film. The goal is to democratize filmmaking through technological augmentation.

ai-assisted productioncreative technologyfilm competitionoptimistic futuresproduction barriers

The latest AI news we announced in April 2026

Google AI Blog · The Keyword Team · 2026-05-04

Google announced multiple AI advancements in April 2026, focusing on agentic workflows and efficiency. The Gemini Enterprise Agent Platform enables organizations to build autonomous agents for complex processes, while eighth-generation TPUs optimize compute for agentic AI with improved energy efficiency. Gemma 4, claimed as the most capable open model, supports advanced reasoning, and Deep Research Max automates high-level research tasks. Google Colab's Learn Mode provides personalized coding tutoring, and Google Vids offers free AI-powered video generation. Adoption metrics include 75% of Google Cloud customers using AI and 500M Gemma downloads since launch.

agentic workflowseighth-generation tpusgemma 4deep research maxlearn mode

Reduce friction and latency for long-running jobs with Webhooks in Gemini API

Google AI Blog · Lucia Loher, Hussein Hassan Harrirou · 2026-05-04

The Gemini API introduces event-driven Webhooks to reduce latency in long-running agentic workflows (e.g., Deep Research, batch prompt processing). This push-based system replaces polling by delivering real-time HTTP POST notifications upon task completion, adhering to the Standard Webhooks specification with HMAC/JWKS security. Features include signed headers (webhook-signature, webhook-id, webhook-timestamp) for idempotency, replay attack prevention, and guaranteed at-least-once delivery with 24-hour retries. Developers can configure webhooks globally or per-request via Python SDK.

webhooksgemini apihmacjwksidempotency

Celebrating 20 years of Google Translate: Fun facts, tips and new features to try

Google AI Blog · Rose Yao · 2026-04-28

Google Translate marks 20 years of evolution from statistical machine translation (2006) to neural machine translation (2016) and now Gemini-powered AI models, achieving 250-language coverage for 95% of global populations. Key technical advances include sequence-to-sequence models, TPU-accelerated inference, and context-aware Gemini architectures enabling real-time audio translation with preserved prosody. Current deployments handle 1B monthly users translating 1T words/month, with novel features like pronunciation feedback (English/Spanish/Hindi), visual translation via Lens, and offline support for 10 major languages. Usage analytics reveal 33% adoption for language learning and 35% of Live Translate sessions exceeding 5 minutes.

neural machine translationtensor processing unitssequence-to-sequence modelsin-context learningprosody preservation

Join the new AI Agents Vibe Coding Course from Google and Kaggle

Google AI Blog · Anant Nawalgaria, Frank Guan · 2026-04-27

Google and Kaggle announce the AI Agents Vibe Coding Course, a free five-day intensive program running June 15-19, 2026, focusing on building production-ready AI agents using vibe coding workflows. The course emphasizes natural language as a primary programming interface and integrates tools and APIs to create '10x agents'. Updated content, new speakers, and a hands-on capstone project are included. The previous iteration in November 2023 reached over 1.5 million learners. Participants will gain skills to design, build, and deploy robust agent systems.

ai agentsvibe codingnatural languagecapstone projectproduction-ready systems

8 Gemini tips for organizing your space (and life)

Google AI Blog · Ivy Levine · 2026-04-24

Google Gemini introduces eight multimodal AI-driven strategies for spatial and digital organization, leveraging personalized checklists, visual analysis, and real-time assistance. Techniques include generating tailored cleaning schedules via natural language queries, optimizing storage through image-based spatial analysis, and resolving home repair issues using Gemini Live's visual diagnostics. The system integrates with Ask Maps for efficient errand planning and Nano Banana for virtual room redesigns. Additionally, Gemini assists in plant care optimization and inbox decluttering via summarization and task prioritization, particularly for Ultra Subscribers in the U.S. These methods collectively enhance efficiency in physical and digital environments.

multimodal aivisual diagnosticstask prioritizationnatural language queriesspatial analysis

Here’s how our TPUs power increasingly demanding AI workloads.

Google AI Blog · 2026-04-23

Google's Tensor Processing Units (TPUs) are custom-designed hardware accelerators optimized for AI workloads, enabling efficient execution of complex mathematical operations at scale. Developed over a decade ago, TPUs are tailored specifically for AI model inference and training, leveraging high-bandwidth architecture to maximize computational throughput. The latest TPU generation achieves 121 exaflops of compute power, doubling the bandwidth of previous iterations. This performance enhancement supports increasingly demanding AI applications, including large-scale neural networks and deep learning tasks, while maintaining energy efficiency.

tensor processing unitsai workloadscompute powerbandwidthdeep learning

Elevating Austria: Google invests in its first data center in the Alps.

Google AI Blog · 2026-04-23

Google announced its first data center in Kronstorf, Austria, to bolster digital services and AI capabilities, creating 100 direct jobs. The facility integrates sustainability measures, including a green roof with solar panels, off-site heat recovery, and a fund to improve water quality in the Enns river. Additionally, Google launched a skilling partnership with the University of Applied Science Upper Austria, building on its history of training over 140,000 Austrians to support workforce readiness in an AI-driven economy. This investment aims to enhance Europe’s competitiveness through advanced digital infrastructure and responsible growth.

data centersustainabilityheat recoveryskilling partnershipdigital infrastructure

We're launching two specialized TPUs for the agentic era.

Google AI Blog · 2026-04-22

Google introduces two specialized TPU chips, TPU 8i and TPU 8t, to address demanding AI workloads, particularly for autonomous AI agents. TPU 8i is optimized for inference tasks, enabling rapid reasoning, planning, and execution of multi-step workflows to enhance user experience. TPU 8t is designed for training, capable of handling complex models on a single large memory pool. These TPUs are integrated with Google's full-stack infrastructure, including networking, data centers, and energy-efficient operations, to support scalable and responsive agentic AI systems. The advancements aim to facilitate the deployment of highly efficient AI agents across diverse applications.

tpuinferencetrainingautonomous agentsmulti-step workflows

3 new ways Ads Advisor is making Google Ads safer and faster

Google AI Blog · Priya Baliga · 2026-04-21

Google Ads introduces three agentic AI safety features in Ads Advisor to enhance campaign management efficiency and security. The system employs real-time policy reviews with proactive troubleshooting, continuous security monitoring via personalized dashboards, and automated certification processing using Gemini capabilities. Results include reduced manual workload (weeks to instant for certifications), 24/7 policy violation detection, and improved account security through passkey authentication and user activity monitoring.

agentic aireal-time policy reviewsgemini capabilitiespasskey authenticationsecurity insights dashboard

7 ways to travel smarter this summer, with help from Google

Google AI Blog · 2026-04-17

Google introduces seven AI-enhanced tools to optimize summer travel planning and execution. AI Mode in Search leverages Canvas to generate custom itineraries, including flights, hotels, and attractions, with iterative refinement via follow-up queries. Hotel price tracking, available globally, monitors individual hotel rates and alerts users to significant changes. Agentic capabilities in AI Mode and Ask Maps streamline restaurant bookings by aggregating real-time availability across platforms. Ask Maps, powered by Gemini models, provides personalized recommendations for campsites and amenities based on conversational queries. Google Translate offers live translation through headphones for 70+ languages, enhancing communication during travel.

ai modecanvasagentic capabilitiesgemini modelslive translation

A new way to explore the web with AI Mode in Chrome

Google AI Blog · Robby Stein, Mike Torres · 2026-04-16

Google introduces AI Mode in Chrome, enabling side-by-side web exploration with persistent contextual search. The system maintains search context while users navigate linked pages, allowing real-time follow-up queries against both page content and web knowledge. Early testing indicates reduced tab switching (quantitative metrics unspecified) and improved task focus. The update also enables cross-tab search integration, combining multiple open tabs, images, and files as contextual inputs for AI-powered queries. Currently available in the U.S., the feature supports multimodal inputs including Canvas and image generation tools.

contextual searchmultimodal inputreal-time queryingcross-tab integrationpersistent context

New ways to create personalized images in the Gemini app

Google AI Blog · Animish Sivaramakrishnan, David Sharon · 2026-04-16

Google introduces personalized image generation in the Gemini app by integrating user context from Nano Banana 2 and Google Photos. The system leverages labeled personal media and preferences to reduce prompt engineering effort, enabling style-adaptive outputs (e.g., claymation, watercolor) with 1-click reference photo switching. Privacy is maintained through opt-in data usage without direct model training on private photos. Currently rolling out to U.S. subscribers of Google AI Plus/Pro/Ultra tiers.

personalized image generationnano banana 2prompt engineeringstyle-adaptive outputsopt-in data usage

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

Google AI Blog · Vilobh Meshram, Max Gubin · 2026-04-15

Google introduces Gemini 3.1 Flash TTS, a text-to-speech model offering enhanced controllability, expressivity, and multilingual support. The model employs audio tags for granular vocal style control and supports 70+ languages with native multi-speaker dialogue. On the Artificial Analysis TTS benchmark, it achieves a 1,211 Elo score, placing it in the 'most attractive quadrant' for quality-cost balance. The system integrates SynthID watermarking for AI-generated content detection and provides developer tools via Google AI Studio and Vertex AI.

text-to-speechelo scoremultilingual supportaudio tagssynthid

Turn your best AI prompts into one-click tools in Chrome

Google AI Blog · Hafsah Ismail · 2026-04-14

Google introduces 'Skills in Chrome', a feature enabling users to save and reuse AI prompts as one-click tools within the Chrome browser. The method involves saving prompts from chat history, which can then be executed on any webpage via a shortcut (forward slash or plus button). Early testing demonstrated applications in health (protein macro calculation), shopping (spec comparisons), and productivity (document scanning). The feature includes a pre-built Skills library and maintains Chrome's security protocols, including confirmation prompts for sensitive actions. Rollout begins for Gemini in Chrome on desktop, with Skills syncing across signed-in devices.

ai promptsone-click toolsgemini in chromeprompt reuseworkflow automation

Bringing people together at AI for the Economy Forum

Google AI Blog · James Manyika · 2026-04-14

Google and MIT FutureTech co-hosted the inaugural AI for the Economy Forum to address AI's economic impact and workforce implications. The event convened economists, policymakers, and industry leaders to foster collaboration and identify research gaps. Google announced two initiatives: the AI & Economy Research Program, which funds external experts like MIT's David Autor and supports studies on AI's sector-specific transformations, and the $120 million Global AI Opportunity Fund, providing AI education globally. Additionally, Google.org funded programs targeting rural healthcare workers, apprenticeships, and manufacturing employees, aiming to equip 40,000 individuals with AI skills. These efforts build on Google's $1 billion investment in AI education and infrastructure.

ai literacyeconomic impactworkforce transformationapprenticeshipssector-specific

New ways to balance cost and reliability in the Gemini API

Google AI Blog · Lucia Loher, Hussein Hassan Harrirou · 2026-04-02

Google introduces Flex and Priority Inference tiers for the Gemini API, enabling granular cost-reliability tradeoffs through synchronous endpoints. Flex reduces costs by 50% for latency-tolerant workloads (e.g., data enrichment, agentic workflows) via request criticality downgrading, while Priority ensures high reliability for interactive applications (e.g., chatbots, content moderation) with automatic fallback to Standard tier during overflow. Both tiers maintain API endpoint consistency, eliminating asynchronous batch processing complexity. Tier selection is controlled via the service_tier parameter, with Priority available for Tier 2/3 paid projects and Flex for all paid tiers.

gemini apiflex inferencepriority inferencesynchronous endpointscriticality downgrading

📜 arXiv Papers

No new items today.

📰 Industry Media (32)

Fostering breakthrough AI innovation through customer-back engineering

MIT Tech Review — AI · MIT Technology Review Insights · 2026-05-11

The article advocates for customer-back engineering in AI development, emphasizing that prioritizing customer needs over technological capabilities yields higher-value solutions. Capital One's approach involves direct engineer-customer interactions through digital empathy sessions, embedded support, and hackathons, fostering rapid AI-driven innovation. Results include Chat Concierge, a multi-agent AI framework for car buyers, demonstrating improved customer experience via agentic AI. Key practices include data governance, workflow redesign, and cross-functional collaboration, with 70% of surveyed leaders reporting agentic AI adoption for fraud detection (56%) and customer service (41%).

customer-back engineeringagentic aimulti-agent frameworkdata governancedigital empathy sessions

Implementing advanced AI technologies in finance

MIT Tech Review — AI · MIT Technology Review Insights · 2026-05-11

The finance sector exhibits emergent bottom-up AI adoption despite lacking initial governance frameworks, creating a tension between productivity gains and regulatory oversight. Key applications include unstructured data processing (variance commentary, fraud detection) via embedded systems like model context protocol (MCP), with integration ease driving adoption more than cost savings. Domain-expertise gaps and tool misinterpretation pose greater risks than technical constraints, while future trajectories point toward multi-step AI agents and expanded context windows augmenting human judgment. The study draws on practitioner interviews to characterize this dual transformation of workflows and governance paradigms.

model context protocolunstructured data processingcontext windowsai agentsgovernance frameworks

Musk v. Altman week 2: OpenAI fires back, and Shivon Zilis reveals that Musk tried to poach Sam Altman

MIT Tech Review — AI · Michelle Kim · 2026-05-08

The trial between Elon Musk and OpenAI revealed conflicting narratives regarding OpenAI's transition from nonprofit to for-profit status. Musk alleges deception in securing his $38M donation, while OpenAI cofounder Greg Brockman testified that Musk pushed for a for-profit structure with majority control. Shivon Zilis disclosed Musk's attempt to recruit OpenAI CEO Sam Altman for Tesla's AI lab. Key evidence included internal communications and testimonies about Musk's demands for equity and board control. The trial's outcome could impact OpenAI's IPO valuation and Musk's xAI ambitions. Next week, Ilya Sutskever and Microsoft CEO Satya Nadella will testify.

nonprofitfor-profitipoequityboard-control

A blueprint for using AI to strengthen democracy

MIT Tech Review — AI · Andrew Sorota, Josh Hendler · 2026-05-05

The article proposes a framework for leveraging AI to enhance democratic processes while mitigating risks to civic epistemology. It identifies three critical layers—epistemic (belief formation via AI-mediated information), agentic (AI-driven civic actions), and institutional (collective governance with AI participants)—where design choices determine democratic outcomes. Empirical evidence suggests AI fact-checking reduces polarization (73% perceived as more helpful than human efforts in preliminary X study). Technical challenges include ensuring agent fidelity without reinforcing bias, and implementing scalable identity verification for AI-human hybrid systems. The authors advocate proactive infrastructure design to prevent unchecked AI influence on democratic institutions.

epistemic layeragentic mediationcivic epistemologyidentity verificationpolarization reduction

Week one of the Musk v. Altman trial: What it was like in the room

MIT Tech Review — AI · James O'Donnell · 2026-05-04

Elon Musk's lawsuit against OpenAI alleges breach of charitable trust, claiming the company deviated from its nonprofit mission. Musk seeks remedies including damages and restructuring OpenAI's governance. The trial highlights internal communications and AI safety debates, with Musk admitting xAI distills OpenAI models for training. Key testimonies include OpenAI executives and AI safety experts, with a jury advisory verdict guiding the judge's decision. The case underscores tensions in AI governance and corporate practices.

charitable trustai safetygovernancetestimonyrestructuring

Musk v. Altman week 1: Elon Musk says he was duped, warns AI could kill us all, and admits that xAI distills OpenAI’s models

MIT Tech Review — AI · Michelle Kim · 2026-05-01

Elon Musk testified in a trial against OpenAI, alleging deception in OpenAI's transition from nonprofit to for-profit status and claiming his $38M funding was misused. Musk argued that OpenAI's restructuring jeopardizes AI safety, while OpenAI's counsel countered that Musk's lawsuit aims to undermine competition. Musk admitted that xAI, his AI company, distills OpenAI's models, a technique where smaller models mimic larger ones. The trial highlights tensions over AI governance, with Musk advocating for nonprofit structures and OpenAI defending its for-profit subsidiary. Testimony also revealed Musk's recruitment of OpenAI employees for Tesla and Neuralink.

distillationnonprofitfor-profitai safetygovernance

Cyber-Insecurity in the AI Era

MIT Tech Review — AI · MIT Technology Review Events · 2026-05-01

The integration of AI into cybersecurity systems necessitates a fundamental rethinking of security architectures, as legacy approaches struggle to address the expanded attack surface and increased complexity. Tarique Mustafa, a cybersecurity expert with expertise in knowledge representation and AI planning, emphasizes the need for AI-driven solutions that are embedded at the core rather than retrofitted. His work includes the development of autonomous AI algorithms for advanced data leak protection platforms, leveraging innovations in data classification, DLP, and DSPM. Mustafa’s contributions span multiple patents and industry-leading products, highlighting the importance of proactive, AI-centric security strategies in mitigating emerging threats.

cybersecuritydata leak protectionknowledge representationai planningattack surface

Operationalizing AI for Scale and Sovereignty

MIT Tech Review — AI · MIT Technology Review Events · 2026-05-01

The article highlights the strategic imperative of data control for enterprises and governments in operationalizing AI at scale, emphasizing sovereignty and governance. Chris Davidson of HPE discusses AI Factory solutions and Sovereign AI, focusing on secure, scalable AI capabilities for national and enterprise-grade applications. His work spans large-model training platforms, exascale systems, and cloud-native high-performance computing. Arjun Shankar of Oak Ridge National Laboratory bridges computer science with large-scale scientific discovery using scalable computing and data science. Both experts underscore the balance between data ownership and the safe flow of high-quality data for reliable AI insights.

sovereign aiexascale systemsai factoryhigh-performance computingdata governance

A new US phone network for Christians aims to block porn and gender-related content

MIT Tech Review — AI · James O'Donnell · 2026-05-01

Radiant Mobile, a new US mobile virtual network operator (MVNO) leveraging T-Mobile’s infrastructure, introduces a network-level content filtering system targeting Christian users. The system employs Israeli cybersecurity firm Allot’s domain categorization technology to block pornography by default, with no opt-out for adults, and optional filters for gender-related content. Radiant Mobile’s approach contrasts with app-based solutions by enforcing stricter, irreversible blocks at the network level. The company plans to expand internationally and supplement blocked content with AI-generated religious media. Critics highlight the subjective nature of domain categorization and potential overreach in content moderation.

mobile virtual network operatornetwork-level filteringdomain categorizationcontent moderationai-generated media

This startup’s new mechanistic interpretability tool lets you debug LLMs

MIT Tech Review — AI · Will Douglas Heaven · 2026-04-30

Goodfire's Silico introduces mechanistic interpretability for LLM debugging, enabling parameter adjustment during training via neuron mapping and pathway analysis. The tool automates interpretability using agents, demonstrating efficacy in behavior modification (e.g., reducing hallucinations, flipping ethical decisions 90% of the time). It supports open-source model inspection (e.g., Qwen 3) and targeted retraining (e.g., correcting numerical biases). Silico aims to democratize techniques previously limited to frontier labs, offering case-based pricing for tailored model development.

mechanistic interpretabilityneuronsparameter adjustmentopen-source modelsethical reasoning

Sakana AI and NVIDIA Introduce TwELL with CUDA Kernels for 20.5% Inference and 21.9% Training Speedup in LLMs

MarkTechPost · Asif Razzaq · 2026-05-11

Sakana AI and NVIDIA introduce TwELL, a tile-wise sparse format and CUDA kernels for efficient batched GEMM operations in LLM feedforward layers. The method replaces SiLU with ReLU activations and adds L1 regularization to induce 99.5% sparsity without accuracy loss. Benchmarks show 20.5% inference and 21.9% training speedups on 2B-parameter models, with gains scaling with model size due to reduced non-zero activations (29 vs. 911 in 1.5B models).

twellsparse kernelsactivation sparsitygemm optimizationl1 regularization

A Coding Implementation to Build Agent-Native Memory Infrastructure with Memori for Persistent Multi-User and Multi-Session LLM Applications

MarkTechPost · Sana Hassan · 2026-05-11

The article presents Memori, an agent-native memory infrastructure for persistent multi-user LLM applications, demonstrating its implementation via a Google Colab tutorial. The method involves setting up Memori with synchronous/asynchronous OpenAI clients (GPT-4o-mini), testing memory persistence across user identities (entity_id), agent roles (process_id), and sessions (uuid-based grouping). Results confirm isolated user context (Alice/Bob), persona-specific recall (fitness-coach/meal-planner), session-aware project tracking, and compatibility with streaming/async calls in a customer-support workflow. Memory operations show 6-second write delays and require API key configuration for production use.

agent-native memorymulti-tenant isolationsession managementin-context persistenceasync llm calls

Best Vector Databases in 2026: Pricing, Scale Limits, and Architecture Tradeoffs Across Nine Leading Systems

MarkTechPost · Michal Sutter · 2026-05-10

The article evaluates nine vector databases in 2026, focusing on their architecture, scalability, pricing, and suitability for Retrieval-Augmented Generation (RAG) pipelines and semantic search systems. It highlights Pinecone for zero-ops management, Milvus/Zilliz for billion-scale deployments, Qdrant for price-performance, Weaviate for hybrid search, and pgvector for PostgreSQL-native teams. Key findings include Pinecone's Builder tier ($20/month), Zilliz Cloud's Cardinal engine (10x throughput over HNSW), and Qdrant's composable vector search. The analysis underscores the critical role of vector databases in grounding LLM outputs and enterprise AI workflows.

vector databasesretrieval-augmented generationhierarchical navigable small worldgpu accelerationhybrid search

OpenClaw vs Hermes Agent: Why Nous Research’s Self-Improving Agent Now Leads OpenRouter’s Global Rankings

MarkTechPost · Michal Sutter · 2026-05-10

Hermes Agent, developed by Nous Research, has surpassed OpenClaw to become the top-ranked open-source AI agent on OpenRouter, generating 224 billion daily tokens compared to OpenClaw’s 186 billion. Hermes employs a self-improving execution loop with a three-layer memory system (persistent identity, SQLite FTS5 database, procedural skill files) and autonomously generates reusable skills. Its rapid release cadence includes v0.13.0 'Tenacity', which introduced Kanban task boards, hallucination recovery, and 8 P0 security fixes. OpenClaw, optimized for multi-channel reach, faces security challenges with multiple CVEs, including CVE-2026-25253 (CVSS 8.8). The bifurcation reflects differing philosophies: breadth of reach versus depth of learning.

self-improving execution loopsqlite fts5kanban task boardscvss scoremulti-channel routing

How to Build a Cost-Aware LLM Routing System with NadirClaw Using Local Prompt Classification and Gemini Model Switching

MarkTechPost · Sana Hassan · 2026-05-10

NadirClaw introduces a cost-aware LLM routing system that classifies prompts into simple and complex tiers using local prompt embeddings and cosine similarity against precomputed centroid vectors. The system employs the SentenceTransformer encoder (all-MiniLM-L6-v2) for embedding and routes queries to Gemini Flash or Pro models based on complexity thresholds. Experiments demonstrate cost savings of up to 75% compared to an always-Pro baseline, with routing accuracy validated through scatter plots of prompt embeddings against decision boundaries.

llm routingcosine similaritycentroid vectorssentence transformercost optimization

NVIDIA AI Just Released cuda-oxide: An Experimental Rust-to-CUDA Compiler Backend that Compiles SIMT GPU Kernels Directly to PTX

MarkTechPost · Michal Sutter · 2026-05-10

NVIDIA AI introduces cuda-oxide, an experimental Rust-to-CUDA compiler backend enabling direct compilation of SIMT GPU kernels to PTX without intermediary languages or C/C++ bindings. The tool leverages a custom rustc codegen backend, utilizing Stable MIR and Pliron IR frameworks to transform Rust source code into PTX via LLVM IR. cuda-oxide supports Rust constructs such as generics, closures, and GPU intrinsics, while ensuring thread safety through hardware-derived ThreadIndex. Initial testing on Ubuntu 24.04 demonstrates successful compilation and execution of vector addition kernels, validated by CUDA driver integration.

cuda-oxidesimtptxplironthreadindex

A Coding Implementation to Recover Hidden Malware IOCs with FLARE-FLOSS Beyond Classic Strings Analysis

MarkTechPost · Sana Hassan · 2026-05-10

The FLARE-FLOSS tool demonstrates superior malware indicator-of-compromise (IOC) recovery compared to traditional string extraction methods. Through static analysis and emulation, it successfully decodes obfuscated strings in Windows PE files, including XOR-encoded, stack-built, and tight strings. In a synthetic malware sample, FLOSS recovered 100% of planted IOCs (URLs, registry paths, APIs) that classic 'strings' utility missed, while providing structured JSON output with decoding routine metadata. The implementation includes MinGW-w64 cross-compilation, pattern-matching for IOCs, and visualization of string recovery statistics.

flare-flossioc recoverystring obfuscationwindows pe analysismalware triage

NVIDIA AI Releases Star Elastic: One Checkpoint that Contains 30B, 23B, and 12B Reasoning Models with Zero-Shot Slicing

MarkTechPost · Asif Razzaq · 2026-05-09

NVIDIA AI introduces Star Elastic, a post-training method enabling zero-shot extraction of multiple nested submodels (30B, 23B, 12B) from a single parent checkpoint, reducing compute and storage costs. The method employs nested weight-sharing, importance estimation, and a trainable router with Gumbel-Softmax to select active components based on target budgets. Evaluated on benchmarks like AIME-2025 and MMLU-Pro, Star Elastic achieves competitive performance while reducing training tokens by 360× compared to independent pretraining. Quantization-aware distillation preserves nested structures, enabling efficient deployment on GPUs like RTX 5080 with NVFP4 precision.

nested weight-sharingimportance estimationgumbel-softmaxquantization-aware distillationzero-shot slicing

9 Best AI Tools for Spec-Driven Development in 2026: Kiro, BMAD, GSD, and More Compare

MarkTechPost · Asif Razzaq · 2026-05-09

The article evaluates nine AI tools for spec-driven development (SDD) in 2026, emphasizing structured workflows over iterative prompting. Key tools include AWS Kiro (agentic IDE with EARS notation and model routing), GitHub Spec Kit (open-source CLI with constitutional markdown rules), BMAD-METHOD (multi-agent SDLC orchestration), and GSD (lean meta-prompting framework). Technical differentiators include context engines, cross-platform agent teams, and spec registries. Reported metrics include 70.6% SWE-bench accuracy (Augment Code) and 200K token context windows (GSD).

spec-driven developmentagentic ideears notationcontext enginemulti-agent orchestration

Meet GitHub Spec-Kit: An Open Source Toolkit for Spec-Driven Development with AI Coding Agents

MarkTechPost · Asif Razzaq · 2026-05-09

GitHub Spec-Kit introduces an open-source toolkit for Spec-Driven Development (SDD), a methodology that prioritizes structured specifications over ad-hoc coding with AI agents. The toolkit includes a Python-based CLI (Specify CLI) and templates to generate, validate, and implement code from specifications, reducing ambiguity in AI-generated outputs. Key features include slash commands for specification drafting (/speckit.specify), technical planning (/speckit.plan), and task breakdown (/speckit.tasks), with optional quality checks (/speckit.analyze). The project has gained rapid adoption, with 90k+ GitHub stars and 8k+ forks, demonstrating its utility in mission-critical development workflows.

spec-driven developmentai coding agentsgithub copilotstructured specificationstask breakdown

AI automates HR compliance, except for the area tech companies need

AI News · AY & J Solicitors · 2026-05-11

The article identifies a critical gap in AI-driven HR compliance automation: sponsor licence management for international AI talent in UK tech companies. Despite advanced automation in areas like payroll monitoring and predictive analytics, sponsor compliance remains manual due to the Home Office Sponsor Management System's lack of API integration and reliance on unstructured data. This manual process exposes tech companies to systemic operational risks, with 30-40% of their workforce on Skilled Worker visas. The article proposes a systems-thinking approach, integrating compliance checks into existing workflows and establishing verification loops, to mitigate these risks and ensure regulatory adherence.

sponsor licence managementapi integrationpredictive analyticsskilled worker visassystems-thinking approach

Bain sees US$100 billion SaaS market in agentic AI automation

AI News · Muhammad Zulhusni · 2026-05-11

Bain & Company estimates a US$100 billion SaaS market potential for agentic AI automation in enterprise coordination workflows, with over 90% remaining untapped. Their analysis identifies six automation factors: output verifiability, consequence of failure, digitised knowledge availability, process variability, integration complexity, and cross-workflow decision context. The report quantifies automation potential across enterprise functions, ranging from 40-60% in customer support and R&D to 20-30% in legal workflows. Current vendor capture is US$4-6 billion in the US, with similar potential in Canada, Europe, Australia, and New Zealand. The study recommends SaaS companies focus on automatable subprocesses, improve data quality, and adopt outcome-based pricing models.

agentic aisaasworkflow automationcross-workflow decision contextdigitised knowledge availability

RingCentral adds Shopify, Calendly, and WhatsApp to AI Receptionist

AI News · AI News · 2026-05-08

RingCentral has enhanced its AI Receptionist (AIR) by integrating Shopify, Calendly, and WhatsApp, enabling it to handle order inquiries, schedule appointments, and respond to messages. AIR now supports shared SMS inboxes, call queues, and automatic language detection across 10 languages. Deployed in over 11,800 businesses, AIR reduces waiting times and improves customer satisfaction, as evidenced by Keller Interiors' reduction from 12 minutes to 90 seconds and a three-point increase in customer satisfaction scores. Maple Federal Credit Union reported a 90% reduction in hold times. AIR is available standalone at $49/month or $39/month for RingEX customers, targeting SMEs in sectors like healthcare and hospitality.

ai receptionistautomatic language detectioncall queuescustomer satisfactionshared sms inboxes

AI helping ease the UK’s NHS burden

AI News · David Thomas · 2026-05-07

AI-enabled virtual care is being deployed in the UK’s NHS to alleviate strain on healthcare systems by addressing waiting lists, hospital capacity, and corridor care. Machine learning models analyze NHS datasets and clinical-grade wearable data (e.g., oxygen saturation, blood pressure, ECG) to identify at-risk patients and enable early interventions. Doccla’s virtual care platform has demonstrated a 61% reduction in bed days, an 89% decrease in GP appointments, and a 39% drop in non-elective admissions, saving £450 daily per hospital bed. Large language models (LLMs) are also utilized to streamline clinical notes and improve patient communication, enhancing clinician efficiency without replacing human roles.

virtual caremachine learningclinical-grade wearableslarge language modelsnon-elective admissions

HP and the art of AI and data for the enterprise

AI News · AI News · 2026-05-06

HP addresses enterprise AI challenges through a hardware-centric approach, emphasizing local compute for governance, latency, and cost efficiency. The Z series workstations, including the ZGX Nano and Z8 Fury, support autonomous AI lifecycles by enabling local execution of large models (up to 405B parameters) and Retrieval-Augmented Generation (RAG) pipelines without cloud dependency. HP advocates a three-tier compute model: cloud for burst training, on-premises for high-volume inference, and edge for latency-sensitive tasks. This strategy reduces cloud costs by up to 18x per million tokens over five years while ensuring data sovereignty and compliance. Enterprises are transitioning IT roles from task execution to agent governance, leveraging local infrastructure for observability and control.

retrieval-augmented generationlocal computeautonomous ai lifecycledata sovereigntymlops pipelines

US government increases AI suppliers and rethinks Anthropic’s role

AI News · Joe Green · 2026-05-06

The US Department of Defense expanded its AI supplier roster by signing agreements with Microsoft, Reflection AI, Amazon, and Nvidia for classified operations at Impact Levels 6 and 7, aiming to enhance warfighter decision-making and situational understanding. This move reduces reliance on individual vendors, addressing concerns raised by Anthropic's refusal to allow its AI for civilian surveillance or autonomous weapons, which led to a canceled $200M contract and legal dispute. Despite this, Anthropic's Mythos model remains in use by the NSA for cyber warfare, and the administration seeks to reintegrate Anthropic. The Pentagon emphasizes preventing vendor lock-in and building an AI-first fighting force.

impact levelsvendor lock-insituational understandingwarfighter decision-makingcyber warfare

Google tests Remy AI agent for Gemini as focus turns to user control

AI News · Muhammad Zulhusni · 2026-05-06

Google is internally testing Remy, an advanced AI agent designed to autonomously handle complex tasks and learn user preferences within the Gemini ecosystem. The agent integrates with Google Workspace and third-party services like GitHub and Spotify, enabling actions such as calendar management and smart-home control. Remy emphasizes user control through Gemini's Privacy Hub, allowing data management and activity review. Testing focuses on transparency, auditability, and least-privilege principles, though technical details on model architecture and autonomy levels remain undisclosed. The project aligns with Google's broader goal of expanding Gemini beyond chat-based interactions, positioning Remy as a potential successor to agentic features like Agent Mode.

ai agentgemini ecosystemleast-privilege principleauditabilitypreference-learning

Physical AI raises governance questions for autonomous systems

AI News · Muhammad Zulhusni · 2026-05-04

The governance of Physical AI systems poses unique challenges as autonomous AI integrates into robotics, edge computing, and industrial equipment. Key issues include safety limits, escalation paths, and system design, particularly when AI models control physical actions. Google DeepMind's Gemini Robotics and Gemini Robotics-ER models exemplify these challenges, requiring visual perception, spatial reasoning, and task planning. The ASIMOV dataset evaluates semantic safety in robotics. Governance frameworks like NIST AI Risk Management Framework must address model behavior, connected machines, and operating environments. Industrial robotics installations are projected to exceed 700,000 units by 2028, highlighting the urgency of these governance concerns.

physical airoboticsgovernancespatial reasoningsafety limits

Google made agentic AI governance a product. Enterprises still have to catch up.

AI News · Dashveenjit Kaur · 2026-05-04

Google introduced agentic AI governance as a core product feature through its Gemini Enterprise Agent Platform at Google Cloud Next ’26. The platform integrates cryptographic agent identities for traceability and Agent Gateway for oversight, addressing governance gaps in enterprise AI deployments. Despite 97% of organizations exploring agentic AI strategies, only 12% use centralized governance platforms, with 86-89% of pilots failing to reach production scale due to governance breakdowns. Google’s architecture emphasizes context, identity, and security, requiring deeper integration with its stack. However, enterprises face challenges distinguishing genuine agentic AI from automation, complicating governance frameworks.

agentic aigovernancecryptographic identityenterprise platformaudit trail

SAP: How enterprise AI governance secures profit margins

AI News · Ryan Daws · 2026-05-01

Enterprise AI governance ensures deterministic control over probabilistic models, securing profit margins by addressing operational risks and accuracy gaps. SAP emphasizes precision, scalability, and governance as critical evaluation criteria for deploying agentic AI systems in production environments. Key challenges include managing agent lifecycle, enforcing policy boundaries, and integrating modern vector databases with legacy architectures. Deterministic outputs require high-frequency database querying, increasing computational costs and latency. Relational foundation models optimized for structured business data outperform generic large language models in forecasting and anomaly detection. Governance frameworks must resolve accountability, audit trails, and human escalation thresholds to mitigate risks.

agentic aideterministic controlvector databasesrelational foundation modelsgovernance frameworks

Per-token AI charges come to GitHub Copilot

AI News · Joe Green · 2026-05-01

GitHub Copilot transitions from a flat-rate subscription model to per-token pricing effective June 1, 2026, aligning with industry trends observed in OpenAI and Anthropic. Under the new scheme, users receive AI Credits equivalent to their subscription value, with each credit valued at one US cent. Token consumption depends on model complexity, input/output size, KV-cache usage, and feature type, incentivizing efficient query formulation. Code completions and Next Edit suggestions remain free. This shift may impact developer behavior, particularly in exploratory and complex coding tasks, while reflecting broader industry moves toward token-based billing for LLM services.

per-token pricingai creditskv-cachellmcode completions

What LG and NVIDIA’s talks reveal about the future of physical AI

AI News · Ryan Daws · 2026-04-30

LG and NVIDIA are exploring collaborations to address hardware and computational challenges in physical AI systems. Key focus areas include thermal management for high-density AI data centers, edge inference pipelines for autonomous consumer hardware, and automotive integration. LG’s HVAC solutions aim to mitigate thermal throttling in NVIDIA’s compute clusters, while NVIDIA’s Omniverse and Isaac platforms provide digital twin infrastructure and real-time inference capabilities for LG’s robotic systems. The partnership leverages LG’s ThinQ ecosystem for mass-market data ingestion, enabling training in variable domestic environments. Automotive integration aligns LG’s infotainment systems with NVIDIA’s DRIVE platform, streamlining autonomous vehicle architectures.

thermal managementedge inferencedigital twinomniversephysical ai


Generated automatically at 2026-05-11 16:40 UTC. Summaries and keywords are produced by an LLM and may contain inaccuracies — always consult the original article.