Daily Digest — 2026-05-27

Tuesday, May 26, 2026 · 209 items · model: deepseek/deepseek-chat

209 items · 200 arxiv papers, 9 industry media

🏛️ Research Labs

No new items today.

📜 arXiv Papers (200)

From Model Scaling to System Scaling: Scaling the Harness in Agentic AI

arXiv cs.LG · Shangding Gu · 2026-05-25

The paper identifies system scaling as the next bottleneck in agentic AI, proposing the concept of 'harness scaling' to design auditable, persistent, and modular architectures around foundation models. It highlights three core bottlenecks: context governance, trustworthy memory, and dynamic skill routing, supported by orchestration and governance mechanisms. The authors introduce CheetahClaws, a Python-native reference harness, and compare it with Claude Code and OpenClaw. They argue that future progress in agentic AI will depend equally on system design and foundation model improvements.

agentic aisystem scalingfoundation modelscheetahclawscontext governance

Read original →

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

arXiv cs.LG · Shuhong Zheng, Aashish Kumar Misraa, Yu-Teng Li, Yu-Jhe Li · 2026-05-25

The paper introduces a method for subject-driven image generation that enhances identity preservation while following textual instructions by conditioning diffusion models on Multimodal Large Language Models (MLLMs). The approach employs a Dual Layer Aggregation (DLA) module to fuse multi-level MLLM features and a multi-stage denoising strategy to balance semantic information from MLLM and fine-detail identity from VAE. Experiments show improved performance in harmonizing multimodal understanding with identity preservation, reducing copy-paste artifacts, and achieving higher human preference scores.

subject-driven generationmultimodal large language modelsdiffusion modelsdual layer aggregationidentity preservation

Read original →

Prism: A Plug-in Reproducible Infrastructure for Scalable Multimodal Continual Instruction Tuning

arXiv cs.LG · Jun-Tao Tang, Yu-Cheng Shi, Zhen-Hao Xie, Da-Wei Zhou · 2026-05-25

The paper introduces Prism, a plug-in reproducible infrastructure for scalable Multimodal Continual Instruction Tuning (MCIT). It addresses engineering bottlenecks in MCIT research by decoupling algorithmic development from backbone MLLM implementation via a lightweight plugin registration mechanism, enabling method integration without codebase modifications. Prism supports large-scale training pipelines, facilitating reproducible and scalable experimentation. The code is publicly available.

multimodal continual instruction tuningplugin registration mechanismlarge-scale training pipelinereproducible infrastructuremllm codebase

Read original →

Looped Diffusion Language Models

arXiv cs.LG · Sanghyun Lee, Chunsan Hong, Seungryong Kim, Jonghyun Lee · 2026-05-25

The paper introduces LoopMDM, a looped masked diffusion model (MDM) that selectively reuses early-middle transformer layers to improve training efficiency and performance. This approach yields depth-scaling benefits without parameter overhead and enables flexible compute scaling at inference by varying loop counts. LoopMDM matches same-size MDMs with 3.3× fewer training FLOPs, outperforms them by up to 8.5 points on GSM8K, and surpasses deeper non-looped MDMs. Adaptive loop adjustment during sampling further enhances compute efficiency. Attention analysis suggests looping improves masked position interactions.

masked diffusion modelstransformer architecturescompute scalingdepth-scalingattention analysis

Read original →

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

arXiv cs.LG · Martin Marek, Dongkyu Cho, Shikai Qiu, Rumi Chunara · 2026-05-25

The paper demonstrates that language models can mitigate catastrophic forgetting by leveraging self-generated samples as replay data, nearly eliminating performance degradation on prior tasks. This approach contrasts with traditional methods requiring stored exemplars. Findings indicate that forgetting persists when models operate near capacity saturation, necessitating low learning rates for retention at the cost of training efficiency. Self-generated replay breaks this tradeoff, enabling rapid finetuning with high learning rates while preserving prior knowledge.

catastrophic forgettinglanguage modelsself-generated replaycapacity saturationfinetuning

Read original →

Goal-driven Bayesian Optimal Experimental Design for Robust Decision-Making Under Model Uncertainty

arXiv cs.LG · Jinwoo Go, Xiaoning Qian, Byung-Jun Yoon · 2026-05-25

The paper introduces GoBOED, a goal-driven Bayesian optimal experimental design framework that optimizes experiments for specific decision-making objectives rather than general information gain. By combining an amortized variational posterior surrogate with a differentiable convex decision layer, GoBOED enables gradient-based design optimization focused on decision quality. Theoretical analysis shows GoBOED gradients ignore parameter directions irrelevant to decisions, justifying its superior alignment with objectives. Empirical evaluations in source localization, epidemic management, and pharmacokinetic control demonstrate GoBOED's wider near-optimal design windows compared to goal-agnostic BOED approaches.

bayesian optimal designdecision-focusedvariational posteriordifferentiable convex layerparameter uncertainty

Read original →

OrpQuant: Geometric Orthogonal Residual Projection for Multiplier-Free Power-of-Two Transformer Quantization

arXiv cs.LG · Maoyang Xiang, Bo Wang, Tao Luo · 2026-05-25

The paper introduces Orthogonal Residual Projection (ORP), a geometric algorithm-hardware co-design framework for Power-of-Two (PoT) quantization in Transformers. ORP addresses the Low Angular Resolution Regime limitation in sub-4-bit quantization by formulating it as a dual-basis geometric projection, synthesizing higher-resolution residual lattices using shift-and-add operations. The method reduces calibration time for LLaMA-2-7B to 15 minutes and achieves a perplexity of 6.10 under 3-bit constraints, outperforming MAC-intensive baselines like AWQ. Silicon-level RTL synthesis at 28nm confirms ORP's efficacy in mitigating timing bottlenecks.

orthogonal residual projectionpower-of-two quantizationlow angular resolution regimeshift-and-add operationsrtl synthesis

Read original →

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

arXiv cs.LG · Matt L. Wiemann, Lindsay M. Smith, Peter Melchior, Siddharth Mishra-Sharma · 2026-05-25

The paper introduces DiscoverPhysics, a novel benchmark evaluating LLMs' ability to discover physical laws in simulated worlds with non-standard physics. The benchmark comprises 22 procedurally generated worlds featuring varied gravitational models and hidden interactions, requiring agents to design experiments, analyze trajectory data, and formulate explanatory theories. Evaluations across eleven frontier models reveal that even top performers solve only 50% of worlds, with particular difficulty in latent structure discovery. Open-source models significantly underperform commercial counterparts in experimental design and hypothesis refinement. Results indicate a dissociation between predictive accuracy and conceptual understanding, emphasizing the need for iterative hypothesis testing.

physics discoveryinteractive benchmarkn-body simulationhypothesis refinementlatent structure

Read original →

Global Convergence of Wasserstein Policy Gradient for Entropy-Regularized Reinforcement Learning

arXiv cs.LG · Zhaoyu Zhu, Rui Gao, Shuang Li · 2026-05-25

The paper establishes global convergence guarantees for Wasserstein policy gradient (WPG) in entropy-regularized reinforcement learning (RL), addressing a gap in theoretical understanding. By leveraging the Bellman structure of soft Q-functions, the analysis substitutes convexity with a Bellman-based argument: a KL representation of soft Bellman residuals, contraction properties linking residuals to optimality gaps, and a resolvent identity connecting value improvement to Fisher information. Combined with a uniform log-Sobolev inequality for Gibbs policies, this yields a distributional Polyak–Łojasiewicz condition, enabling geometric convergence up to discretization error. The results demonstrate that entropy-regularized RL exhibits favorable PL-type geometry despite non-convexity.

wasserstein policy gradiententropy-regularized rlbellman residualpolyak–łojasiewicz conditionlog-sobolev inequality

Read original →

Active Query Synthesis for Preference Learning

arXiv cs.LG · Namrata Nadagouda, Nauman Ahad, Maegan Tucker, Mark A. Davenport · 2026-05-25

The authors introduce Info-Synth, an active query synthesis framework for preference learning that addresses feedback reliability and computational efficiency. The method employs a confidence-aware response model to handle ambiguous pairwise comparisons and maximizes a mutual information-based objective in continuous space for query generation. Two strategies, Pair M-dist and Pair Opt-dist, extend Info-Synth to finite query pools. Evaluations on synthetic preference learning, constrained text summarization, and robot controller tuning demonstrate the framework's versatility and performance.

active learningpreference learningmutual informationquery synthesisconfidence-aware model

Read original →

Rethinking Weak Supervision in Anomaly Detection: A Comprehensive Benchmark

arXiv cs.LG · Xu Yao, Siyuan Zhou, Wu Zhenbo, Chaochuan Hou · 2026-05-25

The paper introduces WSADBench, a unified benchmark for evaluating weakly supervised anomaly detection (WSAD) across three supervision scenarios: incomplete, inexact, and inaccurate. It standardizes evaluation protocols for 36 algorithms across 4 modalities, varying label quantity, granularity, and quality. Based on 700K experiments, key findings include: (i) intrinsic correlations between WSAD scenarios, (ii) specialized WSAD methods excel only in extreme label-scarcity, (iii) inconsistent utility of unlabeled data, and (iv) asymmetric sensitivity to label noise. The benchmark reveals that tabular foundation models outperform specialized methods as supervision increases.

weakly supervised anomaly detectionwsadbenchtabular foundation modelslabel noisebenchmark

Read original →

Conditional KRR: Injecting Unpenalized Features into Kernel Methods with Applications to Kernel Thresholding

arXiv cs.LG · Rustem Takhanov, Zhenisbek Assylbekov · 2026-05-25

The paper introduces conditional kernel ridge regression (conditional KRR), a method combining linear regression on unpenalized features with standard KRR on residuals, using conditionally positive definite kernels. Theoretical analysis shows the method's test risk reduces to standard KRR with a residual kernel, plus an O(1/√N) term dependent on feature class F. Experiments demonstrate conditional KRR outperforms standard KRR when F captures dominant signal components, particularly for Mercer eigenfunctions or random feature representations of K.

conditional kernel ridge regressionconditionally positive definite kernelsnative space normmercer decompositionrandom features

Read original →

Paris 2.0: A Decentralized Diffusion Model for Video Generation

arXiv cs.LG · Ali Rouzbayani, Bidhan Roy, Marcos Villagra, Zhiying Jiang · 2026-05-25

Paris 2.0 introduces the first decentralized diffusion model for video generation, extending Paris 1.0's decentralized training framework to temporally coherent outputs. The method adapts decentralized computation—previously proven viable for image generation—to video synthesis without relying on monolithic GPU clusters. In low-resolution text-to-video tasks, Paris 2.0 reduces Frechet Video Distance (FVD) by 2.0x (from 561.04 to 279.01) versus a centralized counterpart under matched compute, while improving CLIP text-video alignment and aesthetic scores.

decentralized diffusion modelfrechet video distancetext-to-videoclip similaritytemporal coherence

Read original →

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning

arXiv cs.LG · Waleed Razzaq, Yun-Bo Zhao · 2026-05-25

The paper introduces Neuronal Stochastic Attention Circuit (NSAC), a biologically-inspired continuous-time attention architecture that models attention logits via an Ornstein-Uhlenbeck stochastic differential equation with input-dependent gates from C.elegans Neuronal Circuit Policies. NSAC propagates stochasticity through Gaussian-distributed logits and logistic-normal attention weights, optimized via a two-term objective combining Gaussian negative log-likelihood and epistemic-separation regularization. Evaluated on irregular function approximation, multivariate regression, long-range forecasting, Industry 4.0, and autonomous lane-keeping, NSAC achieves competitive accuracy with well-calibrated uncertainty estimates while maintaining neuronal-level interpretability.

neuronal stochastic attention circuitornstein-uhlenbeck processlogistic-normal distributionepistemic-separation regularizercontinuous-time attention

Read original →

Accelerating Bayesian inverse design in computational fluid dynamics using neural operators

arXiv cs.LG · Bipin Tiwari, Omer San · 2026-05-25

This work introduces neural operator-accelerated Bayesian inference for uncertainty-aware inverse design in computational fluid dynamics (CFD), achieving a 1000× speedup over traditional methods. The approach embeds a Deep Operator Network surrogate within a No-U-Turn Sampler MCMC loop while preserving posterior structure, validated on quasi-one-dimensional nozzle flow with cubic B-spline parameterization. Results show surrogate-based inference matches CFD reference posteriors in under 1 second, and a direct inverse neural operator enables single-shot deterministic reconstruction.

bayesian inverse designneural operatorscomputational fluid dynamicsmarkov chain monte carlodeep operator network

Read original →

When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

arXiv cs.LG · Parth Darshan, Abhishek Divekar · 2026-05-25

The work identifies two failure modes in multi-objective prompt optimization for LLM judges: gradient dilution during optimization and instruction interference during inference. It evaluates five decomposition modes of textual gradient optimizers by varying cross-task information sharing among loss, gradient, and optimizer LLMs. Results show optimization fails in 6/10 configurations, with gradient specificity dropping 59% (9.0 to 3.7) under joint criteria processing, and naive instruction combination degrading Spearman's rho by -5.3%.

multi-objective optimizationtextual gradient methodsllm judgesgradient dilutioninstruction interference

Read original →

CITYREP: A Unified Benchmark for Urban Representations Across Cities, Tasks, and Modalities

arXiv cs.LG · Junyuan Liu, Xinglei Wang, Zichao Zeng, Jiazhuang Feng · 2026-05-25

CityRep introduces a unified benchmark for evaluating urban representation learning across cities, tasks, and modalities, addressing limitations of current evaluations. It features a spatial unit-agnostic framework, block-based spatial splits to mitigate spatial leakage, and an extensible multi-city, multi-task suite spanning 8 cities and 8 tasks. The benchmark evaluates 11 urban representation models, revealing that random splits inflate performance and alter rankings, while performance varies significantly across cities and tasks. CityRep provides datasets, evaluation pipelines, and diagnostic tools to support reproducible research and fair comparison in urban representation learning.

urban representation learningspatial leakagemulti-task benchmarkgeneralization-aware evaluationurban foundation models

Read original →

Length Generalization with Log-Depth Recurrent Units

arXiv cs.LG · Charles Pert, Dalal Alrajeh, Alessandra Russo · 2026-05-25

We introduce MLP-LDRU, a Log-Depth Recurrent Unit designed to address length generalization challenges in neural networks by approximating recurrence through parallel reduction with associativity-biased operators. The model is evaluated on 21 regular-language tasks, including standard benchmarks and new prefix languages, achieving 100% out-of-distribution accuracy on 18 tasks and ≥99.9% on the remaining 3 when increasing max training length, surpassing comparable recurrent and attention-based models. MLP-LDRU also demonstrates competitive performance on ListOps and NLP classification benchmarks beyond regular languages.

length generalizationlog-depth recurrent unitregular-language tasksparallel reductionassociativity-biased operators

Read original →

Everything at Every Scale: Scale-Invariant Diffusion with Continuous Super-Resolution

arXiv cs.LG · Zixin Jessie Chen, Zhuo Chen, Archer Wang, Jeff Gore · 2026-05-25

SKILD introduces a scale-invariant diffusion model unifying image generation and continuous super-resolution within a single unconditional framework. Leveraging scale invariance, it designs a forward process attenuating image content from fine to coarse scales while injecting spectrum-matched Gaussian noise, making scale an explicit coordinate of diffusion dynamics. The reverse process performs both tasks by varying only the starting timestep, requiring no task-specific architecture or retraining per scale factor. SKILD achieves FID 2.65 and Inception Score 9.63 on CIFAR-10, outperforms conditional models in ImageNet super-resolution, and accurately reconstructs critical Ising models.

scale invariancediffusion modelsuper-resolutionspectrum-matched noiseising models

Read original →

A Multimodal 3D Foundation Model for Light Sheet Fluorescence Microscopy Enables Few-Shot Segmentation, Classification, and Deblurring

arXiv cs.LG · Adina Scheinfeld, Haotan Zhang, Shang Mu, Rudolf L. M. van Herten · 2026-05-25

We introduce a multimodal 3D foundation model for light sheet fluorescence microscopy (LSM) that enables few-shot segmentation, classification, and deblurring. The model is pretrained on a large curated dataset of 3D images across organisms, stains, and imaging protocols, learning transferable volumetric representations via joint optimization of masked reconstruction and image-text alignment. Evaluations demonstrate consistent improvements over baselines in downstream tasks, both in standard metrics and expert assessments, while drastically reducing annotation requirements. Pretrained weights and code are publicly available.

light sheet fluorescence microscopyvolumetric representationmasked reconstructionfew-shot learningimage-text alignment

Read original →

Retrieval-Augmented Detection of Potentially Abusive Clauses in Chilean Terms of Service

arXiv cs.LG · Christoffer Loeffler, Tomás Rey Pizarro, Daniel Ignacio Miranda Vásquez, Andrea Martínez Freile · 2026-05-25

The study introduces a retrieval-augmented generation framework for detecting and classifying abusive clauses in Chilean Terms of Service, alongside the Chilean Abusive Terms of Service Extended corpus (100 contracts, 10,029 clauses in 24 categories). The method combines dense-sparse retrieval, reranking, and prompt augmentation to support local open-weight language models. Results show retrieval-augmented prompting improves performance, enabling local models to approach cloud-based systems with lower computational cost (token efficiency quantified but not specified).

retrieval-augmented generationdense-sparse retrievalopen-weight language modelscontract annotationprompt augmentation

Read original →

AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models

arXiv cs.LG · Branislav Kveton, Anup Rao, Subhojyoti Mukherjee, Krishna Kumar Singh · 2026-05-25

AdvantageFlow introduces a forward-process reinforcement learning algorithm for rectified flow models, optimizing an advantage-weighted forward-process prediction loss instead of the reverse process like Flow-GRPO. The method addresses instability in optimization with negative advantages by employing rollout policy regularization to reduce variance and fit a local reward-improving target distribution. Evaluated on image generation tasks using Stable Diffusion 3.5 Medium, AdvantageFlow outperforms both Flow-GRPO and a state-of-the-art forward-process RL baseline in negative-aware fine-tuning.

advantageflowrectified flow modelsrollout policy regularizationstable diffusion 3.5 mediumflow-grpo

Read original →

Learning in Low-Dimensional Subspaces: Orthogonal Bottlenecks for Reinforcement Learning

arXiv cs.LG · Aleksandar Todorov, Matthia Sabatelli · 2026-05-25

This work introduces orthogonal bottlenecks, a representation-level prior for deep reinforcement learning that constrains encoder features to low-dimensional subspaces via fixed orthonormal projections. The method requires no auxiliary objectives, pretraining, or RL algorithm modifications, preserving expressivity when the bottleneck dimension exceeds the intrinsic rank of the optimal value function. Empirical results across single and multi-task benchmarks show baseline performance is maintained or improved above task-dependent threshold dimensions, with value representations compressible to extremely low dimensions without loss. Analysis reveals orthogonal bottlenecks stabilize feature norms and increase effective rank, supporting their role as lightweight, architecture-agnostic mechanisms for shaping RL representations.

orthogonal bottlenecksrepresentation-level priorlow-dimensional subspaceseffective rankreinforcement learning

Read original →

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

arXiv cs.LG · Jose Blanchet, Peter Glynn, Wenhao Yang · 2026-05-25

We present a model-agnostic methodology for constructing asymptotically valid confidence regions from Stochastic Gradient Descent (SGD) trajectories in both finite- and infinite-variance regimes. The approach leverages a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer derived from stochastic gradients, yielding a self-normalized statistic where tail-dependent scaling terms cancel. A subsampling calibration scheme estimates critical values without requiring explicit estimation of tail indices or stable-law parameters. Simulations demonstrate reliable coverage across various settings, establishing the method as a practical tool for uncertainty quantification in stochastic optimization.

stochastic gradient descentpolyak-ruppert averagingself-normalized statisticsubsampling calibrationconfidence regions

Read original →

Causal methods for LLM development and evaluation

arXiv cs.LG · Dennis Frauen, Marie Brockschmidt, Konstantin Hess, Haorui Ma · 2026-05-25

The paper advocates for integrating causal inference methods into large language model (LLM) development and evaluation pipelines, identifying three key contributions. First, it demonstrates how causal methods address confounding, distribution shifts, and biased evaluation in logged data settings. Second, it systematically maps causal opportunities across pretraining, alignment, routing, agentic workflows, and evaluation stages. Third, it identifies new research directions for causal LLM development. The authors argue causal methods provide principled solutions to current empirical fragility in LLM pipelines despite their underutilization.

causal inferencellm developmentdistribution shiftslogged dataconfounding

Read original →

Deployment-complete benchmarking

arXiv cs.LG · El Mustapha Mansouri, Keigo Arai · 2026-05-25

The paper introduces deployment-complete benchmarking, a method to evaluate whether benchmark evidence sufficiently determines deployment actions. By analyzing evidence fibers and using completion curves, it identifies ambiguities in benchmark-to-deployment transitions. Results show poor transferability (10.07% coverage) of benchmark-channel conformal coverage to unmeasured deployment channels, while response-rank intervals achieved 94.91% coverage. Audits revealed significant incompleteness, with 97.9% mixed fibers in Tox21 and zero median certifiable fraction in Matbench/JARVIS. The method reduced false decisions from 1.19% to 0.027% in Tox21 and 20.3% to 0.128% in JARVIS replays.

deployment-complete benchmarkingevidence fiberscompletion curvesconformal coverageresponse-rank intervals

Read original →

Fuzzy PyTorch: Rapid Numerical Variability Evaluation for Deep Learning Models

arXiv cs.LG · Inés Gonzalez-Pepe, Hiba Akhaddar, Tristan Glatard, Yohan Chatelain · 2026-05-25

Fuzzy PyTorch introduces a framework for rapid evaluation of numerical variability in deep learning models, addressing floating-point arithmetic uncertainty. It integrates stochastic arithmetic into PyTorch via Probabilistic Rounding with Instruction Set Management, interfacing with the Verificarlo compiler, and offers stochastic rounding and novel up-down rounding modes. Comparative evaluations demonstrate runtime reductions of 5x to 60x versus Verrou, while maintaining model performance across architectures ranging from 1 to 341 million parameters. The framework provides scalable, efficient, and practical solutions for quantifying floating-point uncertainty without compromising computational efficiency.

stochastic arithmeticprobabilistic roundingnumerical variabilityfloating-point uncertaintyinstruction set management

Read original →

Creative Quality Alignment: Expert Tacit Knowledge Transfer via Chain-of-Thought Fine-Tuning

arXiv cs.LG · Bo Zou, Chao Xu · 2026-05-25

The paper empirically validates the creative quality metric from Calibrated Surprise (Zou & Xu, 2026a) under strict engineering constraints: minimal data (100 expert chain-of-thought annotations from BC Protocol) and a small base model. It introduces Creative Quality Alignment (CQA), addressing dataset biases toward craft knowledge by emphasizing audience modeling and reality-logic coverage. Theoretically, it demonstrates that in single-conditional-distribution LLMs, calibrating appreciation transfers to generation via architectural duality, explaining why few CoT examples suffice, unlike empirical approaches like LIMA.

creative quality alignmentchain-of-thoughtexpert tacit knowledgeconditional distribution architecturedata bias

Read original →

Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

arXiv cs.LG · Georgios Milis, Yubin Qin, Yihan Wu, Heng Huang · 2026-05-25

The paper introduces a robust, gradient-free watermarking method for synthetic audio, leveraging vocabulary redundancy in discrete token representations. By analyzing token error impacts and employing community detection for vocabulary reduction, the method achieves significant detectability improvements without finetuning tokenizers. Experimental results demonstrate orders-of-magnitude gains in watermark detectability and inherent robustness to audio modifications, establishing a new state-of-the-art for token-level watermarks in multimedia.

watermarkingtokenizationcommunity detectiondiscretizationrobustness

Read original →

Mapping the Schedule x Bit-Width Boundary in Sub-100M Quantisation-Aware Training

arXiv cs.LG · Christian Brandt Thomassen · 2026-05-25

The study investigates whether optimal learning-rate schedules vary by bit-width in quantisation-aware training (QAT) for sub-100M decoder language models. Through extensive experiments (720-run factorial grid and 625-run follow-up) across FP16/INT8/INT6/INT4 precisions, model sizes (5M-350M), and training configurations, the primary hypothesis—that INT6 QAT requires distinct schedules—is falsified. Results show a consistent optimal warmdown fraction of 33% (wd33) across bit-widths, with INT4 exhibiting a noise-dominated regime below 50M and decisive wd33 preference above 50M. Practical recommendations include reusing FP16 schedules for INT8/INT6 QAT and adopting wd33 for INT4 models ≥50M.

quantisation-aware traininglearning-rate schedulewarmdown fractionsub-100m modelsbit-width

Read original →

QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability

arXiv cs.LG · Bo Zou, Chao Xu · 2026-05-25

The paper introduces QUIET, a diagnostic benchmark for evaluating large language models' (LLMs) creative generation capability through multi-blank cascaded story cloze tasks. QUIET features N blanks (10-20) with explicit content constraints and cascade dependencies, requiring open-ended generation. An automated scoring protocol based on information-theoretic principles operationalizes the 'calibrated surprise' framework, combining constraint satisfaction and surprise metrics. This method avoids subjective human grading, providing an objective measure of creative capability.

multi-blank cascaded story clozecreative generation capabilitycalibrated surpriseinformation-theoretic scoringconstraint satisfaction

Read original →

Step-TP: A Grounded, Step-Level Dataset with Chain-of-Thought Reasoning for LLM-Guided Tensor Program Optimization

arXiv cs.LG · Mengfan Liu, Da Zheng, Junwei Su, Chuan Wu · 2026-05-25

Step-TP introduces a step-level dataset for tensor program optimization with chain-of-thought reasoning, addressing limitations in existing LLM-guided approaches. The dataset provides grounded, atomic supervision through a token-efficient intermediate representation that deterministically lowers to TVM TIR, enabling reliable multi-step optimization. It decomposes complex trajectories into interpretable single-step decisions with structured CoT supervision and explicit IR-to-IR state transitions. Strategy filtering balances coverage while preventing shortcut exploitation. The dataset and implementation are publicly available on GitHub.

tensor program optimizationchain-of-thought reasoningintermediate representationstrategy filteringtvm tir

Read original →

Small Models, Strong Priors: Architectural Inductive Bias for Parameter-Efficient Neural PDE Solvers

arXiv cs.LG · Shyam Sankaran, Hanwen Wang, Paris Perdikaris · 2026-05-25

The paper proposes WaveLiT, a parameter-efficient neural PDE solver combining architectural inductive biases to compete with larger foundation models. The architecture integrates discrete wavelet transforms for multi-resolution tokenization, augmented linear attention, shared-weight multiscale feature pyramids, and wavelet-domain auxiliary losses. Evaluated on eight TheWell benchmarks, 1-10M-parameter WaveLiT models match or exceed 100-1000× larger models, particularly excelling in wave/acoustic-dominated systems where wavelet priors align with dynamics. A 10M-parameter multi-task variant shows interpretable transfer patterns, performing best on wavelet-aligned dynamics and worst on chaotic advection. Results indicate architectural priors outweigh scale for PDE solving, with failure patterns revealing prior content.

neural pde solverswavelet transforminductive biasparameter efficiencymultiscale feature pyramid

Read original →

STaT: Resolving Shape Distortion in Non-Stationary Time Series via Tri-Modal Synergy

arXiv cs.LG · Hui Cheng, Jinsheng Guo, Zhenhao Weng, Yan Qiao · 2026-05-25

STaT introduces a tri-modal architecture for non-stationary time series forecasting, addressing shape distortion in existing multi-modal approaches. The method combines symbolic (discrete tokenization for structural patterns), temporal (sequential dependencies), and textual (domain semantics) modalities via Symbolic-Temporal-Textual Alignment. Evaluations on eight benchmarks show STaT improves magnitude metrics by up to 8.9% and reduces shape distortion by up to 8.5% compared to conventional methods.

non-stationary time seriesmulti-modal fusionsymbolic tokenizationshape distortiontemporal dependencies

Read original →

From Latent Space to Training Data: Explainable Specialization in Minimal MLPs

arXiv cs.LG · Enrique Alba, Ezequiel Lopez-Rubio · 2026-05-25

The study identifies a design principle for prototype-recoverability-aware training in minimal one-hidden-layer MLPs, demonstrating that repulsive structural losses require compatible attractors to prevent latent geometry collapse. Using Gaussian-activation MLPs with width equal to dataset size, the authors evaluate three structural losses—coverage, separation, and overlap—against a standard fitting baseline on uniformly sampled one-dimensional datasets. Coverage regularization achieves the lowest mean reconstruction error across 480 runs (N = 3 to N = 100) and enhances prototype-usage specialization, while overlap penalties systematically degrade performance by pushing prototype centers outside the training input convex hull. Separation exhibits mixed effects, with expulsion occurring only at large temperatures.

mlpprototype-recoverabilitylatent geometrystructural lossgaussian-activation

Read original →

Building an Adversarial Malware Dataset by Family and Type: Generation, Evasion, and Poisoning Evaluation

arXiv cs.LG · David Košťál, Martin Jureček · 2026-05-25

The study introduces a novel adversarial malware dataset derived from RawMal-TF, comprising 44,347 family-labelled and 33,596 type-labelled PE files generated via adversarial techniques. These samples achieve evasion rates of 98.35% and 92.20% against the EMBER classifier, respectively. The dataset includes metadata such as EMBER scores and VirusTotal classifications. Additionally, the work demonstrates the vulnerability of malware classifiers to data poisoning, showing that injecting 0.5% mislabelled adversarial samples increases evasion rates from 26.1% to 92.8%. The dataset is publicly released to support research on adversarial malware and classifier robustness.

adversarial malwareember classifierevasion ratedata poisoningpe files

Read original →

Quantitative Evaluation of the Severity of Posttraumatic Stress Disorder through Transfer Learning from Specific Phobia Data

arXiv cs.LG · Nicolas Ricka, Gauthier Pellegrin, Denis A. Fompeyrine, Thomas Rohaly · 2026-05-25

The study proposes a transfer learning approach using multivariate kernel density estimation (MKDE) to objectively assess PTSD severity through physiological signals. Heart rate (HR) and galvanic skin response (GSR) data from 21 military participants were analyzed, leveraging a fear-response model pre-trained on arachnophobia data. The model achieved 86% accuracy in PTSD classification (PCL-M threshold: 36) with a mean absolute error (MAE) of 5.6 and 17% mean absolute percentage error in severity estimation, demonstrating potential for clinical screening applications.

transfer learningmultivariate kernel density estimationgalvanic skin responseptsd severityphysiological signals

Read original →

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

arXiv cs.LG · Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz · 2026-05-25

The paper investigates multi-agent LLM deliberation through Friedkin-Johnsen opinion dynamics, revealing input-dependent parameters that transform deliberation into a mixture of experts. By analyzing stubbornness, influence, and opinion change, the study demonstrates that dynamic routing based on agent competence enables multi-agent systems to surpass single agents and static ensembles. Empirical analysis focuses on observable proxies for latent competence: self-assessed confidence, perceived confidence, and initial alignment with peers.

multi-agent systemsfriedkin-johnsen dynamicsmixture of expertsopinion changelatent competence

Read original →

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

arXiv cs.LG · Duy Anh Nguyen · 2026-05-25

This study evaluates domain-adaptive continued pretraining (DAPT) on the EFCAMDAT learner corpus to enhance transformer-based automated essay scoring (AES) for English proficiency tests. Using three transformer encoders, the research assesses DAPT's impact on FCE and IELTS datasets through in-domain scoring and few-shot cross-dataset transfer. Results indicate mixed efficacy, with proficiency-aligned subsets outperforming full-corpus DAPT for B1--B2 FCE data but failing to improve cross-dataset transfer consistently. Findings suggest DAPT benefits in-domain AES when pretraining data aligns with assessment settings.

domain-adaptive pretrainingtransformer encodersautomated essay scoringefcamdatenglish proficiency tests

Read original →

Joint Optimization of Training and Inference in Federated Edge Learning via Constrained Multi-Objective Deep Reinforcement Learning

arXiv cs.LG · Zhen Li, Jun Cai, Chao Yang, Haoran Gao · 2026-05-25

The paper proposes a joint optimization framework for federated edge learning (FEEL) that simultaneously manages training and inference on resource-constrained devices. It introduces a tandem-queue mechanism linking inference requests to training data, incorporates temporal dynamics via data/model freshness metrics, and formulates the problem as a multi-objective Markov decision process (MOMDP). The solution, constrained multi-objective proximal policy optimization (C-MOPPO), learns Pareto-optimal policies balancing accuracy, latency, and energy. Experiments show C-MOPPO outperforms baselines in achieving dense, high-quality trade-offs across objectives.

federated edge learningmulti-objective optimizationmarkov decision processproximal policy optimizationedge intelligence

Read original →

Universal Activation Verbalizer: A Unified Framework for Cross-Model Activation Explanation

arXiv cs.LG · Haiyan Zhao, Zirui He, Guanchu Wang, Ali Payani · 2026-05-25

The Universal Activation Verbalizer (UAV) framework enables cross-model activation explanation by using a shared decoder to interpret activations from heterogeneous donor models. UAV employs a lightweight adapter to convert donor activations into soft tokens in the decoder's embedding space, supporting adapter-only transfer via a frozen decoder-side LoRA. Evaluated across classification, fact retrieval, and gist summarization tasks, UAV matches self-explanation baselines while facilitating cross-model verbalization across different model families and scales. Ablation studies indicate decoder-side tuning primarily enhances task behavior, while the adapter supplies activation-grounded factual and semantic information for faithful explanations.

universal activation verbalizercross-model explanationadapter-only transferdecoder-side loraactivation-grounded

Read original →

Reading the Finetuning Prior: Verbatim Content Recovery via Contrastive Decoding Diffing

arXiv cs.LG · Michał Brzozowski, Zuzanna Dubanowska, Enrico Cassano, Neo Christopher Chung · 2026-05-25

The paper introduces Contrastive Decoding Diffing (CDD), a black-box method for verbatim content recovery from finetuned language models without weight access. CDD leverages output-level logit distributions, bypasses chat templates, uses vague pre-fills, and amplifies logit-space differences between base and finetuned models. It achieves exact recovery of implanted facts (drug names, vote counts, etc.) across four architectures (1B-32B parameters), outperforming white-box Activation Difference Lens (ADL) by 170x speedup. CDD also exposes data pipeline artifacts, demonstrating end-to-end fingerprinting from generator artifacts to model weights. Validation shows near-perfect recovery in single-dataset settings and correct identification in mixed-dataset scenarios.

contrastive decoding diffinglogit-space differenceverbatim recoveryfinetuning priormodel fingerprinting

Read original →

Predicting Stock Price Direction on Earnings Announcement Days using Multi-modal Deep Learning

arXiv cs.LG · Manuel Noseda, Nathan Soldati, Marco Paina · 2026-05-25

The study evaluates multi-modal deep learning for predicting stock price direction during earnings announcements, combining fundamental metrics, technical indicators, and FinBERT-derived sentiment scores. It compares LSTM and Transformer architectures against logistic regression, with and without sentiment features. Results show the Transformer outperforms in volatile movement detection, achieving higher macro F1-scores, while sentiment features consistently enhance performance.

finbertlstmtransformermacro f1-scoresentiment scores

Read original →

Merge-Bench: Resolve Merge Conflicts with Large Language Models

arXiv cs.LG · Benedikt Schesch, Michael D. Ernst · 2026-05-25

The paper introduces Merge-Bench, a dataset of 7938 real-world merge conflicts from 1439 GitHub repositories, with ground truth from developer commits. It presents LLMergeJ, a 14B-parameter model trained via Group Relative Policy Optimization (GRPO) for resolving Java merge conflicts. Evaluations show LLMergeJ outperforms three commercial LLMs (except Gemini 2.5 Pro), with top models resolving <60% of conflicts across 11 languages.

merge conflictsgroup relative policy optimizationlarge language modelsversion controlreinforcement learning

Read original →

Capability and Robustness Cannot Both Be Free: An Information-Theoretic Bound for Vision-Language-Action Models

arXiv cs.LG · Jianwei Tai · 2026-05-25

The paper establishes an information-theoretic bound proving that capability and robustness in Vision-Language-Action (VLA) models cannot be simultaneously maximized. Using mutual information measures and the Data Processing Inequality, it derives a policy-independent budget constraining the sum of task performance (capability) and adversarial robustness. The bound is validated empirically on 252 Gaussian-VLA cells and 48 OpenVLA-7B × LIBERO × PGD configurations, with zero violations observed. A corollary tightens the bound by restricting the adversarial channel to policy-relevant subspaces, revealing OpenVLA-7B already consumes ~24% of its ~31-nat budget.

vision-language-action modelsmutual informationadversarial robustnessdata processing inequalitytask entropy

Read original →

Optimal and Order-optimal Gated Priority-based Greedy Policies for Two-layer Multi-item Order Fulfillment

arXiv cs.LG · Xi Chen, Yuze Chen, Ziyi Chen, Yuan Zhou · 2026-05-25

The paper introduces Gated Priority-based Greedy policies for real-time multi-item order fulfillment in two-layer e-commerce distribution networks, addressing the trade-off between immediate cost savings and inventory preservation. Using an adversarial online model with multiple front distribution centers (FDCs), a regional center (RDC), and time-varying costs, the authors derive competitive-ratio guarantees and near-matching lower bounds. Numerical experiments demonstrate superior performance against myopic and forecast-based benchmarks, providing managerial insights on inventory protection and order splitting.

online fulfillmentcompetitive-ratiotwo-layer distributiongreedy policiesmulti-item orders

Read original →

Conformalised imprecise inference for robust extrapolation under limited data

arXiv cs.LG · Yu Chen, Scott Ferson · 2026-05-25

The paper introduces a conformalised imprecise inference framework for robust extrapolation under distributional shift, addressing limitations in existing uncertainty quantification methods. The model-agnostic approach augments predictive models with imprecision and distance awareness, yielding valid probability boxes (p-boxes) that maintain coverage guarantees while adaptively expanding uncertainty in extrapolation regimes. Experiments on synthetic and benchmark datasets demonstrate improved robustness and reliable coverage compared to standard probabilistic methods, particularly in data-limited scenarios.

conformal predictionimprecise probabilitydistributional shiftuncertainty quantificationprobability boxes

Read original →

The Quantization Benefits of Residual-Free Transformers

arXiv cs.LG · Yiping Ji, Mahalakshmi Sabanayagam, Peyman Moghadam, Hemanth Saratchandran · 2026-05-25

The work demonstrates that residual connections in transformers amplify non-Gaussian activation distributions, increasing quantization error compared to residual-free architectures. Through kurtosis analysis and controlled experiments, it shows residual mixing exacerbates heavy-tailed activations, while dense mixing contracts them. The authors enable trainable residual-free transformers via orthogonal initialization, second-order optimization, and depth-scaled attention temperature, achieving near-Gaussian activations. Results on language tasks show marginally lower full-precision accuracy but significantly improved 4-bit quantization robustness (e.g., <1% drop vs. >5% in residual models), revealing an accuracy-compressibility trade-off in transformer design.

quantization errorresidual-free transformersexcess kurtosisorthogonal initializationattention temperature

Read original →

The Timing Dependencies of Trust: Speed, Accuracy, and cBCI Neuro-Decoupling in Human-AI Teams

arXiv cs.LG · Christopher Baker, Stephen Hinton, Akashdeep Nijjar, Riccardo Poli · 2026-05-25

This study examines how AI intervention timing (Fast/Less-Accurate vs. Slow/Accurate) affects Human-AI team performance in a cBCI-mediated drone task. Using a 2D Adaptive Riemannian Oracle to map spatial covariance, 17 operators performed search tasks under cognitive workload. Fast AI induced blind compliance (50.2% accuracy), while Slow AI caused hesitation (61.1% accuracy) but eventual recovery (100%). Hybrid Fusion improved Fast AI teams by 7.6% and accelerated Slow AI teams by 6.9%, demonstrating that cBCI synergy depends on temporal trust dynamics.

cbcriemannian oraclehybrid fusiontemporal dynamicscognitive workload

Read original →

UNATE: UNsupervised ATomic Embedding for crystal structures property prediction

arXiv cs.LG · Laura Solà-Garcia, Àlex Solé, Javier Ruiz-Hidalgo · 2026-05-25

UNATE proposes unsupervised atomic embeddings for crystal property prediction, addressing data scarcity by leveraging unlabeled structural information. The framework combines a denoising autoencoder with contrastive learning to learn robust atomic representations, which replace raw atomic numbers as input features. Experiments demonstrate a 2.7% accuracy improvement over full-data baselines, with gains up to 10% when only 25% labeled data is available.

unsupervised learningatomic embeddingscontrastive learningdenoising autoencodermaterials discovery

Read original →

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards

arXiv cs.LG · Li Wang, Xiaodong Lu, Xiaohan Wang, Yikun Ban · 2026-05-25

The paper introduces Reinforcement Learning with Active Verifiable Rewards (RLAVR), a method that combines actively acquired ground-truth labels with pseudo-labels to stabilize training in Reinforcement Learning with Verifiable Rewards (RLVR). The authors propose the Corrective Advantage Gap (CAG) metric to identify high-value samples and develop Correction-Aware Reliability Estimation (CARE), a practical acquisition policy. Experiments across various domains, model families, and scales demonstrate RLAVR's effectiveness in improving performance under limited annotation budgets.

reinforcement learningverifiable rewardsactive learningcorrective advantage gaplabel acquisition

Read original →

Minimax Limits of k-Fold Cross-Validation via Majority

arXiv cs.LG · Ido Nachum, Rüdiger Urbanke, Thomas Weinberger · 2026-05-25

(No summary returned.)

Read original →

TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning

arXiv cs.LG · Muyu Pan, Shu Zhao, Nan Zhang, Philip Shin · 2026-05-25

The paper introduces Trajectory-Informed Advantage Reweighting (TIAR) for LLM abstention learning, extending ternary reward approaches with dynamic reward reweighting during Group Relative Policy Optimization (GRPO) training. Methodologically, it leverages trajectory-based confidence indicators to calculate abstention advantages, focusing on hallucination reduction rather than truthfulness improvement. Evaluated on AbstentionBench, TIAR achieves state-of-the-art abstention F1 scores across 5/6 categories, outperforming static ternary baselines on 17/31 datasets while maintaining baseline accuracy.

trajectory-informedadvantage reweightingabstention learninggrpohallucination reduction

Read original →

Geometric Evolution Maps: Extracting Stable Concept Probes from Transformer Residual Streams

arXiv cs.LG · James Henry · 2026-05-25

The paper introduces Geometric Evolution Maps (GEMs), a method for identifying stable concept probe directions in transformer residual streams by tracking directional trajectories and detecting handoff layers where concept representations cease rotating. GEMs analyze 23 architectures (70M-14B parameters) across 17 concept types, showing that concept representations undergo substantial directional rotation (mean entry-to-exit cosine similarity 0.233 in Concept Allocation Zones). GEM-extracted probes outperform peak-layer probes in 66.2% of 391 concept-model pairs, with performance varying by attention type (MHA models favor handoff in 78.3% cases vs. 47.1% for GQA). An adaptive ablation rule improves probe quality in 75.9% of near-final-layer cases (+7.44pp mean gain).

geometric evolution mapsconcept probesresidual streamsconcept allocation zonedirectional rotation

Read original →

Context-Instrumental Data Distillation for Kubernetes Manifest Generation: Method and Experimental Evaluation

arXiv cs.LG · Andrey Kozachok, Anatoliy Bakaev, Aleksandr Kozachok, Shamil Magomedov · 2026-05-25

The paper introduces context-instrumental data distillation for fine-tuning Small Language Models (≤4B params) to generate Kubernetes manifests. The method combines synthetic data generation (via DeepSeek-V4 Flash API) and reverse instruction extraction from real YAMLs, filtered by validators and domain context. Unlike KL-divergence distillation, it uses supervised fine-tuning (LoRA on Qwen2.5-Coder-1.5B-Instruct, CPU-only). On K8s-Distill-Pilot (200 test samples), strict formatting yielded 91.5% full-pass@1, outperforming naive dataset scaling.

small language modelskubernetes manifestsdata distillationlora fine-tuningdomain-specific languages

Read original →

Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation

arXiv cs.LG · Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández · 2026-05-25

The paper introduces Belief-Augmented Generation (BAG), a method that grounds large language models (LLMs) in their own belief state by prompting them to reason over K sampled responses. This enables strategic decisions—answering, clarifying, or abstaining—in conversational settings. Evaluated in a multi-turn ambiguous QA task, BAG improves accuracy across six LLMs and yields more faithful strategy decisions than prompt-only baselines, though distinguishing clarification from abstention remains challenging.

belief-augmented generationlarge language modelsprobabilistic uncertaintyselective predictionmulti-turn qa

Read original →

Branched Signature Kernel Solvers for ODEs with rough Single-Trajectory signals

arXiv cs.LG · Munawar Ali, Qi Feng, Charlie Pyle, George Xu · 2026-05-25

The authors introduce a branched signature kernel solver for ODEs driven by single observed trajectories of rough signals, addressing applications like earthquake engineering and finance. The method combines a count-sampling construction to generate nested training paths from a single observation and a kernel-collocation framework that places ansatzes on derivatives or integrated solutions. A universal approximation theorem is proven using the Hairer--Kelly morphism, and the solver is extended to online settings with linear updates or Newton steps. Experiments on six benchmarks demonstrate accurate and stable performance across diverse regimes.

branched signature kernelcount-samplingkernel-collocationhairer--kelly morphismonline updates

Read original →

Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models

arXiv cs.LG · Yulin Yuan, Hongshuo Zhao, Xiangming Meng · 2026-05-25

The paper introduces Visual-Redundancy-Controlled Decoding (VRCD), an inference-time method for diffusion-based multimodal LLMs (dMLLMs) that mitigates visual redundancy in parallel token decoding. VRCD quantifies redundancy via a Visual Redundancy Index (VRI) and uses token-to-image attention to prioritize visually complementary positions, reducing step-level grounding overlap. Evaluated on M^3CoT and MMBench, VRCD achieves relative accuracy gains of 18.8% and 6.9% respectively over confidence-based decoding, with minimal runtime overhead.

diffusion-based mllmsparallel decodingvisual redundancy indextoken-to-image attentionmultimodal benchmarks

Read original →

On Reliability of Efficient Membership Inference Vulnerability Evaluation

arXiv cs.LG · Joonas Jälkö, Gauri Pradhan, Ossi Räisä, Antti Honkela · 2026-05-25

The work identifies two reliability flaws in efficient membership inference attack (MIA) evaluation pipelines and proposes corrective measures. First, it demonstrates that concatenating MIA scores across multiple individuals for low-FPR TPR estimation creates miscalibration across per-sample FPRs, undermining differential privacy audits; a post-processing calibration method is introduced. Second, it reveals a finite population bias in Carlini et al.'s (2022) likelihood-ratio attack (LiRA) implementation, causing upward bias in per-sample vulnerability estimates. The analysis focuses on statistical miscalibration and computational efficiency trade-offs in MIA vulnerability assessment.

membership inference attacksfalse positive ratelikelihood-ratio attackdifferential privacyfinite population bias

Read original →

Geometry Adaptive Counterfactual Distribution Learning with Diffusion-Guided Smoothing

arXiv cs.LG · Kwangho Kim · 2026-05-25

The authors propose geometry-adaptive estimators for counterfactual distribution learning in high-dimensional outcomes, addressing limitations of isotropic smoothing. Their method integrates diffusion-informed smoothing for counterfactual densities and diffusion-informed score smoothing, combining causal nuisance adjustment with geometry-adaptive localization driven by diffusion score information. This approach removes first-order nuisance bias while aligning smoothing with local outcome geometry, yielding asymptotic expansions, risk bounds, and inference procedures. Under structural geometry conditions, stochastic error is governed by an effective dimension induced by the diffusion-guided kernel rather than the ambient dimension. Semi-synthetic experiments on CelebA demonstrate steeper error decay, validating the effective-dimension theory.

counterfactual distributiondiffusion-guided smoothinggeometry-adaptive localizationnuisance biaseffective dimension

Read original →

On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits

arXiv cs.LG · Yunlong Hou, Zixin Zhong, Vincent Y. F. Tan · 2026-05-25

The paper introduces a stochastic multi-armed bandit problem with a free exploration phase before regret accumulation, formalizing it as regret minimization with free exploration. The authors propose UFE-KLUCB-H, a two-phase algorithm combining a principled free exploration policy (UFE) and a history-aware regret minimization policy (KLUCB-H). Instance-dependent upper bounds show UFE-KLUCB-H achieves strictly lower regret than policies without free exploration, while lower bounds demonstrate near-optimality for two-valued bandits. Simulations confirm the benefits of forced exploration and adaptivity.

multi-armed banditsregret minimizationfree explorationinstance-dependent boundsklucb-h

Read original →

NPSolver: Neural Poisson Solver with Iterative Physics Supervision

arXiv cs.LG · Bocheng Zeng, Rui Zhang, Runze Mao, Mengtao Yan · 2026-05-25

The paper introduces NPSolver, a neural Poisson solver trained via iterative physics supervision without solution labels, addressing instability in physics-informed training and data scarcity. The method uses preconditioned conjugate gradient (PCG) steps to refine predictions, providing a stable training signal, with theoretical justification for stop-gradient optimization. A Boundary-Aware Transolver (BA-Transolver) architecture explicitly handles mixed boundary conditions. Evaluations on 2D/3D irregular geometries show NPSolver outperforms physics-informed and data-driven baselines, with demonstrated efficacy in thermal boundary control tasks.

poisson equationneural operatorphysics-informed trainingpreconditioned conjugate gradientboundary-aware architecture

Read original →

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

arXiv cs.LG · Sam Bowyer, Acyr Locatelli, Kris Cao · 2026-05-25

The paper demonstrates that efficient benchmarking of LLMs can be significantly improved by reformulating it as a multiple regression problem with feature selection. The proposed method combines kernel ridge regression for score prediction with minimum redundancy maximum relevance (mRMR) for optimal question subset selection. Results show superior performance in prediction error (MAE/RMSE) and ranking correlation (Spearman ρ/Kendall τ) across benchmarks, while being computationally faster and more stable than existing approaches.

efficient benchmarkingkernel ridge regressionminimum redundancy maximum relevancefeature selectionmultiple regression

Read original →

MDGMIX: Boundary-Aware Subgraph Mixing for Multi-Domain Graph Pre-Training

arXiv cs.LG · Ziyu Zheng, Yaming Yang, Ziyu Guan, Wei Zhao · 2026-05-25

MDGMIX introduces a boundary-aware subgraph mixing framework for efficient multi-domain graph pre-training, addressing data redundancy in existing joint training approaches. The method constructs challenging mixed-domain subgraphs via boundary node selection, employing hierarchical discrimination (coarse-grained domain discrimination and fine-grained domain decomposition losses) to separate shared and domain-specific patterns. Experiments show MDGMIX outperforms baselines in few-shot classification while improving time/memory efficiency, aided by a lightweight prompt weighting mechanism for knowledge transfer.

multi-domain graph pre-trainingboundary-aware subgraph mixinghierarchical discriminationfew-shot classificationprompt weighting

Read original →

Concept Unlearning via Cross-Attention Activation Projection for Diffusion Models

arXiv cs.LG · Saemi Moon, Suhyeon Jun, Seoyeon Lee, Dongwoo Kim · 2026-05-25

PURE introduces a closed-form concept-unlearning method for text-to-image diffusion models by projecting cross-attention activations rather than relying on text embeddings. The approach constructs forget and retain bases from per-layer cross-attention activations during denoising, applying a single linear projector to key and value weights. Evaluated on a holistic benchmark with ten concepts, PURE reduces target leakage under paraphrased and adversarial prompts while maintaining retain-concept fidelity, achieving superior forget-retain trade-offs compared to existing methods.

concept unlearningcross-attention activationdiffusion modelsclosed-form methodtext-to-image generation

Read original →

Invariant-Based Weight Sharing for Message Passing

arXiv cs.LG · Florian Seiffarth · 2026-05-25

The paper introduces ShareGNN, a structure-aware weight sharing principle for message-passing neural networks (MPNNs) that leverages graph invariants to enable systematic weight reuse across structurally equivalent subgraphs. The method employs a novel encoder-decoder architecture with learnable adjacency and transformer-like connectivity, providing explicit control over model complexity. Experiments on synthetic and real-world datasets demonstrate improved performance over standard MPNNs, competitive expressivity beyond the 1-WL test, and scalability to large graphs.

mpnnsgraph invariantsweight sharingencoder-decoder1-wl test

Read original →

DeGRe: Dense-supervised Generative Reranking for Recommendation

arXiv cs.LG · Chaotian Song, Jingyao Zhang, Chenghao Chen, Zisen Sang · 2026-05-25

DeGRe introduces a dense-supervised generative reranking framework to address heuristic label bias and credit assignment problems in multi-stage recommender systems. The method employs an offline-online decoupled design, utilizing a Lookahead Evaluator with cumulative regression and beam search to mine high-value sequences offline, then distilling step-wise value estimations into a lightweight Online Generator for efficient greedy decoding during online inference. Experiments show DeGRe outperforms baselines on public benchmarks and industrial datasets, with successful deployment on Taobao Flash Shopping improving online recommendations.

generative rerankinglookahead evaluatorcumulative regressionbeam searchgreedy decoding

Read original →

Latent Representation Alignment for Offline Goal-Conditioned Reinforcement Learning

arXiv cs.LG · Hyungkyu Kang, Byeongchan Kim, Min-hwan Oh · 2026-05-25

The paper proposes Latent-Aligned Value Learning (LAVL), an offline goal-conditioned reinforcement learning (GCRL) method addressing erroneous generalization in value functions for long-horizon tasks. LAVL integrates latent-representation-based value generalization with hierarchical planning, introducing inductive bias to improve reliability. Evaluated on OGBench, LAVL outperforms existing methods on 20/22 datasets, particularly excelling in long-horizon and trajectory-stitching tasks where prior approaches degrade. The code is publicly available.

offline reinforcement learninggoal-conditioned rllatent representationvalue functionhierarchical planning

Read original →

The Behavioral Credibility Trilemma: When Calibrated Autonomy Becomes Impossible

arXiv cs.LG · Lauri Lovén, Nam Do, Hassan Mehmood, Dinesh Kumar Sah · 2026-05-25

The paper establishes the Behavioral Credibility Trilemma, proving no RL policy with confidence-gated autonomy can simultaneously maximize helpfulness, calibration, and autonomy under rational oversight for tasks beyond agent competence. Using geometric analysis, the Behavioral Perturbation Lemma quantifies confidence inflation (scaling as $w_A/(2 w_C)$ for Brier score) and detection requirements ($Ω(1/Δ^2)$ observations). Theoretical results show the principal's optimal oversight rule must be non-affine, making the trilemma unconditional across log-concave policy families. A 540-configuration Best-of-N experiment confirms five pre-registered hypotheses (effect sizes $d = 1.10$ to $5.32$) and reveals plateau-truncated frontier geometry in achievable $(H, C, A)$ space.

reinforcement learningconfidence calibrationstrict propernesslog-concave densitiesbehavioral trilemma

Read original →

FLOATBench: A Dataset and Benchmark for Floating Offshore Wind Turbine Tower Fatigue

arXiv cs.LG · João Alves Ribeiro, Bruno Alves Ribeiro, Francisco Pimenta, Sérgio M. O. Tavares · 2026-05-25

FLOATBench introduces a public benchmark dataset for floating offshore wind turbine (FOWT) tower fatigue prediction, addressing the lack of standardized evaluation in the field. The dataset comprises 582,120 per-section fatigue-damage labels derived from 19,404 high-fidelity OpenFAST simulations across three 22 MW FOWT tower geometries. It features a regime-aware alpha-shape partition of the joint wind/wave operating envelope, stratifying test points into in-train, interpolation, and extrapolation regimes. The benchmark includes a reproducible evaluation harness with three protocol levels: random validation, within-tower regime-aware evaluation, and cross-tower transfer. The regime-aware protocol reveals rank shifts between global and extrapolation performance, highlighting limitations of random-split leaderboards.

fatigue-damage predictionopenfast simulationsregime-aware evaluationalpha-shape partitiontabular surrogate modeling

Read original →

Machine Learning Multiscale Interactions

arXiv cs.LG · Àlex Solé, Sergio Suárez-Dou, Albert Mosella-Montoro, Silvia Gómez-Coca · 2026-05-25

The paper introduces Multiscale Structural Ensemble (MuSE), a hierarchical model addressing multiscale interactions in physical systems through Soft Coarse-Graining Pooling. MuSE integrates with MLFFs like SO3krates, MACE, and PaiNN to capture long-range many-body effects across molecules and materials. Benchmarks demonstrate MuSE's accuracy in Hessian-based tests, biomolecular folding, and molecule-graphene nanostructures, outperforming existing long-range ML models in quantum-mechanical interaction modeling.

multiscale structural ensemblesoft coarse-graining poolingmachine learning force fieldslong-range many-body effectsquantum-mechanical interactions

Read original →

PowLU: An Activation Function for Stable Pre-Training of LLMs

arXiv cs.LG · Peijie Jiang, Yuqi Feng, Cunyin Peng, Qian Zhao · 2026-05-25

The paper introduces Power Linear Unit (PowLU), a stable activation function designed for large-scale LLM pre-training, addressing numerical instability in SwiGLU caused by its quadratic amplification of large inputs. PowLU employs a rational power function to achieve adaptive nonlinearity, improving representation ability and training stability in spike regions. Theoretical justification for PowLU's properties is provided. Scaling law experiments confirm consistent performance across model sizes, and empirical results with the Ling architecture (7.9B and 124B parameters) show PowLU achieves competitive results against SwiGLU and SwiGLU-Clip, enhancing LLM scalability.

activation functionlarge language modelsnumerical instabilityscaling lawadaptive nonlinearity

Read original →

How Should LLMs Consume High-Quality Data? Optimal Data Scheduling via Quality-Aware Functional Scaling Laws

arXiv cs.LG · Zhitao Zhu, Xili Wang, Shizhe Wu, Jiawei Fu · 2026-05-25

The paper introduces quality-aware functional scaling laws to optimize joint scheduling of data quality and batch size in LLM training, revealing two regimes for high-quality data utilization. In noise-limited phases, high-quality data acts as a signal amplifier via reduced batch sizes; in signal-limited phases, it suppresses noise via late-stage placement. The proposed Drop-Stable-Rampup method outperforms Warmup-Stable-Decay and Cosine-decay by +1.70 and +2.98 average accuracy respectively on a 15B Mixture-of-Experts model, with GSM8K (+4.23) and MATH (+2.80) showing notable gains.

functional scaling lawsdata-quality schedulingnoise-limited regimesignal-limited regimedrop-stable-rampup

Read original →

Evaluating passing decision-making in professional football: An enhanced MPNN approach to Receiver Selection

arXiv cs.LG · Gabriel Masella, Giuseppe Alessio D'Inverno, Max Goldsmith, Gianluigi Rozza · 2026-05-25

The paper introduces a Graph Neural Network (GNN) framework for predicting Receiver Selection in football by modeling on-field interactions as dynamic graphs. The method employs a Message-Passing Neural Network (MPNN) with nodes representing players (positional/contextual features) and edges encoding passing-line metrics (distance, angle, pressure). Trained on synchronized tracking and event data via an optimized Needleman-Wunsch Algorithm pipeline, the model achieves competitive accuracy in identifying the actual receiver and state-of-the-art top-3 accuracy. It additionally quantifies option likelihood, threat, and creativity, enabling rapid analysis of >1,000 passes.

graph neural networkmessage-passing neural networkreceiver selectionneedleman-wunsch algorithmdynamic graphs

Read original →

Don't Retrain, Just Reuse: Recovering Dual-Target Molecules from Single-Target Diffusion Models

arXiv cs.LG · Qingyuan Zeng, Pengxiang Cai, Zixin Guan, Ziyang Chen · 2026-05-25

The authors propose REUSE, a hierarchical evolutionary input-space search framework that recovers dual-target molecules from frozen single-target diffusion models without retraining or modifying the denoising process. REUSE formulates the task as a constrained multi-objective optimization problem, combining pair-conditioned exploration with structured multi-stage selection to enforce dual-target affinity, chemical quality, and diversity. Experiments demonstrate that REUSE achieves a 20.9-percentage-point improvement in Dual High Affinity over prior baselines while maintaining competitive molecular quality, outperforming methods that modify the diffusion process.

dual-target moleculesdiffusion modelsmulti-objective optimizationinput-space searchevolutionary framework

Read original →

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

arXiv cs.LG · Steve Hanneke, Qinglin Meng, Shay Moran, Amirreza Shaeiri · 2026-05-25

The work characterizes the optimal sample complexity of multiclass PAC learning with bandit feedback in the realizable setting, sharp up to logarithmic factors. It introduces the bandit DS dimension, a combinatorial measure based on generalized pseudo-boxes that aggregates neighbor counts across coordinates, contrasting with the DS dimension's coordinate counting. A ListCascade-based algorithm achieves the derived upper bound, connecting bandit learning to list learning. Theoretical results show sample complexity scaling with total neighbor counts rather than coordinate-wise structure.

pac learningbandit feedbackds dimensionsample complexityrealizable setting

Read original →

Stochastic Estimation of the Layer-wise Hessian Trace for Monitoring Neural-network Training

arXiv cs.LG · Maxim Bolshim, Alexander Kugaevskikh · 2026-05-25

The authors propose a stochastic estimator for the layer-wise trace of the Hessian matrix during neural-network training, addressing the inaccessibility of explicit curvature information in large models (P∼10^6–10^8). The method combines Hutchinson's stochastic trace estimator with a single Hessian-vector product, enabling unbiased per-layer trace estimates via one backward pass. Theoretical analysis reveals weight-sharing introduces bias unless layer-wise Hessians are assembled before differentiation, and derives variance bounds leading to a recommended probe count K∈[5,10]. Applied to ResNet-18/34 and VGG-11 on CIFAR-10/100, the estimator detects label memorization with 179/180 true positives at 16/120 false alarms using a cumulative-sum decision rule.

hessian tracestochastic estimatorweight sharinglabel memorizationresnet

Read original →

Opportunistic Target Selection: Early Directional Commitment for Query-Efficient Black-Box Adversarial Attacks

arXiv cs.LG · Florent Tariolle, Florian Yger · 2026-05-25

The paper introduces Opportunistic Target Selection (OTS), a query-efficient wrapper for black-box adversarial attacks that mitigates class drift by switching untargeted attacks to targeted objectives early. OTS operates without gradient access, architectural changes, or target-class knowledge, functioning as a margin-loss surrogate. Evaluated on three score-based attacks (SimBA, Square Attack, Bandits) across five ImageNet classifiers (4,500 runs), OTS achieves up to +27 pp success rate improvement and 43% query reduction on ResNet-50 for random-search attacks, though it proves redundant for gradient-estimation or margin-loss attacks. Bimodal difficulty distributions on adversarially-trained models nullify its benefits.

black-box adversarial attacksclass driftopportunistic target selectionquery efficiencymargin-loss surrogate

Read original →

Closed-Form Node Classification with Exact Graph Unlearning

arXiv cs.LG · Aditya Gaur, Charu Sharma · 2026-05-25

The paper introduces a closed-form framework for node classification that matches or exceeds gradient-trained GNNs while enabling exact graph unlearning. For assortative graphs, it combines SGC-style propagation with Ridge regression; for heterophilous graphs, it proposes LCF-Net, a layer-wise closed-form network with Gaussian kernel-Ridge heads. Evaluated on 14 benchmarks (including ogbn-arxiv and ogbn-proteins), the method outperforms vanilla 2-layer GCN/SAGE/GAT on 9/9 datasets and ties tuned deep models within one standard deviation on 9/12 small benchmarks. The deterministic solutions permit exact unlearning for graph modifications, with 21–45× speedups over full re-solving and 10^6× over retraining, while theoretical analysis proves K-hop locality for Ridge components.

closed-form solversgraph unlearningridge regressionnode classificationheterophilous graphs

Read original →

StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery

arXiv cs.LG · Yuan-Hao Wei · 2026-05-25

StrTransformer introduces a source-structured Transformer framework for unsupervised blind source recovery, replacing latent variable encoders with direct optimization of a latent source matrix and observation-space mixer. Each source trajectory is processed by a dedicated Transformer branch employing multi-scale patch tokens, random masking, and locality-biased attention, with structural constraints enforced via masked patch reconstruction energy. An ordered multi-scale controller promotes branch specialization through learned patch-scale weights and locality attention slopes. Theoretical analysis examines objective decoupling/coupling and symmetry reduction, while empirical results demonstrate branch convergence to distinct temporal-scale structures and source-aligned latent trajectories.

blind source recoverystructured transformersmulti-scale patcheslocality-biased attentionpermutation symmetry

Read original →

3D Magnetic Field Reconstruction and Mapping with Physics-Informed Neural Networks

arXiv cs.LG · Haohan Yu, Zhanxu Hao, Bingzhi Li, Zejia Lu · 2026-05-25

This study introduces a Physics-Informed Neural Network (PINN) framework for high-precision 3D magnetic field reconstruction, integrating Maxwell's equations into the loss function to enforce divergence-free and curl-free conditions. The method incorporates physics-residual losses at measurement points, ensuring physical consistency beyond random collocation. Validation achieves $10^{-4}$ reconstruction accuracy in simulations (10× improvement over benchmarks) and sub-percent relative accuracy ($10^{-3}$ level) in experimental coil assembly tests, demonstrating robust performance in restricted sensor environments.

physics-informed neural networksmagnetic field reconstructionmaxwell's equationsdivergence-freecurl-free

Read original →

Reinforcement Learning from Denoising Feedback

arXiv cs.LG · Qi He, Huan Chen, Ya Guo, Huijia Zhu · 2026-05-25

The paper introduces Reinforcement Learning from Denoising Feedback (RLDF), a novel paradigm for policy loss estimation in diffusion language models (dLLMs). RLDF leverages feedback from rollout and training processes, optimizing models toward clipped clean states from intermediate noisy states with weighted timestep sampling. Experiments show RLDF improves performance and generalizability across LLaDA and Dream architectures on multiple reasoning benchmarks. The work also presents Drift, a training framework for dLLMs.

reinforcement learningdenoising feedbackdiffusion language modelspolicy loss estimationdrift framework

Read original →

Insuring Every Action: An Authority Frontier Framework for Runtime Actuarial Control of Autonomous AI Agents

arXiv cs.LG · Hao-Hsuan Chen · 2026-05-25

The paper contributes a benchmark-ready framework for runtime actuarial control of autonomous AI agents' side-effect-bearing actions. The proposed Actuarial Action Interface (AAI) enforces deterministic runtime contracts via (i) a quote-bind-commit protocol with capability tokens, (ii) a seven-class action taxonomy for authority normalization, and (iii) pathwise reserve coverage under α-spending. Evaluated across four agentic environments (database, refunds, retail, airline), AAI exhibits domain-specific reserve demands (22x variance in Capital@50) while preventing realized loss in a live Postgres panel with three Azure-hosted models. The Authority Frontier primitive quantifies released autonomy per reserve level, revealing low-reserve refusal patterns.

actuarial action interfaceauthority frontierruntime contractreserve capitalside-effect-bearing actions

Read original →

When In-Distribution Gains Fail: Evaluating Weak-to-Strong Reward Models under Preference Shift

arXiv cs.LG · Khoi Le, Tri Cao, Phong Nguyen, Cong-Duy Nguyen · 2026-05-25

The study identifies a representational failure mode in weak-to-strong (W2S) preference learning under distribution shift, where strong models fine-tuned on weak labels fail to transfer across preference domains. To address this, the authors propose Representation Anchoring (Anchor), a regularizer that constrains representation drift during fine-tuning while permitting task-adaptive updates. Experiments across multiple preference datasets and model families show Anchor improves out-of-distribution transfer by 15-30% while maintaining in-distribution performance, revealing limitations in current W2S reward modeling paradigms.

weak-to-strong generalizationpreference learningrepresentation anchoringdistribution shiftreward modeling

Read original →

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents

arXiv cs.LG · Bowen Wang, Dunjie Lu, Junli Wang, Tianyi Bai · 2026-05-25

CUA-Gym introduces a scalable pipeline for generating verifiable training data for computer-use agents (CUAs), addressing the bottleneck of deterministic reward construction. The method employs Generator and Discriminator agents to co-generate task instructions, environment states, and reward functions, with iterative refinement via an orchestrator and quality filtering via LLM voting and agent rollouts. The pipeline produces CUA-Gym (32,112 verified RLVR tuples across 110 environments) and CUA-Gym-Hub (mock web applications). Trained agents (A3B, A17B) achieve 62.1% and 72.6% on OSWorld-Verified, outperforming prior open-source CUAs and demonstrating transfer to WebArena.

reinforcement learning with verifiable rewardscomputer-use agentsgenerator-discriminator pipelinellm majority votinggspo optimization

Read original →

Analogies between Transformer Layers and Power Method

arXiv cs.LG · Chenglong Li, Claudio Altafini · 2026-05-25

The paper establishes an analogy between transformer layer operations (projections, layer normalizations) and the power method, demonstrating that tokens progressively align with the principal eigenvector of a matrix formed by the product of output and value weight matrices. In transformers with shared weights across layers, this alignment becomes empirically pronounced and analytically tractable. The theoretical framework further enables steering transformer outputs toward arbitrary token-space directions by leveraging eigenvector properties.

transformerpower methodeigenvectorlayer normalizationshared weights

Read original →

Courtroom Analogy: New Perspective on Uncertainty-Aware Classification

arXiv cs.LG · Taeseong Yoon, Heeyoung Kim · 2026-05-25

The paper proposes a courtroom analogy for uncertainty-aware classification, framing it as a structured debate among class-specific advocates. Methodologically, it introduces Mixture of Dirichlet EXperts (MoDEX), a neural architecture that models each advocate's opinion as a Dirichlet distribution with decomposed concentration parameters (shared evidence and class-specific advocacy), yielding interpretable uncertainty aggregation. Experiments show MoDEX achieves state-of-the-art uncertainty quantification performance while providing semantically meaningful uncertainty estimates.

uncertainty quantificationdirichlet distributioninterpretabilityclassificationneural architecture

Read original →

Towards the Connection between Activation Sparsity and Flat Minima

arXiv cs.LG · Ze Peng, Jian Zhang, Lei Qi, Yang Gao · 2026-05-25

This work establishes a theoretical connection between activation sparsity in MLP blocks of Transformers and flat minima in loss landscapes, proposing that sparsity emerges from the ratio between augmented flatness and the product of input norm and activation gradient. The authors introduce derivative sparsity, which generalizes activation sparsity under ReLU and enables backward propagation pruning. Three plug-and-play methods are proposed to encourage sparsity by manipulating this ratio. Experiments on ImageNet-1K and C4 datasets demonstrate 36% improvement in inference sparsity and 50% in training sparsity compared to vanilla Transformers, indicating significant computational cost reduction.

activation sparsityflat minimamlp blocksderivative sparsitytransformers

Read original →

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

arXiv cs.LG · Shuo Huang, Lorenzo Fiorito, Lorenzo Rosasco, Tomaso Poggio · 2026-05-25

The paper develops a theoretical framework for analyzing norm-constrained deep neural networks learning sparse compositional functions represented by directed acyclic graphs (DAGs). By measuring complexity via parameter norms rather than counts, the work establishes approximation rates and excess risk bounds in overparameterized regimes. Results demonstrate that deep networks avoid the curse of dimensionality by exploiting hierarchical structure, with applications to multi-index models, binary trees, and general compositional architectures. The analysis covers all efficiently Turing-computable functions through their sparse compositional representations.

sparse compositional functionsnorm-constrained networksdirected acyclic graphsapproximation ratescurse of dimensionality

Read original →

Decoding Stimulus Reconstruction-Based Auditory Attention Robustly in Unbalanced EEG Datasets

arXiv cs.LG · Yuanming Zhang, Yayun Liang, Zhibin Lin, Jing Lu · 2026-05-25

This study introduces a leave-one-paired-envelope-out (LOPEO) cross-validation protocol to address inflated decoding accuracy in stimulus reconstruction-based auditory attention decoding (AAD) from EEG signals on unbalanced datasets. Using three publicly available EEG-AAD datasets (KUL, DTU, NJU cEEGrid), the authors demonstrate that deep neural networks (DNNs) tend to overestimate performance on unbalanced data. LOPEO effectively mitigates this issue, providing a robust evaluation framework for existing unbalanced datasets. The results validate LOPEO's efficacy in preventing performance overestimation, offering a principled solution for AAD research with imbalanced data.

auditory attention decodingelectroencephalogramstimulus reconstructioncross-validationdeep neural networks

Read original →

DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning

arXiv cs.LG · Guochao Jiang, Jingyi Song, Guofeng Quan, Chuzhan Hao · 2026-05-25

The paper introduces Dynamic Variance-adaptive Advantage Optimization (DVAO), a method for multi-reward reinforcement learning that dynamically adjusts combination weights based on empirical reward variance. DVAO addresses limitations of Reward Combination and Advantage Combination by maintaining bounded advantage magnitudes and incorporating cross-objective regularization. Experiments on mathematical reasoning and tool-use benchmarks with Qwen3 and Qwen2.5 models show DVAO outperforms baselines, achieving superior Pareto frontiers and training stability.

reinforcement learningadvantage optimizationmulti-rewarddynamic variancepareto frontier

Read original →

Generalized Evidential Deep Learning: From a Bayesian Perspective

arXiv cs.LG · Yuanye Liu, Yibo Gao, Yuanyang Chen, Xiahai Zhuang · 2026-05-25

The authors formalize Evidential Deep Learning (EDL) within a generalized Bayesian framework, providing theoretical grounding for prior specification, posterior updates, and training objectives. Their proposed Generalized Evidential Deep Learning (GEDL) unifies existing EDL variants by explicitly disentangling components and linking them to Bayesian distributional uncertainty via asymptotic analysis. Experiments show GEDL achieves comparable performance to specialized variants in classification, uncertainty estimation, and OOD detection while offering systematic extensibility.

evidential deep learningbayesian frameworkuncertainty estimationood detectionasymptotic analysis

Read original →

Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

arXiv cs.LG · Joongkyu Lee, Min-hwan Oh · 2026-05-25

The authors propose an optimal experimental design framework for multinomial logit (MNL) bandits, addressing computational intractability in combinatorial action spaces. The framework combines two approaches: (i) a 0-1 mixed-integer linear program (MILP) with solver-certified early stopping for exact or certified-approximate solutions, and (ii) a polynomial-time lifted design using a tractable surrogate objective. Near G-optimality guarantees are established via the Kiefer-Wolfowitz equivalence theorem, characterizing statistical-computational trade-offs. As an application, they develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, achieving instance-dependent sample complexity of Õ(d log N / Δ²), where d is feature dimension, N is the number of arms, and Δ is the minimum revenue gap.

multinomial logitoptimal designmixed-integer linear programkiefer-wolfowitzsample complexity

Read original →

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

arXiv cs.LG · Joongkyu Lee, Min-hwan Oh · 2026-05-25

The paper introduces DOMD-GLB, a computationally efficient algorithm for nonstationary generalized linear bandits (GLBs) using discounted online mirror descent (DOMD) for parameter estimation. Unlike prior MLE-based approaches requiring O(t) memory/computation per round, DOMD-GLB maintains O(1) costs while handling time-varying parameters via nonlinear link functions. Theoretical analysis yields dynamic regret bounds of Õ(c_μ^{-1/2}d^{3/4}P_T^{1/4}T^{3/4}) for drifting environments and Õ(c_μ^{-1/3}d^{2/3}Γ_T^{1/3}T^{2/3}) for piecewise-stationary cases, where d is feature dimension, P_T path length, and Γ_T change points. This constitutes the first GLB method with time-invariant per-round complexity.

generalized linear banditsdiscounted online mirror descentnonstationary environmentsdynamic regretcomputational efficiency

Read original →

Extreme Region Policy Distillation

arXiv cs.LG · Changyu Chen, Xiting Wang, Rui Yan · 2026-05-25

Extreme Region Policy Distillation (ERPD) addresses the trade-off between sample efficiency and asymptotic performance in reinforcement learning for large language models by decoupling these objectives into a two-stage framework. The first stage performs weakly constrained off-policy optimization on fixed data to extract maximal training signals, while the second stage distills these signals into the base policy under trust-region constraints to prevent harmful drift. ERPD achieves comparable or better performance with reduced KL divergence, demonstrating that much initial divergence is unnecessary. Experiments on mathematical reasoning show ERPD improves strong base models where on-policy training plateaus and reliably enhances weak teachers.

reinforcement learningoff-policy optimizationtrust-region constraintskl divergencepolicy distillation

Read original →

Learning Latent Dynamical Causal Processes for Single-Cell Perturbation Prediction

arXiv cs.LG · Wenkang Jiang, Yuhang Liu, Erdun Gao, Ehsan Abbasnejad · 2026-05-25

The authors propose CITE-VAE, a latent dynamical causal generative model for single-cell perturbation prediction that jointly captures unobserved cellular programs, perturbation-conditioned mechanisms, and temporal evolution. The framework is grounded in identifiability theory, proving latent causal variables are recoverable under standard equivalence classes. Experiments on Causal-3DIdent validate theoretical guarantees, while real-world CRISPR perturbation data demonstrate improved OOD generalization over baselines (specific metrics not provided).

latent causal variablessingle-cell perturbationood generalizationdynamical causal modelidentifiability analysis

Read original →

Geometric Flow Matching for Molecular Conformation Generation via Manifold Decomposition

arXiv cs.LG · Yunqing Liu, Yi Zhou, Wenqi Fan · 2026-05-25

GO-Flow introduces manifold-aware flow matching for molecular conformation generation by decomposing the process into three physically motivated subspaces: translation (linear optimal transport), rotation (geodesic flows on $SO(3)$), and conformation (entropic optimal transport). This approach aligns generative paths with molecular degrees of freedom, leveraging equivariant architectures for rotation-consistent generation. On GEOM-Drugs and GEOM-QM9, GO-Flow achieves SOTA quality, enabling high-fidelity sampling in 50 steps by learning straighter probability paths on intrinsic manifolds.

flow matchingmanifold decompositionoptimal transportequivariant architecturesconformation generation

Read original →

Rao-Blackwellized Score Matching on Manifolds

arXiv cs.LG · Divit Rawal · 2026-05-25

The paper introduces Rao-Blackwellized score matching for denoising on smooth embedded manifolds, addressing the singularity in tangent denoising targets under ambient Gaussian corruption. By conditioning on the nearest-point projection, the method derives the unique L²-optimal predictor among estimators dependent on projected observations. A small-noise expansion reveals that the canonical target equals the intrinsic Riemannian score, corrected by an explicit order-σ² term comprising intrinsic Tweedie and extrinsic curvature components. Results show exact reduction to Gaussian denoising in flat cases and simplification to scalar factors on Sᵈ, with cancellation of extrinsic corrections on S².

rao-blackwellizeddenoising score matchingriemannian scoretweedie correctionweingarten operator

Read original →

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

arXiv cs.LG · Mengyang Sun, Maochuan Dou, Tao Feng, Dan Zhang · 2026-05-25

RotMoLE introduces a rotational gating mechanism to enhance Mixture of Low-rank Experts (MoE-LoRA) for improved representation and generalization in complex scenarios. Unlike conventional scalar reweighing, RotMoLE applies rotation transformations to selected experts, enabling superior exploitation and specialization, particularly with limited expert candidates. The method leverages low-rank structures inherent in MoE-LoRA to implement this mechanism. Empirical validation demonstrates RotMoLE's effectiveness in multi-task and multilingual training scenarios, addressing challenges in adapting Large Language Models to diverse specialized knowledge domains.

mixture of expertslow-rank adaptersrotational gatingparameter-efficient fine-tuningmultilingual training

Read original →

Learning Permutation from Structure Without Supervision

arXiv cs.LG · Ran Eisenberg, Ofir Lindenbaum · 2026-05-25

The paper introduces an entropy-adaptive Gumbel-Sinkhorn formulation for learning permutations from structural objectives without supervision. The method locally modulates temperature based on assignment uncertainty, allowing confident assignments to discretize early while preserving exploration in ambiguous regions. Experiments on sorting, jigsaw reconstruction, and routing tasks demonstrate improved training stability and permutation quality over fixed-temperature baselines, particularly for larger problem sizes and higher ambiguity.

permutation learninggumbel-sinkhornunsupervised learningdoubly stochastic matricesentropy adaptation

Read original →

BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

arXiv cs.LG · Bo Zou, Chao Xu · 2026-05-25

The BC Protocol introduces a structured dual-expert dialogue method for generating high-quality chain-of-thought (CoT) data in LLM post-training, addressing limitations of crowdsourcing, solo expert writing, and RLHF. It pairs domain experts (crystallized intelligence) with knowledge engineers (fluid intelligence) to externalize implicit reasoning, guided by a Participant Aptitude Model and the 'Selection-over-Prescription' principle. In a narrative fiction experiment (n=40), BC Protocol-generated CoT significantly outperformed solo-expert CoT in reasoning naturalness (Group A mean 4.80 vs. Group B 1.30, p=2.4×10⁻⁸, Cliff's δ=1.0) across three judge models (GPT-4o, Claude Opus 4.5, Gemini 2.5 Pro).

chain-of-thoughtpost-trainingelicitationcrystallized intelligenceparticipant aptitude model

Read original →

'Si'multaneous 'S'patial-'T'emporal Message Passing for Dynamic Graph Representation Learning

arXiv cs.LG · Shubhajit Roy, Anirban Dasgupta · 2026-05-25

The paper introduces SiST-GNN, a dynamic graph neural network that simultaneously processes spatial and temporal signals through unified message passing, avoiding the limitations of sequential temporal-first or spatial-first approaches. The method maintains recurrent node states to capture historical trajectories, pairs them with current features, and performs graph convolution on this temporally augmented structure. Evaluated across 14 model-dataset combinations, SiST-GNN achieves 109-277% and 68-194% relative improvements in link prediction over prior methods in fixed-split and live-update settings respectively, while also outperforming discrete-time baselines by 7-22% in node classification tasks.

dynamic graph neural networksmessage passingtemporal augmentationlink predictionnode classification

Read original →

TopoAlign: Topology-Aware Visual Representation Alignment

arXiv cs.LG · Xinyuan Yan, Rita Sevastjanova, Mennatallah El-Assady, Bei Wang · 2026-05-25

TopoAlign introduces a topology-aware framework for comparative analysis of neural representations using mapper graphs from topological data analysis. The method jointly analyzes representation graphs via force-directed layout optimization, identifies local correspondences through automated structural matching, and enables motif-based queries with membrane visualizations. Evaluations on language and multimodal models demonstrate its capability to reveal structural alignment patterns missed by geometric approaches.

representation alignmentmapper graphstopological data analysisforce-directed layoutstructural matching

Read original →

A Multimodal Framework for Dementia Detection via Linguistic and Acoustic Representation Learning

arXiv cs.LG · Loukas Ilias, Dimitris Askounis · 2026-05-25

We propose a multimodal deep learning framework for dementia detection that jointly models linguistic and acoustic features. Speech recordings are processed via HuBERT with attentive statistics pooling, while transcripts are encoded using BERT. An Audio-Text Fusion mechanism combines modalities, enhanced by a MINE objective to maximize mutual information. Evaluated on the ADReSS Challenge and PROCESS-2 datasets, our approach demonstrates robust performance in speech-based dementia assessment.

multimodal fusionhubertmine objectiveattentive statistics poolingaudio-text fusion

Read original →

DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading

arXiv cs.LG · Sayak Charabarty, Souradip Pal · 2026-05-25

The paper introduces group-aware policy optimization methods for high-frequency trading on limit order books, leveraging Order-Flow-based state models and policy-gradient techniques. It proposes variants of Proximal Policy Optimization (PPO), including GRPO and GSPO, which incorporate group-normalized updates and downside-aware shaping, outperforming traditional value-based RL methods like Q-learning. Backtests on financial assets AMZN, AAPL, and GOOG demonstrate improved net average PnL, profitability, and drawdown metrics. Results validate the adequacy of Order-Flow signals as state representations and the superiority of group-aware PPO surrogates over value-based baselines in high-frequency trading scenarios.

policy-gradient methodsorder-flow signalsgroup-normalized updatesdownside-aware shapinghigh-frequency trading

Read original →

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

arXiv cs.LG · Hideitsu Hino, Keisuke Yano · 2026-05-25

This work characterizes the identifiability structure of $k$-DPPs through spectral decomposition $L=UΛU^{\top}$, contrasting it with full DPPs. The analysis reveals that $k$-DPPs exhibit fundamentally different identifiability properties: spectral parameters become identifiable only up to a common scale, and eigenspace rotations are identifiable solely through squared minors of the eigenvector matrix. The authors precisely quantify this identifiability gap via three explicit invariances (scale, sign similarity, and eigenspace rotation) and a dimension-counting theorem, demonstrating additional continuous non-identifiability when $\binom{N}{k}

determinantal point processesspectral decompositionidentifiabilityelementary symmetric polynomialseigenspace rotation

Read original →

SAE-FD: Sparse Autoencoder Feature Distillation for Continual Learning of Large Language Models

arXiv cs.LG · Mingxu Zhang, Yuhan Li, Lujundong Li, Dazhong Shen · 2026-05-25

SAE-FD introduces Sparse Autoencoder Feature Distillation for continual learning in large language models, addressing catastrophic forgetting through sparse feature space regularization. The method leverages a pre-trained Sparse Autoencoder to decompose dense activations into an overcomplete sparse basis, reducing representational entanglement and enabling targeted regularization with minimal interference to new-task learning. Evaluations on two continual learning benchmarks across three architectures demonstrate SAE-FD's superiority over existing regularization-based methods, achieving 52.70% average accuracy with -0.46 backward transfer.

sparse autoencoderfeature distillationcontinual learningcatastrophic forgettingregularization

Read original →

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

arXiv cs.LG · Xifeng Zhang, Jin Zhao · 2026-05-25

FM4PDE introduces a flow-matching generative framework for solving forward and inverse PDE problems with sparse observations, learning joint distributions of coefficients/solutions. The method employs guided sampling via composite losses (measurement agreement, PDE residual reduction) with deterministic, stochastic, and hybrid variants, supported by theoretical error guarantees. Deterministic optimization achieves logarithmic complexity under coercivity, while adaptive stochastic guidance attains polynomial-time bounds by addressing noise-floor bias. Experiments on static/time-dependent PDE benchmarks show superior accuracy and faster inference versus diffusion models.

flow matchingsparse pde reconstructionadaptive guidancedeterministic-stochastic hybriderror guarantees

Read original →

Relative Repairability: A Calibration-Based Diagnostic for High-Sparsity Post-Pruning Allocation

arXiv cs.LG · Qishi Zhan, Liang He, Minxuan Hu, Ziheng Chen · 2026-05-25

The paper introduces Relative Repairability (RR), a calibration-based diagnostic for high-sparsity post-pruning allocation in neural networks. RR evaluates the residual activation distortion after channelwise variance matching repair, estimating the fraction of unrecoverable damage using unlabeled calibration data. Experiments on ResNet18, ResNet34, and VGG16 BN across CIFAR10 and CIFAR100 demonstrate RR's utility near architecture-dependent recoverability transitions, where it outperforms ERK and LAMP in specific sparsity ranges. Findings highlight the importance of allocating both retained weights and repairable damage in high-sparsity pruning.

relative repairabilityhigh-sparsity pruningactivation distortionchannelwise variance matchingrecoverability transition

Read original →

Accelerated Dynamic Importance Weighting with Versatile Divergence-Minimizing Estimators

arXiv cs.LG · Tongtong Fang, Nan Lu, Gang Niu, Kenji Fukumizu · 2026-05-25

The paper proposes Accelerated Dynamic Importance Weighting (ADIW), a unified framework for deep learning under joint distribution shift. ADIW improves efficiency over Dynamic Importance Weighting (DIW) by using lightweight projected gradient descent with warm-start initialization, and generalizes DIW to support multiple divergence-minimizing weight estimators (Kullback-Leibler, squared distance, Wasserstein-1). Theoretical convergence guarantees are provided, and empirical results show ADIW achieves state-of-the-art performance while being significantly more computationally efficient than prior methods.

importance weightingdistribution shiftdivergence minimizationkernel mean matchinggradient descent

Read original →

SafetyRepro: Configuration-Conditional Rank Instability on Alignment Benchmarks

arXiv cs.LG · Yanhang Li, Zhichao Fan, Zexin Zhuang · 2026-05-25

The paper introduces SafetyRepro, a method to quantify configuration-induced rank instability in foundation-model alignment benchmarks. It proposes a finite-envelope proposition linking pairwise disagreement rates to strict ordering reversals, validated via a commit-stamped evaluation protocol. Results demonstrate that benchmark configuration choices alone can reverse pairwise safety verdicts (e.g., 'A is safer than B') across all tested benchmarks, exposing a critical failure mode in comparative evaluations.

configuration-conditionalrank instabilitypairwise disagreementalignment benchmarksstrict reversal

Read original →

JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates

arXiv cs.LG · Kai Yi, Vignesh Vivekraja, Harshit Khaitan, Steven Li · 2026-05-25

JacQuant introduces a STE-free quantization-aware training framework that learns lightweight Jacobian surrogates to model local parameter sensitivity, stabilizing and accelerating training without modifying forward quantizers. The method employs data-driven diagonal or block-diagonal surrogates compatible with common weight/activation quantizers, proving convergence for non-convex objectives and linear rates under PL conditions. Evaluated on ≤2-bit LLM benchmarks, JacQuant consistently outperforms STE-based QAT in accuracy while maintaining negligible runtime overhead under practical group sizes.

quantization-aware trainingjacobian surrogatestraight-through estimatorlow-precision modelsnon-convex optimization

Read original →

Mean-Shift PCA by Knockoff Mean

arXiv cs.LG · Mengda Li, Zeng Li, Jianfeng Yao · 2026-05-25

The paper introduces a two-stage PCA algorithm that removes mean-shift noise by deliberately adding knockoff mean-shift perturbations. Leveraging Random Matrix Theory, the authors prove that mean-shift contamination creates spectrally separable spikes while leaving the original eigenspace asymptotically invariant. The proposed method identifies and eliminates contaminated components using standard PCA operations, addressing a limitation of Robust PCA in high-dimensional regimes with mean-shift mixtures. Theoretical guarantees show spectral stability independent of mixture weights.

robust pcamean-shift contaminationrandom matrix theoryspectral separationknockoff perturbation

Read original →

From Simulation to Enaction: Post-trained language models recognize and react to their own generations

arXiv cs.LG · Asvin G., Jack Lindsey · 2026-05-25

The study demonstrates that post-trained language models implicitly recognize and adapt to their own on-policy generations, unlike pretrained models. Through entropy analysis across model families and sizes, it reveals a 3--4$ imes$ reduction in on-policy output entropy compared to off-policy, linked to an internal representation of input surprise. The work also identifies distinct mechanisms for implicit (via entropy modulation) versus explicit (verbal report) recognition of on-policy contexts, with evidence from topic-specific response prefills.

post-trainingon-policyoutput entropyinput surpriseprefill

Read original →

Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks

arXiv cs.LG · Nil Ayday, Mahalakshmi Sabanayagam, Debarghya Ghoshdastidar · 2026-05-25

The paper systematizes three statistical frameworks for analyzing generalization in Graph Neural Networks (GNNs). First, learning-theory approaches derive uniform convergence bounds via hypothesis class complexity and expressivity through graph isomorphism tests. Second, infinite-parameter asymptotics approximate GNNs using Gaussian processes, neural tangent kernels, or graphon operators to study stability. Third, random graph models (e.g., contextual stochastic block models) enable non-asymptotic error rate analysis via high-dimensional statistics. Each framework's key results and limitations are discussed, highlighting open questions in GNN theory.

graph neural networksuniform convergenceneural tangent kernelgraphon operatorsstochastic block model

Read original →

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

arXiv cs.LG · Zili Zhang, Chengxu Yang, Shenglong Zhang, Chenyu Wang · 2026-05-25

BigMac introduces a novel training pipeline for multimodal large language models (MLLMs) that breaks the Pareto frontier between compute and memory efficiency. The method elegantly nests encoder and generator computations into the original LLM pipeline, achieving O(1) activation memory complexity for these components while maintaining the LLM's activation memory complexity. This design enables simultaneous optimization of computation and memory, achieving 1.08×-1.9× training speedup over baseline systems with stable memory usage as batch size increases.

multimodal llmpareto frontieractivation memorynested pipelinetraining speedup

Read original →

A Signal-Language Foundation Model for Broad-Spectrum Cardiovascular Assessment from Routine Electrocardiography

arXiv cs.LG · Ziqing Yu, Yuhui Tao, Jiayu Huo, Lei Pan · 2026-05-25

We introduce ECG Contrastive Language-Image Pre-training (ECGCLIP), a signal-language foundation model for broad-spectrum cardiovascular assessment from routine electrocardiography. ECGCLIP aligns ECG waveforms with expert diagnostic reports via contrastive learning, pre-trained on 2,837,962 ECG studies from 1,324,856 patients. Evaluated on 89 downstream tasks across nine external cohorts (~1.5M ECGs), ECGCLIP-R34 achieved strong performance for atrial fibrillation (PRAUC 0.900) and ST-segment elevation myocardial infarction (PRAUC 0.383), with robust generalization to rare diseases like Ebstein anomaly (PRAUC 0.253). ECGCLIP matched baseline performance with only 10% of training data, demonstrating data efficiency. Feature visualization revealed clinically meaningful representations aligned with electrocardiographic criteria.

contrastive learningelectrocardiographysignal-languagepruaccardiovascular assessment

Read original →

Missing Pattern Recognized Diffusion Imputation Model for Missing Not At Random

arXiv cs.LG · Gyuwon Sim, Sumin Lee, Heesun Bae, Byeonghu Na · 2026-05-25

The paper introduces PRDIM, a diffusion-based imputation model addressing Missing Not at Random (MNAR) data by explicitly modeling missing patterns. It employs a pattern recognizer within an EM framework to iteratively maximize the joint distribution likelihood of observed values and missing masks. Experiments demonstrate PRDIM's superior imputation performance across diverse data modalities under MNAR conditions.

missing not at randomdiffusion modelexpectation-maximizationpattern recognizerdata imputation

Read original →

Rethinking Feature Alignment in Generalist Graph Anomaly Detection: A Relational Fingerprint-based Approach

arXiv cs.LG · Yujing Liu, Yixin Liu, Yu Zheng, Alan Wee-Chung Liew · 2026-05-25

The paper introduces ReFi-GAD, a generalist graph anomaly detection (GAD) method addressing feature alignment limitations in existing approaches. Current methods rely on PCA-based projection, neglecting feature semantics and causing negative transfer. ReFi-GAD employs a Relational Fingerprint (ReFi) to encode anomaly-indicative cues from contextual and structural perspectives, combined with a transformer-based encoder and SNR-guided refinement for domain adaptation. Evaluations on 14 datasets show ReFi-GAD outperforms state-of-the-art methods.

generalist anomaly detectionrelational fingerprintfeature alignmenttransformer encodersnr-guided refinement

Read original →

SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning

arXiv cs.LG · Zhongling Xu, Shunan Zheng, Wei Wang · 2026-05-25

SeqRoute introduces a global budget-aware sequential LLM routing framework that treats multi-turn interactions as a finite-horizon Markov Decision Process, solved via offline reinforcement learning. It incorporates remaining budget into the state space and employs Conservative Q-Learning (CQL) to strategically allocate resources, alongside Hindsight Budget Relabeling (HBR) to expand training data by simulating trajectories under diverse budgets. A dynamic λ-sweep mechanism enables zero-shot Pareto frontier navigation. Evaluations show SeqRoute reduces operational costs by 6.0-73.5%, maintains or improves quality, and suppresses bankruptcy rates to under 1%, outperforming baselines across the Pareto frontier.

offline reinforcement learningmarkov decision processbudget-aware routinghindsight budget relabelingpareto frontier

Read original →

Capture-Calibrate-Coach: A Graph-Based Framework for Knowledge Monitoring Estimation and Adaptive Feedback

arXiv cs.LG · Gen Li, Li Chen, Cheng Tang, Boxuan Ma · 2026-05-25

The paper introduces Capture-Calibrate-Coach (3C), a graph-based framework for metacognitive learning support that jointly estimates knowledge monitoring and delivers adaptive feedback. The method constructs a heterogeneous learner-concept graph from self-reports, infers latent perceived states via a heterogeneous GNN, and classifies learners into five metacognitive patterns for personalized coaching. Evaluation on 684 students shows 85.21% AUC in latent state prediction, while a 47-participant user study confirms the perceived utility of feedback addressing both knowledge gaps and calibration errors.

knowledge monitoringheterogeneous graph neural networkmetacognitive patternsadaptive feedbackself-regulated learning

Read original →

Generating 3D models from sketches of human faces using a combined approach of Convolutional Neural Networks, Procedural Modeling, and Contour Mapping

arXiv cs.LG · Nancy Iskander · 2026-05-25

The authors present a novel method for generating 3D facial models from sketches by integrating expression detection with model generation. Their approach combines Convolutional Neural Networks (CNNs) trained on a custom dataset to detect Facial Action Coding System (FACS) Action Units, a parametric 3D face model (Valley Girl) for expression duplication, and Active Snake Contours for contour alignment. This marks the first use of CNNs for sketch-based expression detection in literature, enabling more accurate 3D model generation that preserves facial expressions from input sketches.

convolutional neural networksparametric 3d face modelfacial action coding systemactive snake contourssketch-based modeling

Read original →

Autoregression-Free Neural Operators for Time-Dependent PDEs

arXiv cs.LG · Jiaquan Zhang, Caiyan Qin, Haoyu Bian, Libin Cai · 2026-05-25

The authors propose Autoregression-Free Neural Operators (AFNO), a novel framework for solving time-dependent partial differential equations (PDEs) without autoregressive rollout. AFNO maps PDE time evolution into a latent space and models continuous-time vector fields using flow matching, enabling stable long-horizon predictions and explicit conditioning on physical parameters. Theoretical analysis and experiments on six PDE benchmarks show AFNO reduces rollout errors and improves prediction stability compared to autoregressive baselines.

neural operatorspartial differential equationsautoregressive rolloutflow matchinglatent space

Read original →

EMA-Nesterov: Stabilizing Nesterov's Lookahead for Accelerated Deep Learning Optimization

arXiv cs.LG · Chung-Yiu Yau, Dawei Li, Athanasios Glentis, Valentyn Boreiko · 2026-05-25

EMA-Nesterov introduces a stabilized lookahead optimization method for deep learning by replacing Nesterov's standard lookahead direction with an exponential moving average (EMA) of parameter updates. This modification captures low-frequency trends in optimization trajectories through EMA's low-pass filtering, maintaining adaptability via geometric weighting while avoiding instability from noisy short-horizon updates. Theoretical analysis confirms accelerated convergence rates analogous to Nesterov's method in convex settings. Empirical evaluations on language model pre-training demonstrate broad applicability across optimizers like Adam, SOAP, Muon, and NanoGPT, outperforming prior lookahead methods in stability and performance.

exponential moving averagelookahead optimizationnesterov accelerationlow-pass filterconvergence rate

Read original →

A Context Augmented Multi-Play Multi-Armed Bandit Algorithm for Fast Channel Allocation in Opportunistic Spectrum Access

arXiv cs.LG · Ruiyu Li, Guangxia Li, Xiao Lu, Jichao Liu · 2026-05-25

The authors propose a context-augmented multi-play multi-armed bandit (MP-MAB) algorithm for channel allocation in opportunistic spectrum access (OSA), addressing limitations of existing methods by incorporating channel noise as a perturbation of the reward function. They model the correlation between channel state information and noise using both linear and nonlinear approaches, deriving index policies that learn these correlations via a linear model and neural network, respectively. The policies adjust the upper confidence bound using estimated noise values. Numerical experiments demonstrate reduced regret and more rational sub-optimal arm selection compared to existing methods.

multi-armed banditchannel allocationopportunistic spectrum accessupper confidence boundchannel noise

Read original →

ViroBench: Benchmarking Nucleotide Foundation Models on Viral Genomics Tasks

arXiv cs.LG · Dongxin Ye, Fang Hu, Han Hu, Shu Hu · 2026-05-25

ViroBench introduces the first comprehensive benchmark for evaluating nucleotide foundation models (NFMs) in viral genomics, addressing biological understanding and biosecurity risks across 18 scenarios and 4 task types. The study evaluates 66 NFMs, revealing three key findings: performance degradation under phylogenetic and temporal shifts, decoupling between statistical likelihood and biological validity in generation tasks, and the critical importance of taxonomic diversity over parameter scale in pretraining. A lightweight baseline trained on diverse data achieves a 67.5% performance gain. ViroBench provides interpretable evaluations and a reproducible framework, with datasets and code publicly available.

nucleotide foundation modelsvirobenchbiosecurity riskphylogenetic shifttaxonomic diversity

Read original →

Learning manifold diffusion semigroups from graph transition matrices

arXiv cs.LG · Xiuyuan Cheng, Nan Wu · 2026-05-25

(No summary returned.)

Read original →

Not only where, But when: Temporal Scheduling for RLVR

arXiv cs.LG · Jinghao Zhang, Ruilin Li, Feng Zhao, Jiaqi Wang · 2026-05-25

The paper introduces temporal scheduling of credit allocation criteria during RLVR (Reinforcement Learning with Verifiable Rewards) optimization for LLMs, arguing that dynamic scheduling of learning signals improves upon static token-level credit assignment. The method prioritizes targeted tokens early in training before gradually shifting to general optimization, using trajectory percentiles to distinguish policy behaviors. Experiments on mathematical and reasoning benchmarks show temporal scheduling yields healthier policy entropy dynamics and consistent performance gains over standard RLVR approaches.

rlvrcredit allocationtemporal schedulingpolicy entropytrajectory percentiles

Read original →

PDEInvBench: A Comprehensive Dataset and Design Space Exploration of Neural Networks for PDE Inverse Problems

arXiv cs.LG · Divyam Goel, Nithin Chalapathi, Sanjeev Raja, Aditi S. Krishnapriyan · 2026-05-25

The authors introduce PDEInvBench, a benchmark dataset for evaluating neural networks on inverse problems in partial differential equations (PDEs), addressing the gap in existing benchmarks focused on forward problems. The dataset includes time-dependent and time-independent PDE simulations with in-distribution and out-of-distribution evaluation splits. Through systematic exploration of optimization procedures, problem representations, and scaling, they find that two-stage training (supervised pre-training + test-time fine-tuning), PDE derivative features, and diverse initial conditions yield optimal performance. Results demonstrate consistent accuracy improvements from these design choices across varied physical behaviors.

pde inverse problemsbenchmark datasetneural networkstest-time traininginductive biases

Read original →

Certified Robustness from Approximate Gaussian Mixture Structures in Pretrained Latent Spaces

arXiv cs.LG · Konstantinos Emmanouilidis, Tianjiao Ding, Nghia Nguyen, Nicolas Loizou · 2026-05-25

The work introduces a framework for certifiably robust classifiers by exploiting approximate Gaussian mixture structures in pretrained latent spaces. The authors derive necessary and sufficient conditions for robust classifiers in the Gaussian mixture setting, then extend this to cases where the latent distribution is ε-close (in KL divergence) to a mixture, proving graceful degradation of certified accuracy. The method achieves state-of-the-art or competitive certified accuracy on CIFAR-10 and ImageNet while maintaining clean performance and low computational overhead, demonstrating practical certifiable robustness via approximate latent structure.

certified robustnessgaussian mixturelatent spacekl divergenceadversarial perturbations

Read original →

Parameter-Efficient CT Reconstruction via Deep Graph Laplacian Regularization

arXiv cs.LG · Veera Varuni Radhakrishnan, Chinthaka Dinesh, Qurat-ul-Ain Azim · 2026-05-25

The authors propose Deep Graph Laplacian Regularization (Deep GLR), a parameter-efficient LDCT reconstruction method combining quadratic graph regularization with lightweight CNNs in a Proximal Forward-Backward Splitting framework. Deep GLR achieves 30.70 dB PSNR on LoDoPaB-CT (6.33 dB improvement over filtered backprojection) using only 91,848 parameters trained on 1,000 samples, yielding 5.8× better parameter efficiency and 30× better data efficiency per dB than benchmarks. The learned graph bandwidth (ε=1.25) suggests interpretable priors, though a 13 dB gap remains versus SOTA methods.

low-dose computed tomographygraph laplacian regularizationproximal forward-backward splittingparameter efficiencymedical imaging

Read original →

ERNIE-Image Technical Report

arXiv cs.LG · Jiaxiang Liu, Zhida Feng, Pengyu Zou, Zhenyu Qian · 2026-05-25

ERNIE-Image introduces an 8B-parameter single-stream Diffusion Transformer (DiT) for text-to-image generation, aiming to close the performance gap between open-source and proprietary models. The method employs a bottom-up pre-training pipeline combining fine-grained image categorization, dense captioning, aesthetic scoring, and hierarchical sampling, followed by top-down post-training with diversified prompts and stabilized Direct Preference Optimization (DPO). The system includes ERNIE-Image-Turbo for 8-step generation via MT-DMD distillation and a Prompt Enhancer for practical deployment. Evaluations show state-of-the-art open-source performance in instruction following, text rendering, and aesthetics, with released models and the ERNIE-Image-Aes-1K benchmark for reproducible assessment.

diffusion transformerdirect preference optimizationaesthetic assessmentinstruction followingtext-to-image generation

Read original →

Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers

arXiv cs.LG · Keyi Shen, Glen Chou · 2026-05-25

The paper introduces a parallelizable, differentiable reachability framework in JAX for certifying neural dynamics models and controllers in continuous- and discrete-time systems. The method unifies Taylor-model flowpipe construction with CROWN-style linear bound propagation, preserving affine dependencies while enabling GPU-batched computation and automatic differentiation. Applications include certified training for reachability-friendly models and reachability-aware sampling-based MPC with gradient refinement. Experiments on non-prehensile manipulation and quadrotor tasks (up to 72D) demonstrate certified reachable-set over-approximations under bounded uncertainty during online planning.

differentiable reachabilitytaylor-model flowpipecrown-style boundscertified trainingreachability-aware mpc

Read original →

A general tensor-structured compression scheme for efficient large language models

arXiv cs.LG · Ying Lu, Peng-Fei Zhou, Qi-Xuan Fang, Pan Zhang · 2026-05-25

The paper introduces Tensor Mixture (MixT), a tensor-structured compression scheme for efficient large language models (LLMs) that replaces dense linear layers with mixtures of tensor operators. Operating generically on linear projections, MixT is applicable to Transformer-based LLMs and other dense neural mappings. Evaluated on Qwen3-8B and LLaMA2-7B, MixT preserves MMLU accuracy until model-specific boundaries, where output entropy, prediction entropy, and inter-layer geometry shift. At LLaMA2-7B's boundary, MixT reduces parameters by 47.5%, inference FLOPs by 37.1%, training FLOPs by 52.1%, and peak memory by 60.4%.

tensor-structured compressionlarge language modelslinear projectionsmmlu accuracyinference flops

Read original →

CausalFlow: Causal Attribution and Counterfactual Repair for LLM Agent Failures

arXiv cs.LG · Akash Bonagiri, Devang Borkar, Gerard Janno Anderias, Setareh Rafatirad · 2026-05-25

CausalFlow introduces an interventional framework for diagnosing and repairing LLM agent failures through causal attribution. The method models execution traces as sequential chains, computes Causal Responsibility Scores via step-level counterfactual intervention, and generates minimally edited repairs that flip outcomes to success. Evaluated on four benchmarks (mathematical reasoning, code generation, question answering, medical browsing), CausalFlow produces validated minimal repairs with high minimality and causal-consensus scores, outperforming heuristic refinement in complex retrieval settings while enabling reliable improvement across diverse tasks.

causal attributioncounterfactual interventionexecution tracesminimal repairspreference optimization

Read original →

UWM-JEPA: Predictive World Models That Imagine in Belief Space

arXiv cs.LG · Santosh Kumar Radha, Oktay Goktas · 2026-05-25

The Unitary World Model Joint Embedding Predictive Architecture (UWM-JEPA) introduces a density-matrix latent on a joint system-environment space with a learned unitary predictor, preserving the joint-state spectrum during rollout to prevent uncertainty dissipation. This architecture outperforms parameter-matched LSTM-JEPA baselines on a hidden-velocity indicator task, achieving 0.77 accuracy under counterfactual action sequences versus the baseline's 0.53 majority-class accuracy. UWM-JEPA also demonstrates superior robustness in blind rollout, losing fewer than ten points of probe R^2 at short horizons compared to vector-latent baselines losing forty-one and sixty-eight. The results highlight the importance of latent geometry and predictor dynamics, not just context-encoding capacity, for JEPA world models in partially observed environments.

joint embedding predictive architecturedensity-matrix latentunitary predictorblind rolloutcounterfactual action

Read original →

Electricity Consumption Forecasting: An Approach Using Cooperative Ensemble Learning with SHapley Additive exPlanations

arXiv cs.LG · Eduardo Luiz Alba, Gilson Adamczuk Oliveira, Matheus Henrique Dal Molin Ribeiro, Érick Oliveira Rodrigues · 2026-05-25

The study proposes a cooperative ensemble learning approach (Weaker Separator Booster) for electricity consumption forecasting, combining LSTM, RF, SVR, and XGBoost with SHAP-based feature selection and GA/PSO hyperparameter optimization. Using 7-year data from two campuses of Federal Institute of Paraná, the model achieved sMAPE of 13.90% (MAE: 1990.87 kWh) and 18.72% (MAE: 465.02 kWh), outperforming individual methods. SHAP analysis identified lagged time-series values as dominant predictors, with minimal climatic influence.

ensemble learningshapley valueshyperparameter optimizationelectricity forecastingsmape

Read original →

When Interpretability Becomes a Liability: Adversarial Attacks on CBM Concept Layers

arXiv cs.LG · Aditya Sridhar · 2026-05-25

The paper identifies a novel vulnerability in Concept Bottleneck Models (CBMs) where adversarial attacks can manipulate concept layers to induce misclassification. It develops a theoretical framework to quantify concept-space robustness and introduces SPECTRA, a defense method using semantic perturbation-based regularization. Experiments on CUB-200-2011 show SPECTRA increases required perturbation norms from 0.46 to 4,200 while maintaining classification accuracy within 2.2% of baseline.

concept bottleneck modelsadversarial attacksinterpretabilityrobustness regularizationsemantic perturbations

Read original →

Algorithms with Polynomially-Improved Approximation Factors for the $2 \rightarrow q$ Norm, and Applications

arXiv cs.LG · Samuel B. Hopkins, Stefan Tiegel · 2026-05-24

The paper presents polynomial-time approximation algorithms for the $2 \rightarrow q$ matrix norm, achieving improved approximation factors over previous baselines. The authors develop novel techniques to surpass the $d^{1/4}$-approximation baseline, notably achieving $d^{1/8}$-approximation for the $q=4$ case. Their approach involves constructing sum-of-squares certificates, which also enables applications in robust statistics (mean/covariance estimation, regression) and clustering under $q$-th moment constraints. The results address open problems in combinatorial optimization, quantum information, and algorithmic statistics, while circumventing hardness barriers implied by the Exponential Time Hypothesis.

matrix normapproximation algorithmssum-of-squaresrobust estimationexponential time hypothesis

Read original →

A Principled Self-Referenced Early Stopping Approach for Deep Image Prior

arXiv cs.LG · Chaoyan Huang, Cheng-Han Huang, Ismail R. Alkhouri, Rongrong Wang · 2026-05-24

We propose a principled early stopping framework for Deep Image Prior (DIP) that addresses overfitting to noisy measurements by constructing pseudo self-referenced images. Our approach leverages theoretical insights on single-reference validation, pseudo-validation estimation, and shared noise impact, enabling robust overfitting detection without requiring precise noise level estimates. Three novel algorithms are introduced for inverse imaging problems (IIPs), including natural image restoration and medical image reconstruction. Extensive experiments demonstrate consistent performance improvements over existing DIP early stopping methods across varying noise levels and types.

deep image priorearly stoppinginverse imaging problemsoverfitting detectionpseudo-validation

Read original →

Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction

arXiv cs.LG · Hangxuan Li, Renjun Jia, Xuezhang Wu, Yunjie Qian · 2026-05-24

Eureka introduces an LLM-driven framework for agentic feature engineering, where features are generated as executable programs rather than static transformations. The method employs three stages: (1) a domain-expert SFT-tuned agent produces structured feature plans, (2) an LLM translates plans into Python code via chain-of-thought reasoning, and (3) a GRPO-based alignment engine optimizes code quality via dual-channel rewards. Evaluated on 7 benchmarks and Alibaba Cloud GPU demand prediction, Eureka outperforms AutoFE and LLM baselines, improving demand fulfillment by 16% and reducing resource migration by 33%.

agentic feature engineeringchain-of-thought reasoninggrpoautofeself-evolving alignment

Read original →

Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems

arXiv cs.LG · Prashant Shekhar, Caroline Howard · 2026-05-24

The paper contributes an interference-aware experiment design framework for online systems, addressing uncertainty in exposure mechanisms like graph spillovers and temporal carryover. It formulates robust design selection over an ambiguity set, evaluating six implementable designs by worst-case planning risk, which combines exposure bias, variance, and operational cost. Theoretical guarantees include Wasserstein-distance bounds on design bias and minimax tightness under Lipschitz exposure response. Empirical evaluations on Criteo ads, Open Bandit-bts/men, and KuaiRand datasets demonstrate varying design recommendations, with robust risks ranging from 1.295 to 2.240. The framework outputs justified design choices or uncertainty shortlists based on mechanism-robust decisions.

interference-aware designwasserstein distancelipschitz exposurerobust riskmechanism-robust

Read original →

Label-NTK Alignments and A Tighter Convergence Bound in the NTK Regime

arXiv cs.LG · Ruchirinkil Marreddy, Chaoyue Liu · 2026-05-24

The authors derive sharper convergence guarantees for neural network optimization in the Neural Tangent Kernel (NTK) regime by characterizing Label-NTK and Residual-NTK alignment, where label and residual projections onto NTK eigenvectors scale with corresponding eigenvalues. This approach yields a refined convergence bound dependent on the full NTK spectrum, significantly improving over classical worst-case results that rely on the smallest eigenvalue. Theoretical justification under mild data assumptions is provided, along with improved generalization bounds. Empirical validation on MLPs and CNNs across multiple datasets demonstrates alignment with practical training dynamics.

neural tangent kerneleigen-spectrumconvergence boundgeneralization boundlabel alignment

Read original →

Latent Q-Barrier Shielding for Safe In-Context Reinforcement Learning

arXiv cs.LG · Minjae Kwon, Amir Moeini, Shangtong Zhang, Lu Feng · 2026-05-24

The paper introduces Latent Q-Barrier Shielding, a method for safe in-context reinforcement learning (ICRL) that improves reward-safety tradeoffs under out-of-distribution deployment shifts. The approach learns a context representation, latent dynamics, and an ensemble cost critic before deployment, enabling action filtering or reweighting based on remaining budget and predicted future cost without test-time parameter updates. A theoretical result establishes a conditional, error-decomposed barrier-margin guarantee for budget-safe continuations. Empirical evaluation across five safe ICRL benchmarks demonstrates improved returns in four benchmarks and reduced average episode costs in all five compared to a strong baseline.

in-context reinforcement learninglatent dynamicsensemble cost criticbarrier-marginout-of-distribution

Read original →

First, do no harm: Breaking suicidogenic echo chambers in media recommendation

arXiv cs.LG · Alberto Díaz-Álvarez, Raúl Lara-Cabrera, Fernando Ortega-Requena, Víctor Ramos-Osuna · 2026-05-24

The paper introduces RankAid, a re-ranking method for recommender systems that mitigates suicidogenic echo chambers by jointly optimizing clinical safety and predictive relevance. The approach operates as an add-on layer to existing models, dynamically penalizing harmful content and promoting therapeutic items based on user vulnerability levels. Evaluation on the MovieLens 1M dataset, with risk annotations from large language models, demonstrates effective blocking of harmful recommendations during crisis periods while maintaining controlled accuracy degradation (measured by NDCG). The system allows tunable intervention severity through asymmetric hyperparameters aligned with clinical guidelines.

recommender systemssuicidogenic echo chambersre-rankingclinical safetyndcg

Read original →

Quantifying Empirical Compute-Supervision Tradeoffs in RLVR

arXiv cs.LG · Ryo Mitsuhashi, Patrick Chen, Isabelle Tseng, Jasin Cekinmez · 2026-05-24

This study empirically challenges the theoretical prediction that compute scaling can compensate for imperfect supervision in reinforcement learning with verifiable rewards (RLVR). Through controlled experiments on Qwen2.5 (0.5B, 1.5B) models trained with GRPO on GSM8K, we systematically varied verifier noise levels and compute resources (rollouts per prompt). Results show persistent accuracy gaps despite compute scaling, with diminishing returns and asymmetric effects: false negatives degrade performance more rapidly than false positives. These findings demonstrate that verifier quality and compute are not interchangeable, emphasizing the importance of reducing false negatives over pure compute scaling.

reinforcement learningverifiable rewardscompute scalingfalse negativesgrpo

Read original →

Constraint-Anchored Attribution: Feasibility-Certified Counterfactuals and Bonferroni-PAC Sufficient Subsets for Neural CO Policies

arXiv cs.LG · Sohaib Lafifi · 2026-05-24

The paper introduces Constraint-Anchored Attribution (CAA), a method for explaining neural combinatorial-optimization policies via three components: (i) constraint-family decomposition using LP-relaxation duals, (ii) feasibility-certified counterfactuals via a CSP model, and (iii) Bonferroni-PAC sufficient subsets with Hoeffding testing. Evaluated on CVRPTW, Orienteering, and Flexible Job-Shop Scheduling problems, CAA achieves 96.5% and 77.2% alignment with counterfactual signals (vs 75.0% and 35.2% for gradient baselines), with exact agreement in no-gain scenarios. PAC subsets average 5.0 nodes per step (ε=δ=0.2).

combinatorial optimizationcounterfactual explanationlp-relaxationpac learningconstraint satisfaction

Read original →

On the Epistemic Uncertainty of Overparametrized Neural Networks

arXiv cs.LG · David Rügamer · 2026-05-24

The work investigates epistemic uncertainty in overparametrized neural networks, challenging the conventional view that it vanishes with increasing data. Through the lens of parameter non-identifiability, the authors characterize discrete and continuous sources of residual uncertainty, emphasizing that substantial parameter uncertainty persists even when the underlying function is fully identified. Focusing on one-hidden-layer ReLU networks, they analyze the posterior structure and validate theoretical insights empirically. The findings highlight the nuanced relationship between parameter uncertainty and predictive variability in overparametrized models.

epistemic uncertaintynon-identifiabilityoverparametrized networksrelu networksposterior structure

Read original →

A Blended Likelihood Approach for Achieving Fairness Using Naive Bayes

arXiv cs.LG · John Arthur Junior, Abdul Lateef Yussif, Maame G. Asante-Mensah, Charles R. Haruna · 2026-05-24

The Bias Mitigating Naive Bayes (BMNB) classifier introduces fairness-awareness into Naive Bayes through a blended likelihood approach and adaptive thresholding. The in-processing stage combines group-specific and pooled likelihood estimates via a tunable parameter α, while post-processing calibrates outputs with group-specific decision boundaries. BMNB achieves Disparate Impact (DI) values of 1.000, 1.171, and 0.997 and Equal Opportunity Difference (EOD) values of -0.217, -0.226, and -0.053 on Adult, ProPublica, and Framingham datasets, respectively, maintaining computational efficiency. Ablation studies confirm the synergy of blended likelihood and adaptive thresholding.

naive bayesfairness-awareblended likelihooddisparate impactadaptive thresholding

Read original →

Continuous-Depth Field Theory for Transformer Patching and Mechanistic Interpretability

arXiv cs.LG · David N. Olivieri, Antonio F. Pérez Rodríguez · 2026-05-24

The paper introduces a field-theoretic framework for mechanistic interpretability in Transformers, formalizing patching interventions as source insertions in a depth-token field. By treating the residual stream as a continuous field, it models patch effects via sensitivity fields, downstream propagation via empirical Green functions, and patch selection via adjoint variational problems. Experiments on GPT-2-style models demonstrate local linearity in responses, anisotropic propagation patterns across depth and token positions, and transferable behavior via prompt-induced residual displacements. The results establish sensitivity fields and Green operators as foundational tools for systematic patching analysis.

mechanistic interpretabilityresidual streamgreen functionsensitivity fieldadjoint variational

Read original →

Data-Specific Hyper-Parameter Design: A Paradigm Shift in Reservoir Computing

arXiv cs.LG · G Manjunath, Juan-Pablo Ortega, Alma van der Merwe · 2026-05-24

The paper introduces a data-specific hyper-parameter design paradigm for reservoir computing, departing from traditional random reservoir constructions. By analyzing deterministic dynamical systems geometrically, the authors propose aligning reservoir state increments within input-determined subspaces via cone concentration, theoretically reducing ridge-regression error. For echo state networks, they develop a constructive reservoir matrix design maintaining Krylov-chain closure in relevant subspaces while controlling orthogonal mixing. Spectral diagnostics identify predictive information concentration versus spectral pollution. Experiments demonstrate consistent performance improvements over random reservoirs.

reservoir computingecho state networksridge regressionkrylov-chainspectral pollution

Read original →

Personalized Federated Learning by Energy-Efficient UAV Communications

arXiv cs.LG · Shiqian Guo, Jianqing Liu, Beatriz Lorenzo · 2026-05-24

(No summary returned.)

Read original →

Evolving Causal Regulatory Networks (ECR-Net)

arXiv cs.LG · Govind Vallabhasseri Binish, Abdhul Ahadh, Rano Roy Kavanal, Arya Ukunde · 2026-05-24

ECR-Net introduces a bio-inspired framework for adaptive causal discovery by modeling data-generating processes as dynamic Gene Regulatory Networks (GRNs) rather than static graphs. The method employs evolutionary search to optimize regulatory graph topologies, using statistical property shifts as signals for environmental shocks and parsimoniously modifying causal links. This approach enables robust generalization in non-stationary systems by capturing structural adaptation mechanisms.

gene regulatory networkscausal discoveryevolutionary searchstructural adaptationnon-stationary systems

Read original →

Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning

arXiv cs.LG · Ziheng Cheng, Yixiao Huang, Hanlin Zhu, Haoran Geng · 2026-05-24

The paper develops a multi-objective learning (MOL) framework for diffusion models in semi-supervised settings, where paired samples are scarce but unlabeled condition data are abundant. The method employs a two-stage procedure: first training lightweight specialist models on limited paired data, then distilling them into a generalist model via pseudo-sample generation. Theoretical analysis shows generalization bounds where paired sample complexity depends only on specialist model class complexity, extended to sequential decision-making with diffusion policies. Experiments on robotic control and image restoration validate the approach.

diffusion modelsmulti-objective learningsemi-supervised learninggeneralization boundspseudo-sample generation

Read original →

Influence-Inspired Spectral Rotations for Extreme Low-Bit LLM Quantization

arXiv cs.LG · Gorgi Pavlov · 2026-05-24

The paper introduces influence-inspired spectral rotations for extreme low-bit weight-only quantization in large language models (LLMs), building on Walsh-Hadamard transform (WHT) geometry from prior theory. The method involves WHT-rotating each linear layer's weight matrix and rescaling columns by per-coordinate Walsh-basis activation energy before quantization, biasing rounding toward high-spectral-energy channels. Evaluated on decoder-only models (135M–1.5B parameters), the approach reduces WikiText-2 perplexity by 15–58% at W2A16 versus vanilla auto-round, with extensions addressing Qwen3 attention and MoE architectures. Results show device-invariant execution (PPL ±0.1) on Intel hardware, though theoretical transfer from Boolean influence remains unproven.

quantizationwalsh-hadamard transformperplexitylow-bitllm

Read original →

Hide to Guide: Learning via Semantic Masking

arXiv cs.LG · Ruitao Liu, Qinghao Hu, Alex Hu, Yecheng Wu · 2026-05-24

We introduce Semantic Masked Expert Policy Optimization (SMEPO), a novel reinforcement learning with verifiable rewards (RLVR) method that employs fine-grained semantic masking to guide language models on reasoning-intensive tasks. SMEPO selectively masks reward-relevant semantic spans in expert traces while preserving problem-solving structure, transforming hard problems into fill-in-the-blank exercises. This approach prevents reward hacking by forcing models to reconstruct critical content rather than copying expert traces. Evaluated across math, code, and agentic search domains, SMEPO improves accuracy by up to 3.2 points over GRPO and reduces training time by up to 4.2x, demonstrating effective exploration and learning efficiency.

semantic maskingreinforcement learningverifiable rewardsreward hackingexpert traces

Read original →

Localization then Neutralization: Gradient-guided Token Suppression against Visual Prompt Injection Attack

arXiv cs.LG · Dongpeng Zhang, Ke Ma, Yangbangyan Jiang, Gaozheng Pei · 2026-05-24

We propose Gradient Token Masking (GTM), a defense against visual prompt injection attacks on multimodal large language models. GTM localizes critical image tokens via Hidden-State Gradient Norm scoring, which is theoretically guaranteed to align with full adversarial loss gradients, and neutralizes them through masking. This method requires only a single forward-backward pass to identify and suppress a small subset of tokens, effectively disrupting adversarial attack paths. Experiments on prompt injection and multimodal jailbreak attacks demonstrate that GTM reduces attack success rates to near zero while maintaining model utility with minimal computational overhead.

gradient token maskingvisual prompt injectionhidden-state gradient normmultimodal jailbreakadversarial loss

Read original →

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

arXiv cs.LG · Wenlong Deng, Jiaji Huang, Kaan Ozkara, Yushu Li · 2026-05-24

The authors propose trusted-direction projection, a method to mitigate reward hacking in reinforcement learning for language models by constraining gradients to a clean reference subspace. They analyze reward hacking through the geometry of parameter updates, identifying that hacking exhibits larger directional change than clean runs via dominant singular directions. Experiments on mathematical reasoning tasks demonstrate that the approach delays shortcut exploitation and maintains task performance better than unconstrained optimization.

reward hackingreinforcement learninglanguage modelsgradient projectionparameter updates

Read original →

Growing a Neural Network in Breadth, Depth, and Time

arXiv cs.LG · Eivinas Butkus, Kedar Garzón Gupta, Nikolaus Kriegeskorte · 2026-05-24

The authors propose a framework for jointly optimizing neural network architectures across breadth, depth, and temporal recurrence via differentiable cost terms within a recurrent convolutional network. Their method treats the network as a finite subset of an infinite lattice, applying backpropagation to balance task performance against spatial and temporal resource constraints. Results demonstrate trade-offs between these dimensions for accuracy, with emergent computational graphs adapting to task complexity and occlusion (increased recurrence). Notably, model recurrence steps correlate with human reaction times in object recognition (r=0.72).

differentiable architecture searchrecurrent convolutional networksresource-constrained optimizationemergent computational graphsnormative neural modeling

Read original →

Nyström Kernel Stein Discrepancy Tests

arXiv cs.LG · Florian Kalinke, Zoltán Szabó, Bharath K. Sriperumbudur · 2026-05-24

The paper establishes that Nyström-accelerated Kernel Stein Discrepancy (KSD) preserves the asymptotic properties of quadratic-time bootstrap-based goodness-of-fit (GoF) tests while reducing computational cost. By proving that the accelerated method maintains asymptotic level and local consistency, the work enables efficient GoF testing for spherical and functional data. Empirical results demonstrate statistical parity with traditional KSD tests, achieving runtime improvements without accuracy loss.

kernel stein discrepancynyström approximationgoodness-of-fit testingbootstrap methodscomputational efficiency

Read original →

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

arXiv cs.LG · Buxin Su, Jiayao Zhang, Natalie Collina, Yuling Yan · 2026-05-24

The rejoinder addresses critiques of the ICML 2023 Ranking Experiment, which evaluates author self-assessment in ML/AI peer review. It reframes peer review as a statistical estimation problem and proposes the Isotonic Mechanism to mitigate equity and strategic concerns. The response integrates reviewer rankings and structured metadata as complementary signals and explores a human-centered framework for peer review in the context of generative AI. The discussion emphasizes practical deployment challenges and theoretical implications for improving review processes.

peer reviewstatistical estimationisotonic mechanismreviewer rankingsgenerative ai

Read original →

Grow-Prune-Freeze Networks: Adaptive & Continual Learning Technique for Olfactory Navigation

arXiv cs.LG · Kordel K. France, Ovidiu Daescu · 2026-05-24

The paper introduces Grow-Prune-Freeze (GPF) networks, an adaptive continual learning framework for olfactory navigation in non-stationary environments. GPF dynamically modifies policy networks by growing/pruning/freezing layers based on world complexity, grounded in non-linear random matrix theory extensions of Pennington & Worah (2017). The method achieves 94% success in turbulent plume navigation (a partially observable benchmark) via Expected SARSA, with evidence suggesting generalization to Atari RL, image classification, and autoregressive LMs. Theoretical analysis shows preserved eigenvalue composition during layer expansion.

continual learningolfactory navigationrandom matrix theoryexpected sarsanon-stationary environments

Read original →

Learning Treatment Effects during Resource Allocation via Priority-Queue Randomization

arXiv cs.LG · JungHo Lee, Johnna Sundberg, Pim Welle, Bryan Wilder · 2026-05-24

The authors propose an experimental design framework for estimating treatment effects during resource allocation via priority-queue randomization, addressing challenges in public service programs. Their method randomizes incoming applicants into priority queues based on risk scores, allocating treatments across queues in priority order as budgets permit. They characterize identifiable causal effects: standard estimands under exogenous arrivals and local treatment effects under endogenous arrivals via queue randomization as an instrument. Additionally, they develop optimized queue-assignment designs balancing statistical efficiency with prioritization of high-need applicants, demonstrating that iid efficiency bounds remain valid despite treatment assignment dependencies. The framework is validated using data from a U.S. county housing allocation program.

treatment effectspriority queuescausal inferenceresource allocationstatistical efficiency

Read original →

AME-TS: Anchored Mixture-of-Experts for Time Series Forecasting

arXiv cs.LG · Rui Wang, Renhao Xue, Ray Razi, Huan Song · 2026-05-24

AME-TS introduces a structure-guided sparse time series foundation model that improves Mixture-of-Experts (MoE) routing by aligning expert specialization with interpretable temporal structure. The method employs a lightweight regime predictor to estimate series-level descriptors (e.g., forecastability, seasonality, trend, sparsity) and maps them to a soft structural prior over experts, guiding token-level routing during training. On the GIFT-Eval benchmark, AME-TS achieves superior accuracy-efficiency tradeoffs across model scales, outperforming existing models at small scales and remaining competitive at larger scales while activating fewer parameters. Fine-tuning on the M5 dataset demonstrates more interpretable routing geometry and stable expert specialization compared to standard MoE.

mixture-of-expertstime series forecastingsparse routingstructural priortoken-level routing

Read original →

Abduction-Deduction Entanglement: Domain Generalization via Representation Transplants

arXiv cs.LG · Kasra Jalaldoust, Elias Bareinboum · 2026-05-24

The paper introduces a domain generalization framework leveraging causal invariance through representation transplants. By factorizing predictions into abduction (inferring unobserved variables) and deduction (label prediction) maps, the method constrains valid abduction-deduction ensembles via source data. Representation transplants linearly transform representations to manipulate abduction while preserving deduction, enabling search over plausible target distributions. Theoretical analysis shows minimax-optimal target prediction under ideal optimization. Empirical results demonstrate competitive performance on domain generalization benchmarks.

domain generalizationcausal invariancerepresentation transplantabduction-deduction entanglementminimax optimization

Read original →

Beyond the Frontier: Stochastic Backtracking for Efficient Test-Time Scaling

arXiv cs.LG · Dao Tran, Duc Anh Le, Ngoc Luu, Quan Pham · 2026-05-24

The paper introduces stochastic backtracking for efficient test-time scaling in language models, addressing premature pruning in existing PRM-guided methods by maintaining a persistent pool of historical prefixes. Two mechanisms are proposed: Subpool Selection (Top-N within random subpools to revive promising prefixes) and Power Backtrack Sequential Monte Carlo (SMC-style resampling with powered PRM scores). Evaluations on mathematical reasoning benchmarks show improved accuracy per token count and equivalent accuracy with fewer tokens compared to frontier-only PRM baselines.

test-time scalingprm-guided searchstochastic backtrackingsequential monte carlomathematical reasoning

Read original →

ASTRO: Adaptive Spatio-Temporal Reinforcement Optimization for GNN Powered Anomly Detection in Cyber Physical Systems

arXiv cs.LG · Rai Ali Yar, Umaisa Lail, Anwar Shah · 2026-05-24

The paper introduces ASTRO, a reinforcement learning-based anomaly detection framework for IIoT/CPS that dynamically optimizes decision thresholds via DQN. The method combines GNNs (for spatial sensor relations), temporal modeling, and multi-head attention (for salient time steps) to generate adaptive anomaly scores. Evaluated on SWaT and WADI datasets, ASTRO achieves F1-scores of 0.990 and 0.788 respectively, outperforming baselines by 14% on WADI's 127-device network while demonstrating consistent generalization.

anomaly detectiongraph neural networksreinforcement learningmulti-head attentioncyber-physical systems

Read original →

Theoretical Analysis of Sparse Optimization with Reparameterization, Weight Decay, and Adaptive Learning Rate

arXiv cs.LG · Huangyu Xu, Jingqin Yang, Qianqian Xu, Jiaye Teng · 2026-05-24

The paper proposes ReWA, a sparse optimization method combining reparameterization, weight decay, and adaptive learning rates to address instability in ℓ_p regularization (0

sparse optimizationℓ_p regularizationreparameterizationadaptive learning rateweight decay

Read original →

Blocked Gibbs meets Diffusion Transformers: Unsupervised Learning for Constraint Optimization

arXiv cs.LG · Yudong W. Xu, Wenhao Li, Xiaoyu Wang, Scott Sanner · 2026-05-24

BloGDiT introduces blocked Gibbs sampling into Diffusion Transformers for constraint optimization, addressing limitations of standard diffusion in handling discrete variables and global constraints. The method replaces joint Gaussian denoising with blocked Gaussian denoising, iteratively resampling variable blocks while annealing block sizes to enable targeted edits. Evaluated on Sudoku, Graph Coloring, Maximum Independent Set, and MaxCut, BloGDiT matches or surpasses existing methods, demonstrating the efficacy of blocked Gibbs diffusion as an inductive bias for Transformer-based constraint solving.

diffusion transformersblocked gibbs samplingconstraint optimizationdiscrete variablesannealed block resampling

Read original →

PQDT: Pseudo-Query Dual Transformer for Robust Point Cloud Restoration

arXiv cs.LG · Haoqing Wu, Alexa Nawotki, Jochen Garcke · 2026-05-24

The authors propose PQDT, a Pseudo-Query Dual Transformer for robust point cloud restoration that handles diverse degradations (incompleteness, noise, outliers) through a unified architecture. The method introduces a Pseudo-Query module within a Transformer backbone, decomposing geometric translation into two cooperative stages to preserve local details while enhancing structural clarity. Experiments on curated benchmarks demonstrate state-of-the-art performance in joint completion, deformation, and denoising tasks, outperforming specialized single-task approaches. The work provides a point-only backbone for versatile 3D perception without requiring global bottleneck features.

point cloud restorationtransformer architecturegeometric translationlocal detail preservationdegradation robustness

Read original →

Optimizing Multidimensional Scaling in Gini Metric Spaces

arXiv cs.LG · Cassandra Mussard, Stéphane Mussard · 2026-05-24

The authors propose Gini Multidimensional Scaling (Gini MDS), an extension of Euclidean MDS using a Gini pseudo-distance based on values and ranks with a tunable hyperparameter. This method enables flexible exploration of latent configurations for improved embedding alignment with observed dissimilarities. Experiments on 16 UCI datasets with outliers and noisy MNIST images demonstrate Gini MDS's robustness, outperforming standard Euclidean MDS. The implementation leverages PyTorch for GPU acceleration and computational efficiency compared to sklearn's MDS.

gini multidimensional scalingpseudo-distancelatent configurationseuclidean mdsgpu acceleration

Read original →

Inference-Time Alignment of Diffusion Models via Trust-Region Iterative Twisted Sequential Monte Carlo

arXiv cs.LG · Weixin Wang, Yu Yang, Wei Deng, Pan Xu · 2026-05-24

The paper introduces Trust-Region Iterative Twisted Sequential Monte Carlo (TRI-TSMC), a method for inference-time alignment of diffusion models without weight updates. It addresses limitations of existing Sequential Monte Carlo (SMC) approaches—such as weight degeneracy and high-variance estimates—by learning twisting functions via a trust-region framework with closed-form KL-constrained updates and weighted maximum-likelihood projections. Theoretical analysis shows optimal twisting yields zero-variance sampling, while empirical results demonstrate improved alignment in discrete diffusion text generation and text-to-image tasks under fixed inference budgets.

sequential monte carlodiffusion modelsinference-time alignmenttwisting functionstrust-region optimization

Read original →

Trust-Aware Joint Feature-Prediction Discrepancy for Robust Domain Adaptation

arXiv cs.LG · Xi Ding, Lei Wang, Syuan-Hao Li, Yongsheng Gao · 2026-05-24

The authors propose trust-aware domain adaptation, introducing Joint Feature-Prediction Discrepancy (JFPD) to jointly model domain divergence in feature and prediction spaces while weighting contributions by sample-specific trust. Trust is quantified via two mechanisms: uncertainty-aware trust based on prediction entropy and semantic-alignment trust derived from prototype similarity. JFPD prioritizes confident, semantically consistent samples and suppresses noisy ones, providing reliability-aware domain discrepancy estimates. Integrated into a training objective, JFPD guides adaptation toward trustworthy target-domain regions. Experiments on standard benchmarks show superior adaptation performance and discrepancy estimates correlating with target-domain error, addressing trust modeling in feature-prediction interaction for domain adaptation.

domain adaptationdiscrepancy estimationuncertainty-aware trustsemantic-alignment trustfeature-prediction interaction

Read original →

Courant: a State-Adaptive Perceiver-Based Neural Surrogate with Local Support and Interpretable Field Decomposition

arXiv cs.LG · Anuj Kumar, Josiah Bjorgaard, Nikolaos Bouklas, Matteo Salvador · 2026-05-24

The authors propose Courant, a Perceiver-based neural surrogate model featuring state-adaptive latent queries and local support in physical space, mimicking adaptive hp-refinement in numerical solvers. The architecture employs shared random Fourier feature embeddings, lightweight decoding, and trains end-to-end with L_2 loss on steady/transient simulation data. Results show competitive accuracy, with interpretable latents exhibiting multiscale geometric specialization and coherent structure tracking in time-dependent cases, enabling geometry-anchored field decomposition.

perceiver-basedhp-refinementfourier featurelatent queriesfield decomposition

Read original →

Counterfactually Safe Reinforcement Learning

arXiv cs.LG · Jingyi Li, Peng Wu, Chengchun Shi · 2026-05-24

The authors propose a counterfactual safety framework for reinforcement learning that minimizes individual harm while maximizing expected return. They formalize individual harm as the event where an action yields a strictly worse outcome than a baseline alternative and introduce a two-stage procedure for learning harm-aware policies. Theoretical analysis establishes finite-sample properties, derives an upper bound on sub-optimality gap, and demonstrates controlled harm rates. Empirical evaluation on simulated and real-world datasets validates the approach's effectiveness in balancing safety and performance.

reinforcement learningcounterfactual safetyindividual harmsub-optimality gapfinite-sample properties

Read original →

Revisiting Pre-Propagation GNNs: Robust Diffusion Operators and Hidden-State Re-Propagation

arXiv cs.LG · Zichao Yue, Zhiru Zhang · 2026-05-24

The paper introduces robust graph diffusion operators and a few-shot hidden-state re-propagation scheme to enhance pre-propagation GNNs (PPGNNs). PPGNNs decouple feature propagation from transformation, enabling efficient mini-batch training on dense compute accelerators but lag behind message-passing GNNs in accuracy, particularly on heterophilic graphs. The proposed methods bridge this gap, matching message-passing GNN accuracy while preserving training efficiency, as validated on standard benchmarks.

pre-propagation gnnsgraph diffusion operatorsheterophilic graphshidden-state re-propagationmini-batch training

Read original →

Uncertainty-DTW for Sequences and Visual Tokens

arXiv cs.LG · Lei Wang, Syuan-Hao Li, Yongsheng Gao, Piotr Koniusz · 2026-05-24

We introduce uncertainty-DTW (uDTW), a probabilistic framework for aligning structured data that models pairwise correspondences with heteroscedastic uncertainty. uDTW employs a Maximum Likelihood Estimate objective combining precision-weighted matching to suppress unreliable features and log-variance regularization to prevent degenerate solutions. This approach generalizes from temporal sequences to tokenized visual representations, enabling structured matching over visual tokens while providing interpretable uncertainty estimates. Evaluations across diverse domains demonstrate consistent improvements over state-of-the-art methods, with learned uncertainty correlating with semantic importance. The framework establishes uncertainty-aware alignment as a robust and interpretable method for learning from structured data.

heteroscedastic uncertaintydynamic time warpingvisual tokensmaximum likelihood estimatestructured matching

Read original →

Leveraging Gauge Freedom for Learning Non-Gradient Population Dynamics of Stochastic Systems

arXiv cs.LG · Jules Berman, Tobias Blickhan, Benjamin Peherstorfer · 2026-05-24

We introduce Non-Gradient Inference Flows (NGIF), a method for inferring non-gradient population dynamics in stochastic systems by leveraging gauge freedom in vector field selection. NGIF employs a weak formulation of the continuity equation to parameterize general vector fields, enabling criteria beyond minimal kinetic energy. Experiments on low- and high-dimensional physics problems demonstrate that NGIF improves distributional accuracy and better captures non-potential transport compared to gradient-restricted baselines.

population dynamicsgauge freedomcontinuity equationvector fieldskinetic energy

Read original →

RECTOR: Priority-Aware Rule-Based Reranking for Compliance-Aware Autonomous Driving Trajectory Selection

arXiv cs.LG · Hadi Hajieghrary, Benedikt Walter, Chaitanya Shinde, Paul Schmitt · 2026-05-24

RECTOR introduces a rule-based reranking layer for autonomous driving trajectory selection, enforcing a tiered priority of safety > legal > road > comfort constraints via differentiable proxies and scene-conditioned applicability. The method employs a deterministic ε-lexicographic rule to preserve cross-tier priorities without retraining the underlying predictor. Evaluated on the Waymo Open Motion Dataset (43,219 instances, K=6), RECTOR reduces safety+legal violations from 28.58% to 20.42% and total violations from 40.32% to 32.41% compared to confidence-only selection, demonstrating robustness under adversarial confidence corruption (∼96% rejection rate).

trajectory selectionlexicographic optimizationdifferentiable proxiesrule-based rerankingautonomous driving

Read original →

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

arXiv cs.LG · Munsik Kim · 2026-05-24

The paper characterizes the rate-distortion limits of KV cache compression in autoregressive language models through sequential Wyner-Ziv coding, with next-step queries as decoder side information. Empirical analysis across four models (0.5-3B parameters) reveals polynomial (not geometric) decay in next-token distribution sensitivity to context truncation, validated via power-law fits and positional encoding ablations. Theoretical results show suffix-only cache policies achieve distortion ε with window size Θ(ε^{-1/α}), where α is the power-law exponent; a block-Markov scheme matches this bound under certain conditions. Practical evaluations confirm recency-based eviction outperforms random retention by two orders of magnitude.

kv cache compressionwyner-ziv codingautoregressive language modelspower-law decayrate-distortion

Read original →

Security in the Fine-Tuning Lifecycle of Large Language Models: Threats, Defenses,Evaluation, and Future Directions

arXiv cs.LG · Wenjuan Li, Yitao Liu, Runze Chen, Rajkumar Buyya · 2026-05-24

This survey establishes a unified lifecycle framework for analyzing security threats and defenses in LLM fine-tuning, categorizing interventions into pre-tuning, during-tuning, and post-tuning phases. Through systematic review and cross-phase empirical evaluation on 1B-4B parameter models, it reveals scale-dependent attack dynamics (e.g., failed cross-lingual backdoor transfer) and limitations of single-phase defenses. Key findings include non-monotonic attack effectiveness across model generations and safety alignment vulnerabilities from benign samples, highlighting needs for configuration-robust and composable defenses.

fine-tuning lifecyclecross-lingual backdoorsafety alignmentweight-editing attacksembedding-space attacks

Read original →

QML-PipeGuard: Drift-Aware Behavioral Fingerprinting for Quantum Machine Learning Pipeline Integrity

arXiv cs.LG · Esra Yeniaras · 2026-05-24

QML-PipeGuard introduces a contract-based framework for ensuring quantum machine learning (QML) pipeline integrity against hardware drift and adversarial channel substitution. The method employs behavioral fingerprinting via tomographically structured measurements, operating in drift-monitoring and adversarial-detection modes, with theoretical guarantees (tight frame-bound C=√3 for single-qubit Pauli family) and finite-shot sample-complexity bounds. Validation on IBM Heron r2 (ibm_fez) with a two-qubit QSVM pipeline confirms detection of adversarial channels within 1.4×10⁴ shots while tolerating natural hardware drift.

quantum machine learningbehavioral fingerprintingtomographic measurementchannel substitutionsample-complexity

Read original →

Reinforcement Learning for Laser Additive Manufacturing Scan-Order Optimisation: A Bilevel Proxy--FEA Diagnostic Framework for Reward and World-Model Diagnosis

arXiv cs.LG · Xian Wu, Haoran Li, Dongbin Zhao, Ruiyao Zhang · 2026-05-24

The paper proposes a bilevel Proxy--FEA diagnostic framework for evaluating reward functions and world models in reinforcement learning (RL) for laser additive manufacturing scan-order optimization. The method combines lightweight thermo-inspired proxies for rapid candidate generation with sparse Abaqus FEA simulations for reference validation, tested on a LDED32 stripe benchmark with ten scan strategies. Results reveal a stress-distortion trade-off, identify center_out as a robust compromise strategy, and show current path-based proxies primarily capture distortion (U3) with weak FEA correlation, highlighting risks of proxy-only RL reward designs.

reinforcement learningscan-order optimizationfinite-element analysisproxy metricsthermo-mechanical objectives

Read original →

GL-LFGNN:A Global-Local Dual-branch Causal Graph Neural Network Based on Liang-Kleeman Information Flow for EEG Emotion Recognition

arXiv cs.LG · Ziyi Wang, Dongyang Kuang · 2026-05-24

The paper introduces GL-LFGNN, a global-local dual-branch causal graph neural network for EEG emotion recognition, leveraging Liang-Kleeman information flow theory to model asymmetric neural causal interactions. Unlike conventional GNNs using symmetric adjacency matrices, it quantifies directed causal strength via dynamical systems theory, integrating whole-brain connectivity with region-specific processing. Evaluated on the MEEG dataset, GL-LFGNN achieves 86.17% (Arousal) and 86.71% (Valence) accuracy with only 37K parameters, outperforming state-of-the-art models in both efficiency and interpretability.

eeg emotion recognitionliang-kleeman information flowcausal graph neural networkdynamical systems theoryglobal-local dual-branch

Read original →

Random Neural Network Expressivity for Non-Linear Partial Differential Equations

arXiv cs.LG · Muhammed Ali Mehmood, Lukas Gonon · 2026-05-24

This work investigates the expressivity of random neural networks (RaNNs) for approximating solutions to non-linear partial differential equations (PDEs). The authors derive error bounds for RaNN approximations to time-dependent Sobolev functions, achieving a dimension-free approximation rate of 1/2 for sufficiently regular functions. Theoretical results are applied to Porous Medium Equations and Compressible Navier-Stokes Equations, demonstrating RaNNs' capability to approximate solutions efficiently. Numerical experiments validate the derived convergence rates, extending their applicability beyond the theoretical setting.

random neural networksnon-linear pdessobolev functionserror boundsporous medium equations

Read original →

Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

arXiv cs.LG · Ayush K. Varshney, Konstantinos Vandikas, Šarūnas Girdzijauskas, Adam Orucu · 2026-05-24

Neuron-Level Mixed-Precision Quantization-Aware Training (NMP-QAT) introduces adaptive precision allocation at the neuron level, enabling independent learning of discrete precision per neuron during training. The method employs differentiable surrogates and straight-through estimators to expand bit-width only when training signals necessitate, while maintaining a fully discrete inference graph. NMP-QAT adapts both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation models, it achieves superior compression-accuracy trade-offs compared to mixed-precision QAT baselines, making it suitable for Green AI deployments on resource-constrained 6G edge devices.

quantization-aware trainingmixed-precisionneuron-levelstraight-through estimator6g edge devices

Read original →

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

arXiv cs.LG · Mohamed Boussena, Florence Monville, Jacques Fieschi-Meric, Frederic Vely · 2026-05-24

The study introduces Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that handles incomplete multimodal datasets by independently modeling modality-specific features before aggregating predictions via cross-validated stacking. Validated on the PIONeeR study (n=443 patients, 378 biomarkers across 8 sources), MSB outperformed baselines in predicting progression-free survival for NSCLC patients under immunotherapy, with C-index improvements of 15.9% for linear models, 5.4% for random survival forests, and 2.1% for gradient boosting (all p<0.05). MSB also reduced generalization gaps (train-test difference: 0.055 vs 0.380) and identified key predictive biomarkers without bias from missing data patterns.

multimodal stackingblockwise missingnesssurvival analysislate-fusionbiomarker integration

Read original →

TRACE: A taxonomy-grounded synthetic dataset for teaching-program generation and session interpretation in Applied Behavior Analysis

arXiv cs.LG · Festus Kahunla · 2026-05-24

The paper introduces TRACE (Taxonomy-Referenced ABA Clinical Examples), a synthetic 2,999-example instruction-tuning dataset for Applied Behavior Analysis (ABA), addressing the lack of publicly available clinical data due to HIPAA restrictions. TRACE covers two ABA tasks: teaching-program generation (Discrete Trial Training, Natural Environment Teaching, Task Analysis) and multi-session behavioral interpretation (12 trajectory patterns, 13 target behaviors). Examples are generated deterministically via a taxonomy-driven method grounded in ABA literature, with full provenance tracking. The dataset is released under CC BY-NC 4.0 (data) and MIT (code), with stratified splits (2,549 train, 149 validation, 281 test, 20 sanity).

synthetic datasetinstruction-tuningapplied behavior analysistaxonomy-driven generationclinical documentation

Read original →

MimirRAG: A Multi-Agent RAG Framework for Financial Data Retrieval with Metadata Integration

arXiv cs.LG · Magnus Samuelsen, Wilmer Nyström, Somnath Mazumdar, Mansoor Hussain · 2026-05-24

The paper introduces MimirRAG, a multi-agent RAG framework for financial data retrieval, featuring metadata integration, table-aware chunking, and agentic workflows. The system employs structure-preserving PDF parsing, hybrid search, and context-aware generation with numerical reasoning. Evaluation on FinanceBench shows 89.3% accuracy, outperforming baselines, with expert validation emphasizing trust calibration and user personalization. The study identifies metadata integration, table-aware chunking, and agentic workflows as key enablers for effective financial RAG systems.

retrieval-augmented generationmetadata integrationtable-aware chunkingagentic workflownumerical reasoning

Read original →

A perspective on fluid mechanical environments for challenges in reinforcement learning

arXiv cs.LG · Shruti Mishra, Michael Chang, Vamsi Spandan, Shmuel M. Rubinstein · 2026-05-24

The paper proposes fluid mechanics as a testbed for reinforcement learning (RL) in high-dimensional, nonstationary environments, focusing on nonlinear instabilities like droplet breakup and rogue waves. It introduces two RL problem formulations with specified state/action spaces and reward functions, leveraging preserved invariances in fluid dynamics. The authors demonstrate environment generation using Dedalus for stationary navigation tasks, suggesting future work on RL for industrial/scientific flow challenges.

reinforcement learningfluid mechanicsnonlinear instabilitiesnonstationary environmentsdedalus

Read original →

Convex-Neural RRT*: Fast and Reliable Learning-Guided Sampling for High-Quality Robot Path Planning

arXiv cs.LG · Hichem Cheriet, Badra Khellat Kihel, Samira Chouraqui, Bara J. Emran · 2026-05-24

Convex-Neural RRT* introduces neural-guided sampling for high-quality robot path planning by predicting informative waypoint regions and extracting convex candidate regions to focus exploration. The method combines neural network predictions with geometric constraints, preserving global exploration while improving efficiency. Evaluated against Neural RRT*, Neural Informed RRT*, RRT*, and LTA* across 18 benchmark maps, it reduces computation time by 30-75% versus neural-guided variants and up to 88-98% versus LTA*, achieving a 5% average path length reduction over classical RRT* with a 99% success rate across obstacle densities.

sampling-based planningneural guidanceconvex regionsrrt*robot navigation

Read original →

Metropolis-Scale Resilient and Trustworthy Traffic Flow Inference Using Multi-Source Data

arXiv cs.LG · Qishen Zhou, Yifan Zhang, Michail A. Makridis, Anastasios Kouvelas · 2026-05-24

The Task-Aware Attentive Neural Process (TA-ANP) is introduced as a unified probabilistic framework for resilient and trustworthy global traffic state inference (GTSI) by fusing floating car data (FCD) with sparse fixed-detector measurements. TA-ANP leverages neural processes for rapid adaptation to sensing configuration changes and employs a task-aware multi-query attention module to handle three GTSI sub-tasks while mitigating cross-task interference. Uncertainty is quantified using Monte Carlo Dropout for both aleatoric and epistemic uncertainty. Evaluated on the Metropolitan Multi-Source Traffic Dataset (MMTD) with 2,371 road segments, TA-ANP achieves state-of-the-art performance across sub-tasks and demonstrates superior resilience in sensing lifecycle scenarios.

neural processesuncertainty quantificationtraffic state inferencemulti-source datamonte carlo dropout

Read original →

Mitigating Gradient Pathology in PINNs through Aligned Constraint

arXiv cs.LG · Yichen Luo, Peiyu Zhu, Dongxiao Hu, Jia Wang · 2026-05-24

The paper proposes Constraint-Aligned loss with Manifold Lifting (CAML) to mitigate gradient pathology in Physics-Informed Neural Networks (PINNs). By reformulating zeroth-order terms into aligned constraints and introducing a delay factor to bypass high-curvature regions, CAML resolves gradient conflicts between PDE residuals and boundary constraints. Experiments show CAML improves numerical stability and training efficiency for complex PINN problems, outperforming adaptive weighting and hard constraint methods. The method is supported by systematic analysis of gradient pathology through loss landscape and optimization dynamics perspectives.

physics-informed neural networksgradient pathologypartial differential equationsconstraint alignmentloss landscape

Read original →

Scaling up Energy-Aware Multi-Agent Reinforcement Learning for Mission-Oriented Drone Networks with Individual Reward

arXiv cs.LG · Changling Li, Ying Li · 2026-05-24

We propose an energy-aware multi-agent reinforcement learning (MARL) model for mission-oriented drone networks, addressing dynamic environments and limited battery capacity. The model leverages Deep Q-Networks (DQN) with individual reward functions based on task execution progress and remaining battery levels. Simulation studies demonstrate that the model achieves at least 80% success rate across task locations and lengths, scaling robustly with environment size and agent numbers. Compared to shared reward MARL, our approach improves energy efficiency and success rates, reaching nearly 100% success at 40% task density.

multi-agent reinforcement learningdeep q-networksenergy-awaredrone networksindividual reward

Read original →

Selective Test-Time Compute Scaling for Click-Through Rate Prediction via Uncertainty-Triggered Feature Path Exploration

arXiv cs.LG · Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu · 2026-05-24

The paper proposes UTTSI, a training-free framework for Click-Through Rate (CTR) prediction that dynamically scales test-time compute based on per-instance uncertainty. The method combines model logit confidence with data-level frequency priors to distinguish epistemic uncertainty, then applies adaptive feature filtering and stochastic feature-path exploration for uncertain instances, aggregating predictions via consistency-weighted ensembling. Experiments across four datasets and three architectures show statistically significant improvements over baselines, with a 5.3% relative CTR gain in online A/B testing, while maintaining average overhead at 2.8× base model cost.

click-through rate predictiontest-time computeuncertainty estimationadaptive feature filteringconsistency-weighted ensembling

Read original →

Self-Balancing Gradient Allocation for Heterogeneity-Aware Feature Generation in Click-Through Rate Prediction

arXiv cs.LG · Moyu Zhang, Yun Chen, Yujun Jin, Jinxin Hu · 2026-05-24

HeteGenCTR addresses generative difficulty imbalance in click-through rate prediction by introducing per-field learnable difficulty parameters jointly trained with the denoising network. The method employs a self-balancing loss that reallocates gradient budget toward harder fields and a difficulty-guided attention mechanism that suppresses easy fields while amplifying cross-field information flow. Both components utilize the same learned signal, maintaining consistency throughout training. Experiments on five CTR benchmarks and a seven-day online A/B test show statistically significant improvements over state-of-the-art baselines, particularly benefiting cold-start and long-tail users.

click-through rate predictiongenerative difficulty imbalanceself-balancing lossdifficulty-guided attentioncold-start users

Read original →

Learning, locomotion, and navigation of soft synthetic snakes in three-dimensional, heterogeneous environments

arXiv cs.LG · Xiaotian Zhang, Ali Albazroun, Tixian Wang, Songyuan Cui · 2026-05-24

The study presents a computational framework for enabling soft synthetic snakes to navigate unstructured 3D terrains, combining bio-inspired actuation and sensing models with reinforcement learning. Locomotion primitives are first trained in homogeneous environments, then composed into adaptive strategies for complex landscapes. The method demonstrates robustness in high-fidelity 3D environments reconstructed from real-world imaging, achieving reliable navigation for continuum systems.

soft roboticsreinforcement learningcontinuum systemsbio-inspired actuation3d navigation

Read original →

Benchmarking non-conformity score functions in conformal prediction

arXiv cs.LG · Sol Erika Boman · 2026-05-24

The paper benchmarks non-conformity score functions in conformal prediction, addressing a gap in comparative analysis. It reviews existing score functions, proposes novel modifications, and introduces an evaluation method for prediction set sizes. Experiments compare score functions' efficacy, particularly in class-conditional conformal prediction with imbalanced datasets. Results demonstrate variability in prediction set sizes across different score functions, highlighting their impact on conformal prediction's utility.

conformal predictionnon-conformity scoreprediction setsmodel calibrationimbalanced classes

Read original →

Large Language Model Selection with Limited Annotations

arXiv cs.LG · Yavuz Durmazkeser, Patrik Okanovic, Andreas Kirsch, Torsten Hoefler · 2026-05-24

The paper introduces SELECT-LLM, the first framework for active model selection of Large Language Models (LLMs) with limited annotations. The method selects informative queries based on expected information gain derived from pairwise similarities between candidate model outputs, requiring no architectural assumptions or weight access. Evaluated across 23 datasets, 156 models, and diverse tasks, SELECT-LLM reduces annotation costs by up to 81.8% for best model selection and 84.78% for near-best selection, outperforming all baselines.

large language modelsactive learningmodel selectioninformation gainannotation efficiency

Read original →

Bridging the Gap: Enabling Soft Actor Critic for High Performance Legged Locomotion

arXiv cs.LG · Gianluca Sabatini, Chenhao Li, Marco Hutter · 2026-05-24

This work bridges the performance gap between Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) for legged robot locomotion by introducing targeted algorithmic modifications. The proposed enhancements include policy initialization strategies, timeout-aware critic targets, and multi-step return estimation, enabling stable large-scale SAC training. Evaluated across multiple legged robot platforms and diverse locomotion tasks, the modified SAC achieves parity with PPO's empirical performance while maintaining its off-policy advantages for sim-to-real transfer and online adaptation.

soft actor-criticproximal policy optimizationsim-to-real transferlegged locomotionmulti-step return

Read original →

📰 Industry Media (9)

Rethinking organizational design in the age of agentic AI

MIT Tech Review — AI · MIT Technology Review Insights · 2026-05-26

The article introduces Agentic Business Transformation (ABT) as a framework for integrating AI agents into organizational structures, contrasting it with incremental AI adoption. Drawing on industry data (85% organizational ambition vs. 76% infrastructure readiness) and expert analysis from PwC and Ema, it identifies three ABT pillars: technology stack redesign for agentic workflows (e.g., cross-system tacit knowledge), workforce restructuring for hybrid human-AI teams, and outcome-based metrics replacing activity tracking. Early adopters report 30-50% process acceleration and 3x ROI shifts when prioritizing systemic over point solutions.

agentic business transformationtacit knowledgehybrid workforceoutcome-based metricssystems-level change

Read original →

A reality check on the AI jobs hysteria

MIT Tech Review — AI · David Rotman · 2026-05-26

Analysis of US Bureau of Labor Statistics data reveals minimal large-scale AI-driven labor market disruption, with unemployment rates in AI-exposed occupations lower than less-exposed sectors. Stanford Digital Economy Lab's study of 950 occupations using ADP payroll data identifies a 16% decline in entry-level jobs for 22-25-year-olds in high-exposure fields (e.g., software development) post-2024, contrasting with growth for older workers. Task-based analysis shows automation-prone roles declining while augmentation-focused roles expand. Only 20% of companies currently deploy AI, suggesting gradual adoption. Emerging diagnostic tools track sector-specific AI adoption rates (~40% workforce penetration) and productivity impacts.

labor economicstask automationoccupational exposureadp datasetcodified knowledge

Read original →

It’s time to address the looming crisis in entry-level work.

MIT Tech Review — AI · Georgios Petropoulos · 2026-05-26

Recent studies highlight a concerning trend in early-career employment due to AI adoption, particularly in AI-exposed occupations. A Stanford Digital Economy Lab working paper (2025) found a 16% relative decline in employment for workers aged 22-25 in such roles, while experienced workers remained unaffected. This suggests firms are substituting AI for junior tasks traditionally used for skill-building. The Federal Reserve Bank of New York reported rising unemployment (5.6%) and underemployment (42.5%) among recent graduates in Q4 2025. To mitigate this, educational institutions must integrate AI literacy, prompt-based workflows, and verification skills into curricula, while governments and firms should incentivize structured, AI-augmented entry-level roles to preserve long-term workforce development.

early-career employmentai-exposed occupationsprompt-based workflowsverification skillsai-augmented roles

Read original →

Meet OmniVoice Studio: A Local, Open-Source Alternative to ElevenLabs

MarkTechPost · Michal Sutter · 2026-05-26

OmniVoice Studio presents an open-source, locally executable alternative to ElevenLabs' cloud-based voice AI services, offering six core functionalities: voice cloning via 3-second audio clips using zero-shot diffusion-based TTS (supporting 600+ languages), voice design parameterization, video dubbing with WhisperX transcription and Demucs audio separation, real-time dictation, speaker diarization via Pyannote, and batch processing. The architecture combines React/FastAPI with CUDA/MPS/ROCm GPU support, featuring pluggable TTS engines (OmniVoice, CosyVoice 3, MLX-Audio, VoxCPM2, MOSS-TTS-Nano, KittenTTS) and neural watermarking via AudioSeal. Benchmarks show 646-language TTS coverage and 99-language ASR, with CPU fallback for ≤8GB VRAM systems.

zero-shot learningdiffusion-based ttsspeaker diarizationpluggable backendneural watermarking

Read original →

Design a Complete Multimodal RLVR Pipeline with Open-MM-RL, Vision-Language Prompting, Reward Scoring, and GRPO Export

MarkTechPost · Sana Hassan · 2026-05-26

The tutorial introduces a multimodal RLVR pipeline leveraging the TuringEnterprises/Open-MM-RL dataset for vision-language reasoning tasks. It details dataset preprocessing, including schema analysis, domain-specific visualization, and LaTeX block extraction. A verifiable reward function is implemented for exact, numeric, and symbolic answer grading, alongside a LaTeX-to-SymPy converter for mathematical evaluation. The pipeline integrates SmolVLM for inference and exports data in GRPO format for RL training. Initial tests yield a mean reward of 0.3 over six samples, demonstrating the pipeline's utility for multimodal RL applications.

multimodal rlvrvision-language promptinglatex-to-sympyreward scoringgrpo export

Read original →

Together AI Open-Sources OSCAR: An Attention-Aware 2-Bit KV Cache Quantization System for Long-Context LLM Serving

MarkTechPost · Asif Razzaq · 2026-05-25

Together AI introduces OSCAR, an attention-aware 2-bit KV cache quantization system for long-context LLM serving, addressing channel-wise outliers via offline-calibrated rotations derived from query and score-weighted value covariances. The method combines optimal eigenvector-aligned rotations (UQ/US), Walsh-Hadamard transforms, and permuted bit-reversal to achieve 8× memory reduction with minimal accuracy drop (e.g., −0.02 for Qwen3-32B). Integrated into SGLang's paged attention system, OSCAR yields 3× decode speedup at 100K context while maintaining near-BF16 accuracy across benchmarks like AIME25 and RULER-NIAH.

kv cache quantizationattention-aware rotationpaged attentionchannel-wise outlierswalsh-hadamard transform

Read original →

Step by Step Guide to Build and Compare FedAvg and FedProx Federated Learning on Non-IID CIFAR-10 with NVIDIA FLARE

MarkTechPost · Sana Hassan · 2026-05-25

This tutorial implements a federated learning experiment comparing FedAvg and FedProx algorithms on non-IID CIFAR-10 data using NVIDIA FLARE. The authors partition CIFAR-10 across 3 clients using a Dirichlet distribution (α=0.3) to simulate label imbalance, then train a CNN model for 5 communication rounds with local epochs=1. Results show global test accuracy evolution, demonstrating FedProx's (μ=0.1) performance relative to FedAvg under heterogeneous data conditions. The implementation leverages NVFlare's Job API for server orchestration and Client API for local training, model exchange, and aggregation.

federated learningnon-iiddirichlet distributionnvflarefedprox

Read original →

Autonomous AI systems test governance in physical environments

AI News · Muhammad Zulhusni · 2026-05-26

The Infocomm Media Development Authority (IMDA) of Singapore released version 1.5 of its Model AI Governance Framework, addressing risks posed by autonomous AI systems in physical environments. The framework emphasizes iterative risk assessment, human oversight, technical controls (e.g., least-privilege access), and continuous monitoring through simulation and telemetry. Case studies from Grab and OCBC Bank demonstrate deployment challenges, including reliability testing and task-level autonomy. A Reuters/Nikkei survey indicates 34% of Japanese firms are adopting AI robots, primarily in manufacturing. The framework highlights amplified physical risks compared to digital systems, necessitating multi-stakeholder accountability across the AI value chain.

agentic aileast-privilege accesstelemetry monitoringembodied aiiterative testing

Read original →

Proving the case on day two at TechEx North America

AI News · AI News · 2026-05-20

The TechEx North America conference addressed key challenges in enterprise AI adoption, focusing on transitioning from experimental pilots to durable systems. Sessions analyzed governance frameworks, risk control, and ROI measurement, emphasizing cross-functional collaboration and data lineage. Agentic AI emerged as a critical area, requiring formal evaluation and boundary definitions for system-level actions. Cybersecurity tracks highlighted the 'GenAI velocity gap', where adoption outpaces security oversight, necessitating zero-trust architectures for AI systems and workflows. Government transformation cases demonstrated AI's role in public service reliability and explainability. The conference underscored that successful AI implementation depends on organizational change readiness, data quality, and accountable outcome alignment.

agentic aigovernance frameworksgenai velocity gapzero-trust architectureschange readiness

Read original →

Generated automatically at 2026-05-26 21:18 UTC. Summaries and keywords are produced by an LLM and may contain inaccuracies — always consult the original article.