Frontier Atlas

Frontier Atlas

Discover AI Research

Explore the latest papers, methods, and breakthroughs
from the world's AI research community.

Paper thumbnail

GLM-5.2: Built for Long-Horizon Tasks

Z.ai Team · Jun 16, 2026

GLM-5.2 is Z.ai's latest flagship open-weight model for long-horizon agentic engineering. The release extends GLM-5.1 with a solid 1M-token context, IndexShare sparse-attention efficiency, improved MTP speculative decoding, and flexible thinking-effort controls. Benchmarks report stronger coding and agentic performance on SWE-Bench Pro, Terminal-Bench, and MCPAtlas.

🏆 SOTA onAIME 2026,HMMT Feb 2026,PostTrainBench·#3 onFrontierSWE, NL2Repo

Agents Coding Agents Language Modeling Math World Knowledge

DeepSeek Sparse Attention MCP Mixture-of-Experts (MoE)Transformer

Paper thumbnail

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Yushi Bai, Qian Dong, Ting Jiang, +5 authors · Mar 12, 2026

Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both inference speed and serving cost. Sparse attention addresses this challenge effectively, and DeepSeek Sparse Attention (DSA) is a representative production-grade solution: a lightweight lightning indexer selects the most relevant KV pairs per layer without recomputing full attention.

🏆 SOTA onLongBench·#2 onRULER (128K)

Language Modeling Long Context Efficiency

DeepSeek Sparse Attention Gated DeltaNet Key-value cache Kimi Delta Attention

Paper thumbnail

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Taebong Kim, Youngsik Hong, Minsik Kim, +4 authors · May 14, 2026

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints.

🏆 SOTA onNatural Questions·#3 onCommonsenseQA

Language Modeling Reasoning Model Merging

Mamba Post-training Transformer

Paper thumbnail

Qwen3.5: Towards Native Multimodal Agents

Qwen Team · Feb 16, 2026

Qwen3.5 release blog introducing the open-weight Qwen3.5-397B-A17B native vision-language model and reporting benchmark evaluations across language, coding, agents, multimodal understanding, and video understanding tasks at scale.It highlights significant performance improvements across benchmarks.

🏆 SOTA onC-Eval,IFEval,MathVista,MMLU-Pro +20 more

Agents Image Understanding Language Modeling Video Classification

Qwen3 Mixture-of-Experts (MoE)Post-training

Paper thumbnail

LTX-2: Efficient Joint Audio-Visual Foundation Model

Yoav HaCohen, Benny Brazowski, Nisan Chiprut, +26 authors · Jan 6, 2026

We introduce LTX-2, an open-source foundational model capable of generating high-quality, temporally synchronized audiovisual content in a unified manner. Recent text-to-video diffusion models remain silent — LTX-2 closes this gap by jointly modeling visual and audio streams end-to-end with a shared diffusion backbone.

🏆 SOTA onEvalCrafter,VBench·#2 onAudioCaps

Audio Generation Video Generation Multimodal

Classifier-free guidance Diffusion Diffusion Transformer (DiT)Flow matching

Paper thumbnail

NVIDIA Nemotron 3: Efficient and Open Intelligence

Nvidia, Aaron Blakeman, Aaron Grattafiori, +355 authors · Dec 24, 2025

We introduce the Nemotron 3 family — Nano, Super, and Ultra — delivering strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens.

🏆 SOTA onLiveCodeBench (v5),RULER,TauBench·#2 onWMT24++#3 on AIME 2025

Agents Language Modeling Reinforcement Learning

Function calling GRPO Mamba Mixture-of-Experts (MoE)

Paper thumbnail

DFlash: Block Diffusion for Flash Speculative Decoding

Jian Chen, Yesheng Liang, Zhijian Liu · Feb 5, 2026

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM.

🏆 SOTA onHumanEval·#3 onSpec-Bench

Language Modeling Efficiency Inference

DeepSeek-R1 Diffusion Speculative decoding

Paper thumbnail

Tmax: A simple recipe for terminal agents

Hamish Ivison, Junjie Oscar Yin, Rulin Shao, +3 authors · Jun 22, 2026

Terminal-using agents have quickly become the most popular downstream application of language models. Despite their prevalence, relatively little academic work has examined RL-based training of these models. We present Tmax, the strongest open RL recipe for terminal agents to date.

🏆 SOTA onSWE-Bench Verified·#2 onTerminal-Bench

Coding Agents Language Modeling Reinforcement Learning

Fine-tuning GRPO Post-training

Paper thumbnail

Qwen3.6

Qwen · Apr 21, 2026

Qwen3.6 is an open-weight model focused on agentic coding, repository-level reasoning, long-context usage, and multimodal language-model capabilities. The release includes 35B-A3B and 27B base models plus their FP8 quantized variants for efficient local deployment.

🏆 SOTA onRealWorldQA,Video-MME·#2 onClaw-Eval, MMBench-V1.1 +1 more

Agents Image Understanding Language Modeling Coding Agents

fp8 Mixture-of-Experts (MoE)Multi-head attention Qwen3

Paper thumbnail

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Sen Xu, Shixi Liu, Wei Wang, +6 authors · Jun 15, 2026

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. We systematically enhance the model through an optimized pipeline including curriculum-based training and reward shaping.

#2 onAIME 2026 #3 on IFEval

Language Modeling Reasoning Reinforcement Learning

DeepSeek Sparse Attention Fine-tuning GRPO Post-training Test-time scaling

Paper thumbnail

Qwen3-TTS Technical Report

Hangrui Hu, Xinfa Zhu, Ting He, +13 authors · Jan 22, 2026

We present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of-the-art 3-second voice cloning and description-based control, enabling fine-grained manipulation over output speech prosody, style, and speaker identity.

🏆 SOTA onLibriSpeech,UTMOS·#2 onVoiceBench

Audio Generation Text To Speech Voice Cloning

Diffusion Transformer (DiT)Direct Preference Optimization (DPO)Flow matching

Paper thumbnail

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Jian Xie, Tianhe Lin, Zilu Wang, +16 authors · May 22, 2026

Deep research agents extend the role of search engines from retrieving keyword-matched pages to synthesizing knowledge. We propose QUEST, a fully synthetic task generation pipeline for training open deep research agents that generalize across diverse task types and domains.

🏆 SOTA onGAIA·#2 onBrowseComp-Plus

Agents Reinforcement Learning Reasoning

Deep Research DeepSeek-R1 Direct Preference Optimization (DPO)Fine-tuning

Paper thumbnail

Data Science and Technology Towards AGI Part I: Tiered Data Management

Yudong Wang, Zixuan Fu, Hengyu Zhao, +14 authors · Feb 9, 2026

The development of artificial intelligence can be viewed as an evolution of data-driven learning paradigms. We propose a tiered data management framework that moves beyond unidirectional data scaling, enabling more structured and efficient data curation for LLM pretraining.

🏆 SOTA onMMLU-Pro·#3 onHellaSwag

Language Modeling Data Curation Pretraining

DeepSeek-R1 Direct Preference Optimization (DPO)GRPO Key-value cache

Paper thumbnail

MiniMax Sparse Attention

Xunhao Lai, Weiqi Xu, Yufeng Yang, +8 authors · Jun 11, 2026

Ultra-long-context capability is becoming indispensable for frontier LLMs. We introduce MiniMax Sparse Attention, a production-grade sparse attention mechanism targeting hundreds-of-thousands to millions of tokens, achieving quadratic cost reduction without degrading downstream task accuracy.

🏆 SOTA onRULER·#2 onLongBench (1M)

Language Modeling Long Context Efficiency

Big Bird Flash Attention Gated DeltaNet Grouped-Query Attention

Paper thumbnail

MOSS-TTS Technical Report

Yitian Gong, Botian Jiang, Yiwei Zhao, +23 authors · Mar 18, 2026

We present MOSS-TTS, a speech generation foundation model built on discrete audio tokens, autoregressive modeling, and large-scale pretraining. Built on MOSS-Audio-Tokenizer, it compresses 24 kHz audio to 12.5 fps with variable-bitrate RVQ and unified semantic-acoustic encoding.

🏆 SOTA onUTMOS·#3 onLibriSpeech (WER)

Audio Generation Text To Speech Voice Cloning

Pre-training Qwen3 Scaling Laws SoundStream Transformer WaveNet

Paper thumbnail

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-AI · Apr 24, 2026

We present DeepSeek-V4 series, including two strong Mixture-of-Experts language models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting a context length of one million tokens with key architectural upgrades.

🏆 SOTA onLiveCodeBench (v6),MRCR v2 (1M)·#2 onMMLU-Pro

Agents Language Modeling Long Context Coding Agents

Mixture-of-Experts (MoE)Post-training Sparse attention Transformer

Paper thumbnail

Holo3.1: Fast & Local Computer Use Agents

Holo Team · Jun 1, 2026

Holo3.1 improves robustness across web, desktop, and mobile environments as well as agent frameworks. It introduces native function-calling support alongside quantized checkpoints (FP8, Q4 GGUF, NVFP4) for local inference on consumer hardware, with new small model sizes (0.8B, 4B).

🏆 SOTA onAndroidWorld,OSWorld·#2 onOSW-G (OSWorld-G)

Agents Computer Use Image Understanding Language Modeling

Function calling Mixture-of-Experts (MoE)Qwen3

Paper thumbnail

FastContext: Training Efficient Repository Explorer for Coding Agents

Shaoqiu Zhang, Maoquan Wang, Yuling Shi, +5 authors · Jun 12, 2026

LLM coding agents achieve strong results on software engineering tasks, yet repository exploration remains a major bottleneck. We train a dedicated lightweight explorer model that locates relevant code efficiently, freeing the main agent's context budget for actual problem-solving.

🏆 SOTA onSWE-Bench Verified·#3 onRepoQA

Coding Agents Language Modeling Question Answering

Fine-tuning GRPO Post-training

Paper thumbnail

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

Dekoninck, Jasper, Jovanović, Nikola, Gehrunger, Tim, +4 authors · May 1, 2026

Large language models are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient. We introduce MathArena, a living evaluation platform with regularly updated competition problems that prevents saturation and enables reliable model comparison over time.

🏆 SOTA onMathArena·#2 onAIME 2026

Language Modeling Math Reasoning

Chain-of-Thought (CoT)Test-time scaling

Paper thumbnail

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Junmin Gong, Yulin Song, Wenxiao Zhao, +3 authors · Jan 31, 2026

We present ACE-Step v1.5, a highly efficient open-source music foundation model that delivers commercial-grade generation on consumer hardware. ACE-Step v1.5 generates a full song under 2 seconds on an A100 and under 10 seconds on a consumer GPU while matching or exceeding commercial music models on standard metrics.

🏆 SOTA onMusicCaps·#2 onSong Describer Dataset

Audio Generation Language Modeling Reinforcement Learning

Chain-of-Thought (CoT)Diffusion Transformer (DiT)LoRa Reward model

Paper thumbnail

Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets

Sultan Alrashed, Francesco Orabona · Dec 21, 2025

Multilingual data from the web is essential for LLM pretraining, yet scraping it is expensive and research groups repeatedly crawl the same content. We found that over 40% of tokens across major Arabic web corpora are duplicated between sources, and propose using this redundancy as a quality signal for building high-quality multilingual datasets.

🏆 SOTA onmC4·#3 onArabic NLP Benchmark

Language Modeling Pretraining Data Curation

Pre-training T5

Paper thumbnail

HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang, Changling Liu, Chenyu Wang, +6 authors · May 20, 2026

The current pretraining paradigm relies on massive compute and internet-scale raw text, creating a barrier to foundational research. We propose HRM-Text, a biologically-inspired multi-timescale pretraining architecture modeled on the frontoparietal loop, enabling sample-efficient learning without raw scale.

🏆 SOTA onARC-Challenge·#3 onBabyLM Benchmark

Language Modeling Pretraining Reasoning

Pre-training Reasoning model RoPE Transformer

Paper thumbnail

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Jianing Zhang, Chenhao Zheng, Yajun Yang, +10 authors · Jun 17, 2026

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move to plan actions and reason about physical interactions. We argue that 3D points in world coordinates provide a general class-agnostic, view-stable representation and introduce MolmoMotion to predict them from language instructions.

🏆 SOTA onTAPVid-3D·#2 onDynPoint Benchmark

Image Understanding Robotics World Models

Diffusion Transformer (DiT)Flow matching Transformer

Paper thumbnail

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Qihan Ren, Peng Wang, Ruikun Cai, +8 authors · Apr 8, 2026

A prevailing narrative holds that SFT memorizes while RL generalizes. We revisit this claim for reasoning SFT with long chain-of-thought supervision and find that cross-domain generalization is conditional, jointly shaped by optimization dynamics, training data diversity, and base model capability.

🏆 SOTA onMATH-500·#3 onGSM8K

Language Modeling Reasoning Reinforcement Learning

Chain-of-Thought (CoT)Fine-tuning Post-training Qwen3

Paper thumbnail

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google DeepMind · Jun 3, 2026

Gemma 4 12B is Google DeepMind's encoder-free multimodal model that projects raw image patches and audio directly into the LLM backbone, targeting laptop deployment with 16GB VRAM while delivering reasoning and agentic performance approaching the larger Gemma 4 26B MoE.

🏆 SOTA onMMMU,MathVista·#2 onLiveCodeBench (v5)

Agents Audio Understanding Coding Agents Image Understanding Language Modeling

Long Context Omni Models Post-training RMSNorm Function calling

Paper thumbnail

Stable Audio 3

Zach Evans, Julian D. Parker, Matthew Rice, +4 authors · May 18, 2026

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. We also support inpainting, enabling targeted audio editing while preserving surrounding context, making Stable Audio 3 a versatile tool for both creation and post-production.

🏆 SOTA onMusicCaps,AudioCaps·#2 onFAD (AudioSet)

Audio Generation Audio Editing Multimodal

Autoencoder (AE)Diffusion Inpainting Post-training

Paper thumbnail

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Shihao Wang, Shilong Liu, Yuanguo Kuang, +10 authors · May 26, 2026

Vision-language models commonly formulate visual grounding as a coordinate-token generation problem. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck. We propose parallel box decoding to solve both issues simultaneously.

🏆 SOTA onRefCOCO,RefCOCO+·#2 onFlickr30k Entities

Image Understanding Language Modeling Object Detection

ColPaLi Deformable Attention DETR Faster R-CNN