Frontier Atlas
Frontier Atlas

Discover AI Research

Explore the latest papers, methods, and breakthroughsfrom the world's AI research community.

Paper thumbnail

GLM-5.2: Built for Long-Horizon Tasks

Z.ai Team · Jun 16, 2026

GLM-5.2 is Z.ai's latest flagship open-weight model for long-horizon agentic engineering. The release extends GLM-5.1 with a solid 1M-token context, IndexShare sparse-attention efficiency, improved MTP speculative decoding, and flexible thinking-effort controls. Benchmarks report stronger coding and agentic performance on SWE-Bench Pro, Terminal-Bench, and MCPAtlas.

Paper thumbnail

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Yushi Bai, Qian Dong, Ting Jiang, +5 authors · Mar 12, 2026

Long-context agentic workflows have emerged as a defining use case for large language models, making attention efficiency critical for both inference speed and serving cost. Sparse attention addresses this challenge effectively, and DeepSeek Sparse Attention (DSA) is a representative production-grade solution: a lightweight lightning indexer selects the most relevant KV pairs per layer without recomputing full attention.

Paper thumbnail

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Taebong Kim, Youngsik Hong, Minsik Kim, +4 authors · May 14, 2026

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints.

Paper thumbnail

Qwen3.5: Towards Native Multimodal Agents

Qwen Team · Feb 16, 2026

Qwen3.5 release blog introducing the open-weight Qwen3.5-397B-A17B native vision-language model and reporting benchmark evaluations across language, coding, agents, multimodal understanding, and video understanding tasks at scale.It highlights significant performance improvements across benchmarks.

Paper thumbnail

LTX-2: Efficient Joint Audio-Visual Foundation Model

Yoav HaCohen, Benny Brazowski, Nisan Chiprut, +26 authors · Jan 6, 2026

We introduce LTX-2, an open-source foundational model capable of generating high-quality, temporally synchronized audiovisual content in a unified manner. Recent text-to-video diffusion models remain silent — LTX-2 closes this gap by jointly modeling visual and audio streams end-to-end with a shared diffusion backbone.

Paper thumbnail

NVIDIA Nemotron 3: Efficient and Open Intelligence

Nvidia, Aaron Blakeman, Aaron Grattafiori, +355 authors · Dec 24, 2025

We introduce the Nemotron 3 family — Nano, Super, and Ultra — delivering strong agentic, reasoning, and conversational capabilities. The Nemotron 3 family uses a Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens.

Paper thumbnail

DFlash: Block Diffusion for Flash Speculative Decoding

Jian Chen, Yesheng Liang, Zhijian Liu · Feb 5, 2026

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM.

Paper thumbnail

Tmax: A simple recipe for terminal agents

Hamish Ivison, Junjie Oscar Yin, Rulin Shao, +3 authors · Jun 22, 2026

Terminal-using agents have quickly become the most popular downstream application of language models. Despite their prevalence, relatively little academic work has examined RL-based training of these models. We present Tmax, the strongest open RL recipe for terminal agents to date.

Paper thumbnail

Qwen3.6

Qwen · Apr 21, 2026

Qwen3.6 is an open-weight model focused on agentic coding, repository-level reasoning, long-context usage, and multimodal language-model capabilities. The release includes 35B-A3B and 27B base models plus their FP8 quantized variants for efficient local deployment.

Paper thumbnail

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Sen Xu, Shixi Liu, Wei Wang, +6 authors · Jun 15, 2026

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. We systematically enhance the model through an optimized pipeline including curriculum-based training and reward shaping.

Paper thumbnail

Qwen3-TTS Technical Report

Hangrui Hu, Xinfa Zhu, Ting He, +13 authors · Jan 22, 2026

We present the Qwen3-TTS series, a family of advanced multilingual, controllable, robust, and streaming text-to-speech models. Qwen3-TTS supports state-of-the-art 3-second voice cloning and description-based control, enabling fine-grained manipulation over output speech prosody, style, and speaker identity.

Paper thumbnail

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

Jian Xie, Tianhe Lin, Zilu Wang, +16 authors · May 22, 2026

Deep research agents extend the role of search engines from retrieving keyword-matched pages to synthesizing knowledge. We propose QUEST, a fully synthetic task generation pipeline for training open deep research agents that generalize across diverse task types and domains.

Paper thumbnail

Data Science and Technology Towards AGI Part I: Tiered Data Management

Yudong Wang, Zixuan Fu, Hengyu Zhao, +14 authors · Feb 9, 2026

The development of artificial intelligence can be viewed as an evolution of data-driven learning paradigms. We propose a tiered data management framework that moves beyond unidirectional data scaling, enabling more structured and efficient data curation for LLM pretraining.

Paper thumbnail

MiniMax Sparse Attention

Xunhao Lai, Weiqi Xu, Yufeng Yang, +8 authors · Jun 11, 2026

Ultra-long-context capability is becoming indispensable for frontier LLMs. We introduce MiniMax Sparse Attention, a production-grade sparse attention mechanism targeting hundreds-of-thousands to millions of tokens, achieving quadratic cost reduction without degrading downstream task accuracy.

Paper thumbnail

MOSS-TTS Technical Report

Yitian Gong, Botian Jiang, Yiwei Zhao, +23 authors · Mar 18, 2026

We present MOSS-TTS, a speech generation foundation model built on discrete audio tokens, autoregressive modeling, and large-scale pretraining. Built on MOSS-Audio-Tokenizer, it compresses 24 kHz audio to 12.5 fps with variable-bitrate RVQ and unified semantic-acoustic encoding.

Paper thumbnail

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-AI · Apr 24, 2026

We present DeepSeek-V4 series, including two strong Mixture-of-Experts language models: DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated), both supporting a context length of one million tokens with key architectural upgrades.

Paper thumbnail

Holo3.1: Fast & Local Computer Use Agents

Holo Team · Jun 1, 2026

Holo3.1 improves robustness across web, desktop, and mobile environments as well as agent frameworks. It introduces native function-calling support alongside quantized checkpoints (FP8, Q4 GGUF, NVFP4) for local inference on consumer hardware, with new small model sizes (0.8B, 4B).

Paper thumbnail

FastContext: Training Efficient Repository Explorer for Coding Agents

Shaoqiu Zhang, Maoquan Wang, Yuling Shi, +5 authors · Jun 12, 2026

LLM coding agents achieve strong results on software engineering tasks, yet repository exploration remains a major bottleneck. We train a dedicated lightweight explorer model that locates relevant code efficiently, freeing the main agent's context budget for actual problem-solving.

Paper thumbnail

Beyond Benchmarks: MathArena as an Evaluation Platform for Mathematics with LLMs

Dekoninck, Jasper, Jovanović, Nikola, Gehrunger, Tim, +4 authors · May 1, 2026

Large language models are becoming increasingly capable mathematical collaborators, but static benchmarks are no longer sufficient. We introduce MathArena, a living evaluation platform with regularly updated competition problems that prevents saturation and enables reliable model comparison over time.

Paper thumbnail

ACE-Step 1.5: Pushing the Boundaries of Open-Source Music Generation

Junmin Gong, Yulin Song, Wenxiao Zhao, +3 authors · Jan 31, 2026

We present ACE-Step v1.5, a highly efficient open-source music foundation model that delivers commercial-grade generation on consumer hardware. ACE-Step v1.5 generates a full song under 2 seconds on an A100 and under 10 seconds on a consumer GPU while matching or exceeding commercial music models on standard metrics.

Paper thumbnail

Mix, MinHash, and Match: Cross-Source Agreement for Multilingual Pretraining Datasets

Sultan Alrashed, Francesco Orabona · Dec 21, 2025

Multilingual data from the web is essential for LLM pretraining, yet scraping it is expensive and research groups repeatedly crawl the same content. We found that over 40% of tokens across major Arabic web corpora are duplicated between sources, and propose using this redundancy as a quality signal for building high-quality multilingual datasets.

Paper thumbnail

HRM-Text: Efficient Pretraining Beyond Scaling

Guan Wang, Changling Liu, Chenyu Wang, +6 authors · May 20, 2026

The current pretraining paradigm relies on massive compute and internet-scale raw text, creating a barrier to foundational research. We propose HRM-Text, a biologically-inspired multi-timescale pretraining architecture modeled on the frontoparietal loop, enabling sample-efficient learning without raw scale.

Paper thumbnail

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Jianing Zhang, Chenhao Zheng, Yajun Yang, +10 authors · Jun 17, 2026

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move to plan actions and reason about physical interactions. We argue that 3D points in world coordinates provide a general class-agnostic, view-stable representation and introduce MolmoMotion to predict them from language instructions.

Paper thumbnail

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Qihan Ren, Peng Wang, Ruikun Cai, +8 authors · Apr 8, 2026

A prevailing narrative holds that SFT memorizes while RL generalizes. We revisit this claim for reasoning SFT with long chain-of-thought supervision and find that cross-domain generalization is conditional, jointly shaped by optimization dynamics, training data diversity, and base model capability.

Paper thumbnail

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google DeepMind · Jun 3, 2026

Gemma 4 12B is Google DeepMind's encoder-free multimodal model that projects raw image patches and audio directly into the LLM backbone, targeting laptop deployment with 16GB VRAM while delivering reasoning and agentic performance approaching the larger Gemma 4 26B MoE.

Paper thumbnail

Stable Audio 3

Zach Evans, Julian D. Parker, Matthew Rice, +4 authors · May 18, 2026

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. We also support inpainting, enabling targeted audio editing while preserving surrounding context, making Stable Audio 3 a versatile tool for both creation and post-production.

Paper thumbnail

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Shihao Wang, Shilong Liu, Yuanguo Kuang, +10 authors · May 26, 2026

Vision-language models commonly formulate visual grounding as a coordinate-token generation problem. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck. We propose parallel box decoding to solve both issues simultaneously.