Moonshot AI's Kimi K2.5 Deploys 100 Sub-Agents Simultaneously, Cuts Coding Time by 4.5x

The AI coding assistant race just shifted from single models to coordinated swarms. Instead of one agent grinding through tasks sequentially, the latest systems deploy dozens working in parallel—cutting execution time dramatically.

That shift, from sequential to parallel execution, is now live. Moonshot AI released Kimi K2.5 today, an open-source model that deploys up to 100 sub-agents simultaneously across 1,500 coordinated tool calls. The Beijing-based startup reports this cuts execution time by up to 4.5x compared to single-agent systems. In internal evaluations, agent swarm led to an 80% reduction in end-to-end runtime while enabling more complex, long-horizon workloads.

The numbers, shared by Moonshot in its technical release, show agent coordination becoming the new frontier. Developers report K2.5-assisted workflows complete in seconds what previously took minutes.

What Developers Get with K2.5

The model was built on approximately 15 trillion mixed visual and text tokens through continued pre-training from Kimi K2. It features 1 trillion total parameters with 32 billion active parameters and native multimodal capabilities—processing text, images, and video from a single prompt. The 256,000-token context window handles entire codebases in one pass.

For coding tasks, K2.5 excels at visual debugging. It reasons over images and video to improve image-to-code generation, lowering the barrier for expressing intent visually. In demonstrations, it reconstructed websites from video walkthroughs and converted hand-drawn sketches into functional 3D models with working animations.

Screenshot Evaluation (Image Credit:Kimi)

How Agent Swarm Works

The breakthrough comes from Parallel-Agent Reinforcement Learning (PARL). The system uses a trainable orchestrator that breaks complex tasks into parallelizable subtasks, then spawns frozen sub-agents to handle each piece concurrently. Moonshot tackled "serial collapse"—where models default to sequential execution—through staged reward shaping during training.

Performance is measured using "critical steps," a latency-oriented metric inspired by parallel computation's critical path. Instead of counting total steps, it tracks the slowest execution path—ensuring parallelism only helps if it genuinely shortens task completion time.

Availability and Integration

K2.5 is available now on kimi.com with four modes: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (beta). Moonshot also launched Kimi Code, an open-source assistant integrating with VSCode, Cursor, and Zed. Full model weights are on Hugging Face under Modified MIT License.

AI coding tools are moving from single models to coordinated systems working in parallel. It started with code completion, then reasoning. Now it's orchestration. And fast.