Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post]
(https://www.inceptionlabs.ai/blog/introducing-mercury) here.
Inception: Mercury – Recent Activity and Usage Stats | OpenRouter
Recent activity on Mercury
Total usage per day on OpenRouter
Prompt
3.23M
Completion
243K
Prompt tokens measure input size. Reasoning tokens show internal thinking before a response. Completion tokens reflect total output length.