Let’s get to know the benchmarks AI companies use to compare each others’ versions.
AI (Artificial Intelligence) is the latest tech Gold Rush. The richest countries, richest billionaires, and largest companies in the world are all investing heavily toward a “winner take all” dominance.
| Country | Vendor | LLM brand | Clients | |
|---|---|---|---|---|
| China | Alibaba | Qwen | ||
| US | ai21 (Allen AI) | Olmo | ||
| - | aion-labs | aion-2.0 | ||
| US | Amazon | Nova | ||
| US | Anthropic | Claude | ||
| US | Apple | MM1, ReALM | ||
| China | Bytedance | seed | ||
| China | Cerebras | - | ||
| France | Cohere | - | ||
| - | Contexual | - | ||
| China | DeepSeek | R1,R2,V3 | ||
| US | ElevenLabs | - | ||
| - | FAL | - | ||
| US | Fireworks.ai | KwaiKAT-Coder | ||
| US | Gemini | Remy, Antigravity IDE | ||
| - | IBM | granite | ||
| - | Jina | - | ||
| - | Leonardo.ai | - | ||
| US | Meta | Llama | ||
| US | Microsoft | Phi | ||
| Singapore | MiniMax | M2, Hailuo, Speech | ||
| France | Mistral | Medium, Large | ||
| - | Moonshot | Kimi | ||
| - | Morph | Morph | ||
| US | NVIDIA | Nemotron | ||
| - | nousresearch | hermes-llama | ||
| US | OpenAI | GPT | ChatGPT, OpenClaw | Largest context length of 2m for highest price. |
| - | Perplexity.ai | Sonar | Comet browser | |
| US, Paris | poolside.ai | laguna | pool | Free on OpenRouter |
| - | prime-intellect | intellect-3 | - | |
| - | reka.ai | reka-edge/flash | - | |
| China | Tencent | Hy3 | ||
| - | Together.ai | - | ||
| US | xAI | Grok | ||
| US | Xiaomi | mimo</a> | ||
| China | Z.Ai (Zhipu) | glm | chat |
Devstral
This is an analysis of major AI models and techniques to programmatically make API calls from your local machine.
The first to market in 2023 was the OpenAI API accessing its Codex model in it own cloud service. Today, some use OpenAI’s model to evaluate code generated by other models.
The OpenAI API client can also be used to access other clouds simply by changing the API and endpoint URL such as to NVIDIA’s NIM cloud’s Nemotron models.
NVIDIA also hosts in its AI cloud services, other provider’s models, such as IBM, Meta, and others. Some are offered free, albeit for limited rates.
OpenAI’s API client can also emulate xAI’s API as if Grok models are called using xAI’s own API client. Training with conversations on Twitter make Grok the most conversational and up-to-date, as well as least sychophantic on sensitive subjects. But it’s not available free nor locally.
OpenAI’s API client can also emulate the Claude API client accessing the Anthropic AI cloud. But remember that using API cloud emulation eats token the same rate BUT adds latency from compatibility layer overhead and loses Anthropic-native capabilities such as top_k, metadata, etc. That may cause subtle behavioral differences. The very latest model may not be available.
Claude’s models are currently recognized as best for prose and coding. Clude’s cloud and client tools require a $100/month subscription, but also allow access to other models, some for free. This strategy has resulted in Anthropic making billions.
The DeepSeek model on DeepSeek’s cloud is 30 times less expensive than Claude, so someone created a Proxy service that routes Claude API calls to DeepSeek’s cloud service. Yes, that is a security concern so I recommend blocking it. BTW, AFAIK, none of these services provide for two-way client certificates to ensure that services are who they say they are.
DeepSeek and other models from China, such as Alibaba’s Qwen, have been accused of being based on data distilled from Claude. So it’s smaller. There is still doubt about whether one can trust model providers with proprietary data. So for privacy, organizations created a shortage of Mac Minis to run behind an in-house firewall.
Google’s models were trained from all the books it has been scanning for decades, along with YouTube and searches. Google has one of the first APIs to their top-ranked LLMs. Google also open-sourced the Gemini models in its cloud as Gemma4 models for being pulled inside the firewall to as local models run, at no cost, via the Ollama service.
You can individually go to the websites of DeepSeek, Qwen, Kimi, Mistral, and others to download models to run in privacy offline, and program API calls to each, separately.
OpenAI’s API can simplify access to the OpenRouter.ai gateway service enables a single API (and chat) interface to use LLMs from 60+ authors, including free models and even AWS and its vast cloud Bedrock SageMaker ecosystem.
OpenRouter provides pass-through billing to abstract away the complexity of managing separate accounts, authentication, and billing to 370 models from 60+ providers. It can automatically route requests to the fastest or cheapest provider, including 30 free models. It provides a common security, observability, and tracing interface, which allows for easy A/B testing and comparison between different models. For that, it takes a 5.5% fee when you buy credits. However, this may limit the speed of access and impose usage limits to free LLMs.
OpenAI was hosted exclusively in Microsoft’s Azure cloud until April 2026 when it also appeared among models Amazon makes available on its AWS cloud.
There are now several LLM routers:
I have an entire section to Anthropic and its Claude technologies.
This GitHub Issue lists 65 free models and 254 paid models available on OpenRouter.ai.
The list is available as a JSON file and webpage.
PROTIP: I generated a Python program to create a CSV or JSON file at openrouter-models.py - last run on 2026-05-07 found 370 models among 60 providers.
Column Description
PROVIDER MODELS ================================================== ai21 1 aion-labs 4 alfredpros 1 alibaba 1 allenai 2 alpindale 1 amazon 5 anthracite-org 1 anthropic 14 arcee-ai 7 baidu 7 bytedance 1 bytedance-seed 4 cognitivecomputations 1 cohere 4 deepcogito 1 deepseek 13 essentialai 1 google 26 gryphe 1 ibm-granite 2 inception 1 inclusionai 2 inflection 2 kwaipilot 1 liquid 3 mancer 1 meta-llama 14 microsoft 3 minimax 8 mistralai 25 moonshotai 5 morph 2 nex-agi 1 nousresearch 6 nvidia 11 openai 65 openrouter 5 perplexity 5 poolside 2 prime-intellect 1 qwen 51 rekaai 2 relace 2 sao10k 5 stepfun 1 switchpoint 1 tencent 2 thedrummer 4 tngtech 1 undi95 1 upstage 1 writer 1 x-ai 11 xiaomi 5 z-ai 13 ~anthropic 3 ~google 2 ~moonshotai 1 ~openai 2
”~” in front of model names (such as “~google/gemini-flash-latest”) ???
Count of models by provider (alphabetically):
================================================== PROVIDER MODELS ================================================== alibaba 41 anthropic 15 aws 36 azure 25 bytedance 4 cerebras 7 cohere 22 contextualai 3 deepseek 2 elevenlabs 5 fal 4 google 38 google-ai 22 groq 16 jina 12 leonardoai 4 minimax 7 mistral 35 moonshotai 6 openai 67 orq 1 perplexity 4 togetherai 7 xai 19 zai 11 ==================================================
The .csv file adds location and model description.
================================================================================================ PROVIDER TYPE MODEL ID IN $/M OUT $/M FREE ================================================================================================ openai chat o1-pro $150.00 $600.00 openai chat gpt-5.4-pro $30.00 $180.00 openai chat gpt-5.2-pro $21.00 $168.00 openai chat gpt-5-pro $15.00 $120.00 leonardoai image leonardo-diffusion-xl $80.00 $80.00 leonardoai image leonardo-kino-xl $80.00 $80.00 leonardoai image leonardo-lightning-xl $80.00 $80.00 leonardoai image leonardo-vision-xl $80.00 $80.00 openai chat o3-pro $20.00 $80.00 anthropic chat claude-opus-4-0 $15.00 $75.00 anthropic chat claude-opus-4-1 $15.00 $75.00 anthropic chat claude-opus-4-20250514 $15.00 $75.00 anthropic chat claude-opus-4.1-20250805 $15.00 $75.00 aws chat anthropic/claude-opus-4.1 (US) $15.00 $75.00 google chat anthropic/claude-opus-4-1@20250805 $15.00 $75.00 google chat anthropic/claude-opus-4@20250514 $15.00 $75.00 azure chat o1 $15.00 $60.00 ...
poolside.ai Founded in 2023
Eiso Kent, CTO
Jason Warner, CEO
Elon Musk’s XAI Grok series of LLMs do not have free usage.
It can be accessed using OpenAI’s API.
ppm = price_per_million prices, as of 2026-05-07:
| model_id | context tokens |
features | input_ppm | output_ppm |
|---|---|---|---|---|
| grok-4.20-0309-reasoning | 2m | reasoning, tool calls, structured output | $1.25 | $2.50 |
| grok-4.20-0309-non-reasoning | 2m | tool calls, structured output | $1.25 | $2.50 |
| grok-4.20-multi-agent-0309 | 2m | multi-agent collaboration | $2.00 | $6.00 |
| grok-4-1-fast-reasoning | 2m | reasoning, vision, tool calls, structured output | $0.20 | $0.50 |
| grok-4-1-fast-non-reasoning | 2m | vision, tool calls, structured output | $0.20 | $0.50 |
| grok-code-fast-1 | 256k | code optimization, reasoning, tool calls | $0.20 | $1.50 |
| grok-4-0709 | 256k | reasoning, tool calls, structured output | $3.00 | $15.00 |
| grok-3 | 131k | tool calls, structured output | $3.00 | $15.00 |
| grok-3-mini | 131k | reasoning, tool calls, structured output | $0.30 | $0.50 |
| grok-4.3 | 1 million | general purpose | $1.25 | $2.50 |
| grok-2-image-1212 | — | image generation | — | — |
| grok-2-vision-1212 | 8k | image understanding | $5.00 | $15.00 |
| grok-2-vision-latest | 32k | image understanding | — | — |
| model_id | context_tokens | features | price |
|---|---|---|---|
| grok-imagine-image | — | image generation | $0.02 / image |
| grok-imagine-image-pro | — | high-quality image generation | $0.07 / image |
| grok-imagine-video | — | video generation | $0.05 / second |
PROTIP: I generated a Python program to create a CSV or JSON file at deepseek-rates.py
https://www.cybergym.io/ found that OpenAI’s GPT5.5 found 82% vs 83% by Mythos. Evaluating AI Agents’ Real-World Cybersecurity Capabilities at Scale
https://www.armorcode.com/blog/the-mythos-moment-is-real-the-fix-it-faster-response-is-not The Mythos Moment is Real. The Fix-It-Faster Response isn’t.
https://www.youtube.com/watch?v=T7bQ86m5AEk Warp open source office hours: why open source?
26-05-07 v040 poolside @ai-providers.md created 2024-12-28