bomonike

Let’s get to know the benchmarks AI companies use to compare each others’ versions.

Overview

AI Vendors and their LLM Brands
Clients to Access
PROTIP: Model API Availability
Capabilities
Anthropic’s Claude
OpenRouter models
orq.ai models & providers
Poolside
XAI’s Grok models
Meta Llama vs Muse Spark
References

AI (Artificial Intelligence) is the latest tech Gold Rush. The richest countries, richest billionaires, and largest companies in the world are all investing heavily toward a “winner take all” dominance.

AI Vendors and their LLM Brands

Country	Vendor	LLM brand	Clients
China	Alibaba	Qwen
US	ai21 (Allen AI)	Olmo
-	aion-labs	aion-2.0
US	Amazon	Nova
US	Anthropic	Claude	WebChat
US	Apple	MM1, ReALM
China	Baidu	Ernie
China	Bytedance	seed
China	Cerebras	-
France	Cohere	-
-	Contexual	-
China	DeepSeek	R1,R2,V3	WebChat
US	ElevenLabs	-
-	FAL	-
US	Fireworks.ai	KwaiKAT-Coder
-	Florian	Parakeet 3	speech-to-text
US	Google	Gemini	Remy, Antigravity IDE
US	IBM	granite
-	Jina	-
-	Leonardo.ai	-	img & video gen using Alibaba's video model
US	Meta	Llama, Muse Spark	Instagram
US	Microsoft	Phi
Singapore	MiniMax	M2, Hailuo, Speech
France	Mistral	Medium, Large
-	Moonshot	Kimi
-	Morph	Morph
US	NVIDIA	Nemotron
-	nousresearch	hermes-llama
US	OpenAI	GPT	ChatGPT, OpenClaw, Cursor	Largest context length of 2m for highest price.
-	Perplexity.ai	Sonar	Comet browser
US, Paris	poolside.ai	laguna	pool	Free on OpenRouter
-	prime-intellect	intellect-3	-
-	reka.ai	reka-edge/flash	-
China	Tencent	Hy3
-	Together.ai	-
US	Starlink(xAI)	Grok
US	Xiaomi	mimo</a>
China	Z.Ai (Zhipu)	glm	Webchat

Andromeda commodity futures market

Devstral

Clients to Access

CLI on Terminal
IDE GUI app (VSCode add-in, etc.)
Website
API access provides a way to batch work at discounted pricing.
LinkedIn’s CrossCheck provides a prompt UI that goes to two different LLMs so you can compare (crosscheck).

PROTIP: Model API Availability

This is an analysis of major AI models and techniques to programmatically make API calls from your local machine.

This applies to locally run models running behind a local firewall.

The first to market in 2023 was the OpenAI API accessing its Codex model in it own cloud service. Today, some use OpenAI’s model to evaluate code generated by other models.

The OpenAI API client can also be used to access other clouds simply by changing the API and endpoint URL such as to NVIDIA’s NIM cloud’s Nemotron models.

NVIDIA also hosts in its AI cloud services, other provider’s models, such as IBM, Meta, and others. Some are offered free, albeit for limited rates.

OpenAI’s API client can also emulate xAI’s API as if Grok models are called using xAI’s own API client. Training with conversations on Twitter make Grok the most conversational and up-to-date, as well as least sychophantic on sensitive subjects. But it’s not available free nor locally.

OpenAI’s API client can also emulate the Claude API client accessing the Anthropic AI cloud. But remember that using API cloud emulation eats token the same rate BUT adds latency from compatibility layer overhead and loses Anthropic-native capabilities such as top_k, metadata, etc. That may cause subtle behavioral differences. The very latest model may not be available.

Claude’s models are currently recognized as best for prose and coding. Clude’s cloud and client tools require a $100/month subscription, but also allow access to other models, some for free. This strategy has resulted in Anthropic making billions.

The DeepSeek model on DeepSeek’s cloud is 30 times less expensive than Claude, so someone created a Proxy service that routes Claude API calls to DeepSeek’s cloud service. Yes, that is a security concern so I recommend blocking it. BTW, AFAIK, none of these services provide for two-way client certificates to ensure that services are who they say they are.

DeepSeek and other models from China, such as Alibaba’s Qwen, have been accused of being based on data distilled from Claude. So it’s smaller. There is still doubt about whether one can trust model providers with proprietary data. So for privacy, organizations created a shortage of Mac Minis to run behind an in-house firewall.

Google’s models were trained from all the books it has been scanning for decades, along with YouTube and searches. Google has one of the first APIs to their top-ranked LLMs. Google also open-sourced the Gemini models in its cloud as Gemma4 models for being pulled inside the firewall to as local models run, at no cost, via the Ollama service.

You can individually go to the websites of DeepSeek, Qwen, Kimi, Mistral, and others to download models to run in privacy offline, and program API calls to each, separately.

OpenAI’s API can simplify access to the OpenRouter.ai gateway service enables a single API (and chat) interface to use LLMs from 60+ authors, including free models and even AWS and its vast cloud Bedrock SageMaker ecosystem.

OpenRouter provides pass-through billing to abstract away the complexity of managing separate accounts, authentication, and billing to 370 models from 60+ providers. It can automatically route requests to the fastest or cheapest provider, including 30 free models. It provides a common security, observability, and tracing interface, which allows for easy A/B testing and comparison between different models. For that, it takes a 5.5% fee when you buy credits. However, this may limit the speed of access and impose usage limits to free LLMs.

OpenAI was hosted exclusively in Microsoft’s Azure cloud until April 2026 when it also appeared among models Amazon makes available on its AWS cloud.

There are now several LLM routers:

https://router.orq.ai/ analyzes each prompt and routes it to the most cost-effective among its 413 models among 25 providers who meet quality requirements. It operates in Singapore, APAC, Europe, US to satisfy data sovereignty laws. My program orq-models.py needed to use tricky CSS tricks to obtain a list of 413 models.
https://ngrok.ai

Capabilities

Files
Text (translation)
Image (search, reverse search, .png, .jpg, etc.)
Speech
Audio (.mp3)
Video (.mp4)

Anthropic’s Claude

I have an entire section to Anthropic and its Claude technologies.

OpenRouter models

This GitHub Issue lists 65 free models and 254 paid models available on OpenRouter.ai.

The list is available as a JSON file and webpage.

PROTIP: I generated a Python program to create a CSV or JSON file at openrouter-models.py - last run on 2026-06-17 found 370 models among 60 providers.

Column Description

model_id Unique identifier for the model when making API calls
name Human-readable display name
provider Primary provider or organization offering the model
context_length Maximum tokens the model can process (input + reasoning)
pricing_prompt Cost per 1,000 input tokens (in USD)
pricing_completion Cost per 1,000 output tokens (in USD)
is_free TRUE if both input and output costs are $0
modalities Supported input types (text, image, audio, video, file)

PROVIDER                            MODELS
==================================================
ai21                                     1
aion-labs                                4
alfredpros                               1
alibaba                                  1
allenai                                  2
alpindale                                1
amazon                                   5
anthracite-org                           1
anthropic                               14
arcee-ai                                 7
baidu                                    7
bytedance                                1
bytedance-seed                           4
cognitivecomputations                    1
cohere                                   4
deepcogito                               1
deepseek                                13
essentialai                              1
google                                  26
gryphe                                   1
ibm-granite                              2
inception                                1
inclusionai                              2
inflection                               2
kwaipilot                                1
liquid                                   3
mancer                                   1
meta-llama                              14
microsoft                                3
minimax                                  8
mistralai                               25
moonshotai                               5
morph                                    2
nex-agi                                  1
nousresearch                             6
nvidia                                  11
openai                                  65
openrouter                               5
perplexity                               5
poolside                                 2
prime-intellect                          1
qwen                                    51
rekaai                                   2
relace                                   2
sao10k                                   5
stepfun                                  1
switchpoint                              1
tencent                                  2
thedrummer                               4
tngtech                                  1
undi95                                   1
upstage                                  1
writer                                   1
x-ai                                    11
xiaomi                                   5
z-ai                                    13
~anthropic                               3
~google                                  2
~moonshotai                              1
~openai                                  2

”~” in front of model names (such as “~google/gemini-flash-latest”) ???

orq.ai models & providers

Count of models by provider (alphabetically):

==================================================
PROVIDER                            MODELS
==================================================
alibaba                                 41
anthropic                               15
aws                                     36
azure                                   25
bytedance                                4
cerebras                                 7
cohere                                  22
contextualai                             3
deepseek                                 2
elevenlabs                               5
fal                                      4
google                                  38
google-ai                               22
groq                                    16
jina                                    12
leonardoai                               4
minimax                                  7
mistral                                 35
moonshotai                               6
openai                                  67
orq                                      1
perplexity                               4
togetherai                               7
xai                                     19
zai                                     11
==================================================

The .csv file adds location and model description.

================================================================================================
PROVIDER        TYPE         MODEL ID                                     IN $/M     OUT $/M   FREE
================================================================================================
openai          chat         o1-pro                                      $150.00     $600.00       
openai          chat         gpt-5.4-pro                                  $30.00     $180.00       
openai          chat         gpt-5.2-pro                                  $21.00     $168.00       
openai          chat         gpt-5-pro                                    $15.00     $120.00       
leonardoai      image        leonardo-diffusion-xl                        $80.00      $80.00       
leonardoai      image        leonardo-kino-xl                             $80.00      $80.00       
leonardoai      image        leonardo-lightning-xl                        $80.00      $80.00       
leonardoai      image        leonardo-vision-xl                           $80.00      $80.00       
openai          chat         o3-pro                                       $20.00      $80.00       
anthropic       chat         claude-opus-4-0                              $15.00      $75.00       
anthropic       chat         claude-opus-4-1                              $15.00      $75.00       
anthropic       chat         claude-opus-4-20250514                       $15.00      $75.00       
anthropic       chat         claude-opus-4.1-20250805                     $15.00      $75.00       
aws             chat         anthropic/claude-opus-4.1 (US)               $15.00      $75.00       
google          chat         anthropic/claude-opus-4-1@20250805           $15.00      $75.00       
google          chat         anthropic/claude-opus-4@20250514             $15.00      $75.00       
azure           chat         o1                                           $15.00      $60.00   
...

Poolside

poolside.ai Founded in 2023 to run airgapped in AWS only on PostgeSQL.

Train free models using proprietary synthetic data for reinforcement learning.

VIDEO: for internal infra (Northrup Grumman)

Eiso Kent, CTO

VIDEO Future of AI: A Vision from poolside @Nexus Luxembourg 2025 (toward AGI for devs)
Short

Jason Warner, CEO

VIDEO AWS
At AI Engineer about ADA used by govt.

Inside Poolside’s Mission to Reinvent Enterprise Software Engineering

Install (no brew):

curl -fsSL https://downloads.poolside.ai/pool/install.sh | sh

Installs to $HOME/.local/bin/pool

In VS Code or Visual Studio, use the native Poolside Assistant integration. In Zed, JetBrains, Neovim, and other ACP-compatible editors with pool acp. ACP (Agent Client Protocol)

https://docs.poolside.ai/

Shimmer https://shimmer.poolside.ai/login

XAI’s Grok models

Elon Musk’s XAI Grok series of LLMs do not have free usage.

It can be accessed using OpenAI’s API.

ppm = price_per_million prices, as of 2026-06-17:

model_id	context tokens	features	input_ppm	output_ppm
grok-4.20-0309-reasoning	2m	reasoning, tool calls, structured output	$1.25	$2.50
grok-4.20-0309-non-reasoning	2m	tool calls, structured output	$1.25	$2.50
grok-4.20-multi-agent-0309	2m	multi-agent collaboration	$2.00	$6.00
grok-4-1-fast-reasoning	2m	reasoning, vision, tool calls, structured output	$0.20	$0.50
grok-4-1-fast-non-reasoning	2m	vision, tool calls, structured output	$0.20	$0.50
grok-code-fast-1	256k	code optimization, reasoning, tool calls	$0.20	$1.50
grok-4-0709	256k	reasoning, tool calls, structured output	$3.00	$15.00
grok-3	131k	tool calls, structured output	$3.00	$15.00
grok-3-mini	131k	reasoning, tool calls, structured output	$0.30	$0.50
grok-4.3	1 million	general purpose	$1.25	$2.50
grok-2-image-1212	—	image generation	—	—
grok-2-vision-1212	8k	image understanding	$5.00	$15.00
grok-2-vision-latest	32k	image understanding	—	—

model_id	context_tokens	features	price
grok-imagine-image	—	image generation	$0.02 / image
grok-imagine-image-pro	—	high-quality image generation	$0.07 / image
grok-imagine-video	—	video generation	$0.05 / second

PROTIP: I generated a Python program to create a CSV or JSON file at deepseek-rates.py

Meta Llama vs Muse Spark

Meta’s Llama 4 uses Meta’s Community License with conditions for large-scale commercial deployments.

Muse Spark API was announced quietly on April 8, 2026, with a price tag instead of a download link, in a complete reversal of the free open-source Llama models.

The API & docs are available only to a private API preview to select users”

use Muse Spark right now through meta.ai or the Meta AI app. Free. Requires a Meta account (Facebook or Instagram login). The model runs in Instant and Thinking modes through the chat interface, with a Contemplating mode rolling out later.

Meta worked with a 1,000 doctors on Muse, which beat all other models on the HealthBench Hard. Muse is from Meta Superintelligence Labs, Meta’s new elite research division led by Alexandr Wang after his $14.3 billion acquihire from Scale AI.

See https://ai.gopubby.com/meta-just-killed-open-source-ai-696275b740ab

https://dev.to/o96a/metas-muse-spark-has-16-tools-and-a-secret-weapon-your-instagram-posts-37mg Meta’s Muse Spark Has 16 Tools and a Secret Weapon: Your Instagram Posts

“Meta’s tools pull from your Instagram posts, let you manipulate images you generated, run Python against them, and spawn sub-agents. That’s not a chatbot. That’s an operating system for multimodal workflows.”

Simon Willison poked around the meta.ai interface and extracted the complete tool catalog. The highlights:

browser.search / browser.open / browser.find — Web search and page analysis. Standard pattern now, but solid.
meta_1p.content_search — This is the sleeper. Semantic search across Instagram, Threads, and Facebook posts you have access to, filtered by author, celebrity mentions, comments, likes. Posts since January 2025 only. This turns your social graph into queryable context.
container.python_execution — Full Code Interpreter with Python 3.9, pandas, numpy, matplotlib, scikit-learn, OpenCV, Pillow, PyMuPDF. Files persist at /mnt/data/.
container.visual_grounding — This is Segment Anything integrated directly into the chat. Give it an image path and object names, get back bounding boxes, point coordinates, or counts. Yes, it can literally count whiskers on a generated raccoon.
container.create_web_artifact — Generate HTML/JS artifacts or SVG graphics, rendered inline Claude Artifacts style.
subagents.spawn_agent — The sub-agent pattern. Spawn independent agents for research or delegation.
media.image_gen — Image generation (likely Emu or an updated version) with artistic/realistic modes.
And the rest: file editing tools, Meta content download, third-party account linking (Google Calendar, Outlook, Gmail).

References

https://www.cybergym.io/ found that OpenAI’s GPT5.5 found 82% vs 83% by Mythos. Evaluating AI Agents’ Real-World Cybersecurity Capabilities at Scale

https://www.armorcode.com/blog/the-mythos-moment-is-real-the-fix-it-faster-response-is-not The Mythos Moment is Real. The Fix-It-Faster Response isn’t.

https://www.youtube.com/watch?v=T7bQ86m5AEk Warp open source office hours: why open source?

https://www.interconnects.ai/p/2025-open-models-year-in-review 2025 Open Models Year in Review

_{v045 firewall @ai-providers.md created 2024-12-28}