Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.
Models
Google: Gemini Flash 2.0
google/gemini-2.0-flash-001
Modality
text,image
Context Length
1000K
Input
$0.10/M
Output
$0.40/M
Image
$0.03/M
Meta: Llama 3.3 70B Instruct
meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
Modality
text
Context Length
131K
Input
$0.12/M
Output
$0.30/M
OpenAI: GPT-4o-mini
openai/gpt-4o-mini
GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.
As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.
GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences common leaderboards.
Check out the launch announcement to learn more.
#multimodal
Modality
text,image
Context Length
128K
Input
$0.15/M
Output
$0.60/M
Image
$0.22/M
Google: Gemini Flash 1.5 8B
google/gemini-flash-1.5-8b
Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.
Click here to learn more about this model.
Usage of Gemini is subject to Google's Gemini Terms of Use.
Modality
text,image
Context Length
1000K
Input
$0.04/M
Output
$0.15/M
Image
$0.00/M

DeepSeek: DeepSeek V3 0324
deepseek/deepseek-chat-v3-0324
DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.
It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.
Modality
text
Context Length
64K
Input
$0.27/M
Output
$1.10/M
Anthropic: Claude 3.5 Sonnet
anthropic/claude-3.5-sonnet
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
#multimodal
Modality
text,image
Context Length
200K
Input
$3.00/M
Output
$15.00/M
Image
$4.80/M

Meta: Llama 3.1 8B Instruct
meta-llama/llama-3.1-8b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.
Modality
text
Context Length
131K
Input
$0.02/M
Output
$0.05/M
Google: Gemini Flash 1.5
google/gemini-flash-1.5
Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.
Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.
Usage of Gemini is subject to Google's Gemini Terms of Use.
#multimodal
Modality
text,image
Context Length
1000K
Input
$0.07/M
Output
$0.30/M
Image
$0.04/M
Google: Gemini 2.0 Flash Lite
google/gemini-2.0-flash-lite-001
Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5, all at extremely economical token prices.
Modality
text,image
Context Length
1049K
Input
$0.07/M
Output
$0.30/M
Image
$0.00/M

Mistral: Mistral Nemo
mistralai/mistral-nemo
A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.
The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.
It supports function calling and is released under the Apache 2.0 license.
Modality
text
Context Length
131K
Input
$0.04/M
Output
$0.08/M

DeepSeek: R1 Distill Llama 70B
deepseek/deepseek-r1-distill-llama-70b
DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:
- AIME 2024 pass@1: 70.0
- MATH-500 pass@1: 94.5
- CodeForces Rating: 1633
The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.
Modality
text
Context Length
131K
Input
$0.23/M
Output
$0.69/M

Qwen2.5 7B Instruct
qwen/qwen-2.5-7b-instruct
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:
-
Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
-
Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
-
Long-context Support up to 128K tokens and can generate up to 8K tokens.
-
Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.
Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.
Modality
text
Context Length
33K
Input
$0.02/M
Output
$0.05/M

DeepSeek: R1
deepseek/deepseek-r1
DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.
Fully open-source model & technical report.
MIT licensed: Distill & commercialize freely!
Modality
text
Context Length
164K
Input
$0.55/M
Output
$2.19/M

Meta: Llama 3.1 70B Instruct
meta-llama/llama-3.1-70b-instruct
Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.
It has demonstrated strong performance compared to leading closed-source models in human evaluations.
To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.
Modality
text
Context Length
131K
Input
$0.12/M
Output
$0.30/M

WizardLM-2 8x22B
microsoft/wizardlm-2-8x22b
WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.
It is an instruct finetune of Mixtral 8x22B.
To read more about the model release, click here.
#moe
Modality
text
Context Length
66K
Input
$0.50/M
Output
$0.50/M
Anthropic: Claude 3.5 Sonnet (self-moderated)
anthropic/claude-3.5-sonnet:beta
New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:
- Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
- Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
- Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
- Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)
#multimodal
Modality
text,image
Context Length
200K
Input
$3.00/M
Output
$15.00/M
Image
$4.80/M

Mistral: Mistral 7B Instruct
mistralai/mistral-7b-instruct
A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.
Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.
Modality
text
Context Length
33K
Input
$0.03/M
Output
$0.06/M

Mistral: Mistral Small 3
mistralai/mistral-small-24b-instruct-2501
Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.
The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.
Modality
text
Context Length
33K
Input
$0.07/M
Output
$0.14/M

Google: Gemma 3 27B
google/gemma-3-27b-it
Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2
Modality
text,image
Context Length
131K
Input
$0.10/M
Output
$0.20/M
Image
$0.03/M
Nous: Hermes 3 405B Instruct
nousresearch/hermes-3-llama-3.1-405b
Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.
Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.
Modality
text
Context Length
131K
Input
$0.80/M
Output
$0.80/M