Models

Google: Gemini Flash 2.0

google/gemini-2.0-flash-001

Gemini Flash 2.0 offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5. It introduces notable enhancements in multimodal understanding, coding capabilities, complex instruction following, and function calling. These advancements come together to deliver more seamless and robust agentic experiences.

Privacy Terms Of Service

Modality

text,image

Context Length

1000K

Input

$0.10/M

Output

$0.40/M

Image

$0.03/M

Meta: Llama 3.3 70B Instruct

meta-llama/llama-3.3-70b-instruct

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Supported languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

Model Card

Privacy

Modality

text

Context Length

131K

Input

$0.12/M

Output

$0.30/M

OpenAI: GPT-4o-mini

openai/gpt-4o-mini

GPT-4o mini is OpenAI's newest model after GPT-4 Omni, supporting both text and image inputs with text outputs.

As their most advanced small model, it is many multiples more affordable than other recent frontier models, and more than 60% cheaper than GPT-3.5 Turbo. It maintains SOTA intelligence, while being significantly more cost-effective.

GPT-4o mini achieves an 82% score on MMLU and presently ranks higher than GPT-4 on chat preferences common leaderboards.

Check out the launch announcement to learn more.

#multimodal

Privacy Terms Of Service

Modality

text,image

Context Length

128K

Input

$0.15/M

Output

$0.60/M

Image

$0.22/M

Google: Gemini Flash 1.5 8B

google/gemini-flash-1.5-8b

Gemini Flash 1.5 8B is optimized for speed and efficiency, offering enhanced performance in small prompt tasks like chat, transcription, and translation. With reduced latency, it is highly effective for real-time and large-scale operations. This model focuses on cost-effective solutions while maintaining high-quality results.

Click here to learn more about this model.

Usage of Gemini is subject to Google's Gemini Terms of Use.

Privacy Terms Of Service

Modality

text,image

Context Length

1000K

Input

$0.04/M

Output

$0.15/M

Image

$0.00/M

DeepSeek: DeepSeek V3 0324

deepseek/deepseek-chat-v3-0324

DeepSeek V3, a 685B-parameter, mixture-of-experts model, is the latest iteration of the flagship chat model family from the DeepSeek team.

It succeeds the DeepSeek V3 model and performs really well on a variety of tasks.

Privacy Terms Of Service

Modality

text

Context Length

64K

Input

$0.27/M

Output

$1.10/M

Anthropic: Claude 3.5 Sonnet

anthropic/claude-3.5-sonnet

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:

Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

#multimodal

Privacy Terms Of Service

Modality

text,image

Context Length

200K

Input

$3.00/M

Output

$15.00/M

Image

$4.80/M

Meta: Llama 3.1 8B Instruct

meta-llama/llama-3.1-8b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Privacy

Modality

text

Context Length

131K

Input

$0.02/M

Output

$0.05/M

Google: Gemini Flash 1.5

google/gemini-flash-1.5

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.

Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter.

Usage of Gemini is subject to Google's Gemini Terms of Use.

#multimodal

Privacy Terms Of Service

Modality

text,image

Context Length

1000K

Input

$0.07/M

Output

$0.30/M

Image

$0.04/M

Google: Gemini 2.0 Flash Lite

google/gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite offers a significantly faster time to first token (TTFT) compared to Gemini Flash 1.5, while maintaining quality on par with larger models like Gemini Pro 1.5, all at extremely economical token prices.

Privacy Terms Of Service

Modality

text,image

Context Length

1049K

Input

$0.07/M

Output

$0.30/M

Image

$0.00/M

Mistral: Mistral Nemo

mistralai/mistral-nemo

A 12B parameter model with a 128k token context length built by Mistral in collaboration with NVIDIA.

The model is multilingual, supporting English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi.

It supports function calling and is released under the Apache 2.0 license.

Privacy

Modality

text

Context Length

131K

Input

$0.04/M

Output

$0.08/M

DeepSeek: R1 Distill Llama 70B

deepseek/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including:

AIME 2024 pass@1: 70.0
MATH-500 pass@1: 94.5
CodeForces Rating: 1633

The model leverages fine-tuning from DeepSeek R1's outputs, enabling competitive performance comparable to larger frontier models.

Privacy

Modality

text

Context Length

131K

Input

$0.23/M

Output

$0.69/M

Qwen2.5 7B Instruct

qwen/qwen-2.5-7b-instruct

Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2:

Significantly more knowledge and has greatly improved capabilities in coding and mathematics, thanks to our specialized expert models in these domains.
Significant improvements in instruction following, generating long texts (over 8K tokens), understanding structured data (e.g, tables), and generating structured outputs especially JSON. More resilient to the diversity of system prompts, enhancing role-play implementation and condition-setting for chatbots.
Long-context Support up to 128K tokens and can generate up to 8K tokens.
Multilingual support for over 29 languages, including Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more.

Usage of this model is subject to Tongyi Qianwen LICENSE AGREEMENT.

Privacy

Modality

text

Context Length

33K

Input

$0.02/M

Output

$0.05/M

DeepSeek: R1

deepseek/deepseek-r1

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

Fully open-source model & technical report.

MIT licensed: Distill & commercialize freely!

Privacy

Modality

text

Context Length

164K

Input

$0.55/M

Output

$2.19/M

Meta: Llama 3.1 70B Instruct

meta-llama/llama-3.1-70b-instruct

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases.

It has demonstrated strong performance compared to leading closed-source models in human evaluations.

To read more about the model release, click here. Usage of this model is subject to Meta's Acceptable Use Policy.

Privacy

Modality

text

Context Length

131K

Input

$0.12/M

Output

$0.30/M

WizardLM-2 8x22B

microsoft/wizardlm-2-8x22b

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.

It is an instruct finetune of Mixtral 8x22B.

To read more about the model release, click here.

#moe

Privacy

Modality

text

Context Length

66K

Input

$0.50/M

Output

$0.50/M

Anthropic: Claude 3.5 Sonnet (self-moderated)

anthropic/claude-3.5-sonnet:beta

New Claude 3.5 Sonnet delivers better-than-Opus capabilities, faster-than-Sonnet speeds, at the same Sonnet prices. Sonnet is particularly good at:

Coding: Scores ~49% on SWE-Bench Verified, higher than the last best score, and without any fancy prompt scaffolding
Data science: Augments human data science expertise; navigates unstructured data while using multiple tools for insights
Visual processing: excelling at interpreting charts, graphs, and images, accurately transcribing text to derive insights beyond just the text alone
Agentic tasks: exceptional tool use, making it great at agentic tasks (i.e. complex, multi-step problem solving tasks that require engaging with other systems)

#multimodal

Privacy Terms Of Service

Modality

text,image

Context Length

200K

Input

$3.00/M

Output

$15.00/M

Image

$4.80/M

Mistral: Mistral 7B Instruct

mistralai/mistral-7b-instruct

A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.

Mistral 7B Instruct has multiple version variants, and this is intended to be the latest version.

Privacy

Modality

text

Context Length

33K

Input

$0.03/M

Output

$0.06/M

Mistral: Mistral Small 3

mistralai/mistral-small-24b-instruct-2501

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment.

The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware. Read the blog post about the model here.

Privacy

Modality

text

Context Length

33K

Input

$0.07/M

Output

$0.14/M

Google: Gemma 3 27B

google/gemma-3-27b-it

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

Privacy

Modality

text,image

Context Length

131K

Input

$0.10/M

Output

$0.20/M

Image

$0.03/M

Nous: Hermes 3 405B Instruct

nousresearch/hermes-3-llama-3.1-405b

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

Hermes 3 405B is a frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.

The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

Hermes 3 is competitive, if not superior, to Llama-3.1 Instruct models at general capabilities, with varying strengths and weaknesses attributable between the two.

Privacy

Modality

text

Context Length

131K

Input

$0.80/M

Output

$0.80/M