Models
Language Model Providers
Ollama
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
DeepSeek V3 | deepseek-v3 | DeepSeek V3 model | 163840 | ❌ | ❌ | ❌ |
DeepSeek R1 1.5B | deepseek-r1:1.5b | DeepSeek R1 1.5B Qwen model | 131072 | ❌ | ❌ | ❌ |
DeepSeek R1 7B | deepseek-r1:7b | DeepSeek R1 7B Qwen model | 131072 | ❌ | ❌ | ❌ |
DeepSeek R1 8B | deepseek-r1:8b | DeepSeek R1 8B Llama model | 131072 | ❌ | ❌ | ❌ |
DeepSeek R1 14B | deepseek-r1:14b | DeepSeek R1 14B Qwen model | 131072 | ❌ | ❌ | ❌ |
DeepSeek R1 32B | deepseek-r1:32b | DeepSeek R1 32B Qwen model | 131072 | ❌ | ❌ | ❌ |
DeepSeek R1 70B | deepseek-r1:70b | DeepSeek R1 70B Llama model | 131072 | ❌ | ❌ | ❌ |
DeepSeek R1 671B | deepseek-r1:671b | DeepSeek R1 671B model | 131072 | ❌ | ❌ | ❌ |
Llama3 7b | llama3:latest | Llama 3 | 8192 | ❌ | ❌ | ❌ |
Llama 2-7b | llama2:latest | Llama 2 | 8192 | ❌ | ❌ | ❌ |
Mistral | mistral:latest | Mistral | 8192 | ❌ | ❌ | ❌ |
Code Llama | codellama:7b-code | Code Llama | 8192 | ❌ | ❌ | ❌ |
Replicate
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Mixtral 8x7b instruct | mistralai/mixtral-8x7b-instruct-v0.1 | Mixtral 8x7b instruct | 128000 | ❌ | ❌ | ❌ |
Mistral 7b instruct v0.2 | mistralai/mistral-7b-instruct-v0.2 | The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1. | 128000 | ❌ | ❌ | ❌ |
Mistral 7b instruct v0.1 | mistral-7b-instruct-v0.1 | An instruction-tuned 7 billion parameter language model from Mistral | 128000 | ❌ | ❌ | ❌ |
Mixtral 8x7b instruct v0.1 | mistralai/mixtral-8x7b-instruct-v0.1 | The Mixtral-8x7B-instruct-v0.1 Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts tuned to be a helpful assistant. | 128000 | ❌ | ❌ | ❌ |
Llama 2 13b chat | meta/llama-2-13b-chat | A 13 billion parameter language model from Meta, fine tuned for chat completions | 128000 | ❌ | ❌ | ❌ |
Llama 2 70b chat | meta/llama-2-70b-chat | A 70 billion parameter language model from Meta, fine tuned for chat completions | 128000 | ❌ | ❌ | ❌ |
OpenAI
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
4.1 | gpt-4.1 | OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains. | 1047576 | ✅ | ✅ | ✅ |
4.1 mini | gpt-4.1-mini | GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases. | 1047576 | ✅ | ✅ | ✅ |
4.1 nano | gpt-4.1-nano | GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model. | 1047576 | ✅ | ✅ | ✅ |
4o | gpt-4o | Advanced, multimodal flagship model that's cheaper and faster than GPT-4 Turbo | 128000 | ✅ | ✅ | ✅ |
4o-mini | gpt-4o-mini | Affordable and intelligent small model for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18. | 128000 | ✅ | ✅ | ✅ |
o3-mini (Low Reasoning) | o3-mini-low | Fast and efficient o3-mini model with low reasoning effort. Optimized for quick responses with basic reasoning. | 200000 | ✅ | ✅ | ✅ |
o3-mini (Medium Reasoning) | o3-mini-medium | Balanced o3-mini model with medium reasoning effort. Good for general-purpose tasks requiring moderate analysis. | 200000 | ✅ | ✅ | ✅ |
o3-mini (High Reasoning) | o3-mini-high | Thorough o3-mini model with high reasoning effort. Best for complex tasks requiring deep analysis. | 200000 | ✅ | ✅ | ✅ |
o3 | o3 | o3 is a powerful reasoning model designed for complex problem-solving across domains. It combines advanced reasoning capabilities with high performance for demanding tasks. | 200000 | ✅ | ✅ | ✅ |
o4-mini (Low Reasoning) | o4-mini-low | Fast and efficient o4-mini model with low reasoning effort. Optimized for quick responses with basic reasoning. | 200000 | ✅ | ✅ | ✅ |
o4-mini (Medium Reasoning) | o4-mini-medium | Balanced o4-mini model with medium reasoning effort. Good for general-purpose tasks requiring moderate analysis. | 200000 | ✅ | ✅ | ✅ |
o4-mini (High Reasoning) | o4-mini-high | Thorough o4-mini model with high reasoning effort. Best for complex tasks requiring deep analysis. | 200000 | ✅ | ✅ | ✅ |
o4-mini | o4-mini | o4-mini is a compact and efficient model that delivers strong performance for a wide range of tasks. It offers a good balance of capabilities and resource efficiency. | 200000 | ✅ | ✅ | ✅ |
o1 | o1 | o1 is a reasoning model designed to solve hard problems across domains. The o1 series of models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. | 200000 | ✅ | ✅ | ✅ |
o1-mini | o1-mini | o1-mini is a fast and affordable reasoning model for specialized tasks. The o1-mini series of models are trained with reinforcement learning to perform complex reasoning. o1-mini models think before they answer, producing a long internal chain of thought before responding to the user. | 128000 | ❌ | ✅ | ✅ |
4.5 | gpt-4.5-preview | This is a research preview of GPT-4.5, OpenAI's largest and most capable GPT model yet. Its deep world knowledge and better understanding of user intent makes it good at creative tasks and agentic planning. | 128000 | ✅ | ✅ | ✅ |
4.1 2025-04-14 | gpt-4.1-2025-04-14 | OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains. | 1047576 | ✅ | ✅ | ✅ |
4.1 mini 2025-04-14 | gpt-4.1-mini-2025-04-14 | GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases. | 1047576 | ✅ | ✅ | ✅ |
4.1 nano 2025-04-14 | gpt-4.1-nano-2025-04-14 | GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model. | 1047576 | ✅ | ✅ | ✅ |
4o 2024-08-06 | gpt-4o-2024-08-06 | 2024-08-06 version of gpt-4o | 128000 | ✅ | ✅ | ✅ |
4o-mini 2024-07-18 | gpt-4o-mini-2024-07-18 | 2024-07-18 version of gpt-4o-mini | 128000 | ✅ | ✅ | ✅ |
o1 2024-12-17 | o1-2024-12-17 | 2024-12-17 version of o1 | 200000 | ✅ | ✅ | ✅ |
o1-mini 2024-09-12 | o1-mini-2024-09-12 | 2024-09-12 version of o1-mini | 128000 | ❌ | ✅ | ✅ |
4 Turbo | gpt-4-turbo | The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. | 128000 | ✅ | ✅ | ✅ |
4 Turbo Preview | gpt-4-turbo-preview | The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic. | 128000 | ✅ | ✅ | ✅ |
4 Vision | gpt-4-vision-preview | GPT-4 with the ability to understand images, in addition to all other GPT-4 Turbo capabilities. | 128000 | ✅ | ✅ | ✅ |
4 | gpt-4 | More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration. | 8192 | ❌ | ✅ | ✅ |
4 32K | gpt-4-32k | Same capabilities as the base gpt-4 mode but with 4x the context length. Will be updated with our latest model iteration. | 32768 | ❌ | ✅ | ✅ |
4 Turbo 2024-04-09 | gpt-4-turbo-2024-04-09 | Advanced, multimodal flagship model that's cheaper and faster than GPT-4 Turbo | 128000 | ✅ | ✅ | ✅ |
3.5 Turbo | gpt-3.5-turbo | Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration. | 4096 | ❌ | ✅ | ✅ |
3.5 Turbo 16K | gpt-3.5-turbo-16k | Same capabilities as the base gpt-3.5-turbo model but with 4x the context length. Will be updated with our latest model iteration. | 16384 | ❌ | ✅ | ✅ |
4 0613 | gpt-4-0613 | More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration. | 8192 | ❌ | ✅ | ✅ |
3.5 Turbo 0613 | gpt-3.5-turbo-0613 | Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration. | 4096 | ❌ | ✅ | ✅ |
Groq
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
GPT-OSS 20B | openai/gpt-oss-20b | OpenAI's flagship open source model, built on a Mixture-of-Experts (MoE) architecture with 20 billion parameters and 32 experts. Features tool use, browser search, code execution, JSON object mode, and reasoning capabilities. | 131072 | ❌ | ✅ | ✅ |
GPT-OSS 120B | openai/gpt-oss-120b | OpenAI's flagship open source model, built on a Mixture-of-Experts (MoE) architecture with 20 billion parameters and 128 experts. Features tool use, browser search, code execution, JSON object mode, and reasoning capabilities. | 131072 | ❌ | ✅ | ✅ |
Llama 4 Maverick | meta-llama/llama-4-maverick-17b-128e-instruct | Llama 4 Maverick | 131072 | ❌ | ✅ | ✅ |
Llama 4 Scout | meta-llama/llama-4-scout-17b-16e-instruct | Llama 4 Scout | 131072 | ❌ | ✅ | ✅ |
DeepSeek R1 Distilled Llama 70B | deepseek-r1-distill-llama-70b | DeepSeek R1 Distilled Llama 70B | 128000 | ❌ | ✅ | ✅ |
DeepSeek R1 Distilled Llama 70B SpecDec | deepseek-r1-distill-llama-70b-specdec | DeepSeek R1 Distilled Llama 70B SpecDec | 128000 | ❌ | ✅ | ✅ |
Llama 3.1 405B Reasoning | llama-3.1-405b-reasoning | Llama 3.1 405B Reasoning | 131072 | ❌ | ✅ | ❌ |
Llama 3.3 70B Versatile | llama-3.3-70b-versatile | Llama 3.3 70B Versatile | 32768 | ❌ | ✅ | ✅ |
Llama 3.3 70B SpecDec | llama-3.3-70b-specdec | Llama 3.3 70B SpecDec | 8192 | ❌ | ✅ | ✅ |
Llama 3.1 70B Versatile (Tool Use Preview) | llama3-groq-70b-8192-tool-use-preview | Llama 3.1 70B Versatile (Tool Use Preview) | 8192 | ❌ | ✅ | ✅ |
Llama 3.1 70B Versatile | llama-3.1-70b-versatile | Llama 3.1 70B Versatile | 131072 | ❌ | ✅ | ❌ |
Llama 3.1 8B Instant (Tool Use Preview) | llama3-groq-8b-8192-tool-use-preview | Llama 3.1 8B Instant (Tool Use Preview) | 8192 | ❌ | ✅ | ✅ |
Llama 3.1 8B Instant | llama-3.1-8b-instant | Llama 3.1 8B Instant | 131072 | ❌ | ✅ | ✅ |
LLaMA3-70b | llama3-70b-8192 | LLaMA3-70b | 8192 | ❌ | ✅ | ✅ |
LLaMA3-8b | llama3-8b-8192 | LLaMA3-8b | 8192 | ❌ | ✅ | ✅ |
LLaMA2-70b | llama2-70b-4096 | LLaMA2-70b | 4096 | ❌ | ❌ | ❌ |
Mixtral-8x7b | mixtral-8x7b-32768 | Mixtral-8x7b | 32768 | ❌ | ✅ | ✅ |
Gemma-7b-it | gemma-7b-it | Gemma-7b-it | 8192 | ❌ | ✅ | ✅ |
Google Generative AI
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Gemini 2.5 Pro Preview 03-25 | gemini-2.5-pro-preview-03-25 | Gemini 2.5 Pro Preview 03-25 | 1000000 | ✅ | ✅ | ✅ |
Gemini 2.5 Flash Preview 04-17 | gemini-2.5-flash-preview-04-17 | Gemini 2.5 Flash Preview 04-17 | 1000000 | ✅ | ✅ | ✅ |
Gemini 2.0 Flash | gemini-2.0-flash-001 | Gemini 2.0 Flash | 1000000 | ✅ | ✅ | ✅ |
Gemini 2.0 Flash Experimental | gemini-2.0-flash-exp | Gemini 2.0 Flash Experimental | 1000000 | ✅ | ✅ | ✅ |
Gemini 1.0 Pro | gemini-pro | Gemini 1.0 Pro | 32000 | ❌ | ❌ | ❌ |
Anthropic Claude
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Claude Opus 4.1 | claude-opus-4-1-20250805 | Anthropic's most capable and intelligent model yet. Claude Opus 4.1 sets new standards in complex reasoning and advanced coding. | 200000 | ✅ | ✅ | ✅ |
Claude Opus 4 | claude-opus-4-20250514 | Anthropic's most capable model with highest level of intelligence and capability. Features extended thinking and priority tier access. | 200000 | ✅ | ✅ | ✅ |
Claude Sonnet 4 | claude-sonnet-4-20250514 | Anthropic's high-performance model with balanced intelligence and speed. Features extended thinking and priority tier access. | 200000 | ✅ | ✅ | ✅ |
Claude 3.7 Sonnet | claude-3-7-sonnet-20250219 | Anthropic's most intelligent model. Highest level of intelligence and capability with toggleable extended thinking. This is the latest version of the model. | 200000 | ✅ | ✅ | ✅ |
Claude 3.5 Sonnet (V2) | claude-3-5-sonnet-20241022 | Anthropic's previous most intelligent model. High level of intelligence and capability. | 200000 | ✅ | ✅ | ✅ |
Claude 3.5 Sonnet (V1) | claude-3-5-sonnet-20240620 | Anthropic's previous most intelligent model. High level of intelligence and capability. | 200000 | ✅ | ✅ | ✅ |
Claude 3.5 Haiku | claude-3-5-haiku-20241022 | Anthropic's fastest model that can execute lightweight actions, with industry-leading speed. | 200000 | ✅ | ✅ | ✅ |
Claude 3 Opus | claude-3-opus-20240229 | Most powerful model for highly complex tasks, offering top-level performance with multilingual and vision capabilities. | 200000 | ✅ | ✅ | ✅ |
Claude 3 Sonnet | claude-3-sonnet-20240229 | Ideal balance of intelligence and speed for enterprise workloads, with multilingual and vision support. | 200000 | ✅ | ✅ | ✅ |
Claude 3 Haiku | claude-3-haiku-20240307 | Fastest and most compact model for near-instant responsiveness, includes multilingual and vision capabilities. | 200000 | ✅ | ✅ | ✅ |
OctoAI
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Llama-3.1-Instruct (8B) | meta-llama-3.1-8b-instruct | Meta's Llama-3.1-Instruct model with 8 billion parameters for chat use cases. | 131072 | ❌ | ✅ | ✅ |
Llama-3.1-Instruct (70B) | meta-llama-3.1-70b-instruct | Meta's Llama-3.1-Instruct model with 70 billion parameters for chat use cases. | 131072 | ❌ | ✅ | ✅ |
Llama3-Instruct (8B) | meta-llama-3-8b-instruct | Meta's Llama3-Instruct model with 8 billion parameters for chat use cases. | 8192 | ❌ | ✅ | ✅ |
Llama3-Instruct (70B) | meta-llama-3-70b-instruct | Meta's Llama3-Instruct model with 70 billion parameters for chat use cases. | 8192 | ❌ | ✅ | ✅ |
Mistral Instruct v0.3 (7B) | mistral-7b-instruct | Mistral's Instruct v0.3 model with 7 billion parameters for chat and coding use cases. | 32768 | ❌ | ❌ | ❌ |
Mixtral Instruct (8x7B) | mixtral-8x7b-instruct | Mistral's Mixtral Instruct model with 8x7 billion parameters for chat and coding use cases. | 32768 | ❌ | ❌ | ❌ |
Nous Hermes 2 Mixtral DPO (8x7B) | nous-hermes-2-mixtral-8x7b-dpo | Nous Research's Hermes 2 Mixtral DPO model with 8x7 billion parameters for content moderation. | 32768 | ❌ | ❌ | ❌ |
Mixtral Instruct (8x22B) | mixtral-8x22b-instruct | Mistral's Mixtral Instruct model with 8x22 billion parameters for chat and coding use cases. | 65536 | ❌ | ❌ | ❌ |
WizardLM-2 (8x22B) | wizardlm-2-8x22b | Microsoft's WizardLM-2 model with 8x22 billion parameters for chat and coding use cases. | 65536 | ❌ | ❌ | ❌ |
Llama Guard 2 | llamaguard-2-7b | Meta's Llama Guard 2 model with 7 billion parameters for content moderation. | 4096 | ❌ | ❌ | ❌ |
Perplexity AI
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Llama-3.1-Sonar-Small (8B) | llama-3.1-sonar-small-128k-online | Meta's Llama-3.1-Sonar-Small model with 8 billion parameters for chat use cases. | 127072 | ❌ | ❌ | ❌ |
Llama-3.1-Sonar-Large (70B) | llama-3.1-sonar-large-128k-online | Meta's Llama-3.1-Sonar-Large model with 70 billion parameters for chat use cases. | 127072 | ❌ | ❌ | ❌ |
Llama-3.1-Sonar-Huge (405B) | llama-3.1-sonar-huge-128k-online | Meta's Llama-3.1-Sonar-Huge model with 405 billion parameters for chat use cases. | 127072 | ❌ | ❌ | ❌ |
Amazon Bedrock
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | Anthropic's Claude 3.7 Sonnet model on Amazon Bedrock | 200000 | ✅ | ✅ | ✅ |
Claude 3.5 Sonnet (V2) | anthropic.claude-3-5-sonnet-20241022-v2:0 | Anthropic's Claude 3.5 Sonnet model on Amazon Bedrock | 200000 | ✅ | ✅ | ✅ |
Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0 | Anthropic's Claude 3.5 Sonnet model on Amazon Bedrock | 200000 | ✅ | ✅ | ✅ |
Claude 3 Sonnet | anthropic.claude-3-sonnet-20240229-v1:0 | Anthropic's Claude 3 Sonnet model on Amazon Bedrock | 200000 | ✅ | ✅ | ❌ |
Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 | Anthropic's Claude 3.5 Haiku model on Amazon Bedrock | 200000 | ✅ | ✅ | ✅ |
Claude 3 Haiku | anthropic.claude-3-haiku-20240307-v1:0 | Anthropic's Claude 3 Haiku model on Amazon Bedrock | 200000 | ✅ | ✅ | ✅ |
Claude 3 Opus | anthropic.claude-3-opus-20240229-v1:0 | Anthropic's Claude 3 Opus model on Amazon Bedrock | 200000 | ✅ | ✅ | ❌ |
Llama 3 8B Instruct | meta.llama3-8b-instruct-v1:0 | Meta's Llama 3 8B Instruct model on Amazon Bedrock | 4096 | ❌ | ❌ | ❌ |
Llama 3 70B Instruct | meta.llama3-70b-instruct-v1:0 | Meta's Llama 3 70B Instruct model on Amazon Bedrock | 4096 | ❌ | ❌ | ❌ |
Llama 3.1 8B Instruct | meta.llama3-1-8b-instruct-v1:0 | Meta's Llama 3.1 8B Instruct model on Amazon Bedrock | 128000 | ❌ | ❌ | ❌ |
Llama 3.1 70B Instruct | meta.llama3-1-70b-instruct-v1:0 | Meta's Llama 3.1 70B Instruct model on Amazon Bedrock | 128000 | ❌ | ❌ | ❌ |
Llama 3.1 405B Instruct | meta.llama3-1-405b-instruct-v1:0 | Meta's Llama 3.1 405B Instruct model on Amazon Bedrock | 128000 | ❌ | ❌ | ❌ |
Llama 3.2 1B Instruct | us.meta.llama3-2-1b-instruct-v1:0 | Meta's Llama 3.2 1B Instruct model on Amazon Bedrock | 128000 | ❌ | ❌ | ❌ |
Llama 3.2 3B Instruct | us.meta.llama3-2-3b-instruct-v1:0 | Meta's Llama 3.2 3B Instruct model on Amazon Bedrock | 128000 | ❌ | ❌ | ❌ |
Llama 3.2 11B Instruct | us.meta.llama3-2-11b-instruct-v1:0 | Meta's Llama 3.2 11B Instruct model on Amazon Bedrock | 128000 | ❌ | ❌ | ❌ |
Llama 3.2 90B Instruct | us.meta.llama3-2-90b-instruct-v1:0 | Meta's Llama 3.2 90B Instruct model on Amazon Bedrock | 128000 | ❌ | ✅ | ✅ |
Azure OpenAI
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
GPT-4.1 | gpt-4.1 | Most capable GPT-4.1 model for tasks requiring deep understanding and advanced reasoning. | 1047576 | ✅ | ✅ | ✅ |
GPT-4.1 Mini | gpt-4.1-mini | Smaller, faster version of GPT-4.1 optimized for efficiency. | 1047576 | ✅ | ✅ | ✅ |
GPT-4.1 Nano | gpt-4.1-nano | Smallest version of GPT-4.1 optimized for speed and cost efficiency. | 1047576 | ✅ | ✅ | ✅ |
GPT-4o | gpt-4o | Latest large GA model with structured outputs, text/image processing, enhanced accuracy and superior performance in non-English languages and vision tasks. | 128000 | ✅ | ✅ | ✅ |
GPT-4o mini | gpt-4o-mini | Latest small GA model optimized for fast, inexpensive tasks. Supports text and image processing, JSON Mode, and parallel function calling. | 128000 | ✅ | ✅ | ✅ |
o1 | o1 | o1 is a reasoning model designed to solve hard problems across domains. The o1 series of models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user. | 200000 | ✅ | ✅ | ✅ |
o1-mini | o1-mini | o1-mini is a fast and affordable reasoning model for specialized tasks. The o1-mini series of models are trained with reinforcement learning to perform complex reasoning. o1-mini models think before they answer, producing a long internal chain of thought before responding to the user. | 128000 | ❌ | ✅ | ✅ |
GPT-4 | gpt-4 | Most capable GPT-4 model for tasks requiring deep understanding and advanced reasoning. | 8192 | ❌ | ✅ | ✅ |
GPT-3.5 Turbo | gpt-35-turbo | Most capable GPT-3.5 model, optimized for chat at 1/10th the cost of GPT-4. | 16385 | ❌ | ✅ | ✅ |
xAI
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
Grok 3 | grok-3 | Grok 3 model with high performance capabilities. Choose this for reduced cost compared to grok-3-fast. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Latest | grok-3-latest | Latest version of Grok 3 model with high performance capabilities. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Fast | grok-3-fast | Same as Grok 3 model but optimized for latency-sensitive applications. Choose this for better response time at higher cost. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Fast Latest | grok-3-fast-latest | Latest faster version of Grok 3 model with optimized response time. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Mini | grok-3-mini | Lightweight version of Grok 3 model with lower cost and good performance. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Mini Latest | grok-3-mini-latest | Latest lightweight version of Grok 3 model with lower cost and good performance. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Mini Fast | grok-3-mini-fast | Faster lightweight version of Grok 3 model with balanced cost and performance. | 131072 | ❌ | ✅ | ✅ |
Grok 3 Mini Fast Latest | grok-3-mini-fast-latest | Latest faster lightweight version of Grok 3 model with balanced cost and performance. | 131072 | ❌ | ✅ | ✅ |
Grok Beta | grok-beta | Comparable performance to Grok 2 but with improved efficiency, speed and capabilities. | 131072 | ❌ | ✅ | ✅ |
Grok Vision Beta | grok-vision-beta | Comparable performance to Grok 2 but with improved efficiency, speed and capabilities and with ability to process images. | 8192 | ✅ | ✅ | ❌ |
Fireworks
Model Name | Model ID | Description | Max Tokens | Supports Images | Supports JSON Schema | Supports Function Calls |
---|---|---|---|---|---|---|
GPT-OSS 20B | accounts/fireworks/models/gpt-oss-20b | A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments. It shares the same Harmony training foundation and capabilities as 120B, with faster inference and easier deployment that is ideal for specialized or offline use cases, fast responsive performance, chain-of-thought output and adjustable reasoning levels, and agentic workflows. | 131072 | ❌ | ✅ | ✅ |
GPT-OSS 120B | accounts/fireworks/models/gpt-oss-120b | A high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs. | 131072 | ❌ | ✅ | ✅ |
Llama 4 Maverick Instruct (Basic) | accounts/fireworks/models/llama4-maverick-instruct-basic | The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. | 1000000 | ✅ | ✅ | ✅ |
Llama 4 Scout Instruct (Basic) | accounts/fireworks/models/llama4-scout-instruct-basic | The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. | 128000 | ✅ | ✅ | ✅ |
Qwen3 235B A22B | accounts/fireworks/models/qwen3-235b-a22b | Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models | 32768 | ❌ | ✅ | ✅ |
DeepSeek R1 | accounts/fireworks/models/deepseek-r1 | DeepSeek R1 is a large language model optimized for instruction following and coding tasks. | 160000 | ❌ | ❌ | ❌ |
DeepSeek V3 03-24 | accounts/fireworks/models/deepseek-v3-0324 | DeepSeek V3 is a large language model optimized for instruction following. This model is the version of the DeepSeek V3 model as of 3/24/2025. | 128000 | ❌ | ❌ | ❌ |
DeepSeek V3 | accounts/fireworks/models/deepseek-v3 | DeepSeek V3 is a large language model optimized for instruction following. | 128000 | ❌ | ❌ | ❌ |
Llama 3.3 70B Instruct | accounts/fireworks/models/llama-v3p3-70b-instruct | Llama 3.3 70B Instruct is a large language model that is optimized for instruction following. | 128000 | ❌ | ✅ | ✅ |
Llama 3.1 405B Instruct | accounts/fireworks/models/llama-v3p1-405b-instruct | Llama 3.1 405B Instruct is a large language model that is optimized for instruction following. | 128000 | ❌ | ✅ | ✅ |
Llama 3.1 70B Instruct | accounts/fireworks/models/llama-v3p1-70b-instruct | Llama 3.1 70B Instruct is a large language model that is optimized for instruction following. | 128000 | ❌ | ✅ | ✅ |
Embedding Models
OpenAI
Model Name | Model ID | Description | Max Tokens | Max Output Dimensions | Supports Reduced Dimensions |
---|---|---|---|---|---|
Text Embedding Ada 002 | text-embedding-ada-002 | Text Embedding Ada 002 | 8191 | 1536 | ❌ |
Text Embedding 3 Small | text-embedding-3-small | Increased performance over 2nd generation ada embedding model | 8191 | 1536 | ✅ |
Text Embedding 3 Large | text-embedding-3-large | Most capable embedding model for both english and non-english tasks | 8191 | 3072 | ✅ |
Cohere
Model Name | Model ID | Description | Max Tokens | Max Output Dimensions | Supports Reduced Dimensions |
---|---|---|---|---|---|
Embed English v3.0 | embed-english-v3.0 | A model that allows for text to be classified or turned into embeddings. English only. | 512 | 1024 | ❌ |
Embed English Light v3.0 | embed-english-light-v3.0 | A smaller, faster version of embed-english-v3.0. Almost as capable, but a lot faster. English only. | 512 | 384 | ❌ |
Embed English v2.0 | embed-english-v2.0 | Our older embeddings model that allows for text to be classified or turned into embeddings. English only | 512 | 4096 | ❌ |
Embed English Light v2.0 | embed-english-light-v2.0 | A smaller, faster version of embed-english-v2.0. Almost as capable, but a lot faster. English only. | 512 | 1024 | ❌ |
Embed Multilingual v3.0 | embed-multilingual-v3.0 | Provides multilingual classification and embedding support. See supported languages here. | 512 | 1024 | ❌ |
Embed Multilingual Light v3.0 | embed-multilingual-light-v3.0 | A smaller, faster version of embed-multilingual-v3.0. Almost as capable, but a lot faster. Supports multiple languages. | 512 | 384 | ❌ |
Embed Multilingual v2.0 | embed-multilingual-v2.0 | Provides multilingual classification and embedding support. See supported languages here. | 256 | 768 | ❌ |
Amazon Bedrock
Model Name | Model ID | Description | Max Tokens | Max Output Dimensions | Supports Reduced Dimensions |
---|---|---|---|---|---|
Cohere Embed English | cohere.embed-english-v3 | Cohere English Embedding Model hosted on AWS Bedrock | 512 | 1024 | ❌ |
Cohere Embed Multilingual | cohere.embed-multilingual-v3 | Cohere Multilingual Embedding Model hosted on AWS Bedrock | 512 | 1024 | ❌ |
Amazon Titan Embeddings G1 - Text | amazon.titan-embed-text-v1 | Amazon's G1 Test Embedding Model hosted on AWS Bedrock | 8192 | 1024 | ❌ |
Amazon Titan Embeddings V2 - Text | amazon.titan-embed-text-v2:0 | Amazon's G2 Text Embedding Model hosted on AWS Bedrock | 8192 | 1024 | ❌ |
Azure OpenAI
Model Name | Model ID | Description | Max Tokens | Max Output Dimensions | Supports Reduced Dimensions |
---|---|---|---|---|---|
OpenAI embedding Large | text-embedding-3-large | OpenAI's Large Text Embedding Model hosted on Microsoft Azure | 8192 | 3072 | ✅ |
OpenAI embedding Small | text-embedding-3-small | OpenAI's Small Text Embedding Model hosted on Microsoft Azure | 8192 | 1536 | ✅ |