Models

Language Model Providers

Ollama

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
DeepSeek V3	deepseek-v3	DeepSeek V3 model	163840	❌	❌	❌
DeepSeek R1 1.5B	deepseek-r1:1.5b	DeepSeek R1 1.5B Qwen model	131072	❌	❌	❌
DeepSeek R1 7B	deepseek-r1:7b	DeepSeek R1 7B Qwen model	131072	❌	❌	❌
DeepSeek R1 8B	deepseek-r1:8b	DeepSeek R1 8B Llama model	131072	❌	❌	❌
DeepSeek R1 14B	deepseek-r1:14b	DeepSeek R1 14B Qwen model	131072	❌	❌	❌
DeepSeek R1 32B	deepseek-r1:32b	DeepSeek R1 32B Qwen model	131072	❌	❌	❌
DeepSeek R1 70B	deepseek-r1:70b	DeepSeek R1 70B Llama model	131072	❌	❌	❌
DeepSeek R1 671B	deepseek-r1:671b	DeepSeek R1 671B model	131072	❌	❌	❌
Llama3 7b	llama3:latest	Llama 3	8192	❌	❌	❌
Llama 2-7b	llama2:latest	Llama 2	8192	❌	❌	❌
Mistral	mistral:latest	Mistral	8192	❌	❌	❌
Code Llama	codellama:7b-code	Code Llama	8192	❌	❌	❌

Replicate

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
Mixtral 8x7b instruct	mistralai/mixtral-8x7b-instruct-v0.1	Mixtral 8x7b instruct	128000	❌	❌	❌
Mistral 7b instruct v0.2	mistralai/mistral-7b-instruct-v0.2	The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an improved instruct fine-tuned version of Mistral-7B-Instruct-v0.1.	128000	❌	❌	❌
Mistral 7b instruct v0.1	mistral-7b-instruct-v0.1	An instruction-tuned 7 billion parameter language model from Mistral	128000	❌	❌	❌
Mixtral 8x7b instruct v0.1	mistralai/mixtral-8x7b-instruct-v0.1	The Mixtral-8x7B-instruct-v0.1 Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts tuned to be a helpful assistant.	128000	❌	❌	❌
Llama 2 13b chat	meta/llama-2-13b-chat	A 13 billion parameter language model from Meta, fine tuned for chat completions	128000	❌	❌	❌
Llama 2 70b chat	meta/llama-2-70b-chat	A 70 billion parameter language model from Meta, fine tuned for chat completions	128000	❌	❌	❌

OpenAI

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
GPT-5	gpt-5	GPT-5 is OpenAI's flagship model for coding, reasoning, and agentic tasks across domains.	400000	✅	✅	✅
GPT-5 mini	gpt-5-mini	GPT-5 mini is a faster, more cost-efficient version of GPT-5. It's great for well-defined tasks and precise prompts.	400000	✅	✅	✅
GPT-5 nano	gpt-5-nano	GPT-5 Nano is OpenAI's fastest, cheapest version of GPT-5. It's great for summarization and classification tasks.	400000	✅	✅	✅
4.1	gpt-4.1	OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.	1047576	✅	✅	✅
4.1 mini	gpt-4.1-mini	GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.	1047576	✅	✅	✅
4.1 nano	gpt-4.1-nano	GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.	1047576	✅	✅	✅
4o	gpt-4o	Advanced, multimodal flagship model that's cheaper and faster than GPT-4 Turbo	128000	✅	✅	✅
4o-mini	gpt-4o-mini	Affordable and intelligent small model for fast, lightweight tasks. GPT-4o mini is cheaper and more capable than GPT-3.5 Turbo. Currently points to gpt-4o-mini-2024-07-18.	128000	✅	✅	✅
o3-mini (Low Reasoning)	o3-mini-low	Fast and efficient o3-mini model with low reasoning effort. Optimized for quick responses with basic reasoning.	200000	✅	✅	✅
o3-mini (Medium Reasoning)	o3-mini-medium	Balanced o3-mini model with medium reasoning effort. Good for general-purpose tasks requiring moderate analysis.	200000	✅	✅	✅
o3-mini (High Reasoning)	o3-mini-high	Thorough o3-mini model with high reasoning effort. Best for complex tasks requiring deep analysis.	200000	✅	✅	✅
o3	o3	o3 is a powerful reasoning model designed for complex problem-solving across domains. It combines advanced reasoning capabilities with high performance for demanding tasks.	200000	✅	✅	✅
o4-mini (Low Reasoning)	o4-mini-low	Fast and efficient o4-mini model with low reasoning effort. Optimized for quick responses with basic reasoning.	200000	✅	✅	✅
o4-mini (Medium Reasoning)	o4-mini-medium	Balanced o4-mini model with medium reasoning effort. Good for general-purpose tasks requiring moderate analysis.	200000	✅	✅	✅
o4-mini (High Reasoning)	o4-mini-high	Thorough o4-mini model with high reasoning effort. Best for complex tasks requiring deep analysis.	200000	✅	✅	✅
o4-mini	o4-mini	o4-mini is a compact and efficient model that delivers strong performance for a wide range of tasks. It offers a good balance of capabilities and resource efficiency.	200000	✅	✅	✅
o1	o1	o1 is a reasoning model designed to solve hard problems across domains. The o1 series of models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user.	200000	✅	✅	✅
o1-mini	o1-mini	o1-mini is a fast and affordable reasoning model for specialized tasks. The o1-mini series of models are trained with reinforcement learning to perform complex reasoning. o1-mini models think before they answer, producing a long internal chain of thought before responding to the user.	128000	❌	✅	✅
4.5	gpt-4.5-preview	This is a research preview of GPT-4.5, OpenAI's largest and most capable GPT model yet. Its deep world knowledge and better understanding of user intent makes it good at creative tasks and agentic planning.	128000	✅	✅	✅
GPT-5 2025-08-07	gpt-5-2025-08-07	GPT-5 is OpenAI's flagship model for coding, reasoning, and agentic tasks across domains.	400000	✅	✅	✅
GPT-5 mini 2025-08-07	gpt-5-mini-2025-08-07	GPT-5 mini is a faster, more cost-efficient version of GPT-5. It's great for well-defined tasks and precise prompts.	400000	✅	✅	✅
GPT-5 nano 2025-08-07	gpt-5-nano-2025-08-07	GPT-5 Nano is OpenAI's fastest, cheapest version of GPT-5. It's great for summarization and classification tasks.	400000	✅	✅	✅
4.1 2025-04-14	gpt-4.1-2025-04-14	OpenAI's flagship model for complex tasks. It is well suited for problem solving across domains.	1047576	✅	✅	✅
4.1 mini 2025-04-14	gpt-4.1-mini-2025-04-14	GPT 4.1 mini provides a balance between intelligence, speed, and cost that makes it an attractive model for many use cases.	1047576	✅	✅	✅
4.1 nano 2025-04-14	gpt-4.1-nano-2025-04-14	GPT-4.1 nano is the fastest, most cost-effective GPT 4.1 model.	1047576	✅	✅	✅
4o 2024-08-06	gpt-4o-2024-08-06	2024-08-06 version of gpt-4o	128000	✅	✅	✅
4o-mini 2024-07-18	gpt-4o-mini-2024-07-18	2024-07-18 version of gpt-4o-mini	128000	✅	✅	✅
o1 2024-12-17	o1-2024-12-17	2024-12-17 version of o1	200000	✅	✅	✅
o1-mini 2024-09-12	o1-mini-2024-09-12	2024-09-12 version of o1-mini	128000	❌	✅	✅
4 Turbo	gpt-4-turbo	The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more.	128000	✅	✅	✅
4 Turbo Preview	gpt-4-turbo-preview	The latest GPT-4 model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This preview model is not yet suited for production traffic.	128000	✅	✅	✅
4 Vision	gpt-4-vision-preview	GPT-4 with the ability to understand images, in addition to all other GPT-4 Turbo capabilities.	128000	✅	✅	✅
4	gpt-4	More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration.	8192	❌	✅	✅
4 32K	gpt-4-32k	Same capabilities as the base gpt-4 mode but with 4x the context length. Will be updated with our latest model iteration.	32768	❌	✅	✅
4 Turbo 2024-04-09	gpt-4-turbo-2024-04-09	Advanced, multimodal flagship model that's cheaper and faster than GPT-4 Turbo	128000	✅	✅	✅
3.5 Turbo	gpt-3.5-turbo	Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.	4096	❌	✅	✅
3.5 Turbo 16K	gpt-3.5-turbo-16k	Same capabilities as the base gpt-3.5-turbo model but with 4x the context length. Will be updated with our latest model iteration.	16384	❌	✅	✅
4 0613	gpt-4-0613	More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration.	8192	❌	✅	✅
3.5 Turbo 0613	gpt-3.5-turbo-0613	Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003. Will be updated with our latest model iteration.	4096	❌	✅	✅

Groq

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
GPT-OSS 20B	openai/gpt-oss-20b	OpenAI's flagship open source model, built on a Mixture-of-Experts (MoE) architecture with 20 billion parameters and 32 experts. Features tool use, browser search, code execution, JSON object mode, and reasoning capabilities.	131072	❌	✅	✅
GPT-OSS 120B	openai/gpt-oss-120b	OpenAI's flagship open source model, built on a Mixture-of-Experts (MoE) architecture with 20 billion parameters and 128 experts. Features tool use, browser search, code execution, JSON object mode, and reasoning capabilities.	131072	❌	✅	✅
Kimi K2 Instruct	moonshotai/kimi-k2-instruct	Moonshot AI's state-of-the-art Mixture-of-Experts (MoE) language model with 1 trillion total parameters and 32 billion activated parameters. Designed for agentic intelligence, it excels at tool use, coding, and autonomous problem-solving across diverse domains.	131072	❌	✅	✅
Llama 4 Maverick	meta-llama/llama-4-maverick-17b-128e-instruct	Llama 4 Maverick	131072	❌	✅	✅
Llama 4 Scout	meta-llama/llama-4-scout-17b-16e-instruct	Llama 4 Scout	131072	❌	✅	✅
DeepSeek R1 Distilled Llama 70B	deepseek-r1-distill-llama-70b	DeepSeek R1 Distilled Llama 70B	128000	❌	✅	✅
DeepSeek R1 Distilled Llama 70B SpecDec	deepseek-r1-distill-llama-70b-specdec	DeepSeek R1 Distilled Llama 70B SpecDec	128000	❌	✅	✅
Llama 3.1 405B Reasoning	llama-3.1-405b-reasoning	Llama 3.1 405B Reasoning	131072	❌	✅	❌
Llama 3.3 70B Versatile	llama-3.3-70b-versatile	Llama 3.3 70B Versatile	32768	❌	✅	✅
Llama 3.3 70B SpecDec	llama-3.3-70b-specdec	Llama 3.3 70B SpecDec	8192	❌	✅	✅
Llama 3.1 70B Versatile (Tool Use Preview)	llama3-groq-70b-8192-tool-use-preview	Llama 3.1 70B Versatile (Tool Use Preview)	8192	❌	✅	✅
Llama 3.1 70B Versatile	llama-3.1-70b-versatile	Llama 3.1 70B Versatile	131072	❌	✅	❌
Llama 3.1 8B Instant (Tool Use Preview)	llama3-groq-8b-8192-tool-use-preview	Llama 3.1 8B Instant (Tool Use Preview)	8192	❌	✅	✅
Llama 3.1 8B Instant	llama-3.1-8b-instant	Llama 3.1 8B Instant	131072	❌	✅	✅
LLaMA3-70b	llama3-70b-8192	LLaMA3-70b	8192	❌	✅	✅
LLaMA3-8b	llama3-8b-8192	LLaMA3-8b	8192	❌	✅	✅
LLaMA2-70b	llama2-70b-4096	LLaMA2-70b	4096	❌	❌	❌
Mixtral-8x7b	mixtral-8x7b-32768	Mixtral-8x7b	32768	❌	✅	✅
Gemma-7b-it	gemma-7b-it	Gemma-7b-it	8192	❌	✅	✅

Google Generative AI

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
Gemini 2.5 Pro Preview 03-25	gemini-2.5-pro-preview-03-25	Gemini 2.5 Pro Preview 03-25	1000000	✅	✅	✅
Gemini 2.5 Flash Preview 04-17	gemini-2.5-flash-preview-04-17	Gemini 2.5 Flash Preview 04-17	1000000	✅	✅	✅
Gemini 2.0 Flash	gemini-2.0-flash-001	Gemini 2.0 Flash	1000000	✅	✅	✅
Gemini 2.0 Flash Experimental	gemini-2.0-flash-exp	Gemini 2.0 Flash Experimental	1000000	✅	✅	✅
Gemini 1.0 Pro	gemini-pro	Gemini 1.0 Pro	32000	❌	❌	❌

Anthropic Claude

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
Claude Opus 4.1	claude-opus-4-1-20250805	Anthropic's most capable and intelligent model yet. Claude Opus 4.1 sets new standards in complex reasoning and advanced coding.	200000	✅	✅	✅
Claude Opus 4	claude-opus-4-20250514	Anthropic's most capable model with highest level of intelligence and capability. Features extended thinking and priority tier access.	200000	✅	✅	✅
Claude Sonnet 4	claude-sonnet-4-20250514	Anthropic's high-performance model with balanced intelligence and speed. Features extended thinking and priority tier access.	200000	✅	✅	✅
Claude 3.7 Sonnet	claude-3-7-sonnet-20250219	Anthropic's most intelligent model. Highest level of intelligence and capability with toggleable extended thinking. This is the latest version of the model.	200000	✅	✅	✅
Claude 3.5 Sonnet (V2)	claude-3-5-sonnet-20241022	Anthropic's previous most intelligent model. High level of intelligence and capability.	200000	✅	✅	✅
Claude 3.5 Sonnet (V1)	claude-3-5-sonnet-20240620	Anthropic's previous most intelligent model. High level of intelligence and capability.	200000	✅	✅	✅
Claude 3.5 Haiku	claude-3-5-haiku-20241022	Anthropic's fastest model that can execute lightweight actions, with industry-leading speed.	200000	✅	✅	✅
Claude 3 Opus	claude-3-opus-20240229	Most powerful model for highly complex tasks, offering top-level performance with multilingual and vision capabilities.	200000	✅	✅	✅
Claude 3 Sonnet	claude-3-sonnet-20240229	Ideal balance of intelligence and speed for enterprise workloads, with multilingual and vision support.	200000	✅	✅	✅
Claude 3 Haiku	claude-3-haiku-20240307	Fastest and most compact model for near-instant responsiveness, includes multilingual and vision capabilities.	200000	✅	✅	✅

Perplexity AI

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
Sonar	sonar	Lightweight, cost-effective search model with grounding. Best suited for quick factual queries, topic summaries, product comparisons, and current events.	128000	❌	❌	❌
Sonar Pro	sonar-pro	Advanced search offering with grounding, supporting complex queries and follow-ups. Ideal for detailed information retrieval and synthesis.	128000	❌	❌	❌
Sonar Reasoning	sonar-reasoning	Fast, real-time reasoning model designed for problem-solving with search. Excellent for complex analyses requiring step-by-step thinking.	128000	❌	❌	❌
Sonar Deep Research	sonar-deep-research	Expert-level research model conducting exhaustive searches and generating comprehensive reports. Ideal for in-depth analysis and detailed topic reports.	128000	❌	❌	❌
Llama-3.1-Sonar-Small (8B)	llama-3.1-sonar-small-128k-online	Meta's Llama-3.1-Sonar-Small model with 8 billion parameters for chat use cases.	127072	❌	❌	❌
Llama-3.1-Sonar-Large (70B)	llama-3.1-sonar-large-128k-online	Meta's Llama-3.1-Sonar-Large model with 70 billion parameters for chat use cases.	127072	❌	❌	❌
Llama-3.1-Sonar-Huge (405B)	llama-3.1-sonar-huge-128k-online	Meta's Llama-3.1-Sonar-Huge model with 405 billion parameters for chat use cases.	127072	❌	❌	❌

Amazon Bedrock

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
Claude Opus 4.1	anthropic.claude-opus-4-1-20250805-v1:0	Anthropic's Claude Opus 4.1 model on Amazon Bedrock	200000	✅	✅	✅
Claude Opus 4	anthropic.claude-opus-4-20250514-v1:0	Anthropic's Claude Opus 4 model on Amazon Bedrock	200000	✅	✅	✅
Claude Sonnet 4	anthropic.claude-sonnet-4-20250514-v1:0	Anthropic's Claude Sonnet 4 model on Amazon Bedrock	200000	✅	✅	✅
Claude 3.7 Sonnet	anthropic.claude-3-7-sonnet-20250219-v1:0	Anthropic's Claude 3.7 Sonnet model on Amazon Bedrock	200000	✅	✅	✅
Claude 3.5 Sonnet (V2)	anthropic.claude-3-5-sonnet-20241022-v2:0	Anthropic's Claude 3.5 Sonnet model on Amazon Bedrock	200000	✅	✅	✅
Claude 3.5 Sonnet	anthropic.claude-3-5-sonnet-20240620-v1:0	Anthropic's Claude 3.5 Sonnet model on Amazon Bedrock	200000	✅	✅	✅
Claude 3 Sonnet	anthropic.claude-3-sonnet-20240229-v1:0	Anthropic's Claude 3 Sonnet model on Amazon Bedrock	200000	✅	✅	❌
Claude 3.5 Haiku	anthropic.claude-3-5-haiku-20241022-v1:0	Anthropic's Claude 3.5 Haiku model on Amazon Bedrock	200000	✅	✅	✅
Claude 3 Haiku	anthropic.claude-3-haiku-20240307-v1:0	Anthropic's Claude 3 Haiku model on Amazon Bedrock	200000	✅	✅	✅
Claude 3 Opus	anthropic.claude-3-opus-20240229-v1:0	Anthropic's Claude 3 Opus model on Amazon Bedrock	200000	✅	✅	❌
Llama 3 8B Instruct	meta.llama3-8b-instruct-v1:0	Meta's Llama 3 8B Instruct model on Amazon Bedrock	4096	❌	❌	❌
Llama 3 70B Instruct	meta.llama3-70b-instruct-v1:0	Meta's Llama 3 70B Instruct model on Amazon Bedrock	4096	❌	❌	❌
Llama 3.1 8B Instruct	meta.llama3-1-8b-instruct-v1:0	Meta's Llama 3.1 8B Instruct model on Amazon Bedrock	128000	❌	❌	❌
Llama 3.1 70B Instruct	meta.llama3-1-70b-instruct-v1:0	Meta's Llama 3.1 70B Instruct model on Amazon Bedrock	128000	❌	❌	❌
Llama 3.1 405B Instruct	meta.llama3-1-405b-instruct-v1:0	Meta's Llama 3.1 405B Instruct model on Amazon Bedrock	128000	❌	❌	❌
Llama 3.2 1B Instruct	us.meta.llama3-2-1b-instruct-v1:0	Meta's Llama 3.2 1B Instruct model on Amazon Bedrock	128000	❌	❌	❌
Llama 3.2 3B Instruct	us.meta.llama3-2-3b-instruct-v1:0	Meta's Llama 3.2 3B Instruct model on Amazon Bedrock	128000	❌	❌	❌
Llama 3.2 11B Instruct	us.meta.llama3-2-11b-instruct-v1:0	Meta's Llama 3.2 11B Instruct model on Amazon Bedrock	128000	❌	❌	❌
Llama 3.2 90B Instruct	us.meta.llama3-2-90b-instruct-v1:0	Meta's Llama 3.2 90B Instruct model on Amazon Bedrock	128000	❌	✅	✅

Azure OpenAI

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
GPT-4.1	gpt-4.1	Most capable GPT-4.1 model for tasks requiring deep understanding and advanced reasoning.	1047576	✅	✅	✅
GPT-4.1 Mini	gpt-4.1-mini	Smaller, faster version of GPT-4.1 optimized for efficiency.	1047576	✅	✅	✅
GPT-4.1 Nano	gpt-4.1-nano	Smallest version of GPT-4.1 optimized for speed and cost efficiency.	1047576	✅	✅	✅
GPT-4o	gpt-4o	Latest large GA model with structured outputs, text/image processing, enhanced accuracy and superior performance in non-English languages and vision tasks.	128000	✅	✅	✅
GPT-4o mini	gpt-4o-mini	Latest small GA model optimized for fast, inexpensive tasks. Supports text and image processing, JSON Mode, and parallel function calling.	128000	✅	✅	✅
o1	o1	o1 is a reasoning model designed to solve hard problems across domains. The o1 series of models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user.	200000	✅	✅	✅
o1-mini	o1-mini	o1-mini is a fast and affordable reasoning model for specialized tasks. The o1-mini series of models are trained with reinforcement learning to perform complex reasoning. o1-mini models think before they answer, producing a long internal chain of thought before responding to the user.	128000	❌	✅	✅
GPT-4	gpt-4	Most capable GPT-4 model for tasks requiring deep understanding and advanced reasoning.	8192	❌	✅	✅
GPT-3.5 Turbo	gpt-35-turbo	Most capable GPT-3.5 model, optimized for chat at 1/10th the cost of GPT-4.	16385	❌	✅	✅

xAI

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
Grok 4 (July 2024)	grok-4-0709	Grok 4 model (July 2024). Supports text and image input, 256,000 token context window, advanced reasoning, function calling, and structured outputs.	256000	✅	✅	✅
Grok 3	grok-3	Grok 3 model with high performance capabilities. Choose this for reduced cost compared to grok-3-fast.	131072	❌	✅	✅
Grok 3 Latest	grok-3-latest	Latest version of Grok 3 model with high performance capabilities.	131072	❌	✅	✅
Grok 3 Fast	grok-3-fast	Same as Grok 3 model but optimized for latency-sensitive applications. Choose this for better response time at higher cost.	131072	❌	✅	✅
Grok 3 Fast Latest	grok-3-fast-latest	Latest faster version of Grok 3 model with optimized response time.	131072	❌	✅	✅
Grok 3 Mini	grok-3-mini	Lightweight version of Grok 3 model with lower cost and good performance.	131072	❌	✅	✅
Grok 3 Mini Latest	grok-3-mini-latest	Latest lightweight version of Grok 3 model with lower cost and good performance.	131072	❌	✅	✅
Grok 3 Mini Fast	grok-3-mini-fast	Faster lightweight version of Grok 3 model with balanced cost and performance.	131072	❌	✅	✅
Grok 3 Mini Fast Latest	grok-3-mini-fast-latest	Latest faster lightweight version of Grok 3 model with balanced cost and performance.	131072	❌	✅	✅
Grok Beta	grok-beta	Comparable performance to Grok 2 but with improved efficiency, speed and capabilities.	131072	❌	✅	✅
Grok Vision Beta	grok-vision-beta	Comparable performance to Grok 2 but with improved efficiency, speed and capabilities and with ability to process images.	8192	✅	✅	❌

Fireworks

Model Name	Model ID	Description	Max Tokens	Supports Images	Supports JSON Schema	Supports Function Calls
GPT-OSS 20B	accounts/fireworks/models/gpt-oss-20b	A compact, open-weight language model optimized for low-latency and resource-constrained environments, including local and edge deployments. It shares the same Harmony training foundation and capabilities as 120B, with faster inference and easier deployment that is ideal for specialized or offline use cases, fast responsive performance, chain-of-thought output and adjustable reasoning levels, and agentic workflows.	128000	❌	✅	✅
GPT-OSS 120B	accounts/fireworks/models/gpt-oss-120b	A high-performance, open-weight language model designed for production-grade, general-purpose use cases. It fits on a single H100 GPU, making it accessible without requiring multi-GPU infrastructure. Trained on the Harmony response format, it excels at complex reasoning and supports configurable reasoning effort, full chain-of-thought transparency for easier debugging and trust, and native agentic capabilities for function calling, tool use, and structured outputs.	128000	❌	✅	✅
Llama 4 Maverick Instruct (Basic)	accounts/fireworks/models/llama4-maverick-instruct-basic	The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.	1000000	✅	✅	✅
Llama 4 Scout Instruct (Basic)	accounts/fireworks/models/llama4-scout-instruct-basic	The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.	128000	✅	✅	✅
Qwen3 235B A22B	accounts/fireworks/models/qwen3-235b-a22b	Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models	32768	❌	✅	✅
DeepSeek R1	accounts/fireworks/models/deepseek-r1	DeepSeek R1 is a large language model optimized for instruction following and coding tasks.	160000	❌	❌	❌
DeepSeek V3 03-24	accounts/fireworks/models/deepseek-v3-0324	DeepSeek V3 is a large language model optimized for instruction following. This model is the version of the DeepSeek V3 model as of 3/24/2025.	128000	❌	❌	❌
DeepSeek V3	accounts/fireworks/models/deepseek-v3	DeepSeek V3 is a large language model optimized for instruction following.	128000	❌	❌	❌
Llama 3.3 70B Instruct	accounts/fireworks/models/llama-v3p3-70b-instruct	Llama 3.3 70B Instruct is a large language model that is optimized for instruction following.	128000	❌	✅	✅
Llama 3.1 405B Instruct	accounts/fireworks/models/llama-v3p1-405b-instruct	Llama 3.1 405B Instruct is a large language model that is optimized for instruction following.	128000	❌	✅	✅
Llama 3.1 70B Instruct	accounts/fireworks/models/llama-v3p1-70b-instruct	Llama 3.1 70B Instruct is a large language model that is optimized for instruction following.	128000	❌	✅	✅

Embedding Models

OpenAI

Model Name	Model ID	Description	Max Tokens	Max Output Dimensions	Supports Reduced Dimensions
Text Embedding Ada 002	text-embedding-ada-002	Text Embedding Ada 002	8191	1536	❌
Text Embedding 3 Small	text-embedding-3-small	Increased performance over 2nd generation ada embedding model	8191	1536	✅
Text Embedding 3 Large	text-embedding-3-large	Most capable embedding model for both english and non-english tasks	8191	3072	✅

Cohere

Model Name	Model ID	Description	Max Tokens	Max Output Dimensions	Supports Reduced Dimensions
Embed English v3.0	embed-english-v3.0	A model that allows for text to be classified or turned into embeddings. English only.	512	1024	❌
Embed English Light v3.0	embed-english-light-v3.0	A smaller, faster version of embed-english-v3.0. Almost as capable, but a lot faster. English only.	512	384	❌
Embed English v2.0	embed-english-v2.0	Our older embeddings model that allows for text to be classified or turned into embeddings. English only	512	4096	❌
Embed English Light v2.0	embed-english-light-v2.0	A smaller, faster version of embed-english-v2.0. Almost as capable, but a lot faster. English only.	512	1024	❌
Embed Multilingual v3.0	embed-multilingual-v3.0	Provides multilingual classification and embedding support. See supported languages here.	512	1024	❌
Embed Multilingual Light v3.0	embed-multilingual-light-v3.0	A smaller, faster version of embed-multilingual-v3.0. Almost as capable, but a lot faster. Supports multiple languages.	512	384	❌
Embed Multilingual v2.0	embed-multilingual-v2.0	Provides multilingual classification and embedding support. See supported languages here.	256	768	❌

Amazon Bedrock

Model Name	Model ID	Description	Max Tokens	Max Output Dimensions	Supports Reduced Dimensions
Cohere Embed English	cohere.embed-english-v3	Cohere English Embedding Model hosted on AWS Bedrock	512	1024	❌
Cohere Embed Multilingual	cohere.embed-multilingual-v3	Cohere Multilingual Embedding Model hosted on AWS Bedrock	512	1024	❌
Amazon Titan Embeddings G1 - Text	amazon.titan-embed-text-v1	Amazon's G1 Test Embedding Model hosted on AWS Bedrock	8192	1024	❌
Amazon Titan Embeddings V2 - Text	amazon.titan-embed-text-v2:0	Amazon's G2 Text Embedding Model hosted on AWS Bedrock	8192	1024	❌

Azure OpenAI

Model Name	Model ID	Description	Max Tokens	Max Output Dimensions	Supports Reduced Dimensions
OpenAI embedding Large	text-embedding-3-large	OpenAI's Large Text Embedding Model hosted on Microsoft Azure	8192	3072	✅
OpenAI embedding Small	text-embedding-3-small	OpenAI's Small Text Embedding Model hosted on Microsoft Azure	8192	1536	✅

Language Model Providers​

Ollama​

Replicate​

OpenAI​

Groq​

Google Generative AI​

Anthropic Claude​

Perplexity AI​

Amazon Bedrock​

Azure OpenAI​

xAI​

Fireworks​

Embedding Models​

OpenAI​

Cohere​

Amazon Bedrock​

Azure OpenAI​

Language Model Providers

Ollama

Replicate

OpenAI

Groq

Google Generative AI

Anthropic Claude

Perplexity AI

Amazon Bedrock

Azure OpenAI

xAI

Fireworks

Embedding Models

OpenAI

Cohere

Amazon Bedrock

Azure OpenAI