Mistralai/Mistral-Small-3.2-24B-Instruct-2506

5 hours ago 2

Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503.

Small-3.2 improves in the following categories:

  • Instruction following: Small-3.2 is better at following precise instructions
  • Repetition errors: Small-3.2 produces less infinite generations or repetitive answers
  • Function calling: Small-3.2's function calling template is more robust (see here and examples)

In all other categories Small-3.2 should match or slightly improve compared to Mistral-Small-3.1-24B-Instruct-2503.

Key Features

Benchmark Results

We compare Mistral-Small-3.2-24B to Mistral-Small-3.1-24B-Instruct-2503. For more comparison against other models of similar size, please check Mistral-Small-3.1's Benchmarks'

Text

Instruction Following / Chat / Tone

Model Wildbench v2 Arena Hard v2 IF (Internal; accuracy)
Small 3.1 24B Instruct 55.6% 19.56% 82.75%
Small 3.2 24B Instruct 65.33% 43.1% 84.78%

Infinite Generations

Small 3.2 reduces infitine generations by 2x on challenging, long and repetitive prompts.

Model Infinite Generations (Internal; Lower is better)
Small 3.1 24B Instruct 2.11%
Small 3.2 24B Instruct 1.29%

STEM

Model MMLU MMLU Pro (5-shot CoT) MATH GPQA Main (5-shot CoT) GPQA Diamond (5-shot CoT ) MBPP Plus - Pass@5 HumanEval Plus - Pass@5 SimpleQA (TotalAcc)
Small 3.1 24B Instruct 80.62% 66.76% 69.30% 44.42% 45.96% 74.63% 88.99% 10.43%
Small 3.2 24B Instruct 80.50% 69.06% 69.42% 44.22% 46.13% 78.33% 92.90% 12.10%

Vision

Model MMMU Mathvista ChartQA DocVQA AI2D
Small 3.1 24B Instruct 64.00% 68.91% 86.24% 94.08% 93.72%
Small 3.2 24B Instruct 62.50% 67.09% 87.4% 94.86% 92.91%

Usage

The model can be used with the following frameworks;

Note 1: We recommend using a relatively low temperature, such as temperature=0.15.

Note 2: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend to use the one provided in the SYSTEM_PROMPT.txt file.

vLLM (recommended)

We recommend using this model with vLLM.

Installation

Make sure to install vLLM >= 0.9.1:

pip install vllm --upgrade

Doing so should automatically install mistral_common >= 1.6.2.

To check:

python -c "import mistral_common; print(mistral_common.__version__)"

You can also make use of a ready-to-go docker image or on the docker hub.

Serve

We recommand that you use Mistral-Small-3.2-24B-Instruct-2506 in a server/client setting.

  1. Spin up a server:
vllm serve mistralai/Mistral-Small-3.2-24B-Instruct-2506 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2

Note: Running Mistral-Small-3.2-24B-Instruct-2506 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.

  1. To ping the client you can use a simple Python snippet. See the following examples.

Vision reasoning

Take leverage of the vision capabilities of Mistral-Small-3.2-24B-Instruct-2506 to take the best choice given a scenario, go catch them all !

Python snippet from datetime import datetime, timedelta from openai import OpenAI from huggingface_hub import hf_hub_download openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 131072 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday) model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506" SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt") image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": [ { "type": "text", "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.", }, {"type": "image_url", "image_url": {"url": image_url}}, ], }, ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, ) print(response.choices[0].message.content)

Function calling

Mistral-Small-3.2-24B-Instruct-2506 is excellent at function / tool calling tasks via vLLM. E.g.:

Python snippet - easy from openai import OpenAI from huggingface_hub import hf_hub_download openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 131072 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() return system_prompt model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506" SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt") image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png" tools = [ { "type": "function", "function": { "name": "get_current_population", "description": "Get the up-to-date population of a given country.", "parameters": { "type": "object", "properties": { "country": { "type": "string", "description": "The country to find the population of.", }, "unit": { "type": "string", "description": "The unit for the population.", "enum": ["millions", "thousands"], }, }, "required": ["country", "unit"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "Rewrite a given text for improved clarity", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "The input text to rewrite", } }, }, }, }, ] messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.", }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "bbc5b7ede", "type": "function", "function": { "name": "rewrite", "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}', }, } ], }, { "role": "tool", "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}', "tool_call_id": "bbc5b7ede", "name": "rewrite", }, { "role": "assistant", "content": "---\n\nOpenAI is a FOR-profit company.", }, { "role": "user", "content": [ { "type": "text", "text": "Can you tell me what is the biggest country depicted on the map?", }, { "type": "image_url", "image_url": { "url": image_url, }, }, ], } ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, tools=tools, tool_choice="auto", ) assistant_message = response.choices[0].message.content print(assistant_message) messages.extend([ {"role": "assistant", "content": assistant_message}, {"role": "user", "content": "What is the population of that country in millions?"}, ]) response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, tools=tools, tool_choice="auto", ) print(response.choices[0].message.tool_calls) Python snippet - complex import json from openai import OpenAI from huggingface_hub import hf_hub_download openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 131072 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() return system_prompt model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506" SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt") image_url = "https://math-coaching.com/img/fiche/46/expressions-mathematiques.jpg" def my_calculator(expression: str) -> str: return str(eval(expression)) tools = [ { "type": "function", "function": { "name": "my_calculator", "description": "A calculator that can evaluate a mathematical expression.", "parameters": { "type": "object", "properties": { "expression": { "type": "string", "description": "The mathematical expression to evaluate.", }, }, "required": ["expression"], }, }, }, { "type": "function", "function": { "name": "rewrite", "description": "Rewrite a given text for improved clarity", "parameters": { "type": "object", "properties": { "text": { "type": "string", "description": "The input text to rewrite", } }, }, }, }, ] messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": [ { "type": "text", "text": "Can you calculate the results for all the equations displayed in the image? Only compute the ones that involve numbers.", }, { "type": "image_url", "image_url": { "url": image_url, }, }, ], }, ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, tools=tools, tool_choice="auto", ) tool_calls = response.choices[0].message.tool_calls print(tool_calls) results = [] for tool_call in tool_calls: function_name = tool_call.function.name function_args = tool_call.function.arguments if function_name == "my_calculator": result = my_calculator(**json.loads(function_args)) results.append(result) messages.append({"role": "assistant", "tool_calls": tool_calls}) for tool_call, result in zip(tool_calls, results): messages.append( { "role": "tool", "tool_call_id": tool_call.id, "name": tool_call.function.name, "content": result, } ) response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, ) print(response.choices[0].message.content)

Instruction following

Mistral-Small-3.2-24B-Instruct-2506 will follow your instructions down to the last letter !

Python snippet from openai import OpenAI from huggingface_hub import hf_hub_download openai_api_key = "EMPTY" openai_api_base = "http://localhost:8000/v1" TEMP = 0.15 MAX_TOK = 131072 client = OpenAI( api_key=openai_api_key, base_url=openai_api_base, ) models = client.models.list() model = models.data[0].id def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() return system_prompt model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506" SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt") messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": "Write me a sentence where every word starts with the next letter in the alphabet - start with 'a' and end with 'z'.", }, ] response = client.chat.completions.create( model=model, messages=messages, temperature=TEMP, max_tokens=MAX_TOK, ) assistant_message = response.choices[0].message.content print(assistant_message)

Transformers

You can also use Mistral-Small-3.2-24B-Instruct-2506 with Transformers !

To make the best use of our model with Transformers make sure to have installed mistral-common >= 1.6.2 to use our tokenizer.

pip install mistral-common --upgrade

Then load our tokenizer along with the model and generate:

Python snippet from datetime import datetime, timedelta import torch from mistral_common.protocol.instruct.request import ChatCompletionRequest from mistral_common.tokens.tokenizers.mistral import MistralTokenizer from huggingface_hub import hf_hub_download from transformers import Mistral3ForConditionalGeneration def load_system_prompt(repo_id: str, filename: str) -> str: file_path = hf_hub_download(repo_id=repo_id, filename=filename) with open(file_path, "r") as file: system_prompt = file.read() today = datetime.today().strftime("%Y-%m-%d") yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d") model_name = repo_id.split("/")[-1] return system_prompt.format(name=model_name, today=today, yesterday=yesterday) model_id = "mistralai/Mistral-Small-3.2-24B-Instruct-2506" SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt") tokenizer = MistralTokenizer.from_hf_hub(model_id) model = Mistral3ForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16 ) image_url = "https://static.wikia.nocookie.net/essentialsdocs/images/7/70/Battle.png/revision/latest?cb=20220523172438" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, { "role": "user", "content": [ { "type": "text", "text": "What action do you think I should take in this situation? List all the possible actions and explain why you think they are good or bad.", }, {"type": "image_url", "image_url": {"url": image_url}}, ], }, ] tokenized = tokenizer.encode_chat_completion(ChatCompletionRequest(messages=messages)) input_ids = torch.tensor([tokenized.tokens]) attention_mask = torch.ones_like(input_ids) pixel_values = torch.tensor(tokenized.images[0], dtype=torch.bfloat16).unsqueeze(0) image_sizes = torch.tensor([pixel_values.shape[-2:]]) output = model.generate( input_ids=input_ids, attention_mask=attention_mask, pixel_values=pixel_values, image_sizes=image_sizes, max_new_tokens=1000, )[0] decoded_output = tokenizer.decode(output[len(tokenized.tokens) :]) print(decoded_output)

Model tree for mistralai/Mistral-Small-3.2-24B-Instruct-2506

Read Entire Article