
Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.
When thinking is enabled, the output will separate the model’s thinking from the model’s output. When thinking is disabled, the model will not think and directly output the content.
Models that support thinking:
- DeepSeek R1
- Qwen 3
- more will be added under thinking models.
Thinking in action
Enable thinking in DeepSeek R1
In the CLI, thinking is enabled using /set think followed by the prompt.
This can be useful in getting the model to think through different viewpoints to arrive at more accurate answer.
The model shown is the 8 billion parameter DeepSeek-R1-0528 Qwen 3 distilled model. This video is not sped up.
Disable thinking in DeepSeek R1
In the CLI, thinking is disabled using /set nothink followed by the prompt.
This is useful in getting answers fast out of the model.
The model shown is the 8 billion parameter DeepSeek-R1-0528 Qwen 3 distilled model. This video is not sped up.
Get started
Download the latest version of Ollama.
CLI
From the Ollama CLI, thinking can be enabled or disabled:
Enable thinking
--thinkDisable thinking
--think=falseInteractive sessions
When chatting inside an interactive session, thinking can be enabled or disabled:
Enable thinking
/set thinkDisable thinking
/set nothinkScripting
For scripting, a --hidethinking command is available. This helps users who want to use thinking models but simply want to see the answer.
Example:
ollama run deepseek-r1:8b --hidethinking "is 9.9 bigger or 9.11?"API
Both of Ollama’s generate API (/api/generate) and chat API (/api/chat) have been updated to support thinking.
There is a new think parameter that can be set to true or false for enabling a model’s thinking process. When the think parameter is set to true, the output will separate the model’s thinking from the model’s output. This can help users craft new application experiences like animating the thinking process via a graphical interface, or for NPCs in games to have a thinking bubble before the output. When the think parameter is set to false, the model will not think and directly output the content.
Example using Ollama’s chat API with thinking enabled
curl http://localhost:11434/api/chat -d '{ "model": "deepseek-r1", "messages": [ { "role": "user", "content": "how many r in the word strawberry?" } ], "think": true, "stream": false }'Output
{"model":"deepseek-r1", "created_at":"2025-05-29T09:35:56.836222Z", "message": {"role": "assistant", "content": "The word \"strawberry\" contains **three** instances of the letter 'R' ..." "thinking": "First, the question is: \"how many r in the word strawberry?\" I need to count the number of times the letter 'r' appears in the word \"strawberry\". Let me write down the word:...", "done_reason":"stop", "done":true, "total_duration":47975065417, "load_duration":29758167, "prompt_eval_count":10, "prompt_eval_duration":174191542, "eval_count":2514, "eval_duration":47770692833 } }Output is truncated for brevity.
Python library
Please update to the latest Ollama Python library.
pip install ollamaExample of enabling thinking
from ollama import chat messages = [ { 'role': 'user', 'content': 'What is 10 + 23?', }, ] response = chat('deepseek-r1', messages=messages, think=True) print('Thinking:\n========\n\n' + response.message.thinking) print('\nResponse:\n========\n\n' + response.message.content)Please visit the Ollama Python library for more information about its usage. More examples are available.
JavaScript library
Please update to the latest Ollama JavaScript library.
npm i ollamaExample of enabling thinking
import ollama from 'ollama' async function main() { const response = await ollama.chat({ model: 'deepseek-r1', messages: [ { role: 'user', content: 'What is 10 + 23', }, ], stream: false, think: true, }) console.log('Thinking:\n========\n\n' + response.message.thinking) console.log('\nResponse:\n========\n\n' + response.message.content + '\n\n') } main()Example of streaming responses with thinking
import ollama from 'ollama' async function main() { const response = await ollama.chat({ model: 'deepseek-r1', messages: [ { role: 'user', content: 'What is 10 + 23', }, ], stream: true, think: true, }) let startedThinking = false let finishedThinking = false for await (const chunk of response) { if (chunk.message.thinking && !startedThinking) { startedThinking = true process.stdout.write('Thinking:\n========\n\n') } else if (chunk.message.content && startedThinking && !finishedThinking) { finishedThinking = true process.stdout.write('\n\nResponse:\n========\n\n') } if (chunk.message.thinking) { process.stdout.write(chunk.message.thinking) } else if (chunk.message.content) { process.stdout.write(chunk.message.content) } } } main()Please visit the Ollama JavaScript library for more information about its usage. More examples are available.
.png)


