Thinking Mode in Ollama

4 months ago 28

illustration of Ollama thinking

Ollama now has the ability to enable or disable thinking. This gives users the flexibility to choose the model’s thinking behavior for different applications and use cases.

When thinking is enabled, the output will separate the model’s thinking from the model’s output. When thinking is disabled, the model will not think and directly output the content.

Models that support thinking:

Thinking in action

Enable thinking in DeepSeek R1

In the CLI, thinking is enabled using /set think followed by the prompt.

This can be useful in getting the model to think through different viewpoints to arrive at more accurate answer.

The model shown is the 8 billion parameter DeepSeek-R1-0528 Qwen 3 distilled model. This video is not sped up.

Disable thinking in DeepSeek R1

In the CLI, thinking is disabled using /set nothink followed by the prompt.

This is useful in getting answers fast out of the model.

The model shown is the 8 billion parameter DeepSeek-R1-0528 Qwen 3 distilled model. This video is not sped up.

Get started

Download the latest version of Ollama.

CLI

From the Ollama CLI, thinking can be enabled or disabled:

Enable thinking

--think

Disable thinking

--think=false

Interactive sessions

When chatting inside an interactive session, thinking can be enabled or disabled:

Enable thinking

/set think

Disable thinking

/set nothink

Scripting

For scripting, a --hidethinking command is available. This helps users who want to use thinking models but simply want to see the answer.

Example:

ollama run deepseek-r1:8b --hidethinking "is 9.9 bigger or 9.11?"

API

Both of Ollama’s generate API (/api/generate) and chat API (/api/chat) have been updated to support thinking.

There is a new think parameter that can be set to true or false for enabling a model’s thinking process. When the think parameter is set to true, the output will separate the model’s thinking from the model’s output. This can help users craft new application experiences like animating the thinking process via a graphical interface, or for NPCs in games to have a thinking bubble before the output. When the think parameter is set to false, the model will not think and directly output the content.

Example using Ollama’s chat API with thinking enabled

curl http://localhost:11434/api/chat -d '{ "model": "deepseek-r1", "messages": [ { "role": "user", "content": "how many r in the word strawberry?" } ], "think": true, "stream": false }'

Output

{"model":"deepseek-r1", "created_at":"2025-05-29T09:35:56.836222Z", "message": {"role": "assistant", "content": "The word \"strawberry\" contains **three** instances of the letter 'R' ..." "thinking": "First, the question is: \"how many r in the word strawberry?\" I need to count the number of times the letter 'r' appears in the word \"strawberry\". Let me write down the word:...", "done_reason":"stop", "done":true, "total_duration":47975065417, "load_duration":29758167, "prompt_eval_count":10, "prompt_eval_duration":174191542, "eval_count":2514, "eval_duration":47770692833 } }

Output is truncated for brevity.

Python library

Please update to the latest Ollama Python library.

pip install ollama

Example of enabling thinking

from ollama import chat messages = [ { 'role': 'user', 'content': 'What is 10 + 23?', }, ] response = chat('deepseek-r1', messages=messages, think=True) print('Thinking:\n========\n\n' + response.message.thinking) print('\nResponse:\n========\n\n' + response.message.content)

Please visit the Ollama Python library for more information about its usage. More examples are available.

JavaScript library

Please update to the latest Ollama JavaScript library.

npm i ollama

Example of enabling thinking

import ollama from 'ollama' async function main() { const response = await ollama.chat({ model: 'deepseek-r1', messages: [ { role: 'user', content: 'What is 10 + 23', }, ], stream: false, think: true, }) console.log('Thinking:\n========\n\n' + response.message.thinking) console.log('\nResponse:\n========\n\n' + response.message.content + '\n\n') } main()

Example of streaming responses with thinking

import ollama from 'ollama' async function main() { const response = await ollama.chat({ model: 'deepseek-r1', messages: [ { role: 'user', content: 'What is 10 + 23', }, ], stream: true, think: true, }) let startedThinking = false let finishedThinking = false for await (const chunk of response) { if (chunk.message.thinking && !startedThinking) { startedThinking = true process.stdout.write('Thinking:\n========\n\n') } else if (chunk.message.content && startedThinking && !finishedThinking) { finishedThinking = true process.stdout.write('\n\nResponse:\n========\n\n') } if (chunk.message.thinking) { process.stdout.write(chunk.message.thinking) } else if (chunk.message.content) { process.stdout.write(chunk.message.content) } } } main()

Please visit the Ollama JavaScript library for more information about its usage. More examples are available.

Reference

Read Entire Article