A Gradio web application for exploring the experiments described in Apple's paper "The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models with locally hosted language models using Ollama. This project is designed to test and evaluate the reasoning capabilities of language models on well-defined problem-solving tasks.
The paper evaluates on four different types of puzzles with varying difficulty levels:
- Towers of Hanoi - The classic disk-moving puzzle with three pegs
- Checker Jumping - A one-dimensional board puzzle where checkers must swap positions
- River Crossing - A constraint-satisfaction puzzle involving actors and agents crossing a river
- Blocks World - A planning puzzle requiring rearrangement of stacked blocks
Each puzzle:
- Has configurable difficulty levels (n=1 to n=10)
- Provides structured system prompts to guide the language model
- Automatically evaluates the model's solution for correctness
- Supports real-time interaction through a web interface via Gradio
- Install Ollama from https://ollama.ai
- Pull at least one model (recommended models for reasoning tasks):
- Verify Ollama is running:
-
Clone the repository:
git clone <repository-url> cd illusion-of-thinking -
Install dependencies (using uv - recommended):
Or using pip:
pip install -r requirements.txt
-
Ensure Ollama is running with at least one model available
-
Launch the Gradio interface:
using uv
or if using a virtual environment
-
Open your browser to the displayed URL (typically http://127.0.0.1:7860)
- Chatbot Window: Displays the conversation between system prompts and model responses
- Model Dropdown: Select which Ollama model to use for solving puzzles
- Options: Advanced JSON configuration for model parameters (temperature, top_p, etc.)
- Clear Button: Reset the conversation history
- Puzzle Dropdown: Choose from Towers of Hanoi, Checker Jumping, River Crossing, or Blocks World
- Difficulty Slider: Set complexity level (n=1 for easiest, n=10 for hardest)
- Solve Button: Start the puzzle-solving process
- System Tab: View/edit the system prompt that guides the model
- User Tab: View/edit the specific puzzle instance description
- Select a model (e.g., "qwen3:8b")
- Choose "Towers of Hanoi" puzzle
- Set difficulty to 3
- Click "Solve"
- Watch as the model attempts to solve the puzzle
- View the automatic evaluation of the solution
To add a new puzzle:
- Create a new class inheriting from Puzzle in puzzles.py
- Implement required methods: parse_solution(), play(), move(), user_prompt()
- Define NAME, SYSTEM_PROMPT, and other class attributes
- Add the puzzle to the puzzles dictionary in main.py
.png)


