Efficiency scoring: Quantified recommendations for improvement
Cost impact: Shows potential savings in dollars and tokens
Multiple input modes: Interactive, file-based, or programmatic
Cross-platform: Works on Windows, macOS, and Linux
Zero-config: Works out of the box with graceful fallbacks
Terminal-friendly: Beautiful output with automatic color detection
Quick Start (Recommended)
# Clone the repository
git clone https://github.com/yourusername/token-visualizer.git
cd token-visualizer
# Install dependencies (optional but recommended)
pip install tiktoken transformers
# Run it!
python token_visualizer.py
# Create virtual environment (recommended)
python -m venv token-env
source token-env/bin/activate # On Windows: token-env\Scripts\activate# Install all dependencies
pip install -r requirements.txt
# Or install minimal dependencies
pip install tiktoken # For OpenAI models (GPT-3.5, GPT-4)
pip install transformers # For Hugging Face models (LLaMA, Claude, etc.)
Package
Purpose
Required
tiktoken
OpenAI tokenization (GPT models)
Recommended
transformers
Hugging Face tokenization
Recommended
Built-in modules
Core functionality
✅ Always
Note: Tool works without dependencies using word-based fallback tokenization.
Perfect for quick analysis and experimentation:
python token_visualizer.py
Token Visualizer
Enter your text (press Ctrl+D when done):
--------------------------------------------------
Write a comprehensive blog post about the benefits
of using artificial intelligence in modern healthcare
systems, including specific examples and case studies.
^D
Select tokenizer:
1. gpt-4
2. gpt-3.5-turbo
3. claude-3-sonnet
4. llama-2-7b
Choice (1-4, default=1): 1
Analyze entire files or documents:
# Analyze a single file
python token_visualizer.py my_prompt.txt
# Analyze multiple filesforfilein prompts/*.txt;doecho"Analyzing $file"
python token_visualizer.py "$file"done
Integrate into your own tools:
fromtoken_visualizerimportTokenVisualizer# Initialize with your preferred modelvisualizer=TokenVisualizer("gpt-4")
# Analyze texttext="Your prompt here..."stats=visualizer.tokenize(text)
print(f"Tokens: {stats.token_count}")
print(f"Efficiency: {stats.efficiency:.2f} chars/token")
# Get optimization suggestionsvisualizer.suggest_compression(text)
Example 1: Basic Analysis
Input:
Analyze the following customer feedback and provide actionable insights
for improving our product based on the sentiment and specific issues mentioned.
Output:
TOKEN ANALYSIS - GPT-4
============================================================
SUMMARY:
Total tokens: 23
Total characters: 134
Efficiency: 5.83 chars/token
Est. GPT-4 cost: $0.0007
LINE BREAKDOWN:
Line 1: 23 tokens (5.8 c/t) Analyze the following customer feedback and provide actionable...
COMPRESSION SUGGESTIONS
============================================================
Text appears well-optimized!
POTENTIAL SAVINGS:
Estimated reduction: 2 tokens (10%)
Cost savings: $0.0001 per request
Example 2: Verbose Text Analysis
Input:
In order to provide you with the most comprehensive and detailed analysis
of the current market situation, I would like to take this opportunity to
examine all of the various factors that may be contributing to the recent
changes that we have been observing in the marketplace over the course of
the past several months.
Output:
TOKEN ANALYSIS - GPT-4
============================================================
SUMMARY:
Total tokens: 62
Total characters: 312
Efficiency: 5.03 chars/token
Est. GPT-4 cost: $0.0019
LINE BREAKDOWN:
Line 1: 62 tokens (5.0 c/t) In order to provide you with the most comprehensive and...
EXPENSIVE LINES (>50 tokens):
Line 1: 62 tokens - In order to provide you with the most comprehensive...
COMPRESSION SUGGESTIONS
============================================================
Repetitive words: the, of, to, that, in
Consider using pronouns or abbreviations
Verbose phrases found:
'in order to' → 'to'
'for the purpose of' → 'to'
Low efficiency (5.0 c/t):
Consider removing filler words, combining sentences
POTENTIAL SAVINGS:
Estimated reduction: 18 tokens (30%)
Cost savings: $0.0005 per request
Example 3: Code Documentation
# Analyze code comments and docstringsvisualizer=TokenVisualizer("gpt-4")
code_doc="""def process_user_data(user_input: str) -> dict: ''' This function takes user input as a string parameter and processes it through various validation and transformation steps in order to return a properly formatted dictionary containing all relevant user information that has been extracted and validated from the input. ''' pass"""visualizer.visualize_tokens(code_doc, show_individual=True)
__init__(model_name: str = "gpt-4")
Initialize the visualizer with a specific model tokenizer.
Parameters:
model_name: Supported models include gpt-4, gpt-3.5-turbo, claude-3-sonnet, llama-2-7b
tokenize(text: str) -> TokenStats
Tokenize text and return comprehensive statistics.
Modify the Colors class to customize the visual output:
classColors:
RED='\033[91m'# Expensive sectionsYELLOW='\033[93m'# Medium efficiencyGREEN='\033[92m'# Well optimizedCYAN='\033[96m'# Headers# ... customize as needed
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request
New tokenizer support
Additional visualization modes
More compression algorithms
Web interface
Mobile app
IDE plugins
Web interface with drag-and-drop file upload
Batch processing for multiple files
Export reports to PDF/HTML
API endpoint for integration
Streaming analysis for large files
Custom tokenizer training
Prompt template library
A/B testing framework
Cost tracking dashboard
Team collaboration features
AI-powered prompt optimization
Real-time optimization suggestions
Integration with popular LLM tools
Enterprise features (SSO, audit logs)
Q: How accurate is the token counting?
A: 100% accurate when using the official tokenizers (tiktoken for OpenAI models, transformers for others). The word-based fallback is ~95% accurate.
Q: Can I use this with my own custom models?
A: Yes! You can extend the _load_tokenizer method to support custom tokenizers.
Q: Does this work offline?
A: Yes, once dependencies are installed, everything runs locally.
Q: What about privacy/security?
A: All processing happens locally. No data is sent to external servers.
Q: Can I integrate this into my existing tools?
A: Absolutely! The TokenVisualizer class is designed for programmatic use.
Q: How do I handle very large files?
A: The tool handles large files well, but consider processing in chunks for files >10MB.
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Token Visualizer Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
OpenAI for the tiktoken library
Hugging Face for the transformers library
The open-source community for inspiration and feedback
All contributors who help make this tool better
For enterprise features, custom integrations, or priority support: