A minimal Flask + HTMX + LangChain (with llama.cpp) interface for learning purposes only
❗ This project is a personal learning experiment. It is not production-ready. Do not deploy this without major modifications.
This project was built as a hands-on experiment in combining:
- 🧠 A local LLM backend (via llama.cpp and LangChain)
- ⚡ HTMX for seamless front-end interactivity without JavaScript frameworks
- 🌐 Flask for backend routing and SSE streaming
- 💅 A Neumorphic UI for minimal styling
It’s a prototype, intended for exploring techniques like:
- Streaming LLM output over Server-Sent Events (SSE)
- Managing chat history by session (in SQLite)
- Coordinating LLM calls with a safe in-process queue
- Building interactive web apps with minimal frontend JavaScript using htmx
This project is not suitable for deployment.
- ❌ No authentication or session protection
- ❌ Very basic error handling
- ❌ Blocking, single-threaded queue for LLM calls
- ❌ Input is not sanitized or restricted
It’s meant for educational use only.
- uv
- a compatible GGUF LLM model (e.g. Phi-3.5)
- a relatively fast computer (ideally with a strong GPU)
Download Phi-3.5-mini-instruct-GGUF (tested with the Q5_K_M quantization). Set the CHAT_MODEL_GGUF environment variable to the full path of the .gguf file. Or edit the .env file to include: CHAT_MODEL_GGUF=/path/to/your/model.gguf
Install uv. Then run:
Set FLASK_DEBUG=1 for automatic reloading on code changes.
For CUDA support (Nvidia GPU) you must reinstall the llama-cpp-python library (and have the CUDA toolkit installed):
.png)



