Show HN: A minimal Flask and Htmx and LangChain (with llama.cpp) interface

3 months ago 2

A minimal Flask + HTMX + LangChain (with llama.cpp) interface for learning purposes only

This project is a personal learning experiment. It is not production-ready. Do not deploy this without major modifications.

screenshot


This project was built as a hands-on experiment in combining:

  • 🧠 A local LLM backend (via llama.cpp and LangChain)
  • HTMX for seamless front-end interactivity without JavaScript frameworks
  • 🌐 Flask for backend routing and SSE streaming
  • 💅 A Neumorphic UI for minimal styling

It’s a prototype, intended for exploring techniques like:

  • Streaming LLM output over Server-Sent Events (SSE)
  • Managing chat history by session (in SQLite)
  • Coordinating LLM calls with a safe in-process queue
  • Building interactive web apps with minimal frontend JavaScript using htmx

This project is not suitable for deployment.

  • ❌ No authentication or session protection
  • ❌ Very basic error handling
  • ❌ Blocking, single-threaded queue for LLM calls
  • ❌ Input is not sanitized or restricted

It’s meant for educational use only.


  • uv
  • a compatible GGUF LLM model (e.g. Phi-3.5)
  • a relatively fast computer (ideally with a strong GPU)

Download Phi-3.5-mini-instruct-GGUF (tested with the Q5_K_M quantization). Set the CHAT_MODEL_GGUF environment variable to the full path of the .gguf file. Or edit the .env file to include: CHAT_MODEL_GGUF=/path/to/your/model.gguf

Install uv. Then run:

uv run flask --app main run

Set FLASK_DEBUG=1 for automatic reloading on code changes.

For CUDA support (Nvidia GPU) you must reinstall the llama-cpp-python library (and have the CUDA toolkit installed):

CMAKE_ARGS="\ -DGGML_CUDA=on \ -DLLAMA_BUILD_TESTS=OFF \ -DLLAMA_BUILD_EXAMPLES=OFF \ -DLLAMA_BUILD_TOOLS=OFF \ uv add --force-reinstall --no-cache-dir llama-cpp-python
Read Entire Article