Every lecture has lore. Most of it is locked in 10-hour videos and cryptic PDFs.
The Lore Engine extracts it.
You know the drill:
- PDFs: Your professor's 200-slide PDF, filled with nothing but bullet points, vague diagrams, and your own shattered hopes.
- Handwritten notes: That one dude's notes from 2018, scanned so badly they look like a seismograph reading of a metal concert. Good luck deciphering it 3 hours before finals.
- Videos: You're rewatching a 2-hour lecture for the fifth time trying to find that one explanation
- Time sink: "Let me just scrub through this 40-hour course real quick..." (Narrator: It was not quick.)
- Comprehension gap: Slides are too sparse, textbooks are too dense, videos are too slow. Handwriting too alien.
What if you could transform all of it into comprehensive, readable notes?
Lectures have the perfect amount of explanation—not a sparse slide deck, not a dense textbook. This tool gives you lecture-quality explanations for everything: your professor's cryptic PDFs, incomprehensible handwritten notes, and those endless video recordings.
The Lore Engine is a multimodal AI pipeline that transforms educational content—PDFs, videos, handwritten notes, and transcripts—into comprehensive, searchable markdown notes with explanations, screenshots and diagrams.
Think of it as a knowledge extraction engine: you feed it raw educational content, and it gives you organized, comprehensive "lore dumps."
Before: 10 hours of lecture watching
After: 2 hours of focused reading (with full details and better explanations)
Interactive mode makes it dead simple to use
Point it at a folder of PDFs or .srt files (with or without video), and let it work its magic.
- 📄 PDF → Detailed Notes: Turn sparse slide decks into comprehensive explanations
- ✍️ Handwriting → Detailed Notes: OCR and explain your professor's illegible scrawls
- 📝 Transcripts + Video → Detailed Notes: Take SRT files and add visual context + better formatting
- 📸 Smart Screenshots: Automatically captures key moments, not redundant frames
- 📊 Mermaid Diagrams: Auto-generates flowcharts and architecture diagrams
- 🎯 Perceptual Deduplication: Hash-based frame selection (no more 50 identical slides)
- 🤖 Context-Aware Explanations: AI fills in the gaps between what's shown and what's implied
- 🚀 Blazing Fast: Process 10 hours of video in 40 minutes (15x real-time speed with 2 keys). Then consume in the next 4 hours.
- ⚡ Parallel Processing: Multi-process pipeline + round-robin API keys = scales linearly
- 💾 Memory Efficient: Doesn't load entire videos into RAM
- 🆓 Free-Tier Friendly: Optimized for Gemini's generous free tier
Performance:
- Frame extraction: ~2-4 seconds per chunk (video_reader-rs, not OpenCV)
- Memory efficient: No whole-video allocation like Decord
- Scales linearly: 2 API keys = 30x real-time, 10 keys = 150x real-time
- CPU usage: ~3% (I/O bound, not compute bound)
Clean, comprehensive markdown notes with screenshots and diagrams
Recommended: Using uv (fastest)
First, install uv if you haven't already.
Alternative: Using pip
With dev dependencies:
Note: This project uses google-generativeai (legacy SDK). We may migrate to the new google-genai SDK in the future. See migration guide for differences.
Note: On Windows, you may need to install ffmpeg separately:
- Go to Google AI Studio
- Click "Get API Key"
- Copy your key
Create a .env file in the project root:
Pro tip: Add multiple keys for faster parallel processing:
The engine uses numbered keys (GEMINI_API_KEY_1, GEMINI_API_KEY_2, etc.) in round-robin fashion. More keys = faster processing!
Interactive Mode (easiest):
Single File:
Batch Process a Folder:
The tool will:
- 📹 Extract smart keyframes from videos
- 📝 Process transcripts (auto-detects .srt files)
- 🤖 Generate comprehensive notes with Gemini
- 💾 Save markdown files in the output directory
1. Video or PDF Processing
- Uses video_reader-rs (Rust FFmpeg bindings) instead of OpenCV for frame extraction
- Batch frame extraction via get_batch() API
- Memory efficient: only loads requested frames
2. Intelligent Frame Selection for Videos
- Perceptual hashing (pHash) with 8x8 DCT
- Temporal diversity scoring to avoid redundant frames
- Configurable similarity thresholds
- Global deduplication across entire video
3. Multimodal AI Orchestration
- Gemini 2.5 Flash for speed + quality balance (Any Gemini model works)
- Automatic fallback: inline images → File API for large batches
- Exponential backoff with intelligent retry logic
- Rate limiting to maximize free-tier throughput
4. Output Processing
- Automatic Mermaid diagram syntax correction
- Screenshot placeholder replacement with relative paths
- Markdown cleaning and formatting
| Frame extraction | 2-4s per chunk | 1080p video, 5 frames |
| LLM inference | 10-20s per chunk | ~50 subtitles + images |
| Rate limiting | 10s between calls | Gemini free tier |
| Throughput | 15x real-time | With 2 API keys |
| Memory usage | <500MB | Excluding video file |
Bottleneck: LLM API calls (expected and unavoidable)
Not the bottleneck: Frame extraction
Edit config.json to customize:
Key settings:
- screenshots_per_minute: How many frames to extract per minute of video
- hash_similarity_threshold: Lower = more strict deduplication
- request_interval: Seconds between API calls (respect rate limits!)
Q: Does this work with non-English content?
A: Yes! Gemini supports 100+ languages. Just make sure your SRT files are in the correct encoding (UTF-8). You will have to modify the base prompt to include your language.
Q: Can I use this for copyrighted content?
A: The tool processes content locally and sends frames to Gemini's API. Follow your institution's fair use policies for educational content. Notes are derived content so should be fine :P but I am no legal expert.
Q: Why Gemini and not GPT-5/Claude?
A: Gemini 2.5 has native multimodal support, generous free tier (60 RPM), and excellent performance on educational content. But the architecture is LLM-agnostic and model agnostic support coming soon!
Q: How much does this cost?
A: Free if you stay within Gemini's limits. Heavy users might hit paid tiers.
Q: Can I run this on my own LLM?
A: Not yet, but the architecture supports it. PRs welcome for OpenRouter(and alternatives) integration.
Q: What about privacy?
A: The tool runs locally, however all content is sent to the Gemini API and Gemini Privacy Policy applies.
- Local LLM/OpenRouter/Alternative support
- GUI interface
- Anki flashcard generation
- Custom prompt templates
- Better lecture support (whiteboard detection) - get latest fully annotated frame
Found a bug? Have a feature idea? PRs welcome!
Areas where help is needed:
- Testing on different video codecs
- Mermaid Diagram prompt
- LaTeX rendering improvements
- Local LLM integration
- UI/UX enhancements
Star this repo if it extracted the lore from your professor's cryptic slides ⭐
.png)

