AI News Link Scraper extracts all URLs from the most recent AI News issue (from news.smol.ai) and prepares them for seamless import into Google's NotebookLM. It organizes sources into a dedicated folder, separates non-social URLs into a sources.txt, and generates individual markdown files for quoted tweet content.
- Folder Generation: Creates a timestamped folder for each issue’s sources.
- sources.txt: Lists all URLs from the issue, excluding twitter.com, x.com, and discord.com.
- Tweet Markdown: Saves the full text of each quoted tweet as a separate markdown file.
- WebSync Ready: sources.txt can be pasted directly into the WebSync for NotebookLM Chrome extension to auto-import into NotebookLM.
git clone https://github.com/ThomsenDrake/ainews-source-extractor.git
cd ainews-source-extractor
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Simply run the main scraper:
This will:
- Generate a folder named with the current date for the latest AI News issue.
- Create sources.txt inside that folder, containing all non-social URLs.
- Produce individual .md files for each tweet quoted in the issue.
- Improve URL-filtering logic to separate twitter.com, x.com, and discord.com links.
- Build discord_scraper.py to fetch and save referenced Discord messages as markdown.
- Parameterize the output folder path and issue source URL for greater flexibility.
Contributions welcome! Fork, branch, and submit a pull request.