Show HN: AI News Source Extractor – Easily Ingest AI News into Notebook LM

2 days ago 1

AI News Link Scraper extracts all URLs from the most recent AI News issue (from news.smol.ai) and prepares them for seamless import into Google's NotebookLM. It organizes sources into a dedicated folder, separates non-social URLs into a sources.txt, and generates individual markdown files for quoted tweet content.

Folder Generation: Creates a timestamped folder for each issue’s sources.
sources.txt: Lists all URLs from the issue, excluding twitter.com, x.com, and discord.com.
Tweet Markdown: Saves the full text of each quoted tweet as a separate markdown file.
WebSync Ready: sources.txt can be pasted directly into the WebSync for NotebookLM Chrome extension to auto-import into NotebookLM.

git clone https://github.com/ThomsenDrake/ainews-source-extractor.git cd ainews-source-extractor python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt

Simply run the main scraper:

This will:

Generate a folder named with the current date for the latest AI News issue.
Create sources.txt inside that folder, containing all non-social URLs.
Produce individual .md files for each tweet quoted in the issue.

Improve URL-filtering logic to separate twitter.com, x.com, and discord.com links.
Build discord_scraper.py to fetch and save referenced Discord messages as markdown.
Parameterize the output folder path and issue source URL for greater flexibility.

Contributions welcome! Fork, branch, and submit a pull request.

Read Entire Article