Ask HN: Is Common Crawl used exhaustively by any search engine?

3 days ago 4
Ask HN: Is Common Crawl used exhaustively by any search engine?
1 point by n1xis10t 9 minutes ago | hide | past | favorite | discuss

The Common Crawl has about 300 billion pages in it, and if you downloaded all of it in extracted text format it would only take up about 816 TB compressed. If someone were to make a search engine with this I think it would be more comprehensive than Bing, and possibly pretty similar to Google. The only CC based search engines that I know of use a tiny fraction of what they have available. Do you know of any that use the whole thing?


Read Entire Article