#

web-crawling

Here are 448 public repositories matching this topic...

crawlee

apify / crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Jul 21, 2026
TypeScript

apify / crawlee-python

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Parsel, BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling selenium pip web-scraping beautifulsoup web-crawling headless-chrome apify parsel playwright

Updated Jul 21, 2026
Python

ferret

MontFerret / ferret

Declarative data automation language and Go runtime for structured extraction workflows.

go html golang library runtime dsl web-scraping data-extraction golang-library query-language web-crawling browser-automation chrome-devtools-protocol data-automation

Updated Jul 20, 2026
Go

botasaurus

omkarcloud / botasaurus

The All in One Framework to Build Undefeatable Scrapers

Updated Jun 29, 2026
Python

brightdata / brightdata-mcp

A powerful Model Context Protocol (MCP) server that provides an all-in-one solution for public web access.

mcp scraping web-scraping data-extraction data-collection structured-data web-crawling browser-automation ai-agents web-data scraping-tools anti-bot-detection llm ai-integrations mcp-server modelcontextprotocol

Updated Jun 21, 2026
JavaScript

cxcscmu / Craw4LLM

Official repository for "Craw4LLM: Efficient Web Crawling for LLM Pretraining"

crawler web-crawler crawling web-crawling pre-training pretraining large-language-models llm

Updated Feb 24, 2025
Python

scrapehero-code / amazon-scraper

A simple web scraper to extract Product Data and Pricing from Amazon

web-scraping web-crawling page-scraper web-scraping-tutorials amazon-scraper scrape-products

Updated Jun 13, 2023
Python

fortress

tiliondev / fortress

Stealth Chromium engine that stops scrapers and browser agents from getting blocked, with one line of code change.

Updated Jul 17, 2026
Python

crawler

crwlrsoft / crawler

Library for Rapid (Web) Crawler and Scraper Development

php crawler scraper web-crawler scraping crawling web-scraper web-scraping scraping-websites web-crawling hacktoberfest

Updated May 3, 2026
PHP

spyboy-productions / omnisci3nt

Omnisci3nt is an open-source web reconnaissance and intelligence tool for extracting deep technical insights from domains, including subdomains, SSL certificates, exposed services, archived content, and configuration data. — Omnisci3nt gives you the full picture in seconds.

Updated Jun 15, 2026
Python

godkingjay / selenium-twitter-scraper

This is a Twitter Scraper which uses Selenium for scraping tweets. It is capable of scraping tweets from home, user profile, hashtag, query or search, and advanced searches.

scraper twitter selenium collaborate web-crawling hacktoberfest twitter-scraper selenium-scraper hacktoberfest-accepted

Updated Apr 12, 2025
Jupyter Notebook

jrbadiabo / Bet-on-Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

python machine-learning algorithms scikit-learn machine-learning-algorithms selenium web-scraping beautifulsoup machinelearning predictive-analysis python-2 web-crawling sports-stats sportsanalytics

Updated Feb 12, 2017
Jupyter Notebook

InfinityCrawler

TurnerSoftware / InfinityCrawler

A simple but powerful web crawler library for .NET

crawler spider web-crawler robots-txt web-crawling

Updated Dec 15, 2023
C#

ayakashi

ayakashi-io / ayakashi

⚡ Ayakashi.io - The next generation web scraping framework

data-mining automation web-scraping web-crawling headless-chrome

Updated Jun 29, 2023
TypeScript

clauneck

serpapi / clauneck

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

ruby open-source rubygem automation command-line email email-marketing data-extraction serp command-line-tool webscraping web-crawling data-extractor email-extractor email-scraper social-media-scraper email-extraction email-extract-with-proxy

Updated Mar 19, 2024
Ruby

scrapinghub / scrapy-training

Scrapy Training companion code

python training web-scraping scrapy web-crawling

Updated Jan 30, 2019
Python

apify-sdk-python

apify / apify-sdk-python

Apify SDK for Python—The official library for building Apify Actors: serverless cloud programs for web scraping, browser automation, data processing, and AI agents. Manages the Actor lifecycle, storages (datasets, key-value stores, request queues), events, proxies, and pay-per-event monetization. Built on top of the the Apify API Client.

python automation sdk proxy scraping web-scraping actor data-extraction web-crawling apify crawlee

Updated Jul 21, 2026
Python

umbrellaDocumentation / Web-Data-Scraper

Web Data Scraper - no-code internet scraping. Extract and export to CSV, Excel, JSON, Google Sheets, and Webhook.

Updated Mar 17, 2026
JavaScript

MaxValue / Terpene-Profile-Parser-for-Cannabis-Strains

Parser and database to index the terpene profile of different strains of Cannabis from online databases

Updated Apr 28, 2023
Python

brianmadden / krawler

A web crawling framework written in Kotlin

kotlin link-checker framework web-crawler webcrawler web-crawling crawler4j

Updated Jun 29, 2021
Kotlin

Improve this page

Add a description, image, and links to the web-crawling topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-crawling topic, visit your repo's landing page and select "manage topics."