Scrapy, a fast high-level web crawling & scraping framework for Python.
Pythonic HTML Parsing for Humans™
HTML
Updated Nov 2, 2018
A scalable web crawler framework for Java.
Java
Updated Sep 30, 2018
Elegant Scraper and Crawler Framework for Golang
Distributed crawler powered by Headless Chrome
JavaScript
Updated Nov 5, 2018
Declarative web scraping
Getting started with Puppeteer and Chrome Headless for Web Scraping
JavaScript
Updated Oct 18, 2018
A Python module to scrape several search engines (like Google, Yandex, Bing, Duckduckgo, Baidu and others) by using p…
HTML
Updated Oct 30, 2018
Get info from any web service or page
PHP
Updated Oct 22, 2018
A browser testing and web crawling library for PHP and Symfony
artoo.js - the client-side scraping companion.
JavaScript
Updated Jul 17, 2018
Scrape the Instagram frontend. Inspired from twitter-scraper by
@kennethreitz.
Python
Updated Jun 29, 2018
Creating Scrapy scrapers via the Django admin interface
This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
[Unmaintained] A simple and clean video/music/image downloader 👾
Free Web Scraping Tool with Java
JavaScript
Updated Jun 12, 2018
Analyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ran…
✂️ High performance, multi-threaded image scraper
A framework for creating semi-automatic web content extractors
Python
Updated May 1, 2018
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors
Python
Updated Nov 2, 2018
Web scraping library made by the Phantombuster team. Modern, simple & works on all websites.
JavaScript
Updated Aug 9, 2018
A curated list of awesome puppeteer resources.
Updated Oct 16, 2018
Jekyll-based static site for The Programming Historian
HTML
Updated Nov 6, 2018
Jsoup Annotations POJO
Java
Updated May 23, 2017
Use SQL on various data sources
C#
Updated Oct 25, 2018
一个灵活、友好的爬虫框架
Python
Updated Dec 13, 2017
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO:
http://index.elasticsearch.cn
Go
Updated Sep 1, 2018
Universal scrapping tool, which allows you to extract data using multiple environments
JavaScript
Updated Oct 29, 2018
Functional HTML scraping and rewriting with CSS in OCaml.
OCaml
Updated Oct 9, 2018
Crawl all unique internal links found on a given website
PHP
Updated Oct 20, 2018