Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Check out the blog post.

Want to use this project?

Fork/Clone
Create and activate a virtual environment
Install the requirements

Run the scrapers:

# sync
(env)$ python script.py headless

# parallel with multiprocessing
(env)$ python script_parallel_1.py headless

# parallel with concurrent.futures
(env)$ python script_parallel_2.py headless

# concurrent with concurrent.futures (should be the fastest!)
(env)$ python script_concurrent.py headless

# parallel with concurrent.futures and concurrent with asyncio
(env)$ python script_asyncio.py headless

Run the tests:

(env)$ python -m pytest test/test_scraper.py
(env)$ python -m pytest test/test_scraper_mock.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Want to use this project?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name	Name	Last commit message	Last commit date
Latest commit History 9 Commits
scrapers	scrapers
test	test
.gitignore	.gitignore
README.md	README.md
requirements.txt	requirements.txt
script.py	script.py
script_asyncio.py	script_asyncio.py
script_concurrent.py	script_concurrent.py
script_parallel_1.py	script_parallel_1.py
script_parallel_2.py	script_parallel_2.py

Search code, repositories, users, issues, pull requests...

testdrivenio/concurrent-web-scraping

Folders and files

Latest commit

History

Repository files navigation

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Want to use this project?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages