Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

chsin1/crawler

Open more actions menu
 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crawler

A Web crawler.

  • Start from the url and crawl the web pages with a specified depth.
  • Save the pages which contain a keyword(if provided) into database.
  • Support multi-threading.
  • Support logging.
  • Support self-testing.

usage

main.py [-h] -u URL -d DEPTH [--logfile FILE] [--loglevel {1,2,3,4,5}]
               [--thread NUM] [--dbfile FILE] [--key KEYWORD] [--testself]

optional arguments:

  -h, --help            show this help message and exit
  -u URL                Specify the begin url
  -d DEPTH              Specify the crawling depth
  --logfile FILE        The log file path, Default: spider.log
  --loglevel {1,2,3,4,5}
                        The level of logging details. Larger number record
                        more details. Default:3
  --thread NUM          The amount of threads. Default:10
  --dbfile FILE         The SQLite file path. Default:data.sql
  --key KEYWORD         The keyword for crawling. Default: None. For more then
                        one word, quote them. example: --key 'Hello world'
  --testself            Crawler self test

About

a web crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%
Morty Proxy This is a proxified and sanitized view of the page, visit original site.