pyspider

A Powerful Spider System in Python. Try It Now!

Write script in python with powerful API
Powerful WebUI with script editor, task monitor, project manager and result viewer
MySQL, MongoDB, SQLite as database backend
Javascript pages supported!
Task priority, retry, periodical and recrawl by age or marks in index page (like update time)
Distributed architecture

Sample Code:

from libs.base_handler import *

class Handler(BaseHandler):
    '''
    this is a sample handler
    '''
    @every(minutes=24*60, seconds=0)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10*24*60*60)
    def index_page(self, response):
        for each in response.doc('a[href^="http://"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
                "url": response.url,
                "title": response.doc('title').text(),
                }

Installation

python2.6/2.7
pip install --allow-all-external -r requirements.txt
./run.py , visit http://localhost:5000/

if ubuntu: apt-get install python python-dev python-distribute python-pip libcurl4-openssl-dev libxml2-dev libxslt1-dev python-lxml

or Running with Docker

Documents

Contribute

Use It, Open Issue, PR is welcome.
Discuss, Document

License

Licensed under the Apache License, Version 2.0

Name	Name	Last commit message	Last commit date
Latest commit History 192 Commits
data	data
pyspider	pyspider
tests	tests
.coveragerc	.coveragerc
.gitignore	.gitignore
.travis.yml	.travis.yml
Dockerfile	Dockerfile
LICENSE	LICENSE
README.md	README.md
logging.conf	logging.conf
requirements.txt	requirements.txt
run.py	run.py
runtest.py	runtest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pyspider

Installation

Documents

Contribute

License

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

License

UXScripts/pyspider

Folders and files

Latest commit

History

Repository files navigation

pyspider

Installation

Documents

Contribute

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages