Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

UXScripts/pyspider

Open more actions menu
 
 

Repository files navigation

pyspider Build Status Coverage Status

A Powerful Spider System in Python. Try It Now!

  • Write script in python with powerful API
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL, MongoDB, SQLite as database backend
  • Javascript pages supported!
  • Task priority, retry, periodical and recrawl by age or marks in index page (like update time)
  • Distributed architecture

Sample Code:

from libs.base_handler import *

class Handler(BaseHandler):
    '''
    this is a sample handler
    '''
    @every(minutes=24*60, seconds=0)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10*24*60*60)
    def index_page(self, response):
        for each in response.doc('a[href^="http://"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
                "url": response.url,
                "title": response.doc('title').text(),
                }

demo

Installation

if ubuntu: apt-get install python python-dev python-distribute python-pip libcurl4-openssl-dev libxml2-dev libxslt1-dev python-lxml

or Running with Docker

Documents

Contribute

License

Licensed under the Apache License, Version 2.0

About

A Powerful Spider System with Web UI

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 81.5%
  • JavaScript 11.2%
  • CSS 7.1%
  • Shell 0.2%
Morty Proxy This is a proxified and sanitized view of the page, visit original site.