SpideyWeb

Simple WebCrawler

Usage

Crawl any Website and return specific results you are looking for.

Example

 page = 25
    while page <= max_pages:
    # Add any URL here and make sure to increment depending on the "Next page";
    # I used the website UsedOttawa and looking for a car = Honda;
        url = 'http://www.usedottawa.com/classifieds/all/' + str(page) + '?description=honda' 
        source_code = requests.get(url)
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text)
        # Make sure to go to Website and find specific tags with the classes you want to grab.
        for link in soup.findAll('a',{'itemprop':'name'}):
            href = "http://www.usedottawa.com/" + link.get('href')
            title = link.string
            print(href)
            print(title)
            get_single_item_data(href)
            page+=25

To Install

apt-get install python-bs4
pip install requests

##Notes Make sure to look at your're URL you are crawling and look at how it changes when visiting the next page and the next page... etc.

Name	Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md	README.md
spideyweb.py	spideyweb.py
web.png	web.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpideyWeb

Usage

Example

To Install

About

Uh oh!

Releases

Packages

Languages

Search code, repositories, users, issues, pull requests...

alimogh/spideyweb

Folders and files

Latest commit

History

Repository files navigation

SpideyWeb

Usage

Example

To Install

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages