wiki_philosophy

Wiki Crawler
Starting from a random Wikipedia article (example: http://en.wikipedia.org/wiki/Art) and clicking
on the first non-italicized link not surrounded by parentheses in the main text and then repeating
the process for subsequent articles usually leads to http://en.wikipedia.org/wiki/Philosophy.
Please write a program that models this behavior and answers the following questions, while
making as few http requests as possible.

# Questions:
## What percentage of pages lead to philosophy?
## Using the random article link (found on any wikipedia article in the left sidebar),
   what is the distribution of path lengths for 500 pages, discarding those paths that never reach the Philosophy page?

Dependencies

python2
BeautifulSoup

Running Program:

Please from terminal run python wiki-crawler.py

the result would be something like:

percentage of page lead to philosophy: 100.0%
random percentage of page lead to philosophy: 80.0%
Counter({15: 3, 10: 1, 13: 1})

Name	Name	Last commit message	Last commit date
Latest commit History 12 Commits 12 Commits
bin	bin
flask	flask
helper	helper
include	include
top-100-liked-questions	top-100-liked-questions
tree	tree
.gitignore	.gitignore
3sum.py	3sum.py
README.md	README.md
age_in_numbers.py	age_in_numbers.py
algorithms-implementation.py	algorithms-implementation.py
balanced-symbol.py	balanced-symbol.py
binary-search.py	binary-search.py
bloomberg.py	bloomberg.py
card.py	card.py
concurrency.py	concurrency.py
crawl.py	crawl.py
data-structure.py	data-structure.py
first-duplicate.py	first-duplicate.py
first-not-repeating-character.py	first-not-repeating-character.py
grading_students.py	grading_students.py
is_zero.py	is_zero.py
list_generator.py	list_generator.py
misc.py	misc.py
node.py	node.py
number_in_words.py	number_in_words.py
output.png	output.png
palindrome.py	palindrome.py
pip-selfcheck.json	pip-selfcheck.json
requirements.txt	requirements.txt
rock_paper_scissors.py	rock_paper_scissors.py
rotate-image.py	rotate-image.py
seam-carving.png	seam-carving.png
seam-carving.py	seam-carving.py
sort_key.py	sort_key.py
spiral-matrix.py	spiral-matrix.py
sudoku.py	sudoku.py
test-m.py	test-m.py
test.py	test.py
wiki-crawler.py	wiki-crawler.py
year_you_trun_100.py	year_you_trun_100.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wiki_philosophy

Dependencies

Running Program:

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Search code, repositories, users, issues, pull requests...

Folders and files

Latest commit

History

Repository files navigation

wiki_philosophy

Dependencies

Running Program:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages