Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Latest commit

 

History

History
History
38 lines (31 loc) · 1.32 KB

File metadata and controls

38 lines (31 loc) · 1.32 KB
Copy raw file
Download raw file
Open symbols panel
Edit and raw actions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
'''
Michael Galarnyk
1. Rename the socket1.py program from our textbook to URL_reader.py.
2. Modify the URL_reader.py program to use urllib instead of a socket.
3. Add code that prompts the user for the URL so it can read any web page.
4. Add error checking using try and except to handle the condition where the user enters an
improperly formatted or non-existent URL.
5. Count the number of characters received and stop displaying any text after it has shown 3000
characters.
6. Continue to retrieve the entire document, count the total number of characters, and display the
total number of characters.
'''
# Wasnt sure what exactly the assignment wanted, but this seems to work.
import urllib
import re
# https://www.gutenberg.org/files/1342/1342-h/1342-h.htm#link2HCH0034
try:
url = raw_input('Enter - ');
fhand = urllib.urlopen(url);
count = 0; # only characters upper and lowercase a-z.
adjusted_3000 = [];
while count < 3000:
new_char = fhand.read(1);
adjusted_3000 = adjusted_3000 + re.findall('[A-Za-z]', new_char)
count += 1;
print 'first 3000 thousand characters are: ', adjusted_3000;
new_chars = fhand.read()
entire_doc = adjusted_3000 + re.findall('[A-Za-z]', new_chars)
print 'The length of the entire document is: ', len(entire_doc)
except:
print "Bad link"
Morty Proxy This is a proxified and sanitized view of the page, visit original site.