Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 6613b13

Browse filesBrowse files
committed
updated email crawler
1 parent 780cad2 commit 6613b13
Copy full SHA for 6613b13

File tree

Expand file treeCollapse file tree

1 file changed

+9
-14
lines changed
Open diff view settings
Filter options
Expand file treeCollapse file tree

1 file changed

+9
-14
lines changed
Open diff view settings
Collapse file

‎08_basic_email_web_crawler.py‎

Copy file name to clipboard
+9-14Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,21 @@
11
import requests
22
import re
33

4-
#get url
5-
#url=input('Enter a URL (include 'http://'):')--this is wrong
4+
# get url
65
url = input('Enter a URL (include `http://`): ')
76

7+
# connect to the url
8+
website = requests.get(url)
89

9-
#connect to the url
10-
website=requests.get(url)
10+
# read html
11+
html = website.text
1112

12-
#read html
13-
html=website.text
14-
15-
16-
#use re.findall to grab all the links
13+
# use re.findall to grab all the links
1714
links = re.findall('"((http|ftp)s?://.*?)"', html)
15+
emails = re.findall('([\w\.,]+@[\w\.,]+\.\w+)', html)
1816

19-
emails=re.findall('([\w\.,]+@[\w\.,]+\.\w+)',html)
2017

21-
22-
#prints the number of links in the list
18+
# print the number of links in the list
2319
print("\nFound {} links".format(len(links)))
24-
2520
for email in emails:
26-
print(email)
21+
print(email)

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.