Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 32dea94

Browse filesBrowse files
committed
Using requests instead of urllib2, final draft.
1 parent a22a6e9 commit 32dea94
Copy full SHA for 32dea94

File tree

Expand file treeCollapse file tree

1 file changed

+12
-10
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+12
-10
lines changed

‎docs/scenarios/scrape.rst

Copy file name to clipboardExpand all lines: docs/scenarios/scrape.rst
+12-10Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,27 +14,29 @@ This is where web scraping comes in. Web scraping is the practice of using
1414
computer program to sift through a web page and gather the data that you need
1515
in a format most useful to you.
1616

17-
lxml
18-
----
17+
lxml and Requests
18+
-----------------
1919

2020
`lxml <http://lxml.de/>`_ is a pretty extensive library written for parsing
21-
XML and HTML documents, which you can easily install using ``pip``. We will
22-
be using its ``html`` module to get example data from this web page: `econpy.org <http://econpy.pythonanywhere.com/ex/001.html>`_ .
21+
XML and HTML documents really fast. It even handles messed up tags. We will
22+
also be using the `Requests <http://docs.python-requests.org/en/latest/>`_ module instead of the already built-in urlib2
23+
due to improvements in speed and readability. You can easily install both
24+
using ``pip install lxml`` and ``pip install requests``.
2325

24-
First we shall import the required modules:
26+
Lets start with the imports:
2527

2628
.. code-block:: python
2729
2830
from lxml import html
29-
from urllib2 import urlopen
31+
import requests
3032
31-
We will use ``urllib2.urlopen`` to retrieve the web page with our data and
32-
parse it using the ``html`` module:
33+
Next we will use ``requests.get`` to retrieve the web page with our data
34+
and parse it using the ``html`` module and save the results in ``tree``:
3335

3436
.. code-block:: python
3537
36-
page = urlopen('http://econpy.pythonanywhere.com/ex/001.html')
37-
tree = html.fromstring(page.read())
38+
page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
39+
tree = html.fromstring(page.text)
3840
3941
``tree`` now contains the whole HTML file in a nice tree structure which
4042
we can go over two different ways: XPath and CSSSelect. In this example, I

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.