Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upENH: Change Pandas User-Agent and add possibility to set custom http_headers to pd.read_* functions #36688
Comments
|
we have had this request before pls search for these issues |
|
Related to #10526 |
|
@martindurant can we pass these thru using |
|
HTTP is the only of the "protocol://" URLs which is not handled by fsspec, because it already had its own code (whereas s3fs and gcs were already using fsspec second-hand). For HTTPFileSystem, you can include
|
|
ok u think a PR to add an example in read_csv / io.rst would be sufficient then @AstroMatt if u are interested |


Currently Pandas makes HTTP requests using "Python-urllib/3.8" as a User Agent.
This prevents from downloading some resources and static files from various places.
What if, Pandas would make requests using "Pandas/1.1.0" headers instead?
There should be possibility to add custom headers too (
auth,csrf tokens,api versionsand so on).Use Case:
I am writing a book on Pandas:
I published data in CSV and JSON to use in code listings:
You can access those resources via browser,
curl, or evenrequests, but not using Pandas.The only change you'd need to do is to set User-Agent.
This is due to the
readthedocs.ioblocking "Python-urllib/3.8" User Agent for whatever reason.The same problem affects many other places where you can get data (not only
readthedocs.io).Currently I get those resources with
requestsand then putresponse.textto one of:pd.read_csvpd.read_jsonpd.read_htmlUnfortunately this makes even simplest code listings... quite complex (due to the explanation of
requestslibrary and why I do this like that).Pandas uses
urllib.request.urlopenwhich does not allow to sethttp_headershttps://github.com/pandas-dev/pandas/blob/master/pandas/io/common.py#L146
Although
urllib.request.urlopencan takeurllib.request.Requestas an argument.And
urllib.request.Requestobject has possibility to set customhttp_headershttps://docs.python.org/3/library/urllib.request.html#urllib.request.Request
Possibility to add custom
http_headersshould be inpd.read_csv,pd.read_jsonandpd.read_htmlfunctions.From what I see, the
read_*call stack is three to four function deep.There are only 6 references in 4 files to
urlopen(*args, **kwargs)function.So the change shouldn't be quite hard to implement.
http_headersparameter can beOptional[List]which will be fully backward compatible and would not require any changes to others code.