Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
>>> import sqlite3, pandas as pd
>>> db = sqlite3.connect(':memory:')
>>> df = pd.DataFrame([{'a': 1, 'b': 2}, {'a': 3, 'b': 4}])
>>> df.to_sql('demo', db)
2
>>> db.execute('SELECT * FROM demo').fetchall()
[(0, 1, 2), (1, 3, 4)]
>>> pd.read_sql('demo', db)
Traceback (most recent call last):
File "/home/kale/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/io/sql.py", line 2202, in execute
cur.execute(sql, *args)
sqlite3.OperationalError: near "demo": syntax error
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kale/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/io/sql.py", line 635, in read_sql
return pandas_sql.read_query(
File "/home/kale/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/io/sql.py", line 2266, in read_query
cursor = self.execute(sql, params)
File "/home/kale/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/io/sql.py", line 2214, in execute
raise ex from exc
pandas.errors.DatabaseError: Execution failed on sql 'demo': near "demo": syntax error
>>> pd.read_sql_table('demo', db)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/kale/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/io/sql.py", line 340, in read_sql_table
table = pandas_sql.read_table(
File "/home/kale/.pyenv/versions/3.10.0/lib/python3.10/site-packages/pandas/io/sql.py", line 1368, in read_table
raise NotImplementedError
NotImplementedError
>>> pd.read_sql('SELECT * from demo', db)
index a b
0 0 1 2
1 1 3 4
Issue Description
According to the docs, the first argument to pd.read_sql()
can either be an SQL query or the name of a table. However, if the SQLite database backend is used, an exception is raised if a valid table name is provided.
The cause of this behavior is in this part of io.read_sql()
. If the SQLite backend is selected, the sql
argument is unconditionally passed to read_query()
, where it is interpreted as a query rather than a table name. If the SQLAlchemy backend is selected instead, the documented logic is used. It's not obvious to me why SQLite is special-cased like this.
I tried to work around this bug by directly calling pd.read_sql_table()
, but it turns out that this function is simply not implemented for the SQLite backend. The relevant class is io.SQLiteDatabase
. In fact, every method of PandasSQL
(the ABC for SQLiteDatabase
) is abstract except read_table()
, so it seems like an exception was made specifically to avoid having to implement this method for the SQLite backend.
Finally, I tried to work around this by providing a SELECT
statement instead of a table name, and that worked. So this behavior isn't a show-stopper, but it's definitely an unpleasant surprise. I'm not even sure that this is really a bug, since the code really seems to deliberately go out of its way to avoid reading SQLite tables, but I don't understand why.
Expected Behavior
>>> pd.read_sql('demo', db)
index a b
0 0 1 2
1 1 3 4
>>> pd.read_sql_table('demo', db)
index a b
0 0 1 2
1 1 3 4
Installed Versions
INSTALLED VERSIONS
commit : 0f43794
python : 3.10.0.final.0
python-bits : 64
OS : Linux
OS-release : 6.4.3-arch1-2
Version : #1 SMP PREEMPT_DYNAMIC Sat, 15 Jul 2023 19:25:49 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.3
numpy : 1.23.5
pytz : 2021.3
dateutil : 2.8.2
setuptools : 65.6.3
pip : 23.1.2
Cython : 0.29.30
pytest : 7.2.0
hypothesis : 6.39.4
sphinx : 6.2.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 8.0.1
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 10.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2022.11.0
scipy : 1.11.1
snappy : None
sqlalchemy : 1.4.31
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : 2.0.1
pyqt5 : None