Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

WriteApi.write does not support pandas' nullable integer  #590

Copy link
Copy link
Closed
@yannsartori

Description

@yannsartori
Issue body actions

Specifications

  • Client Version: 1.36.1
  • InfluxDB Version: 2.7.0
  • Platform: Mac

If you have a dataframe with Pandas' nullable integer as one of the column datatypes, and a row includes a pd.NA value, you get the following traceback:

Traceback (most recent call last):
    write_api.write(
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
    return self._write_batching(bucket, org, record,
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
    serializer.serialize(chunk_idx),
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
    return list(lp)
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
    lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
    for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
    return any(map(lambda x: _not_nan(p[x]), indexes))
  File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

However, if your change your column datatype to a float (which has a native NaN encoding), it works

Code sample to reproduce problem

import pandas as pd

df = pd.DataFrame({"x": [1, pd.NA], "time": [0, 1]}).astype({"x": "Int64"})
with get_client() as client:
    with client.write_api() as write_api:
        write_api.write(BUCKET, record=df, data_frame_measurement_name="test", data_frame_timestamp_column="time")

Expected behavior

I would anticipate that this behaves the same as if it were a float. My current work around is to use floats.

If the code is too complicated to fix/would incur significant slowdown for other users, I think at minimum, raising a cleaner exception would be reasonable.

Actual behavior

I get an exception:

Traceback (most recent call last):
    write_api.write(
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 366, in write
    return self._write_batching(bucket, org, record,
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write_api.py", line 469, in _write_batching
    serializer.serialize(chunk_idx),
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 270, in serialize
    return list(lp)
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 268, in <genexpr>
    lp = (re.sub('^(( |[^ ])* ),([a-zA-Z0-9])(.*)', '\\1\\3\\4', self.f(p))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 269, in <lambda>
    for p in filter(lambda x: _any_not_nan(x, self.field_indexes), _itertuples(chunk)))
  File "venv/lib/python3.9/site-packages/influxdb_client/client/write/dataframe_serializer.py", line 27, in _any_not_nan
    return any(map(lambda x: _not_nan(p[x]), indexes))
  File "pandas/_libs/missing.pyx", line 388, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous

Additional info

My knee-jerk reaction is I saw is in client/write/dataframe_serializer.py, there is a function:

def _not_nan(x):
    return x == x

which I think can just be

def _not_nan(x):
    from ...extras import pd
    return pd.isna(x)    

However, I saw this block of code:

                if null_columns[index]:
                    key_value = f"""{{
                            '' if {val_format} == '' or type({val_format}) == float and math.isnan({val_format}) else
                            f',{key_format}={{str({val_format}).translate(_ESCAPE_STRING)}}'
                        }}"""

which looks pretty crazy, and I am not sure how the data would look at that point?

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.