Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.
the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json()
function, I have to import the json
package and manually write the file myself. this can be much slower for very large files
I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement
Feature Description
add a new parameter to pandas.DataFrame.to_json()
to escape_forward_slashes
def to_json(self, ..., escape_forward_slashes=True) -> str | None:
...
or even a ujson_options
dict
def to_json(self, ..., ujson_options={}) -> str | None:
...
Alternative Solutions
instead of
df.to_json(path)
you have to manually use the json
package
import json
with open(path, "w") as f:
json.dump(df.to_dict(orient="records"), f)
Additional Context
also note that the ujson
project explicitly states
this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.
so it might be worth migrating to orjson
during this development effort