Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH: add option to save json without escaping forward slashes #61442

Copy link
Copy link
Open
@ellisbrown

Description

@ellisbrown
Issue body actions

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.

the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json() function, I have to import the json package and manually write the file myself. this can be much slower for very large files

I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement

Feature Description

add a new parameter to pandas.DataFrame.to_json() to escape_forward_slashes

def to_json(self, ..., escape_forward_slashes=True) -> str | None:
    ...

or even a ujson_options dict

def to_json(self, ..., ujson_options={}) -> str | None:
    ...

Alternative Solutions

instead of

df.to_json(path)

you have to manually use the json package

import json

with open(path, "w") as f:
    json.dump(df.to_dict(orient="records"), f)

Additional Context

also note that the ujson project explicitly states

this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.

so it might be worth migrating to orjson during this development effort

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team memberIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.