-
Notifications
You must be signed in to change notification settings - Fork 139
fix: encode dataset name with urllib, removing default safe='/'
config
#1195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile
@@ -136,7 +138,7 @@ def get( | ||
) | ||
""" | ||
_response = self._client_wrapper.httpx_client.request( | ||
f"api/public/v2/datasets/{jsonable_encoder(dataset_name)}", | ||
f"api/public/v2/datasets/{urllib.parse.quote(dataset_name, safe='')}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Similar URL encoding fix needed for other methods that use dataset_name in URLs (get_run, delete_run, get_runs)
f"api/public/v2/datasets/{urllib.parse.quote(dataset_name, safe='')}", | |
f"api/public/datasets/{urllib.parse.quote(dataset_name, safe='')}/runs/{urllib.parse.quote(run_name, safe='')}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I save this work for a separate PR, or is it required to get this merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile
@@ -1654,7 +1654,7 @@ def get_dataset( | ||
|
||
while True: | ||
new_items = self.api.dataset_items.list( | ||
dataset_name=self._url_encode(name), | ||
dataset_name=name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Dataset name should be URL encoded here using self._url_encode(dataset_name)
for consistency with other API calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the fix. When you do a get_dataset
, it will call this. This call will then fail if you encode it because httpx already encodes it. Then on the server side, the database search is performed with the url encoded name, which fails with a 404. This was the main cause of the issue.
Feels strange conversing with a bot, but I figure someone will read this and make a determination/respond? |
This fixes an issue where the
urllib.parse.quote
fn is used to encode a path component that has a slash in it. Ex: if you name a datasettest/me
, it needs to encode the slash.You can see here that
urllib.parse.quote
does not encode the/
by default, which is what is used in the_url_encode
fn.Related: langfuse/langfuse#7026
Part of the fix for #5962
Important
Fix dataset name encoding to handle special characters using
urllib.parse.quote
withsafe=''
and add tests for special character handling.get_dataset
inclient.py
andget
indatasets/client.py
usingurllib.parse.quote
withsafe=''
.test_dataset_special_characters
intest_datasets.py
to verify handling of special and unicode characters in dataset names.This description was created by
for 73421d5. You can customize this summary. It will automatically update as commits are pushed.
Greptile Summary
Disclaimer: Experimental PR review
This PR fixes URL encoding issues for dataset names in the langfuse-python client, particularly addressing the handling of forward slashes and special characters in URLs.
_url_encode()
inlangfuse/_client/client.py
to properly encode forward slashes by removing defaultsafe='/'
parametertest_dataset_special_characters()
intests/test_datasets.py
to validate handling of special and Unicode charactersget
method indatasets/client.py
to useurllib.parse.quote(dataset_name, safe='')
for consistent encodingjsonable_encoder
for dataset names which may cause inconsistent encoding behavior💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!