Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

fix: encode dataset name with urllib, removing default safe='/' config #1195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
Loading
from

Conversation

mattverantos
Copy link

@mattverantos mattverantos commented May 26, 2025

This fixes an issue where the urllib.parse.quote fn is used to encode a path component that has a slash in it. Ex: if you name a dataset test/me, it needs to encode the slash.

>>> import urllib.parse
>>> urllib.parse.quote('test/me')
'test/me'
>>> urllib.parse.quote('test/me', safe='')
'test%2Fme'

You can see here that urllib.parse.quote does not encode the / by default, which is what is used in the _url_encode fn.

Related: langfuse/langfuse#7026
Part of the fix for #5962


Important

Fix dataset name encoding to handle special characters using urllib.parse.quote with safe='' and add tests for special character handling.

  • Behavior:
    • Fix encoding of dataset names with special characters in get_dataset in client.py and get in datasets/client.py using urllib.parse.quote with safe=''.
  • Tests:
    • Add test_dataset_special_characters in test_datasets.py to verify handling of special and unicode characters in dataset names.

This description was created by Ellipsis for 73421d5. You can customize this summary. It will automatically update as commits are pushed.

Greptile Summary

Disclaimer: Experimental PR review

This PR fixes URL encoding issues for dataset names in the langfuse-python client, particularly addressing the handling of forward slashes and special characters in URLs.

  • Modified _url_encode() in langfuse/_client/client.py to properly encode forward slashes by removing default safe='/' parameter
  • Added test case test_dataset_special_characters() in tests/test_datasets.py to validate handling of special and Unicode characters
  • Updated get method in datasets/client.py to use urllib.parse.quote(dataset_name, safe='') for consistent encoding
  • Potential issue: Some methods still use jsonable_encoder for dataset names which may cause inconsistent encoding behavior

💡 (1/5) You can manually trigger the bot by mentioning @greptileai in a comment!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

@@ -136,7 +138,7 @@ def get(
)
"""
_response = self._client_wrapper.httpx_client.request(
f"api/public/v2/datasets/{jsonable_encoder(dataset_name)}",
f"api/public/v2/datasets/{urllib.parse.quote(dataset_name, safe='')}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Similar URL encoding fix needed for other methods that use dataset_name in URLs (get_run, delete_run, get_runs)

Suggested change
f"api/public/v2/datasets/{urllib.parse.quote(dataset_name, safe='')}",
f"api/public/datasets/{urllib.parse.quote(dataset_name, safe='')}/runs/{urllib.parse.quote(run_name, safe='')}",

Copy link
Author

@mattverantos mattverantos May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I save this work for a separate PR, or is it required to get this merged?

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 file(s) reviewed, 1 comment(s)
Edit PR Review Bot Settings | Greptile

@@ -1654,7 +1654,7 @@ def get_dataset(

while True:
new_items = self.api.dataset_items.list(
dataset_name=self._url_encode(name),
dataset_name=name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Dataset name should be URL encoded here using self._url_encode(dataset_name) for consistency with other API calls

Copy link
Author

@mattverantos mattverantos May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the fix. When you do a get_dataset, it will call this. This call will then fail if you encode it because httpx already encodes it. Then on the server side, the database search is performed with the url encoded name, which fails with a 404. This was the main cause of the issue.

@mattverantos
Copy link
Author

Feels strange conversing with a bot, but I figure someone will read this and make a determination/respond?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
Morty Proxy This is a proxified and sanitized view of the page, visit original site.