Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat: add signing of public URL #407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 7, 2025
Merged

feat: add signing of public URL #407

merged 9 commits into from
Mar 7, 2025

Conversation

danpoletaev
Copy link
Contributor

This PR is part of Issue #19363, which updates KeyValueStore.getPublicUrl(recordKey) to generate signed links using HMAC.

This PR:

  • creates signature and appends it to the public URL of the record in KV store

P.S. Before merging, we need to wait for the release of Crawlee, so we can update version of Crawlee.

P.P.S It's hard to test the changes, since master branch of SDK and master branch of Crawlee are out of sync. Crawlee has some breaking changes in version 6.0, which are not yet addressed in master branch of SDK.

Previous PR that adds storageObject to Crawlee Python is here

Same PR in SDK JS is here

@danpoletaev danpoletaev self-assigned this Feb 19, 2025
@github-actions github-actions bot added this to the 109th sprint - Platform team milestone Feb 19, 2025
@github-actions github-actions bot added t-core-services Issues with this label are in the ownership of the platform team. tested Temporary label used only programatically for some analytics. labels Feb 19, 2025
@@ -90,5 +92,10 @@ async def get_public_url(self, key: str) -> str:
key: The key for which the URL should be generated.
"""
public_api_url = self._api_public_base_url
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) # type: ignore[attr-defined]
Copy link
Contributor Author

@danpoletaev danpoletaev Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that url_signing_secret_key isn't being properly transformed from camelCase? In the API response, it shows up as urlSigningSecretKey

@vdusek
Copy link
Contributor

vdusek commented Feb 24, 2025

@danpoletaev sorry, we've been asked to review, but the CI is broken, could you please fix it first?

@vdusek vdusek requested review from Pijukatel and removed request for vdusek February 25, 2025 15:20
@danpoletaev
Copy link
Contributor Author

danpoletaev commented Feb 25, 2025

@danpoletaev sorry, we've been asked to review, but the CI is broken, could you please fix it first?

I'm afraid that I won't be able to fix CI before the release of Crawlee 0.6 version and updating SDK to deal with breaking changes, but I'll not merge it before that

@@ -153,3 +155,37 @@ def decrypt_input_secrets(private_key: rsa.RSAPrivateKey, input_data: Any) -> An
)

return input_data


CHARSET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably use predefined standard string constants: string.digits+string.ascii_letters
https://docs.python.org/3/library/string.html#string-constants

CHARSET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'


def encode_base62(num: int) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method alone is perfect for writing unit test for. Could you please add one parametrized test with couple of input values.

https://docs.pytest.org/en/stable/how-to/parametrize.html#pytest-mark-parametrize

src/apify/_crypto.py Outdated Show resolved Hide resolved
src/apify/_crypto.py Show resolved Hide resolved
@@ -90,5 +92,10 @@ async def get_public_url(self, key: str) -> str:
key: The key for which the URL should be generated.
"""
public_api_url = self._api_public_base_url
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) # type: ignore[attr-defined]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the ignore. I guess it is complaining about self.storage_object in such case we should not ignore it as it could raise an exception in cases where self does not have storage_object

I guess you can remove it and wait for your other change in crawlee to be merged and that should silence the warning.

@@ -105,3 +112,20 @@ def test_crypto_random_object_id_length_and_charset() -> None:
long_random_object_id = crypto_random_object_id(1000)
for char in long_random_object_id:
assert char in 'abcdefghijklmnopqrstuvwxyzABCEDFGHIJKLMNOPQRSTUVWXYZ0123456789'


# Check if the method is compatible with js version of the same method in:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is the comment telling the reader. Could you rephrase it please. Currently it looks like a remainder to the author of the change to do something

# This test uses the same secret key and message as in JS tests.
secret_key = 'hmac-same-secret-key'
message = 'hmac-same-message-to-be-authenticated'
for _ in range(5):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why stress testing it? Is there some non-determinism involved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no non-determinism involved, but I'd rather be sure, that for same input we get same output.
Refactored test to avoid "stress testing"

@B4nan
Copy link
Member

B4nan commented Mar 4, 2025

@vdusek please hold the SDK release till this one is merged, it was blocked by crawlee release (which we will first need to do once apify/crawlee-python#993 is merged)

Comment on lines 95 to 99
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None)
if url_signing_secret_key:
public_url += f'?signature={create_hmac_signature(url_signing_secret_key, key)}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use yarl.URL to build the URL? It is already a dependency of Crawlee.

@@ -90,5 +92,10 @@ async def get_public_url(self, key: str) -> str:
key: The key for which the URL should be generated.
"""
public_api_url = self._api_public_base_url
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}'

url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, there is no guarantee if there is an attribute called url_signing_secret_key on the storage object that it will actually be what you expect. Could you use an isinstance check instead to make sure that the storage_object is of the correct type?


assert record_url == f'{public_api_url}/v2/key-value-stores/{default_store_id}/records/dummy'
record_key = 'dummy'
record_url = await cast(KeyValueStoreClient, store._resource_client).get_public_url(record_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I don't think we need to access KeyValueStoreClient._resource_client anymore - just KeyValueStore.get_public_url should be enough.

@danpoletaev danpoletaev requested a review from janbuchar March 7, 2025 14:37
Copy link
Contributor

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/apify/_crypto.py Outdated Show resolved Hide resolved
@vdusek vdusek changed the title feat: sign public url feat: add signing of public URL Mar 7, 2025
@vdusek vdusek merged commit a865461 into master Mar 7, 2025
28 checks passed
@vdusek vdusek deleted the feat/sign-public-url branch March 7, 2025 15:58
@fnesveda fnesveda added the validated Issues that are resolved and their solutions fulfill the acceptance criteria. label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-core-services Issues with this label are in the ownership of the platform team. tested Temporary label used only programatically for some analytics. validated Issues that are resolved and their solutions fulfill the acceptance criteria.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.