-
Notifications
You must be signed in to change notification settings - Fork 12
feat: add signing of public URL #407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -90,5 +92,10 @@ async def get_public_url(self, key: str) -> str: | ||
key: The key for which the URL should be generated. | ||
""" | ||
public_api_url = self._api_public_base_url | ||
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}' | ||
|
||
url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit worried that url_signing_secret_key
isn't being properly transformed from camelCase? In the API response, it shows up as urlSigningSecretKey
@danpoletaev sorry, we've been asked to review, but the CI is broken, could you please fix it first? |
I'm afraid that I won't be able to fix CI before the release of Crawlee 0.6 version and updating SDK to deal with breaking changes, but I'll not merge it before that |
src/apify/_crypto.py
Outdated
@@ -153,3 +155,37 @@ def decrypt_input_secrets(private_key: rsa.RSAPrivateKey, input_data: Any) -> An | ||
) | ||
|
||
return input_data | ||
|
||
|
||
CHARSET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferably use predefined standard string constants: string.digits+string.ascii_letters
https://docs.python.org/3/library/string.html#string-constants
CHARSET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' | ||
|
||
|
||
def encode_base62(num: int) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method alone is perfect for writing unit test for. Could you please add one parametrized test with couple of input values.
https://docs.pytest.org/en/stable/how-to/parametrize.html#pytest-mark-parametrize
@@ -90,5 +92,10 @@ async def get_public_url(self, key: str) -> str: | ||
key: The key for which the URL should be generated. | ||
""" | ||
public_api_url = self._api_public_base_url | ||
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}' | ||
|
||
url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the ignore. I guess it is complaining about self.storage_object
in such case we should not ignore it as it could raise an exception in cases where self does not have storage_object
I guess you can remove it and wait for your other change in crawlee
to be merged and that should silence the warning.
tests/unit/test_crypto.py
Outdated
@@ -105,3 +112,20 @@ def test_crypto_random_object_id_length_and_charset() -> None: | ||
long_random_object_id = crypto_random_object_id(1000) | ||
for char in long_random_object_id: | ||
assert char in 'abcdefghijklmnopqrstuvwxyzABCEDFGHIJKLMNOPQRSTUVWXYZ0123456789' | ||
|
||
|
||
# Check if the method is compatible with js version of the same method in: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is the comment telling the reader. Could you rephrase it please. Currently it looks like a remainder to the author of the change to do something
tests/unit/test_crypto.py
Outdated
# This test uses the same secret key and message as in JS tests. | ||
secret_key = 'hmac-same-secret-key' | ||
message = 'hmac-same-message-to-be-authenticated' | ||
for _ in range(5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why stress testing it? Is there some non-determinism involved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no non-determinism involved, but I'd rather be sure, that for same input we get same output.
Refactored test to avoid "stress testing"
@vdusek please hold the SDK release till this one is merged, it was blocked by crawlee release (which we will first need to do once apify/crawlee-python#993 is merged) |
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}' | ||
|
||
url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) | ||
if url_signing_secret_key: | ||
public_url += f'?signature={create_hmac_signature(url_signing_secret_key, key)}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please use yarl.URL
to build the URL? It is already a dependency of Crawlee.
@@ -90,5 +92,10 @@ async def get_public_url(self, key: str) -> str: | ||
key: The key for which the URL should be generated. | ||
""" | ||
public_api_url = self._api_public_base_url | ||
public_url = f'{public_api_url}/v2/key-value-stores/{self._client.resource_id}/records/{key}' | ||
|
||
url_signing_secret_key = getattr(self.storage_object, 'url_signing_secret_key', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, there is no guarantee if there is an attribute called url_signing_secret_key
on the storage object that it will actually be what you expect. Could you use an isinstance
check instead to make sure that the storage_object
is of the correct type?
|
||
assert record_url == f'{public_api_url}/v2/key-value-stores/{default_store_id}/records/dummy' | ||
record_key = 'dummy' | ||
record_url = await cast(KeyValueStoreClient, store._resource_client).get_public_url(record_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I don't think we need to access KeyValueStoreClient._resource_client
anymore - just KeyValueStore.get_public_url
should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR is part of Issue #19363, which updates
KeyValueStore.getPublicUrl(recordKey)
to generate signed links using HMAC.This PR:
P.S. Before merging, we need to wait for the release of Crawlee, so we can update version of Crawlee.
P.P.S It's hard to test the changes, since master branch of SDK and master branch of Crawlee are out of sync. Crawlee has some breaking changes in version 6.0, which are not yet addressed in master branch of SDK.
Previous PR that adds storageObject to Crawlee Python is here
Same PR in SDK JS is here