diff --git a/docs/03-concepts/03-storages.mdx b/docs/03-concepts/03-storages.mdx index e229e14b..1b68f072 100644 --- a/docs/03-concepts/03-storages.mdx +++ b/docs/03-concepts/03-storages.mdx @@ -73,7 +73,7 @@ You can either use them without arguments to open the default storages, or you can pass a storage ID or name to open another storage. ```python title="src/main.py" -from apify import Actor +from apify import Actor, Request async def main(): async with Actor: @@ -87,7 +87,7 @@ async def main(): # Work with the request queue with the name 'my-queue' request_queue = await Actor.open_request_queue(name='my-queue') - await request_queue.add_request({ 'uniqueKey': 'v0Nr', 'url': 'https://example.com' }) + await request_queue.add_request(Request.from_url('https://example.com', unique_key='v0Nr'})) ``` ## Deleting storages @@ -239,7 +239,7 @@ you can use the [`RequestQueue.is_finished()`](../../reference/class/RequestQueu ```python title="src/main.py" import asyncio import random -from apify import Actor +from apify import Actor, Request async def main(): @@ -249,13 +249,13 @@ async def main(): # Add some requests to the queue for i in range(1, 10): - await queue.add_request({ 'uniqueKey': f'{i}', 'url': f'http://example.com/{i}' }) + await queue.add_request(Request.from_url(f'http://example.com/{i}', unique_key=f'{i}')) # Add a request to the start of the queue, for priority processing - await queue.add_request({ 'uniqueKey': '0', 'url': 'http://example.com/0' }, forefront=True) + await queue.add_request(Request.from_url(f'http://example.com/0', unique_key='0'), forefront=True) # If you try to add an existing request again, it will not do anything - operation_info = await queue.add_request({ 'uniqueKey': '5', 'url': 'http://different-example.com/5' }) + operation_info = await queue.add_request(Request.from_url(f'http://different-example.com/5', unique_key='5')) print(operation_info) print(await queue.get_request(operation_info['requestId'])) diff --git a/docs/04-upgrading/upgrading_to_v2.md b/docs/04-upgrading/upgrading_to_v2.md index 99926fe4..90062305 100644 --- a/docs/04-upgrading/upgrading_to_v2.md +++ b/docs/04-upgrading/upgrading_to_v2.md @@ -11,9 +11,12 @@ Support for Python 3.8 has been dropped. The Apify Python SDK v2.x now requires ## Storages -The SDK now uses [crawlee](https://github.com/apify/crawlee-python) for local storage emulation. This change should not affect intended usage (working with `Dataset`, `KeyValueStore` and `RequestQueue` classes from the `apify.storages` module or using the shortcuts exposed by the `Actor` class) in any way. - -Removing the `StorageClientManager` class is a significant change. If you need to change the storage client, use `crawlee.service_container` instead. +- The SDK now uses [crawlee](https://github.com/apify/crawlee-python) for local storage emulation. This change should not affect intended usage (working with `Dataset`, `KeyValueStore` and `RequestQueue` classes from the `apify.storages` module or using the shortcuts exposed by the `Actor` class) in any way. +- There is a difference in the `RequestQueue.add_request` method: it accepts an `apify.Request` object instead of a free-form dictionary. + - A quick way to migrate from dict-based arguments is to wrap it with a `Request.model_validate()` call. + - The preferred way is using the `Request.from_url` helper which prefills the `unique_key` and `id` attributes, or instantiating it directly, e.g., `Request(url='https://example.tld', ...)`. + - For simple use cases, `add_request` also accepts plain strings that contain an URL, e.g. `queue.add_request('https://example.tld')`. +- Removing the `StorageClientManager` class is a significant change. If you need to change the storage client, use `crawlee.service_container` instead. ## Configuration @@ -28,6 +31,7 @@ Attributes suffixed with `_millis` were renamed to remove said suffix and have t - `Actor.start`, `Actor.call`, `Actor.start_task`, `Actor.set_status_message` and `Actor.abort` return instances of the `ActorRun` model instead of an untyped `dict`. - Upon entering the context manager (`async with Actor`), the `Actor` puts the default logging configuration in place. This can be disabled using the `configure_logging` parameter. - The `config` parameter of `Actor` has been renamed to `configuration`. +- Event handlers registered via `Actor.on` will now receive Pydantic objects instead of untyped dicts. For example, where you would do `event['isMigrating']`, you should now use `event.is_migrating` ## Scrapy integration