Skip to content

Named or Alias'ed Request_queue doesn't get used when using storage client #1608

@jaemingo-hhh

Description

@jaemingo-hhh

If I have the following code

configuration = service_locator.get_configuration()
configuration.purge_on_start = True
configuration.storage_dir = "/crawlee/storage"


storage_client: FileSystemStorageClient = FileSystemStorageClient()
dataset_client = await storage_client.create_dataset_client(alias="test-ds")
await storage_client.create_rq_client(alias="test-rq")
service_locator.set_storage_client(storage_client)

PlaywrightCrawler(
    storage_client=storage_client,
)

It creates both the alias dataset and the alias request queue in the /crawlee/storage. The test-ds subfolder is populated. However, the request queue is being appended in the default subfolder and not the test-rq alias subfolder.

Upon closer look, the get_request_manager doesn't seem to open up without a name or alias, which makes me think that it is intentional? I would expect that if I use the create_rq_client with an alias or a name, then the crawler will try to grab use it instead of always defaulting to the unnamed default storage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions