Skip to content

Conversation

@Pijukatel
Copy link
Contributor

@Pijukatel Pijukatel commented Nov 20, 2025

Description

  • Add ResourceCollectionClient._getIterablePagination which extends the return value of ResourceCollectionClient._list with asyncIterator that can be used to iterate over individual items. (It is made in a generic way and can be applied to various endpoints if desired.)
  • Apply _getIterablePagination to StoreCollectionClient.list
  • Add unit tests of async iteration through StoreCollectionClient.list return value.

Example usage

It can still be used the same way as before:

actors = await apifyClient.store().list({ limit, offset });
// Paginated response with up to 1000 items with actor details
console.log(actors.items.length);

Or it can be used as asyncIterator that can return more than one chunk based on the limit, and offset options and also based on the number of items returned from the API:

for await (const actor of apifyClient.store().list({ limit, offset })) {
    // Single actor details
    console.log(actor);
}

Issues

@github-actions github-actions bot added this to the 128th sprint - Tooling team milestone Nov 20, 2025
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Nov 20, 2025
);

return this._list(options);
return this._getIterablePagination(options);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be easily applied to other similar endpoints, but maybe it is better to do it gradually to limit the size of the change?

@Pijukatel Pijukatel force-pushed the paginated-list-iterator branch from 604c018 to de2c43f Compare November 20, 2025 15:36
@Pijukatel Pijukatel changed the title feat: Add AsyncIterator to the apifyClient.list feat: Add AsyncIterator to the StoreCollectionClient.list return value Nov 20, 2025
@Pijukatel Pijukatel requested a review from B4nan November 20, 2025 15:42
@Pijukatel Pijukatel changed the title feat: Add AsyncIterator to the StoreCollectionClient.list return value feat: Add asyncIterator to the StoreCollectionClient.list return value Nov 20, 2025
export interface PaginationOptions {
/** Position of the first returned entry. */
offset?: number;
/** Maximum number of entries requested. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Maximum number of entries requested. */
/** Maximum number of entries requested for one chunk. */

not sure if page or chunk is better, but it should be clear this is a limit for the chunk and not a total limit for the async iterator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the total limit for the whole iterator. Chunk size is limited by the length of the platform response; it is not limited by this code.

items: Data[];
}

export interface IterablePaginatedList<Data> extends PaginatedList<Data>, AsyncIterable<PaginatedList<Data>> {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we could just enhance the PaginatedList interface directly, allowing this for every place where we return it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to keep both types, because internally we will keep using the old PaginatedList


/**
* Returns async iterator to paginate through all pages and first page of results is returned immediately as well.
* @private
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the @private hint on a protected method. i dont think we use it anywhere in TS, sometimes we use @internal or @ignore for public methods we dont want to be part of the docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

return {
...currentPage,
async *[Symbol.asyncIterator]() {
yield currentPage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are yielding the whole pages? i thought we would yield just the items, one by one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my idea was that the iterator should yield exactly the same type as the original response, so that you can use the same code to process it. We also already have the whole chunk in memory. For example:

actors = await apifyClient.store().list({ limit, offset });
processActors(actors)

for await (const actorsChunk of actors) {
    processActors(actorsChunk)
}

But I guess you would prefer?

actors = await apifyClient.store().list({ limit, offset });
processActors(actors)

for await (const singleActor of actors) {
    processSingleActor(singleActor)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return the full pages, it will make the usage harder (forcing users to use nested loops). I would yield items one by one, not pages.

Copy link
Contributor Author

@Pijukatel Pijukatel Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to allow iteration over individual items, while making API requests only for as big chunks as possible.
So for example when using limit 3000, it will make 3 requests (3x1000), but for await loop will run 3000x times.

@B4nan B4nan requested review from barjin and janbuchar November 21, 2025 10:15
@barjin
Copy link
Member

barjin commented Nov 21, 2025

I agree w/ both your remarks about the API - can we still brainstorm the expected interface?

I would love something like

for await (const actor of await client.store().list()) {
    log(actor.name); // logs all the Actor's names
}

and

for await (const actor of await client.store().list({ limit: 10 })) {
    log(actor.name); // logs only first 10 Actor's names
}

Alternatively, we could even attempt something like

- for await (const actor of await client.store().list()) {
+ for await (const actor of client.store().list()) {

if we can attach the async iterator to the promise (I suppose we can, but I'm not sure how happy TypeScript would be about this).


(Internally, all those solutions would still lazy-load some optimal-sized pages with the correct offsets.)

@B4nan
Copy link
Member

B4nan commented Nov 21, 2025

for await (const actor of client.store().list()) {

Yes, this is exactly what I would like to see, but I am not entirely sure if it's doable. The list method would have to be async generator itself, not an async function returning one, which would likely break the current API. We'd probably need a different method, which is feasible, but it would probably mean quite a lot of added code.

@barjin
Copy link
Member

barjin commented Nov 21, 2025

The list method would have to be an async generator itself, not an async function returning one

for await requires the iterable to be of type AsyncIterable, which you can return from a function. The function could actually return Promise & AsyncIterable like this (see list() implementation):

type IterablePromise<TItem> = Promise<Iterable<TItem>> & AsyncIterable<TItem>;

async function fetchData() {
    await new Promise(res => setTimeout(res, 1000));
    return ['data1', 'data2', 'data3'];
}

function list(): IterablePromise<string> {
    const itemsPromise = fetchData();

    async function* asyncGenerator() {
        const items = await itemsPromise;
        for (const item of items) {
            yield item;
        }
    }

    Object.defineProperty(itemsPromise, Symbol.asyncIterator, {
        value: asyncGenerator
    });

    return itemsPromise as any;
}

async function main() {
    // treat the return value as Promise<string[]>
    for (const item of await list()) {
        console.log(item);
    }

    // treat the return value as AsyncIterator<string>
    for await (const item of list()) {
        console.log(item);
    }
}

main();

The only issue is that TypeScript requires the return value of an async function to be a Promise<T> (ts(1064)), but you can make do with .then() callbacks etc just fine (see example above).


return Object.defineProperty(paginatedListPromise, Symbol.asyncIterator, {
value: asyncGenerator,
}) as unknown as AsyncIterable<Data> & Promise<R>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is some type-scriptish way to do this without the as keyword as unknown as AsyncIterable<Data> & Promise<R>;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik you can make a better-typed defineProperty method like this:

function defineProperty<T, K extends PropertyKey>(
    obj: T,
    key: K,
    descriptor: PropertyDescriptor,
): T & { [P in K]: PropertyDescriptor['value'] } {
    Object.defineProperty(obj, key, descriptor);
    return obj as T & { [P in K]: PropertyDescriptor['value'] };
}

This implementation will cast the return type to the original type & { key: typeof value }.

I'm not entirely sure if it's worth the extra 8 lines. It's a rather hacky solution, so explicit cast is IMO okay here.

Copy link
Contributor Author

@Pijukatel Pijukatel Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To completely avoid any casts, I could do something like this. Is it worth it? Or should I just leave it as it is with as unknown as AsyncIterable<Data> & Promise<R>;

class IterablePromise<Data, R extends PaginatedResponse<Data>> implements AsyncIterable<Data>, Promise<R> {
    private iteratorFactory: () => AsyncIterator<Data>;
    private promise: Promise<R>;

    constructor(promise: Promise<R>, iteratorFactory: () => AsyncIterator<Data>) {
        this.iteratorFactory = iteratorFactory;
        this.promise = promise;
    }

    async then<TResult1 = R, TResult2 = never>(
        onfulfilled?: ((value: R) => TResult1 | PromiseLike<TResult1>) | undefined | null,
        onrejected?: ((reason: any) => TResult2 | PromiseLike<TResult2>) | undefined | null,
    ): Promise<TResult1 | TResult2> {
        return this.promise.then(onfulfilled, onrejected);
    }

    async catch<TResult = never>(
        onrejected?: ((reason: any) => TResult | PromiseLike<TResult>) | undefined | null,
    ): Promise<R | TResult> {
        return this.promise.catch(onrejected);
    }

    async finally(onfinally?: (() => void) | undefined | null): Promise<R> {
        return this.promise.finally(onfinally);
    }

    [Symbol.asyncIterator](): AsyncIterator<Data> {
        return this.iteratorFactory();
    }

    get [Symbol.toStringTag]() {
        return 'Promise';
    }
}

and use it in a simple way: new IterablePromise<Data, R>(paginatedListPromise, asyncIterator);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...And I was doubting whether adding 8 LOC would be too much 😅

IMO it's not necessary, especially if it can be solved by one cast (admittedly, not a good practice, but here it's afaiac fine).

@Pijukatel
Copy link
Contributor Author

for await requires the iterable to be of type AsyncIterable, which you can return from a function. The function could actually return Promise & AsyncIterable like this (see list() implementation):

Updated accordingly.

@Pijukatel Pijukatel requested a review from B4nan November 24, 2025 15:00
Comment on lines 34 to 36
let itemsFetched = currentPage.items.length;
let currentLimit = options.limit !== undefined ? options.limit - itemsFetched : undefined;
let currentOffset = options.offset ?? 0 + itemsFetched;
Copy link
Member

@barjin barjin Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling this could be simplified using the pagination fields from the response.

How about e.g.

const isLastPage = Math.min(page.total, options.limit + options.offset) <= page.count + page.offset

This works because:

$n = \text{page.count} + \text{page.offset}$ tells that the current page includes the $n$-th list item. if $n = \text{page.total}$, we've seen the entire list.

The options.limit and options.offset options select an interval of the list between

$$(\text{options.offset}, \text{options.offset} + \text{options.limit}\rangle$$

If the current page includes the

$$n = \text{options.offset} + \text{options.limit}$$

$n$-th item, it's the last page containing any items from the selected interval.

Then we could IMO do

let page = await fetchPage({ limit: options.limit, offset: options.offset });
yield* page.items;

/// The variable names are rather for explanation, not production ready
const lastItemIndex = Math.min(page.total, options.limit + options.offset);
let lastItemIndexFromThePreviousPage = page.count + page.offset;

while (lastItemIndexFromThePreviousPage < lastItemIndex) {
    const remainingItemCount = lastItemIndex - lastPageItemIndex;
    page = await fetchPage({ limit: remainingItemCount, offset: lastItemIndexFromThePreviousPage });
    lastItemIndexFromThePreviousPage = page.count + page.offset;

    yield* page.items;
}

Having typed this out, I'm no longer convinced this is a better way... but feel free to get inspiration 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding pagination fields from the response. According to the documentation, they are not defined on all list endpoints. I think we can rely only on total and items. (Previously, I was under the impression that even total was optional, but re-checking the documentation, it seems it is always there.)

Here is an example of the minimal endpoint: https://docs.apify.com/api/v2/act-versions-get
(Tested and it does really return just items and total)

Therefore, I would define the algorithm in a way that does not require any optional fields from the response. But knowing I can get total allows at least some of the improvements you suggested.

I would also keep this in while condition currentPage.items.length > 0 condition, to handle any API problems or situations when the requested resources change during the iteration due to some external action. (Someone removing an actor could otherwise lead to an infinite loop)

const lastItemIndex = Math.min(page.total, options.limit + options.offset);
Since those fields are optional, this would not work for a user who defines just an offset.

Copy link
Member

@barjin barjin Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://api.apify.com/v2/acts/.../versions is not a paginated endpoint though; it just lists all the items under items and the length of this array under total (see implementation here). See that it won't react to offset nor limit query params.

All paginated endpoints will use the Pagination class, which means those will always accept all the parameters (and return all the fields).

@Pijukatel Pijukatel requested a review from barjin November 25, 2025 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add better support for pagination

4 participants