-
Notifications
You must be signed in to change notification settings - Fork 39
feat: Add asyncIterator to the StoreCollectionClient.list return value
#790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| ); | ||
|
|
||
| return this._list(options); | ||
| return this._getIterablePagination(options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be easily applied to other similar endpoints, but maybe it is better to do it gradually to limit the size of the change?
604c018 to
de2c43f
Compare
AsyncIterator to the apifyClient.listAsyncIterator to the StoreCollectionClient.list return value
AsyncIterator to the StoreCollectionClient.list return valueasyncIterator to the StoreCollectionClient.list return value
| export interface PaginationOptions { | ||
| /** Position of the first returned entry. */ | ||
| offset?: number; | ||
| /** Maximum number of entries requested. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /** Maximum number of entries requested. */ | |
| /** Maximum number of entries requested for one chunk. */ |
not sure if page or chunk is better, but it should be clear this is a limit for the chunk and not a total limit for the async iterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually the total limit for the whole iterator. Chunk size is limited by the length of the platform response; it is not limited by this code.
| items: Data[]; | ||
| } | ||
|
|
||
| export interface IterablePaginatedList<Data> extends PaginatedList<Data>, AsyncIterable<PaginatedList<Data>> {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we could just enhance the PaginatedList interface directly, allowing this for every place where we return it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to keep both types, because internally we will keep using the old PaginatedList
|
|
||
| /** | ||
| * Returns async iterator to paginate through all pages and first page of results is returned immediately as well. | ||
| * @private |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the @private hint on a protected method. i dont think we use it anywhere in TS, sometimes we use @internal or @ignore for public methods we dont want to be part of the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
| return { | ||
| ...currentPage, | ||
| async *[Symbol.asyncIterator]() { | ||
| yield currentPage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we are yielding the whole pages? i thought we would yield just the items, one by one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, my idea was that the iterator should yield exactly the same type as the original response, so that you can use the same code to process it. We also already have the whole chunk in memory. For example:
actors = await apifyClient.store().list({ limit, offset });
processActors(actors)
for await (const actorsChunk of actors) {
processActors(actorsChunk)
}
But I guess you would prefer?
actors = await apifyClient.store().list({ limit, offset });
processActors(actors)
for await (const singleActor of actors) {
processSingleActor(singleActor)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return the full pages, it will make the usage harder (forcing users to use nested loops). I would yield items one by one, not pages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to allow iteration over individual items, while making API requests only for as big chunks as possible.
So for example when using limit 3000, it will make 3 requests (3x1000), but for await loop will run 3000x times.
|
I agree w/ both your remarks about the API - can we still brainstorm the expected interface? I would love something like for await (const actor of await client.store().list()) {
log(actor.name); // logs all the Actor's names
}and for await (const actor of await client.store().list({ limit: 10 })) {
log(actor.name); // logs only first 10 Actor's names
}Alternatively, we could even attempt something like - for await (const actor of await client.store().list()) {
+ for await (const actor of client.store().list()) {if we can attach the async iterator to the promise (I suppose we can, but I'm not sure how happy TypeScript would be about this). (Internally, all those solutions would still lazy-load some optimal-sized pages with the correct offsets.) |
Yes, this is exactly what I would like to see, but I am not entirely sure if it's doable. The |
type IterablePromise<TItem> = Promise<Iterable<TItem>> & AsyncIterable<TItem>;
async function fetchData() {
await new Promise(res => setTimeout(res, 1000));
return ['data1', 'data2', 'data3'];
}
function list(): IterablePromise<string> {
const itemsPromise = fetchData();
async function* asyncGenerator() {
const items = await itemsPromise;
for (const item of items) {
yield item;
}
}
Object.defineProperty(itemsPromise, Symbol.asyncIterator, {
value: asyncGenerator
});
return itemsPromise as any;
}
async function main() {
// treat the return value as Promise<string[]>
for (const item of await list()) {
console.log(item);
}
// treat the return value as AsyncIterator<string>
for await (const item of list()) {
console.log(item);
}
}
main();The only issue is that TypeScript requires the return value of an async function to be a |
|
|
||
| return Object.defineProperty(paginatedListPromise, Symbol.asyncIterator, { | ||
| value: asyncGenerator, | ||
| }) as unknown as AsyncIterable<Data> & Promise<R>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there is some type-scriptish way to do this without the as keyword as unknown as AsyncIterable<Data> & Promise<R>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik you can make a better-typed defineProperty method like this:
function defineProperty<T, K extends PropertyKey>(
obj: T,
key: K,
descriptor: PropertyDescriptor,
): T & { [P in K]: PropertyDescriptor['value'] } {
Object.defineProperty(obj, key, descriptor);
return obj as T & { [P in K]: PropertyDescriptor['value'] };
}This implementation will cast the return type to the original type & { key: typeof value }.
I'm not entirely sure if it's worth the extra 8 lines. It's a rather hacky solution, so explicit cast is IMO okay here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To completely avoid any casts, I could do something like this. Is it worth it? Or should I just leave it as it is with as unknown as AsyncIterable<Data> & Promise<R>;
class IterablePromise<Data, R extends PaginatedResponse<Data>> implements AsyncIterable<Data>, Promise<R> {
private iteratorFactory: () => AsyncIterator<Data>;
private promise: Promise<R>;
constructor(promise: Promise<R>, iteratorFactory: () => AsyncIterator<Data>) {
this.iteratorFactory = iteratorFactory;
this.promise = promise;
}
async then<TResult1 = R, TResult2 = never>(
onfulfilled?: ((value: R) => TResult1 | PromiseLike<TResult1>) | undefined | null,
onrejected?: ((reason: any) => TResult2 | PromiseLike<TResult2>) | undefined | null,
): Promise<TResult1 | TResult2> {
return this.promise.then(onfulfilled, onrejected);
}
async catch<TResult = never>(
onrejected?: ((reason: any) => TResult | PromiseLike<TResult>) | undefined | null,
): Promise<R | TResult> {
return this.promise.catch(onrejected);
}
async finally(onfinally?: (() => void) | undefined | null): Promise<R> {
return this.promise.finally(onfinally);
}
[Symbol.asyncIterator](): AsyncIterator<Data> {
return this.iteratorFactory();
}
get [Symbol.toStringTag]() {
return 'Promise';
}
}
and use it in a simple way: new IterablePromise<Data, R>(paginatedListPromise, asyncIterator);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...And I was doubting whether adding 8 LOC would be too much 😅
IMO it's not necessary, especially if it can be solved by one cast (admittedly, not a good practice, but here it's afaiac fine).
Updated accordingly. |
| let itemsFetched = currentPage.items.length; | ||
| let currentLimit = options.limit !== undefined ? options.limit - itemsFetched : undefined; | ||
| let currentOffset = options.offset ?? 0 + itemsFetched; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling this could be simplified using the pagination fields from the response.
How about e.g.
const isLastPage = Math.min(page.total, options.limit + options.offset) <= page.count + page.offsetThis works because:
The options.limit and options.offset options select an interval of the list between
If the current page includes the
Then we could IMO do
let page = await fetchPage({ limit: options.limit, offset: options.offset });
yield* page.items;
/// The variable names are rather for explanation, not production ready
const lastItemIndex = Math.min(page.total, options.limit + options.offset);
let lastItemIndexFromThePreviousPage = page.count + page.offset;
while (lastItemIndexFromThePreviousPage < lastItemIndex) {
const remainingItemCount = lastItemIndex - lastPageItemIndex;
page = await fetchPage({ limit: remainingItemCount, offset: lastItemIndexFromThePreviousPage });
lastItemIndexFromThePreviousPage = page.count + page.offset;
yield* page.items;
}Having typed this out, I'm no longer convinced this is a better way... but feel free to get inspiration 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding pagination fields from the response. According to the documentation, they are not defined on all list endpoints. I think we can rely only on total and items. (Previously, I was under the impression that even total was optional, but re-checking the documentation, it seems it is always there.)
Here is an example of the minimal endpoint: https://docs.apify.com/api/v2/act-versions-get
(Tested and it does really return just items and total)
Therefore, I would define the algorithm in a way that does not require any optional fields from the response. But knowing I can get total allows at least some of the improvements you suggested.
I would also keep this in while condition currentPage.items.length > 0 condition, to handle any API problems or situations when the requested resources change during the iteration due to some external action. (Someone removing an actor could otherwise lead to an infinite loop)
const lastItemIndex = Math.min(page.total, options.limit + options.offset);
Since those fields are optional, this would not work for a user who defines just an offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://api.apify.com/v2/acts/.../versions is not a paginated endpoint though; it just lists all the items under items and the length of this array under total (see implementation here). See that it won't react to offset nor limit query params.
All paginated endpoints will use the Pagination class, which means those will always accept all the parameters (and return all the fields).
Description
ResourceCollectionClient._getIterablePaginationwhich extends the return value ofResourceCollectionClient._listwithasyncIteratorthat can be used to iterate over individual items. (It is made in a generic way and can be applied to various endpoints if desired.)_getIterablePaginationtoStoreCollectionClient.listStoreCollectionClient.listreturn value.Example usage
It can still be used the same way as before:
Or it can be used as
asyncIteratorthat can return more than one chunk based on thelimit, andoffsetoptions and also based on the number of items returned from the API:Issues