-
Notifications
You must be signed in to change notification settings - Fork 640
Description
[REQUIRED] Step 2: Describe your environment
- Android Studio version: Android Studio Narwhal Feature Drop | 2025.1.2
- Firebase Component: Firestore (Database, Firestore, Storage, Functions, etc)
- Component version: BOM 34.1.0
[REQUIRED] Step 3: Describe the problem
Steps to reproduce:
There seems to be a performance issue when reading from the cache when a collection doesn't currently have many documents, but historically has had many documents created and deleted. E.g. if a user had a "notes" collection and although they currently only have 5 notes if they have had thousand of notes in the past that have been deleted, then queries to the "notes" collection using the cache are much slower. I believe this is because the old documents still exist in the cache long after they have been deleted and until the cache size is reached the garbage collector will not remove them.
Although we could set a stricter cache size to try and kick the garbage collector in more often, I believe this is more of an underlying issue with the persistence in the Firestore SDK. E.g. if we reduced the cache size from the default 100MB to 1MB then it would reduce the chance of a cache "build up" of deleted documents but then restricts how big each document could be. E.g. a user might want some really long notes where each document is big say ~200KB so reducing the cache to 1MB means they can only have 5 notes offline.
I believe the garbage collector should be stricter about cleaning up old documents that are no longer relevant to the user's query or they should be excluded by the SQLite query so they do not need to be processed. These end up being filtered out before they are returned but they still use up a lot of resources whilst the query is executing. E.g. if there are 10,000 deleted documents and the query won't return any results then each document still gets processed via:
Completely saturating the background queue for a few hundred milliseconds, decoding documents that will never be returned (and slowing down other concurrent queries)
I've attached a sample project a sample project to demo the issue. It queries 2 collections and records the average time it takes. For the active collection it will create 10,000
documents and then delete them to prefill the cache with lots of documents that aren't returned but are still processed (you must set RUN_SETUP=true
to create these documents). The active collection reliably takes > 10x longer despite both queries returning no documents.