Skip to content

Conversation

@deividasstr
Copy link
Contributor

Hey, fixing a bug when disk persistence is not being used if requested max cache size (in DiskBufferingConfiguration.builder().setMaxCacheSize) is less than 7mb. Default 60mb sounds alot.

This log gave me a confusing hint: Insufficient folder cache size: -715243, it must be at least: 1048576.

The issue lies within fixed method getAvailableSpace, which returns available space only if max needed space is higher than available space.

It seems the fixed behavior is intended, as this method is used in DiskManager, names the returned value as availableCacheSize.

Screenshot of my debug breakpoint (requesting max 1_000_000 cache storage):
Screenshot 2025-01-28 at 15 09 04

From the numbers here, in current solution, given unlimited available disk space, more than 6_000_000 bytes have to be requested for max cache size for the if (calculatedSize < maxCacheFileSize) { to be false (and have persistence enabled).

Now writing this I got confused why would there be a min limit for cache file? Why do each signals folder have to be bigger than maxCacheFileSize with another maxCacheFileSize subtracted from the size? When CacheStorage already allocates the requested max bytes?

@deividasstr deividasstr requested a review from a team as a code owner January 28, 2025 15:32
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 28, 2025

CLA Signed


The committers listed above are authorized under a signed CLA.

@breedx-splk
Copy link
Contributor

@deividasstr Thanks for the contribution! That error message sure is confusing and maybe even a little embarassing. 😁 We will take a look at the PR soon, but in the meantime can you please sign the contributor license agreement (CLA)? Thanks again!

@deividasstr
Copy link
Contributor Author

Hey @breedx-splk , I have signed the CLA.

So to sum up, this PR turned into a question, if there a minimum cache size required or not.

If yes, I can update the methods and docs in a separate PR.
If no, please let me know how to proceed. Maybe my proposed changes aren't required and something around if (calculatedSize < maxCacheFileSize) needs to be adjusted.

@breedx-splk
Copy link
Contributor

Hey @deividasstr . I thought @LikeTheSalad was going to comment on this, but I think the overall reason we tried to limit the amount of disk used was to try and be a good citizen on the device. If the instrumentation were to get too aggressive with disk usage (due to being offline, or for whatever reason, like excessive telemetry), this could limit the app's own ability to function. We tried to find a reasonable compromise that would allow the app to continue using the cache while the instrumentation also used part of the cache.

The idea was to eventually drop telemetry or stop capturing data if the device is too limited...that way the app is unharmed.

there a minimum cache size required or not.

I think there is. Something too small would prevent our ability to correctly buffer even a few spans or events to disk. Curious on how @LikeTheSalad thinks we should proceed.

@LikeTheSalad
Copy link
Contributor

LikeTheSalad commented Jan 31, 2025

Hi @deividasstr, thank you for your PR!

Regarding a minimum cache size requirement, at first, I thought the same as @breedx-splk, that it would be useful to ensure we have some space for the disk buffering functionality to work properly. The way the cache works here is that each signal has a folder in which we store multiple files of 1MB size max each iirc, though they can be smaller than that, so it made sense to try and ensure that there's at least 1MB available per signal, plus 1MB to account for the data that's read-only to not get cleared before getting exported.

However, after taking another look at it and also after having some extra validations for when a file is corrupted, it seems like probably having a minimum cache size might not be needed after all, given that if something goes wrong, the data will be sent to the exporter right away, which is what would happen anyway in case the disk buffering can't get properly configured due to setting a max folder size of zero, which can happen because of this validation.

So to summarise, unless I missed something important, it seems like we can remove this validation altogether.

We still need to try and use as few resources as possible from the host app though, so I don't like the idea of setting the whole available disk space size as a max cache size for the APM use case. I'm not sure what could be the best approach here, though one option could be to avoid executing this method and instead always use the getUsableSpace() approach and then use a fraction of what it returns.

@deividasstr
Copy link
Contributor Author

deividasstr commented Feb 3, 2025

Thanks for being so involving in the discussion!

My suggestion would be:

  • Remove min size requirement (this validation)
  • Reduce default cache size 60 -> 10mb? Datadog default is ~12mb, Firebase - 10mb.
  • As a developer, I would like to see how often events are lost due to insufficient storage (and if it is due to general storage shortage, or because max cache size was insufficient). Are there some logs of persistence failure? Maybe even a callback of such case could be useful.

@deividasstr
Copy link
Contributor Author

What do you think @LikeTheSalad @breedx-splk?

@LikeTheSalad
Copy link
Contributor

What do you think @LikeTheSalad @breedx-splk?

I agree with the points you mentioned earlier, I just wanted to give some time for people to provide some other ideas in case we missed something, though that doesn't seem to be the case. I've added this topic for tomorrow's Android SIG meeting to discuss the default cache size value and, if there are no other concerns raised on the call then I think we can move forward with them. For your third item, I like the idea of adding a callback for those cases, that's something I can add to the disk buffering artifact.

@LikeTheSalad
Copy link
Contributor

Hi @deividasstr

We talked about this in yesterday's SIG meeting and decided to:

  • Remove the min size requirement, as well as removing any check for available disk space.
  • Set hardcoded 10MB max size as the default.
  • I'll take a look at adding a callback to the disk buffering lib for when there are issues during disk operations.

Please let me know if you'd like to update your PR to apply the 1st and 2nd items, otherwise I can also take a look at it in a separate PR.

Thank you for your patience!

@deividasstr
Copy link
Contributor Author

Hey @LikeTheSalad,

Yep, I can tackle the first and the second points, but I'll do it in separate PRs, linking to this one, which I am closing

@deividasstr
Copy link
Contributor Author

@LikeTheSalad, just to be clear about

Remove the min size requirement, as well as removing any check for available disk space.

Does that include not using and removing CacheStorage.ensureCacheSpaceAvailable() method, which also allocates the requested byte size?

@LikeTheSalad
Copy link
Contributor

@LikeTheSalad, just to be clear about

Remove the min size requirement, as well as removing any check for available disk space.

Does that include not using and removing CacheStorage.ensureCacheSpaceAvailable() method, which also allocates the requested byte size?

Correct. The allocation isn't available in all of the APIs we support anyway and it's also prone to race condition issues, it can also take quite long to run since it may delete some cache files in the process, so for 10MB and for this use case it's probably not needed. So I was planning to go with a simple approach until someone lets us know about a use case where they need allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants