Add option to calculate disk usage with Go-based function#1865
Add option to calculate disk usage with Go-based function#1865keith-ms wants to merge 10 commits intoAzure:mainfrom
Conversation
…f du, add a wrapper function and additional tests
There was a problem hiding this comment.
Pull Request Overview
This PR removes reliance on the external du command by implementing disk usage estimation in Go, updates all cache components to use the new APIs, and adds tests to cover symlink and subdirectory cases.
- Introduced
GetUsageInBytesandGetUsageInMegabytes, rewroteGetUsageusingfilepath.WalkDir - Updated FileCache and BlockCache to call the new disk-usage functions and handle errors
- Removed
du-based detection in LRU policy and expanded unit tests
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| common/util.go | Removed external du logic; added GetUsageInBytes, GetUsageInMegabytes, and new GetUsage implementation |
| common/util_test.go | Fixed test name; added tests for symlink and subdirectory disk usage |
| component/file_cache/file_cache.go | Switched all calls from GetUsage to new byte/MB variants; improved error handling |
| component/file_cache/cache_policy.go | Updated getUsagePercentage to use GetUsageInMegabytes |
| component/file_cache/lru_policy.go | Eliminated du detection logic |
| component/block_cache/block_cache.go | Changed disk-usage checks and StatFs to use new functions |
|
|
||
| // GetUsageWithDu: The current disk usage in MB | ||
| func GetUsageWithDu(path string) (float64, error) { | ||
| var duPath []string = []string{"/usr/bin/du", "/usr/local/bin/du", "/usr/sbin/du", "/usr/local/sbin/du", "/sbin/du", "/bin/du"} |
There was a problem hiding this comment.
These are static values, lets keep this as global variable only otherwise every time the function is called, it will end up creating a new string slice with these values. Just a waste of memory nothing much.
| if noDu { | ||
| bc.diskUsageConfiguration = common.DiskUsageConfiguration{ | ||
| DiskUsageFunction: common.GetUsageWithWalkInMegabytes, | ||
| UsesDu: false, |
There was a problem hiding this comment.
If the callback method is set then do we need this flag marking use du or no du. Based on the method linked here in function pointer we have any way made a decision to use or not use du.
| UsesDu: false, | ||
| } | ||
| } else { | ||
| bc.diskUsageConfiguration = common.DiskUsageConfiguration{ |
There was a problem hiding this comment.
Instead of being a file-cache or a block-cache member this could be a global variable in itself where GetUsage is generalised so that in any code space in blobfuse if we try to compute the disk usage it uses the configured method. By doing it in this format we are saying file-cache and block-cache will define du or no-du but other components are still free to rely on du command. Ideally a global function pointer shall be set based on the input cli flag and all other component shall just use that function pointer to get disk space, globally that function defines whether to use du or not.
Type of Change
Description
Previously, in order to estimate disk usage for a specific path, the
GetUsage()function would use theCommandfunction to execute theducommand, parse the output and return the results in megabytes. After troubleshooting an issue that involved file writes to a volume that made use of this driver causing the write to hang, I tracked down the hang to entry into theGetUsage()function.This change adds an option to use a Go-based disk usage calculate function and preserves the usage of
duas the default. I also added functions with clearer names, preserved the API of thecommonpackage, updated functions that previously called theGetUsage()function with selection logic, and added additional testing.duas the default.How Has This Been Tested?
I ran the unit tests for the components that I changed:
I added new unit tests that cover subdirectory, symlink and non-aligned file size scenarios.
For manual testing, I built the application, copied it to a VM running Ubuntu 24.04 and mounted a blob container in a storage account. I used a Python script that utilized
inotifyto monitor access to the/usr/bin/du. Enabling and disabling theno-duoption in the configuration file worked as expected. Theducommand only ran whenno-duwas absent from the configuration file. The default value forno-duis false, and if the option is missing from command line arguments, theno-duvalue is set to false. The Go-based calculation is only used whenno-duis set totrue.Checklist
Related Links