Skip to content

Add option to calculate disk usage with Go-based function#1865

Open
keith-ms wants to merge 10 commits intoAzure:mainfrom
keith-ms:eliminate-du-usage
Open

Add option to calculate disk usage with Go-based function#1865
keith-ms wants to merge 10 commits intoAzure:mainfrom
keith-ms:eliminate-du-usage

Conversation

@keith-ms
Copy link

@keith-ms keith-ms commented Jul 9, 2025

Type of Change

  • Bug fix
  • New feature
  • Code quality improvement
  • Other (describe):

Description

Previously, in order to estimate disk usage for a specific path, the GetUsage() function would use the Command function to execute the du command, parse the output and return the results in megabytes. After troubleshooting an issue that involved file writes to a volume that made use of this driver causing the write to hang, I tracked down the hang to entry into the GetUsage() function.

This change adds an option to use a Go-based disk usage calculate function and preserves the usage of du as the default. I also added functions with clearer names, preserved the API of the common package, updated functions that previously called the GetUsage() function with selection logic, and added additional testing.

  • Feature / Bug Fix: Adding an option to use a Go-based disk usage calculation while keeping the usage of du as the default.

How Has This Been Tested?

I ran the unit tests for the components that I changed:

~/azure-storage-fuse/component$ go test ./block_cache/ ./file_cache/
ok      github.com/Azure/azure-storage-fuse/v2/component/block_cache    (cached)
ok      github.com/Azure/azure-storage-fuse/v2/component/file_cache     (cached)
~/azure-storage-fuse/common$ go test ./...
ok      github.com/Azure/azure-storage-fuse/v2/common   1.688s
ok      github.com/Azure/azure-storage-fuse/v2/common/cache_policy      0.009s
ok      github.com/Azure/azure-storage-fuse/v2/common/config    0.015s
?       github.com/Azure/azure-storage-fuse/v2/common/exectime  [no test files]
ok      github.com/Azure/azure-storage-fuse/v2/common/log       0.814s

I added new unit tests that cover subdirectory, symlink and non-aligned file size scenarios.

For manual testing, I built the application, copied it to a VM running Ubuntu 24.04 and mounted a blob container in a storage account. I used a Python script that utilized inotify to monitor access to the /usr/bin/du. Enabling and disabling the no-du option in the configuration file worked as expected. The du command only ran when no-du was absent from the configuration file. The default value for no-du is false, and if the option is missing from command line arguments, the no-du value is set to false. The Go-based calculation is only used when no-du is set to true.

Checklist

  • The purpose of this PR is explained in this or a referenced issue.
  • Tests are included and/or updated for code changes.
  • Documentation update required.
  • Updates to module CHANGELOG.md are included.
  • License headers are included in each file.

Related Links

@keith-ms keith-ms marked this pull request as ready for review July 9, 2025 18:04
Copilot AI review requested due to automatic review settings July 9, 2025 18:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes reliance on the external du command by implementing disk usage estimation in Go, updates all cache components to use the new APIs, and adds tests to cover symlink and subdirectory cases.

  • Introduced GetUsageInBytes and GetUsageInMegabytes, rewrote GetUsage using filepath.WalkDir
  • Updated FileCache and BlockCache to call the new disk-usage functions and handle errors
  • Removed du-based detection in LRU policy and expanded unit tests

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
common/util.go Removed external du logic; added GetUsageInBytes, GetUsageInMegabytes, and new GetUsage implementation
common/util_test.go Fixed test name; added tests for symlink and subdirectory disk usage
component/file_cache/file_cache.go Switched all calls from GetUsage to new byte/MB variants; improved error handling
component/file_cache/cache_policy.go Updated getUsagePercentage to use GetUsageInMegabytes
component/file_cache/lru_policy.go Eliminated du detection logic
component/block_cache/block_cache.go Changed disk-usage checks and StatFs to use new functions

@vibhansa-msft vibhansa-msft added this to the v2-2.6.0 milestone Jul 10, 2025
@keith-ms keith-ms changed the title Eliminate du usage Add option to calculate disk usage with Go-based function Jul 22, 2025
@vibhansa-msft vibhansa-msft modified the milestones: v2-2.6.0, v2-2.5.1 Jul 29, 2025

// GetUsageWithDu: The current disk usage in MB
func GetUsageWithDu(path string) (float64, error) {
var duPath []string = []string{"/usr/bin/du", "/usr/local/bin/du", "/usr/sbin/du", "/usr/local/sbin/du", "/sbin/du", "/bin/du"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are static values, lets keep this as global variable only otherwise every time the function is called, it will end up creating a new string slice with these values. Just a waste of memory nothing much.

if noDu {
bc.diskUsageConfiguration = common.DiskUsageConfiguration{
DiskUsageFunction: common.GetUsageWithWalkInMegabytes,
UsesDu: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the callback method is set then do we need this flag marking use du or no du. Based on the method linked here in function pointer we have any way made a decision to use or not use du.

UsesDu: false,
}
} else {
bc.diskUsageConfiguration = common.DiskUsageConfiguration{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of being a file-cache or a block-cache member this could be a global variable in itself where GetUsage is generalised so that in any code space in blobfuse if we try to compute the disk usage it uses the configured method. By doing it in this format we are saying file-cache and block-cache will define du or no-du but other components are still free to rely on du command. Ideally a global function pointer shall be set based on the input cli flag and all other component shall just use that function pointer to get disk space, globally that function defines whether to use du or not.

@vibhansa-msft vibhansa-msft removed this from the v2-2.5.1 milestone Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants