TC-3071 add upload advisory and delete advisory scale test by bxf12315 · Pull Request #77 · guacsec/trustify-scale-testing

bxf12315 · 2025-10-30T03:48:30Z

I have added a new environment variable called UPLOAD_FILE_PATH, which is used to represent the path for all uploaded files.(default value is "./test-data").
I have made corresponding changes based on Jens's comment (#74 (comment)). However, there is still an issue: if the upload has not fully completed, the delete operation may return a 404 error.
First, we can tolerate a certain number of errors. Second, we can add a thread sleep between the two REST requests. However, adding thread sleep may affect the results.

The test results are as shown below:

since they distinguish between different requests, they can also be considered acceptable.

src/restapi.rs

src/main.rs

src/scenario/mod.rs

test-data/advisories/rhba-2024_11505.json

ctron · 2025-11-03T08:10:27Z

.cargo/config.toml

@@ -0,0 +1,2 @@
+[env]
+CARGO_WORKSPACE_ROOT = { value = "", relative = true }


Why is it not possible to use CARGO_MANIFEST_DIR?

CARGO_WORKSPACE_ROOT and CARGO_MANIFEST_DIR serve the same purpose, and I think CARGO_WORKSPACE_ROOT is more semantically appropriate. Could you please explain why CARGO_MANIFEST_DIR would be more reasonable?

src/scenario/mod.rs

ctron · 2025-11-03T13:36:10Z

.cargo/config.toml

@@ -0,0 +1,2 @@
+[env]
+CARGO_MANIFEST_DIR = { value = "", relative = true }


This variable is provided by the cargo build. It does not need to be provided.

ctron · 2025-11-03T13:36:47Z

src/scenario/mod.rs

+    fn load_advisory_files(&self) -> anyhow::Result<Vec<PathBuf>> {
+        // Try UPLOAD_FILE_PATH first, then fall back to CARGO_MANIFEST_DIR
+        let base_path = std::env::var("UPLOAD_FILE_PATH")
+            .or_else(|_| std::env::var("CARGO_MANIFEST_DIR"))


You need to use env! in this case. As it is a build time variable.

Alternatively, why not use the current working directory?

I just want it to be as flexibly configurable as SCENARIO_FILE.

If you use env to access CARGO_MANIFEST_DIR, you will get an absolute path. Isn’t this potentially unreasonable?

Why would it be?

Ok please check again.

ctron · 2025-11-11T14:15:56Z

src/restapi.rs

+    advisory_file: String,
+    user: &mut GooseUser,
+) -> Result<String, Box<TransactionError>> {
+    let file_bytes = tokio::fs::read(&advisory_file).await.map_err(|e| {


Part of this test now is the loading of the file. Which won't change. Can we cache that?

Having a hard-coded list of a number of files will run into the same situation. Just a bit later.

I think the appropriate solution for this would be to:

load an example file on startup, hard-coded, only one

create a function returning a new file. in-memory

implement that function in a way that it generates a new version of the file by modifying certain element with random values (e.g. adding a random suffix to all package names)

use this function to generate the content each time it's required during the test

I’ve randomly modified the csaf.document.tracking.id and also added support for xz.

src/restapi.rs

ctron · 2025-11-27T07:19:00Z

test-data/advisories/.DS_Store

Please remove this file

ctron · 2025-11-27T07:20:05Z

src/restapi.rs

@@ -74,35 +123,16 @@ pub async fn upload_advisory_and_get_id(
    advisory_file: String,


It looks like you're only using a reference to this. If that's the case, please use &str and don't clone.

ctron · 2025-11-27T07:21:04Z

src/main.rs

            if let Some(file) = &scenario.upload_advisory_files {
                // Use sequential transaction execution to ensure immediate deletion after upload
-                s = s.register_transaction(tx!(upload_and_immediately_delete(file.clone(), advisory_files_counter.clone()),name: format!("upload_and_immediately_delete")));
+                s = s.register_transaction(tx!(upload_and_immediately_delete(file.clone()),name: format!("upload_and_immediately_delete")));


Why is format! needed here?

ctron · 2025-11-27T07:25:40Z

src/restapi.rs

+    let (cached_path, content) = result;
+
+    // Verify that the cached path matches the requested path
+    if cached_path != file_path {


Maybe it makes more sense to have a "map" instead of failing here. Or maybe, instead of using a global static with a dynamic capture, load the file in the setup of the tests, and provide the loaded content directly.

ctron · 2025-11-27T07:27:00Z

Thank you, I think we are on the right track now. Just a few small things to iron out.

bxf12315 · 2025-11-28T00:25:42Z

Thank you, I think we are on the right track now. Just a few small things to iron out.

Updated, please check again.

ctron · 2025-11-28T09:54:29Z

src/scenario/mod.rs

        .await
    }
+    /// Load advisory files from UPLOAD_FILE_PATH directory, or fall back to CARGO_MANIFEST_DIR
+    fn load_advisory_files(&self) -> anyhow::Result<Vec<PathBuf>> {


This supposed to be a single file now, right? Why is this still a Vec?

This method keeps a certain level of generality and, in here, simply takes the first file encountered.

If this is about a 0..1 files, please use Result<Option<PathButh>>. If this is about loading a configured file, this should actually only be Result<PathBuf>, as configuring a file which is not present should be an error.

There's no need to read directories, if we only load a single file.

ctron · 2025-12-10T08:26:03Z

src/restapi.rs

+    user: &mut GooseUser,
+) -> TransactionResult {
+    // 1. Upload file and get ID
+    let advisory_id = upload_advisory_and_get_id(&advisory_file, user).await?;


Why not split this up in two operations? So that we have individual metrics? To ny understanding, you can use GooseUserData to transport that information using https://docs.rs/goose/0.17.2/goose/goose/struct.GooseUser.html#method.set_session_data

This PR has been open for quite some time. The change to merge everything into a single transaction was implemented according to your comment. The PR description includes relevant details.

As for Goose, its transactions seem to be executed in a random order — may I ask how we can ensure they are executed sequentially?

Here we probably shouldn’t use session data, because this data is shared across different GooseUsers, while the session is only for the current GooseUser.

Here we probably shouldn’t use session data, because this data is shared across different GooseUsers, while the session is only for the current GooseUser.

To my understanding of the documentation (https://docs.rs/goose/latest/goose/goose/struct.GooseUser.html) this should be the state of the goose user. IIRC, we use 5 in the CI. So this would mean we have 5 concurrent users running through all of the transactions of a scenario.

Also to my understanding of https://book.goose.rs/config/scheduler.html the scenarios and transaction do have an order. We those should be run in the order we define it. As long as they have the same weight, and we keep the (default) round-robin scheduler.

ctron · 2025-12-10T08:27:23Z

src/restapi.rs

 }

+/// Upload file and return advisory ID
+pub async fn upload_advisory_and_get_id(


From the description and name I can't see a difference to upload_advisory_and_extract_id. It may also be good to pull out common functions like this into a dedicated module. But that's optional.

ctron · 2025-12-10T08:28:05Z

src/restapi.rs

+}
+
+/// Check if data is XZ compressed by examining magic bytes
+fn is_xz_compressed(data: &[u8]) -> bool {


Please pull out common decompression functions into a dedicated module.

ctron · 2025-12-10T08:30:20Z

src/restapi.rs

+}
+
+/// Get base upload content from file path - cached and initialized on first access
+async fn get_base_upload_content(file_path: &str) -> Result<Vec<u8>, String> {


Having this method called in parallel will result in multiple loads of the file. Dropping it in the end. If you want to avoid that, you can use entry() and a detect if the entry is occupied or vacant and then perform the load. Holding a simple mutex lock should be enough, as the map lookup would be fast. And the others would need to wait anyway in case the file is being loaded.

ctron · 2025-12-10T08:31:46Z

src/restapi.rs

+static FILE_CACHE: OnceCell<RwLock<HashMap<String, Vec<u8>>>> = OnceCell::const_new();
+
+/// Generate a new advisory with modified ID
+fn generate_advisory_content(base_content: &[u8]) -> Result<Vec<u8>, Box<TransactionError>> {


This may be a potentially time consuming, blocking (sync) function. That may interfere with the IO scheduler and cause side effects of the scale testing. It should be forked off to a worker thread using tokio's mechanisms to run blocking workload in an async context.

ctron · 2025-12-10T08:32:15Z

src/restapi.rs

+}
+
+/// Check if data is XZ compressed by examining magic bytes
+fn is_xz_compressed(data: &[u8]) -> bool {


Please pull out common decompression functions into a dedicated module.

ctron · 2025-12-10T08:33:28Z

src/restapi.rs

+
+/// Sequential execution: upload and then immediately delete
+pub async fn upload_and_immediately_delete(
+    advisory_file: String,


Why not load the file on startup, and then pass it in? That would allow dropping all the complex static map and mutex.

bxf12315 · 2025-12-10T08:50:47Z

Thank you, I think we are on the right track now. Just a few small things to iron out.

I thought this approach was already approved by you two weeks ago, wasn’t it?

ctron · 2025-12-10T12:37:45Z

src/restapi.rs

+    user: &mut GooseUser,
+) -> Result<String, Box<TransactionError>> {
+    // Check cache first - only cache the base template
+    let base_content = get_base_upload_content(advisory_file)


To my understanding this content will never change. Why does it need to be re-read all the time?

https://github.com/guacsec/trustify-scale-testing/pull/77/files#diff-95544b31bf9265df9363ded490f5b9668f85ad72b5ce89c6e0318e70bf34b115R51
This is cache logic and is only initialized once.

Why not just pass in the actual content and drop get_base_upload_content?

Do you mean that we should create a parameter in the scenario file? But earlier you mentioned that we should avoid this kind of configuration approach.

What about just passing in the raw data?

bxf12315 · 2026-01-22T06:59:21Z

Due to this PR taking too long, rebasing resulted in many errors, so I merged all the commits and created a new PR. Please review it. #83

ctron · 2026-01-22T13:12:53Z

If you can create a new PR, why can't you just push those changes to this PR?

bxf12315 · 2026-01-23T02:16:57Z

If you can create a new PR, why can't you just push those changes to this PR?

Done

ctron · 2026-01-23T10:09:18Z

src/utils.rs

 }

+/// Load advisory file from UPLOAD_FILE_PATH directory, or fall back to CARGO_MANIFEST_DIR
+pub fn load_advisory_files() -> anyhow::Result<PathBuf> {


It looks like this method doesn't "load" anything. Also it only returns a random file from a directory.

This feels like complexity that we don't need. If we have a single file, then we should stick to a single file.

If the content is supposed to the checked in to the repository, then the relative path can be hard-coded, and the UPLOAD_FILE_PATH can be used to override the base path.

Alternatively, the relative path could also be made configurable, but have the hard-coded value as a default.

Here are the specific solutions:

Hardcode the upload file in the full-20250604.json5 file.

On the basis of solution one, add UPLOAD_FILE_PATH; if UPLOAD_FILE_PATH exists, it overrides solution one.

Hardcode the relative path in full-20250604.json5, while keeping the original logic for getting the specific file unchanged.

Which one do you prefer? In my view, solution one should be sufficient.

Ok let's do this:

The configuration gets an option for this, with a default value that is relative to the root of the main function

The application will evaluate a base path, which is either the current working directory, or the value provided by UPLOAD_FILE_PATH (if present)

The configuration value will be resolved against the base path. If the configuration was an absolute path, then this is a no-op

I think this should be both simple and flexible.

The configuration gets an option for this, with a default value that is relative to the root of the main function

What does configuration refer to? Is it full-20250604.json5? Is it like retrieving the fields in Scenario?

The application will evaluate a base path, which is either the current working directory, or the value provided by UPLOAD_FILE_PATH (if present)

The base path mentioned here is just a folder, right? It is not a specific file, but something like "/test-data/advisories", correct?

The configuration value will be resolved against the base path. If the configuration was an absolute path, then this is a no-op

Is the configuration value mentioned here a specific file name, like rhba-2024_11505.json.xz?

For example, suppose there is a file named rhba-2024_11505.json.xz.

It first needs to be defined in both full-20250604.json5 and Scenario.

Then, if UPLOAD_FILE_PATH exists, it will be combined to form a complete relative path:
UPLOAD_FILE_PATH + rhba-2024_11505.json.xz.

However, if the config value itself is already a full absolute path, that config value is used directly.

If UPLOAD_FILE_PATH does not exist, rhba-2024_11505.json.xz is assumed to be located in the current working directory.

Is this the intended logic?

ctron · 2026-01-23T10:09:50Z

src/utils.rs

+}
+
+/// Check if data is XZ compressed by examining magic bytes
+pub fn is_xz_compressed(data: &[u8]) -> bool {


Please move all compression functionality into a dedicated module.

ctron · 2026-01-23T10:11:33Z

src/main.rs

+    let delete_counter = Arc::new(AtomicUsize::new(0));

+    let advisory_file = utils::load_advisory_files()?;
+    let advisory_file_path = advisory_file


Please keep this as PathBuf, that's the native type to use. You can use &Path (similar to &str) for passing a "non owned" reference.