Skip to content

TC-3071 add upload advisory and delete advisory scale test#77

Open
bxf12315 wants to merge 1 commit intoguacsec:mainfrom
bxf12315:TC-3071-1
Open

TC-3071 add upload advisory and delete advisory scale test#77
bxf12315 wants to merge 1 commit intoguacsec:mainfrom
bxf12315:TC-3071-1

Conversation

@bxf12315
Copy link
Contributor

@bxf12315 bxf12315 commented Oct 30, 2025

I have added a new environment variable called UPLOAD_FILE_PATH, which is used to represent the path for all uploaded files.(default value is "./test-data").
I have made corresponding changes based on Jens's comment (#74 (comment)). However, there is still an issue: if the upload has not fully completed, the delete operation may return a 404 error.
First, we can tolerate a certain number of errors. Second, we can add a thread sleep between the two REST requests. However, adding thread sleep may affect the results.

The test results are as shown below:
Screenshot 2025-10-30 at 11 51 05
since they distinguish between different requests, they can also be considered acceptable.

@@ -0,0 +1,2 @@
[env]
CARGO_WORKSPACE_ROOT = { value = "", relative = true } No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not possible to use CARGO_MANIFEST_DIR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CARGO_WORKSPACE_ROOT and CARGO_MANIFEST_DIR serve the same purpose, and I think CARGO_WORKSPACE_ROOT is more semantically appropriate. Could you please explain why CARGO_MANIFEST_DIR would be more reasonable?

@@ -0,0 +1,2 @@
[env]
CARGO_MANIFEST_DIR = { value = "", relative = true } No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is provided by the cargo build. It does not need to be provided.

fn load_advisory_files(&self) -> anyhow::Result<Vec<PathBuf>> {
// Try UPLOAD_FILE_PATH first, then fall back to CARGO_MANIFEST_DIR
let base_path = std::env::var("UPLOAD_FILE_PATH")
.or_else(|_| std::env::var("CARGO_MANIFEST_DIR"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to use env! in this case. As it is a build time variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, why not use the current working directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want it to be as flexibly configurable as SCENARIO_FILE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use env to access CARGO_MANIFEST_DIR, you will get an absolute path. Isn’t this potentially unreasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok please check again.

src/restapi.rs Outdated
advisory_file: String,
user: &mut GooseUser,
) -> Result<String, Box<TransactionError>> {
let file_bytes = tokio::fs::read(&advisory_file).await.map_err(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this test now is the loading of the file. Which won't change. Can we cache that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a hard-coded list of a number of files will run into the same situation. Just a bit later.

I think the appropriate solution for this would be to:

  • load an example file on startup, hard-coded, only one
  • create a function returning a new file. in-memory
  • implement that function in a way that it generates a new version of the file by modifying certain element with random values (e.g. adding a random suffix to all package names)
  • use this function to generate the content each time it's required during the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve randomly modified the csaf.document.tracking.id and also added support for xz.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this file

src/restapi.rs Outdated
@@ -74,35 +123,16 @@ pub async fn upload_advisory_and_get_id(
advisory_file: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're only using a reference to this. If that's the case, please use &str and don't clone.

src/main.rs Outdated
if let Some(file) = &scenario.upload_advisory_files {
// Use sequential transaction execution to ensure immediate deletion after upload
s = s.register_transaction(tx!(upload_and_immediately_delete(file.clone(), advisory_files_counter.clone()),name: format!("upload_and_immediately_delete")));
s = s.register_transaction(tx!(upload_and_immediately_delete(file.clone()),name: format!("upload_and_immediately_delete")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is format! needed here?

src/restapi.rs Outdated
let (cached_path, content) = result;

// Verify that the cached path matches the requested path
if cached_path != file_path {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes more sense to have a "map" instead of failing here. Or maybe, instead of using a global static with a dynamic capture, load the file in the setup of the tests, and provide the loaded content directly.

@ctron
Copy link
Contributor

ctron commented Nov 27, 2025

Thank you, I think we are on the right track now. Just a few small things to iron out.

@bxf12315 bxf12315 force-pushed the TC-3071-1 branch 2 times, most recently from 668e969 to 66231c6 Compare November 27, 2025 13:24
@bxf12315
Copy link
Contributor Author

Thank you, I think we are on the right track now. Just a few small things to iron out.

Updated, please check again.

.await
}
/// Load advisory files from UPLOAD_FILE_PATH directory, or fall back to CARGO_MANIFEST_DIR
fn load_advisory_files(&self) -> anyhow::Result<Vec<PathBuf>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This supposed to be a single file now, right? Why is this still a Vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method keeps a certain level of generality and, in here, simply takes the first file encountered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is about a 0..1 files, please use Result<Option<PathButh>>. If this is about loading a configured file, this should actually only be Result<PathBuf>, as configuring a file which is not present should be an error.

There's no need to read directories, if we only load a single file.

src/restapi.rs Outdated
user: &mut GooseUser,
) -> TransactionResult {
// 1. Upload file and get ID
let advisory_id = upload_advisory_and_get_id(&advisory_file, user).await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not split this up in two operations? So that we have individual metrics? To ny understanding, you can use GooseUserData to transport that information using https://docs.rs/goose/0.17.2/goose/goose/struct.GooseUser.html#method.set_session_data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been open for quite some time. The change to merge everything into a single transaction was implemented according to your comment. The PR description includes relevant details.

As for Goose, its transactions seem to be executed in a random order — may I ask how we can ensure they are executed sequentially?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we probably shouldn’t use session data, because this data is shared across different GooseUsers, while the session is only for the current GooseUser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we probably shouldn’t use session data, because this data is shared across different GooseUsers, while the session is only for the current GooseUser.

To my understanding of the documentation (https://docs.rs/goose/latest/goose/goose/struct.GooseUser.html) this should be the state of the goose user. IIRC, we use 5 in the CI. So this would mean we have 5 concurrent users running through all of the transactions of a scenario.

Also to my understanding of https://book.goose.rs/config/scheduler.html the scenarios and transaction do have an order. We those should be run in the order we define it. As long as they have the same weight, and we keep the (default) round-robin scheduler.

src/restapi.rs Outdated
}

/// Upload file and return advisory ID
pub async fn upload_advisory_and_get_id(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description and name I can't see a difference to upload_advisory_and_extract_id. It may also be good to pull out common functions like this into a dedicated module. But that's optional.

src/restapi.rs Outdated
}

/// Check if data is XZ compressed by examining magic bytes
fn is_xz_compressed(data: &[u8]) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pull out common decompression functions into a dedicated module.

src/restapi.rs Outdated
}

/// Get base upload content from file path - cached and initialized on first access
async fn get_base_upload_content(file_path: &str) -> Result<Vec<u8>, String> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this method called in parallel will result in multiple loads of the file. Dropping it in the end. If you want to avoid that, you can use entry() and a detect if the entry is occupied or vacant and then perform the load. Holding a simple mutex lock should be enough, as the map lookup would be fast. And the others would need to wait anyway in case the file is being loaded.

src/restapi.rs Outdated
static FILE_CACHE: OnceCell<RwLock<HashMap<String, Vec<u8>>>> = OnceCell::const_new();

/// Generate a new advisory with modified ID
fn generate_advisory_content(base_content: &[u8]) -> Result<Vec<u8>, Box<TransactionError>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a potentially time consuming, blocking (sync) function. That may interfere with the IO scheduler and cause side effects of the scale testing. It should be forked off to a worker thread using tokio's mechanisms to run blocking workload in an async context.

src/restapi.rs Outdated
}

/// Check if data is XZ compressed by examining magic bytes
fn is_xz_compressed(data: &[u8]) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pull out common decompression functions into a dedicated module.

src/restapi.rs Outdated

/// Sequential execution: upload and then immediately delete
pub async fn upload_and_immediately_delete(
advisory_file: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not load the file on startup, and then pass it in? That would allow dropping all the complex static map and mutex.

@bxf12315
Copy link
Contributor Author

Thank you, I think we are on the right track now. Just a few small things to iron out.

I thought this approach was already approved by you two weeks ago, wasn’t it?

src/restapi.rs Outdated
user: &mut GooseUser,
) -> Result<String, Box<TransactionError>> {
// Check cache first - only cache the base template
let base_content = get_base_upload_content(advisory_file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding this content will never change. Why does it need to be re-read all the time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just pass in the actual content and drop get_base_upload_content?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we should create a parameter in the scenario file? But earlier you mentioned that we should avoid this kind of configuration approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just passing in the raw data?

@bxf12315
Copy link
Contributor Author

Due to this PR taking too long, rebasing resulted in many errors, so I merged all the commits and created a new PR. Please review it. #83

@ctron
Copy link
Contributor

ctron commented Jan 22, 2026

If you can create a new PR, why can't you just push those changes to this PR?

@bxf12315 bxf12315 force-pushed the TC-3071-1 branch 2 times, most recently from d44d11d to 35a1761 Compare January 23, 2026 02:16
@bxf12315
Copy link
Contributor Author

If you can create a new PR, why can't you just push those changes to this PR?

Done

}

/// Load advisory file from UPLOAD_FILE_PATH directory, or fall back to CARGO_MANIFEST_DIR
pub fn load_advisory_files() -> anyhow::Result<PathBuf> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this method doesn't "load" anything. Also it only returns a random file from a directory.

This feels like complexity that we don't need. If we have a single file, then we should stick to a single file.

If the content is supposed to the checked in to the repository, then the relative path can be hard-coded, and the UPLOAD_FILE_PATH can be used to override the base path.

Alternatively, the relative path could also be made configurable, but have the hard-coded value as a default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the specific solutions:

    1. Hardcode the upload file in the full-20250604.json5 file.
    1. On the basis of solution one, add UPLOAD_FILE_PATH; if UPLOAD_FILE_PATH exists, it overrides solution one.
    1. Hardcode the relative path in full-20250604.json5, while keeping the original logic for getting the specific file unchanged.

Which one do you prefer? In my view, solution one should be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's do this:

  • The configuration gets an option for this, with a default value that is relative to the root of the main function
  • The application will evaluate a base path, which is either the current working directory, or the value provided by UPLOAD_FILE_PATH (if present)
  • The configuration value will be resolved against the base path. If the configuration was an absolute path, then this is a no-op

I think this should be both simple and flexible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The configuration gets an option for this, with a default value that is relative to the root of the main function

What does configuration refer to? Is it full-20250604.json5? Is it like retrieving the fields in Scenario?

  • The application will evaluate a base path, which is either the current working directory, or the value provided by UPLOAD_FILE_PATH (if present)

The base path mentioned here is just a folder, right? It is not a specific file, but something like "/test-data/advisories", correct?

  • The configuration value will be resolved against the base path. If the configuration was an absolute path, then this is a no-op

Is the configuration value mentioned here a specific file name, like rhba-2024_11505.json.xz?

For example, suppose there is a file named rhba-2024_11505.json.xz.

It first needs to be defined in both full-20250604.json5 and Scenario.

Then, if UPLOAD_FILE_PATH exists, it will be combined to form a complete relative path:
UPLOAD_FILE_PATH + rhba-2024_11505.json.xz.

However, if the config value itself is already a full absolute path, that config value is used directly.

If UPLOAD_FILE_PATH does not exist, rhba-2024_11505.json.xz is assumed to be located in the current working directory.

Is this the intended logic?

}

/// Check if data is XZ compressed by examining magic bytes
pub fn is_xz_compressed(data: &[u8]) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move all compression functionality into a dedicated module.

let delete_counter = Arc::new(AtomicUsize::new(0));

let advisory_file = utils::load_advisory_files()?;
let advisory_file_path = advisory_file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this as PathBuf, that's the native type to use. You can use &Path (similar to &str) for passing a "non owned" reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants