TC-3071 add upload advisory and delete advisory scale test#77
TC-3071 add upload advisory and delete advisory scale test#77bxf12315 wants to merge 1 commit intoguacsec:mainfrom
Conversation
.cargo/config.toml
Outdated
| @@ -0,0 +1,2 @@ | |||
| [env] | |||
| CARGO_WORKSPACE_ROOT = { value = "", relative = true } No newline at end of file | |||
There was a problem hiding this comment.
Why is it not possible to use CARGO_MANIFEST_DIR?
There was a problem hiding this comment.
CARGO_WORKSPACE_ROOT and CARGO_MANIFEST_DIR serve the same purpose, and I think CARGO_WORKSPACE_ROOT is more semantically appropriate. Could you please explain why CARGO_MANIFEST_DIR would be more reasonable?
.cargo/config.toml
Outdated
| @@ -0,0 +1,2 @@ | |||
| [env] | |||
| CARGO_MANIFEST_DIR = { value = "", relative = true } No newline at end of file | |||
There was a problem hiding this comment.
This variable is provided by the cargo build. It does not need to be provided.
src/scenario/mod.rs
Outdated
| fn load_advisory_files(&self) -> anyhow::Result<Vec<PathBuf>> { | ||
| // Try UPLOAD_FILE_PATH first, then fall back to CARGO_MANIFEST_DIR | ||
| let base_path = std::env::var("UPLOAD_FILE_PATH") | ||
| .or_else(|_| std::env::var("CARGO_MANIFEST_DIR")) |
There was a problem hiding this comment.
You need to use env! in this case. As it is a build time variable.
There was a problem hiding this comment.
Alternatively, why not use the current working directory?
There was a problem hiding this comment.
I just want it to be as flexibly configurable as SCENARIO_FILE.
There was a problem hiding this comment.
If you use env to access CARGO_MANIFEST_DIR, you will get an absolute path. Isn’t this potentially unreasonable?
There was a problem hiding this comment.
Ok please check again.
src/restapi.rs
Outdated
| advisory_file: String, | ||
| user: &mut GooseUser, | ||
| ) -> Result<String, Box<TransactionError>> { | ||
| let file_bytes = tokio::fs::read(&advisory_file).await.map_err(|e| { |
There was a problem hiding this comment.
Part of this test now is the loading of the file. Which won't change. Can we cache that?
There was a problem hiding this comment.
Having a hard-coded list of a number of files will run into the same situation. Just a bit later.
I think the appropriate solution for this would be to:
- load an example file on startup, hard-coded, only one
- create a function returning a new file. in-memory
- implement that function in a way that it generates a new version of the file by modifying certain element with random values (e.g. adding a random suffix to all package names)
- use this function to generate the content each time it's required during the test
There was a problem hiding this comment.
I’ve randomly modified the csaf.document.tracking.id and also added support for xz.
415a670 to
90fd9bf
Compare
test-data/advisories/.DS_Store
Outdated
src/restapi.rs
Outdated
| @@ -74,35 +123,16 @@ pub async fn upload_advisory_and_get_id( | |||
| advisory_file: String, | |||
There was a problem hiding this comment.
It looks like you're only using a reference to this. If that's the case, please use &str and don't clone.
src/main.rs
Outdated
| if let Some(file) = &scenario.upload_advisory_files { | ||
| // Use sequential transaction execution to ensure immediate deletion after upload | ||
| s = s.register_transaction(tx!(upload_and_immediately_delete(file.clone(), advisory_files_counter.clone()),name: format!("upload_and_immediately_delete"))); | ||
| s = s.register_transaction(tx!(upload_and_immediately_delete(file.clone()),name: format!("upload_and_immediately_delete"))); |
src/restapi.rs
Outdated
| let (cached_path, content) = result; | ||
|
|
||
| // Verify that the cached path matches the requested path | ||
| if cached_path != file_path { |
There was a problem hiding this comment.
Maybe it makes more sense to have a "map" instead of failing here. Or maybe, instead of using a global static with a dynamic capture, load the file in the setup of the tests, and provide the loaded content directly.
|
Thank you, I think we are on the right track now. Just a few small things to iron out. |
668e969 to
66231c6
Compare
Updated, please check again. |
src/scenario/mod.rs
Outdated
| .await | ||
| } | ||
| /// Load advisory files from UPLOAD_FILE_PATH directory, or fall back to CARGO_MANIFEST_DIR | ||
| fn load_advisory_files(&self) -> anyhow::Result<Vec<PathBuf>> { |
There was a problem hiding this comment.
This supposed to be a single file now, right? Why is this still a Vec?
There was a problem hiding this comment.
This method keeps a certain level of generality and, in here, simply takes the first file encountered.
There was a problem hiding this comment.
If this is about a 0..1 files, please use Result<Option<PathButh>>. If this is about loading a configured file, this should actually only be Result<PathBuf>, as configuring a file which is not present should be an error.
There's no need to read directories, if we only load a single file.
src/restapi.rs
Outdated
| user: &mut GooseUser, | ||
| ) -> TransactionResult { | ||
| // 1. Upload file and get ID | ||
| let advisory_id = upload_advisory_and_get_id(&advisory_file, user).await?; |
There was a problem hiding this comment.
Why not split this up in two operations? So that we have individual metrics? To ny understanding, you can use GooseUserData to transport that information using https://docs.rs/goose/0.17.2/goose/goose/struct.GooseUser.html#method.set_session_data
There was a problem hiding this comment.
This PR has been open for quite some time. The change to merge everything into a single transaction was implemented according to your comment. The PR description includes relevant details.
As for Goose, its transactions seem to be executed in a random order — may I ask how we can ensure they are executed sequentially?
There was a problem hiding this comment.
Here we probably shouldn’t use session data, because this data is shared across different GooseUsers, while the session is only for the current GooseUser.
There was a problem hiding this comment.
Here we probably shouldn’t use session data, because this data is shared across different GooseUsers, while the session is only for the current GooseUser.
To my understanding of the documentation (https://docs.rs/goose/latest/goose/goose/struct.GooseUser.html) this should be the state of the goose user. IIRC, we use 5 in the CI. So this would mean we have 5 concurrent users running through all of the transactions of a scenario.
Also to my understanding of https://book.goose.rs/config/scheduler.html the scenarios and transaction do have an order. We those should be run in the order we define it. As long as they have the same weight, and we keep the (default) round-robin scheduler.
src/restapi.rs
Outdated
| } | ||
|
|
||
| /// Upload file and return advisory ID | ||
| pub async fn upload_advisory_and_get_id( |
There was a problem hiding this comment.
From the description and name I can't see a difference to upload_advisory_and_extract_id. It may also be good to pull out common functions like this into a dedicated module. But that's optional.
src/restapi.rs
Outdated
| } | ||
|
|
||
| /// Check if data is XZ compressed by examining magic bytes | ||
| fn is_xz_compressed(data: &[u8]) -> bool { |
There was a problem hiding this comment.
Please pull out common decompression functions into a dedicated module.
src/restapi.rs
Outdated
| } | ||
|
|
||
| /// Get base upload content from file path - cached and initialized on first access | ||
| async fn get_base_upload_content(file_path: &str) -> Result<Vec<u8>, String> { |
There was a problem hiding this comment.
Having this method called in parallel will result in multiple loads of the file. Dropping it in the end. If you want to avoid that, you can use entry() and a detect if the entry is occupied or vacant and then perform the load. Holding a simple mutex lock should be enough, as the map lookup would be fast. And the others would need to wait anyway in case the file is being loaded.
src/restapi.rs
Outdated
| static FILE_CACHE: OnceCell<RwLock<HashMap<String, Vec<u8>>>> = OnceCell::const_new(); | ||
|
|
||
| /// Generate a new advisory with modified ID | ||
| fn generate_advisory_content(base_content: &[u8]) -> Result<Vec<u8>, Box<TransactionError>> { |
There was a problem hiding this comment.
This may be a potentially time consuming, blocking (sync) function. That may interfere with the IO scheduler and cause side effects of the scale testing. It should be forked off to a worker thread using tokio's mechanisms to run blocking workload in an async context.
src/restapi.rs
Outdated
| } | ||
|
|
||
| /// Check if data is XZ compressed by examining magic bytes | ||
| fn is_xz_compressed(data: &[u8]) -> bool { |
There was a problem hiding this comment.
Please pull out common decompression functions into a dedicated module.
src/restapi.rs
Outdated
|
|
||
| /// Sequential execution: upload and then immediately delete | ||
| pub async fn upload_and_immediately_delete( | ||
| advisory_file: String, |
There was a problem hiding this comment.
Why not load the file on startup, and then pass it in? That would allow dropping all the complex static map and mutex.
I thought this approach was already approved by you two weeks ago, wasn’t it? |
src/restapi.rs
Outdated
| user: &mut GooseUser, | ||
| ) -> Result<String, Box<TransactionError>> { | ||
| // Check cache first - only cache the base template | ||
| let base_content = get_base_upload_content(advisory_file) |
There was a problem hiding this comment.
To my understanding this content will never change. Why does it need to be re-read all the time?
There was a problem hiding this comment.
https://github.com/guacsec/trustify-scale-testing/pull/77/files#diff-95544b31bf9265df9363ded490f5b9668f85ad72b5ce89c6e0318e70bf34b115R51
This is cache logic and is only initialized once.
There was a problem hiding this comment.
Why not just pass in the actual content and drop get_base_upload_content?
There was a problem hiding this comment.
Do you mean that we should create a parameter in the scenario file? But earlier you mentioned that we should avoid this kind of configuration approach.
There was a problem hiding this comment.
What about just passing in the raw data?
|
Due to this PR taking too long, rebasing resulted in many errors, so I merged all the commits and created a new PR. Please review it. #83 |
|
If you can create a new PR, why can't you just push those changes to this PR? |
d44d11d to
35a1761
Compare
Done |
| } | ||
|
|
||
| /// Load advisory file from UPLOAD_FILE_PATH directory, or fall back to CARGO_MANIFEST_DIR | ||
| pub fn load_advisory_files() -> anyhow::Result<PathBuf> { |
There was a problem hiding this comment.
It looks like this method doesn't "load" anything. Also it only returns a random file from a directory.
This feels like complexity that we don't need. If we have a single file, then we should stick to a single file.
If the content is supposed to the checked in to the repository, then the relative path can be hard-coded, and the UPLOAD_FILE_PATH can be used to override the base path.
Alternatively, the relative path could also be made configurable, but have the hard-coded value as a default.
There was a problem hiding this comment.
Here are the specific solutions:
-
- Hardcode the upload file in the full-20250604.json5 file.
-
- On the basis of solution one, add UPLOAD_FILE_PATH; if UPLOAD_FILE_PATH exists, it overrides solution one.
-
- Hardcode the relative path in full-20250604.json5, while keeping the original logic for getting the specific file unchanged.
Which one do you prefer? In my view, solution one should be sufficient.
There was a problem hiding this comment.
Ok let's do this:
- The configuration gets an option for this, with a default value that is relative to the root of the
mainfunction - The application will evaluate a base path, which is either the current working directory, or the value provided by
UPLOAD_FILE_PATH(if present) - The configuration value will be resolved against the base path. If the configuration was an absolute path, then this is a no-op
I think this should be both simple and flexible.
There was a problem hiding this comment.
- The configuration gets an option for this, with a default value that is relative to the root of the
mainfunction
What does configuration refer to? Is it full-20250604.json5? Is it like retrieving the fields in Scenario?
- The application will evaluate a base path, which is either the current working directory, or the value provided by
UPLOAD_FILE_PATH(if present)
The base path mentioned here is just a folder, right? It is not a specific file, but something like "/test-data/advisories", correct?
- The configuration value will be resolved against the base path. If the configuration was an absolute path, then this is a no-op
Is the configuration value mentioned here a specific file name, like rhba-2024_11505.json.xz?
For example, suppose there is a file named rhba-2024_11505.json.xz.
It first needs to be defined in both full-20250604.json5 and Scenario.
Then, if UPLOAD_FILE_PATH exists, it will be combined to form a complete relative path:
UPLOAD_FILE_PATH + rhba-2024_11505.json.xz.
However, if the config value itself is already a full absolute path, that config value is used directly.
If UPLOAD_FILE_PATH does not exist, rhba-2024_11505.json.xz is assumed to be located in the current working directory.
Is this the intended logic?
| } | ||
|
|
||
| /// Check if data is XZ compressed by examining magic bytes | ||
| pub fn is_xz_compressed(data: &[u8]) -> bool { |
There was a problem hiding this comment.
Please move all compression functionality into a dedicated module.
| let delete_counter = Arc::new(AtomicUsize::new(0)); | ||
|
|
||
| let advisory_file = utils::load_advisory_files()?; | ||
| let advisory_file_path = advisory_file |
There was a problem hiding this comment.
Please keep this as PathBuf, that's the native type to use. You can use &Path (similar to &str) for passing a "non owned" reference.
I have added a new environment variable called UPLOAD_FILE_PATH, which is used to represent the path for all uploaded files.(default value is "./test-data").
I have made corresponding changes based on Jens's comment (#74 (comment)). However, there is still an issue: if the upload has not fully completed, the delete operation may return a 404 error.
First, we can tolerate a certain number of errors. Second, we can add a thread sleep between the two REST requests. However, adding thread sleep may affect the results.
The test results are as shown below:

since they distinguish between different requests, they can also be considered acceptable.