test: create a test for m0002010 using DS4#2243
Open
ctron wants to merge 12 commits intoguacsec:mainfrom
Open
Conversation
Contributor
Reviewer's GuideRefactors migration dump handling into a reusable dump manager that supports generic URL-based sources and extended decompression, and adds a DS4-based performance test for migration m0002010. Sequence diagram for migration m0002010 performance test using DS4 dumpsequenceDiagram
actor Tester
participant MigrationTest_m0002010 as MigrationTest_m0002010
participant Ctx as MigrationTestContext
participant DM as DumpsManager
participant Src as UrlDumpSource
participant Cache as FileSystemCache
participant DS4 as HTTPServer_DS4
participant Decomp as Decompress
participant DB as EmbeddedDatabase
participant Mig as Migrator
Tester->>MigrationTest_m0002010: test_m0002010_performance()
MigrationTest_m0002010->>Ctx: run_migration_test("m0002010")
Ctx->>Src: resolve(DM)
activate Src
Src->>DM: ensure_dump(descriptor)
activate DM
DM->>Cache: lookup(descriptor.id)
alt dump_in_cache
Cache-->>DM: Ok(cached_path)
else cache_miss
Cache-->>DM: NotFound
DM->>DS4: HTTP GET descriptor.url
DS4-->>DM: dump_bytes
DM->>Cache: store(descriptor.id, dump_bytes)
DM->>DM: validate_digest(cached_path, descriptor.digest)
end
DM-->>Src: Ok(cached_path)
deactivate DM
Src-->>Ctx: Ok(cached_path)
deactivate Src
Ctx->>Decomp: decompress_async_read(cached_path)
Decomp-->>Ctx: source_reader
Ctx->>DB: create_for(config, Source::Import(cached_path))
activate DB
DB->>Decomp: decompress_async_read(cached_path)
Decomp-->>DB: async_reader
DB->>DB: import(config, async_reader)
DB-->>Ctx: Ok(())
deactivate DB
Ctx->>Mig: run_migration("m0002010", config)
Mig-->>Ctx: Ok(())
Ctx-->>MigrationTest_m0002010: Ok(())
MigrationTest_m0002010-->>Tester: Test passed
Class diagram for dump management and decompression utilitiesclassDiagram
class DumpDescriptor {
+String id
+String url
+Option~String~ file_name
+Option~String~ digest
+DumpDescriptor new(String id, String url, Option~String~ file_name, Option~String~ digest)
}
class DumpsManager {
+PathBuf cache_root
+DumpsManager new(PathBuf cache_root)
+Result~PathBuf~ ensure_dump(DumpDescriptor descriptor)
+Result~PathBuf~ download(DumpDescriptor descriptor)
+Result~()~ validate_digest(PathBuf path, String digest)
}
class MigrationDumpSource {
<<interface>>
+Result~PathBuf~ resolve(DumpsManager manager)
}
class UrlDumpSource {
+DumpDescriptor descriptor
+UrlDumpSource new(DumpDescriptor descriptor)
+Result~PathBuf~ resolve(DumpsManager manager)
}
class MigrationTestContext {
+DumpsManager dumps_manager
+Result~PathBuf~ resolve_migration_dump(String migration_id)
+Result~()~ run_migration_test(String migration_id)
}
class MigrationTest_m0002010 {
+Result~()~ test_m0002010_performance()
}
class Decompress {
+Result~Pin~Box~dyn AsyncRead + Send~~~~ decompress_async_read(Path path)
+Result~Box~dyn Read + Send~~ decompress_read(Path path)
}
class EmbeddedDatabase {
+Result~()~ create_for(DatabaseConfig config, Source source)
+Result~()~ import(DatabaseConfig config, Pin~Box~dyn AsyncRead + Send~~~~ source)
}
class DatabaseConfig {
}
class Source {
<<enum>>
Import(PathBuf path)
Other
}
class Migrator {
+Result~()~ run_migration(String migration_id, DatabaseConfig config)
}
DumpsManager o-- DumpDescriptor
UrlDumpSource ..|> MigrationDumpSource
UrlDumpSource o-- DumpDescriptor
MigrationTestContext o-- DumpsManager
MigrationTestContext ..> MigrationDumpSource
MigrationTest_m0002010 ..> MigrationTestContext
EmbeddedDatabase ..> Decompress
EmbeddedDatabase ..> DatabaseConfig
EmbeddedDatabase ..> Source
Migrator ..> EmbeddedDatabase
Migrator ..> DatabaseConfig
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
ce8f274 to
a1499eb
Compare
d674a6d to
7920244
Compare
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- In
Dumps::provide_raw/download_artifacts_rawa freshreqwest::Clientis created for each invocation; consider keeping a shared client onDumpsto take advantage of connection pooling and reduce overhead for repeated test runs. - The new
log::info!calls indownload_artifacts_raw(progress output) and inFileSystemBackend::findfor missing files may be quite noisy in normal test runs; consider downgrading these todebug(or gating them) so info-level logs remain focused on higher‑level events.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `Dumps::provide_raw`/`download_artifacts_raw` a fresh `reqwest::Client` is created for each invocation; consider keeping a shared client on `Dumps` to take advantage of connection pooling and reduce overhead for repeated test runs.
- The new `log::info!` calls in `download_artifacts_raw` (progress output) and in `FileSystemBackend::find` for missing files may be quite noisy in normal test runs; consider downgrading these to `debug` (or gating them) so info-level logs remain focused on higher‑level events.
## Individual Comments
### Comment 1
<location path="test-context/src/migration.rs" line_range="329-274" />
<code_context>
+/// Manage raw dump downloads
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding focused tests for `Dumps::provide`/`provide_raw` covering cache reuse, locking behavior, and digest/no-digest modes.
These methods are now a key path for downloading and caching dump artifacts, but their behavior isn’t directly tested. Adding a few focused unit/integration tests would help:
* Cache reuse: Call `provide` twice for the same `Dump` (using a local test server or file-based mock) and assert that the second call doesn’t trigger another download (e.g., via a request counter).
* Digest vs no-digest: In `digests = true` mode, verify success with matching `.sha256` files and failure with mismatched ones; in `digests = false` mode, ensure missing `.sha256` files don’t error.
* Locking: Call `provide` concurrently from two tasks for the same URL and assert both succeed with uncorrupted content, confirming that locking prevents duplicate or conflicting downloads.
Given that many migration and external dump tests rely on this code, these tests would materially improve reliability and debuggability.
Suggested implementation:
```rust
/// Manage raw dump downloads
#[derive(Debug)]
pub struct Dumps {
base: PathBuf,
}
```
` section.
Here are the code changes:
```xml
<file_operations>
<file_operation operation="edit" file_path="test-context/src/migration.rs">
<<<<<<< SEARCH
/// Manage raw dump downloads
#[derive(Debug)]
pub struct Dumps {
base: PathBuf,
}
=======
/// Manage raw dump downloads
#[derive(Debug)]
pub struct Dumps {
base: PathBuf,
}
>>>>>>> REPLACE
</file_operation>
</file_operations>
<additional_changes>
Please append the following `#[cfg(test)]` module to the end of `test-context/src/migration.rs` (after the `impl Dumps` block and any associated functions like `provide` / `provide_raw`). The tests assume that:
- `Dumps::new()` uses the `TRUSTIFY_MIGRATION_DUMPS` environment variable as the base cache directory.
- `Dumps::provide(&self, dump: &Dump<'_, String>) -> anyhow::Result<PathBuf>` (or similar) exists and downloads/caches the dump files defined in `Dump::files`.
- `Dumps::provide_raw(...)` is called internally by `provide` and handles locking/digest verification as implied by your comment.
If your actual signatures differ slightly, adjust the parameter/return types accordingly but keep the test logic intact.
```rust
#[cfg(test)]
mod tests {
use super::{Dump, Dumps};
use std::env;
use std::fs;
use std::io::Write;
use std::net::{TcpListener, TcpStream};
use std::path::{Path, PathBuf};
use std::sync::{
atomic::{AtomicUsize, Ordering},
Arc,
};
use std::thread;
use std::time::Duration;
use sha2::{Digest, Sha256};
use tempfile::TempDir;
/// Simple HTTP response writer for our tiny test server.
fn write_http_response(mut stream: TcpStream, status: &str, body: &[u8]) {
let headers = format!(
"HTTP/1.1 {status}\r\nContent-Length: {}\r\nConnection: close\r\n\r\n",
body.len()
);
let _ = stream.write_all(headers.as_bytes());
let _ = stream.write_all(body);
}
/// A tiny HTTP test server that serves three endpoints:
///
/// - /dump.bin => dump content (counts requests)
/// - /dump.bin.sha256 => hex-encoded sha256 of dump content
/// - /bad.bin.sha256 => incorrect sha256 (for negative test)
struct TestServer {
addr: String,
request_count: Arc<AtomicUsize>,
_handle: thread::JoinHandle<()>,
}
impl TestServer {
fn start() -> Self {
let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server");
let addr = listener.local_addr().unwrap().to_string();
let request_count = Arc::new(AtomicUsize::new(0));
let request_count_clone = Arc::clone(&request_count);
let handle = thread::spawn(move || {
// Fixed content for dump.bin
let content = b"test-dump-content";
let mut hasher = Sha256::new();
hasher.update(content);
let good_digest = format!("{:x}", hasher.finalize());
let bad_digest = "0000000000000000000000000000000000000000000000000000000000000000";
for stream in listener.incoming() {
if let Ok(stream) = stream {
let mut buf = [0u8; 1024];
let read_len = match stream.peek(&mut buf) {
Ok(n) => n,
Err(_) => continue,
};
let req = String::from_utf8_lossy(&buf[..read_len]);
let path = if let Some(line) = req.lines().next() {
line.split_whitespace().nth(1).unwrap_or("/")
} else {
"/"
};
request_count_clone.fetch_add(1, Ordering::SeqCst);
match path {
"/dump.bin" => {
write_http_response(stream, "200 OK", content);
}
"/dump.bin.sha256" => {
write_http_response(stream, "200 OK", good_digest.as_bytes());
}
"/bad.bin.sha256" => {
write_http_response(stream, "200 OK", bad_digest.as_bytes());
}
_ => {
write_http_response(stream, "404 Not Found", b"");
}
}
}
}
});
TestServer {
addr,
request_count,
_handle: handle,
}
}
fn base_url(&self) -> String {
format!("http://{}", self.addr)
}
fn request_count(&self) -> usize {
self.request_count.load(Ordering::SeqCst)
}
}
/// Helper to configure `Dumps` with a temporary base directory.
fn make_dumps_with_tempdir() -> (Dumps, TempDir) {
let tmp = TempDir::new().expect("tempdir");
env::set_var("TRUSTIFY_MIGRATION_DUMPS", tmp.path());
let dumps = Dumps::new().expect("Dumps::new");
(dumps, tmp)
}
fn assert_file_contents_eq(path: &Path, expected: &[u8]) {
let data = fs::read(path).expect("read cached file");
assert_eq!(data, expected);
}
#[test]
fn cache_is_reused_on_subsequent_provide_calls() {
let server = TestServer::start();
let (dumps, _tmp) = make_dumps_with_tempdir();
let url_base = server.base_url();
let dump = Dump {
url: &url_base,
files: &["dump.bin".to_string()],
digests: true,
};
// First call should trigger a download.
let path1 = dumps.provide(&dump).expect("first provide");
assert!(path1.is_dir(), "expected provide to return a directory");
// Second call should reuse the cached content.
let path2 = dumps.provide(&dump).expect("second provide");
assert_eq!(path1, path2, "expected same cache directory to be reused");
// We know the server is used, but the key assertion is that
// a second call does not cause an extra download of dump.bin.
// This relies on `Dumps::provide` checking the cache/lock correctly.
let count = server.request_count();
assert!(
count <= 3,
"expected limited HTTP requests due to cache reuse, got {count}"
);
}
#[test]
fn digest_verification_succeeds_and_fails_as_expected() {
let server = TestServer::start();
let (dumps, _tmp) = make_dumps_with_tempdir();
let url_base = server.base_url();
// digests = true, matching sha256
let dump_ok = Dump {
url: &url_base,
files: &["dump.bin".to_string()],
digests: true,
};
let dir_ok = dumps.provide(&dump_ok).expect("provide with matching digest");
let file_ok = dir_ok.join("dump.bin");
assert!(file_ok.exists(), "file should exist after successful provide");
assert_file_contents_eq(&file_ok, b"test-dump-content");
// digests = true, but we point to a mismatched .sha256 file.
// This assumes `provide` / `provide_raw` uses `{file}.sha256` from the same base URL.
let dump_bad = Dump {
url: &url_base,
files: &["bad.bin".to_string()],
digests: true,
};
let result_bad = dumps.provide(&dump_bad);
assert!(
result_bad.is_err(),
"expected provide to fail when digest does not match"
);
// digests = false: missing .sha256 must not cause an error.
let dump_no_digest = Dump {
url: &url_base,
files: &["dump.bin".to_string()],
digests: false,
};
let dir_no_digest = dumps
.provide(&dump_no_digest)
.expect("provide without digest even if .sha256 is missing/unused");
let file_no_digest = dir_no_digest.join("dump.bin");
assert!(file_no_digest.exists());
assert_file_contents_eq(&file_no_digest, b"test-dump-content");
}
#[test]
fn concurrent_provide_calls_are_locked_and_content_is_not_corrupted() {
use std::sync::Barrier;
let server = TestServer::start();
let (dumps, _tmp) = make_dumps_with_tempdir();
let url_base = server.base_url();
let dump = Dump {
url: &url_base,
files: &["dump.bin".to_string()],
digests: true,
};
// Run two threads that call `provide` concurrently for the same dump.
let barrier = Arc::new(Barrier::new(2));
let dumps1 = dumps.clone();
let dumps2 = dumps;
let dump1 = Dump {
url: dump.url,
files: dump.files,
digests: dump.digests,
};
let dump2 = Dump {
url: dump.url,
files: dump.files,
digests: dump.digests,
};
let b1 = barrier.clone();
let t1 = thread::spawn(move || {
b1.wait();
dumps1.provide(&dump1)
});
let b2 = barrier.clone();
let t2 = thread::spawn(move || {
b2.wait();
dumps2.provide(&dump2)
});
let res1 = t1.join().expect("thread 1 panicked");
let res2 = t2.join().expect("thread 2 panicked");
let dir1 = res1.expect("thread 1 provide failed");
let dir2 = res2.expect("thread 2 provide failed");
assert_eq!(
dir1, dir2,
"both concurrent calls should resolve to the same cache directory"
);
let file = dir1.join("dump.bin");
assert!(file.exists(), "cached file must exist");
assert_file_contents_eq(&file, b"test-dump-content");
}
}
```
Notes / adjustments you may need to make:
1. **Imports**
- If `sha2` or `tempfile` are not yet available in this crate, add them as test/dev dependencies in `test-context/Cargo.toml`:
```toml
[dev-dependencies]
sha2 = "0.10"
tempfile = "3"
```
- If your project already uses another hashing/tempdir crate, swap those in instead.
2. **`Dumps` API**
- If `Dumps` is not `Clone`, either:
- Implement `Clone` for `Dumps` (e.g., it only wraps a `PathBuf`), or
- Change the concurrent test to wrap `Dumps` in `Arc<Dumps>` and clone the `Arc` instead.
3. **`Dump` type**
- The tests use `Dump<'_, String>` with `files: &["dump.bin".to_string()]`.
If your `Dump` is usually constructed differently (e.g., `&[&str]`), update the tests to match.
4. **URL-to-path mapping**
- The tests assume `Dumps::provide` constructs URLs as `format!("{url}/{file}")`.
If your code uses a different convention, adjust `TestServer` paths and/or test URLs accordingly.
These tests will directly cover cache reuse, digest behavior, and locking semantics around `Dumps::provide` / `provide_raw`, and should be easy to adapt to your exact signatures and internal conventions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2243 +/- ##
==========================================
- Coverage 70.42% 70.20% -0.23%
==========================================
Files 413 413
Lines 23895 24051 +156
Branches 23895 24051 +156
==========================================
+ Hits 16828 16884 +56
- Misses 6151 6238 +87
- Partials 916 929 +13 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e7e26d6 to
a493616
Compare
helio-frota
reviewed
Mar 4, 2026
| - run: df -h | ||
|
|
||
| - uses: actions/checkout@v4 | ||
| - uses: actions/checkout@v5 |
helio-frota
reviewed
Mar 4, 2026
| - name: Test | ||
| # use only one job, trying to reduce memory usage | ||
| run: cargo llvm-cov --codecov --jobs 1 --features _test-s3 --output-path codecov.json | ||
| run: cargo llvm-cov --codecov --jobs 1 --all-features --output-path codecov.json |
helio-frota
reviewed
Mar 4, 2026
| | `EXTERNAL_TEST_DB` | Run tests against external test database if set | | | ||
| | `EXTERNAL_TEST_DB_BOOTSTRAP` | Run tests against external test database if set | | | ||
| | `MEM_LIMIT_MB` | Set memory limit for tests that use TrustifyContext, shows the memory usage when the test reaches the limit | `500 MiB` | | ||
| | `TRUST_TEST_BTRFS_STORE` | Path to a BTRFS-backed directory for using snapshots for tests | User's home | |
The logic treated .sql special, as being uncompressed. However, there are more extensions (.tar) which indicate an uncompressed file. We can't distinguish that properly. So whatever we understand, we process. But everything else, we take as plain data, and let the next step fail.
2cfecee to
59fa48a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary by Sourcery
Introduce a reusable dump management abstraction to support both migration-based and direct URL-based database/storage dumps, and add a performance test for migration m0002010 using an external DS4 dump.
New Features:
Enhancements:
Build:
Tests: