|
| 1 | +# Error Handling Guide |
| 2 | + |
| 3 | +This guide establishes principles and best practices for error handling in the Torrust Tracker Deploy application, aligning with our [development principles](../development-principles.md) of observability, traceability, and actionability. |
| 4 | + |
| 5 | +## 🎯 Core Principles |
| 6 | + |
| 7 | +### 1. Clarity - No Ambiguity |
| 8 | + |
| 9 | +Errors must be clear and unambiguous. Users should immediately understand what went wrong without needing to guess or interpret vague messages. |
| 10 | + |
| 11 | +#### ✅ Good Examples |
| 12 | + |
| 13 | +```rust |
| 14 | +// Clear, specific error |
| 15 | +pub enum ConfigError { |
| 16 | + FileNotFound { path: PathBuf }, |
| 17 | + InvalidFormat { line: usize, reason: String }, |
| 18 | + MissingRequiredField { field: String }, |
| 19 | +} |
| 20 | +``` |
| 21 | + |
| 22 | +#### ❌ Bad Examples |
| 23 | + |
| 24 | +```rust |
| 25 | +// Vague, unclear error |
| 26 | +return Err("Something went wrong".into()); |
| 27 | +return Err("Invalid input".into()); |
| 28 | +return Err("Error".into()); |
| 29 | +``` |
| 30 | + |
| 31 | +### 2. Context and Traceability |
| 32 | + |
| 33 | +Errors should include sufficient context to make them easy to diagnose and fix. This aligns with our **Observability** and **Traceability** principles. |
| 34 | + |
| 35 | +#### Context Requirements |
| 36 | + |
| 37 | +- **What**: What operation was being performed? |
| 38 | +- **Where**: Which component, file, or resource was involved? |
| 39 | +- **When**: Under what conditions did this occur? |
| 40 | +- **Why**: What caused the error? |
| 41 | + |
| 42 | +#### ✅ Good Examples |
| 43 | + |
| 44 | +```rust |
| 45 | +pub enum ProvisioningError { |
| 46 | + InstanceAlreadyExists { |
| 47 | + instance_name: String, |
| 48 | + provider: String |
| 49 | + }, |
| 50 | + InvalidConfiguration { |
| 51 | + config_path: PathBuf, |
| 52 | + validation_errors: Vec<String> |
| 53 | + }, |
| 54 | + NetworkTimeout { |
| 55 | + operation: String, |
| 56 | + timeout_duration: Duration, |
| 57 | + endpoint: String |
| 58 | + }, |
| 59 | +} |
| 60 | + |
| 61 | +impl Display for ProvisioningError { |
| 62 | + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { |
| 63 | + match self { |
| 64 | + Self::InstanceAlreadyExists { instance_name, provider } => { |
| 65 | + write!(f, "Instance '{}' already exists in {} provider. Use a different name or remove the existing instance.", instance_name, provider) |
| 66 | + }, |
| 67 | + Self::InvalidConfiguration { config_path, validation_errors } => { |
| 68 | + write!(f, "Configuration file '{}' is invalid:\n{}", |
| 69 | + config_path.display(), |
| 70 | + validation_errors.join("\n")) |
| 71 | + }, |
| 72 | + Self::NetworkTimeout { operation, timeout_duration, endpoint } => { |
| 73 | + write!(f, "Network timeout during '{}' operation to '{}' after {:?}. Check network connectivity and endpoint availability.", |
| 74 | + operation, endpoint, timeout_duration) |
| 75 | + }, |
| 76 | + } |
| 77 | + } |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | +### 3. Actionability |
| 82 | + |
| 83 | +Errors should be actionable, telling users how to fix them when possible. This aligns with our **Actionability** principle. |
| 84 | + |
| 85 | +#### Requirements |
| 86 | + |
| 87 | +- **Clear Instructions**: Provide specific steps to resolve the issue |
| 88 | +- **Command Examples**: Include exact commands when applicable |
| 89 | +- **Alternative Solutions**: Offer multiple approaches when possible |
| 90 | +- **Next Steps**: Guide users on what to do next |
| 91 | + |
| 92 | +#### ✅ Good Examples |
| 93 | + |
| 94 | +```rust |
| 95 | +pub enum DeploymentError { |
| 96 | + SshKeyNotFound { |
| 97 | + expected_path: PathBuf, |
| 98 | + alternative_paths: Vec<PathBuf> |
| 99 | + }, |
| 100 | + InsufficientPermissions { |
| 101 | + required_permissions: String, |
| 102 | + current_user: String |
| 103 | + }, |
| 104 | +} |
| 105 | + |
| 106 | +impl Display for DeploymentError { |
| 107 | + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { |
| 108 | + match self { |
| 109 | + Self::SshKeyNotFound { expected_path, alternative_paths } => { |
| 110 | + write!(f, "SSH key not found at '{}'. \n\nTo fix this:\n1. Generate a new SSH key: ssh-keygen -t rsa -b 4096 -f '{}'\n2. Or specify an existing key path using --ssh-key-path\n3. Alternative locations checked: {}", |
| 111 | + expected_path.display(), |
| 112 | + expected_path.display(), |
| 113 | + alternative_paths.iter().map(|p| format!("'{}'", p.display())).collect::<Vec<_>>().join(", ")) |
| 114 | + }, |
| 115 | + Self::InsufficientPermissions { required_permissions, current_user } => { |
| 116 | + write!(f, "User '{}' lacks required permissions: {}\n\nTo fix this:\n1. Add your user to the required group: sudo usermod -aG lxd {}\n2. Log out and log back in to apply group changes\n3. Or run with sudo (not recommended for regular use)", |
| 117 | + current_user, required_permissions, current_user) |
| 118 | + }, |
| 119 | + } |
| 120 | + } |
| 121 | +} |
| 122 | +``` |
| 123 | + |
| 124 | +## 🛠️ Implementation Guidelines |
| 125 | + |
| 126 | +### Prefer Explicit Enum Errors with Thiserror |
| 127 | + |
| 128 | +Use explicit, strongly-typed enum errors with the `thiserror` crate instead of generic string-based errors for better pattern matching, automatic `Display` implementation, and error handling. |
| 129 | + |
| 130 | +#### Using Thiserror for Error Definitions |
| 131 | + |
| 132 | +The `thiserror` crate provides powerful macros for defining structured errors: |
| 133 | + |
| 134 | +```rust |
| 135 | +use thiserror::Error; |
| 136 | + |
| 137 | +#[derive(Debug, Error)] |
| 138 | +pub enum TemplateManagerError { |
| 139 | + #[error("Failed to create templates directory: {path}")] |
| 140 | + DirectoryCreation { |
| 141 | + path: String, |
| 142 | + #[source] |
| 143 | + source: std::io::Error, |
| 144 | + }, |
| 145 | + |
| 146 | + #[error("Template file not found in embedded resources: {relative_path}")] |
| 147 | + TemplateNotFound { relative_path: String }, |
| 148 | + |
| 149 | + #[error("Invalid UTF-8 in embedded template: {relative_path}")] |
| 150 | + InvalidUtf8 { |
| 151 | + relative_path: String, |
| 152 | + #[source] |
| 153 | + source: std::str::Utf8Error, |
| 154 | + }, |
| 155 | + |
| 156 | + #[error("Failed to write template file: {path}")] |
| 157 | + TemplateWrite { |
| 158 | + path: String, |
| 159 | + #[source] |
| 160 | + source: std::io::Error, |
| 161 | + }, |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +#### Key Benefits of Thiserror |
| 166 | + |
| 167 | +- **Automatic `Display`**: The `#[error("...")]` attribute generates the `Display` implementation |
| 168 | +- **Source Error Chaining**: The `#[source]` attribute maintains error chains for traceability |
| 169 | +- **Structured Data**: Variant fields provide context and enable pattern matching |
| 170 | +- **Better Debugging**: Automatic `Error` trait implementation with proper source chaining |
| 171 | + |
| 172 | +#### When to Use Enum Errors |
| 173 | + |
| 174 | +- **Recoverable errors**: When callers can take specific actions based on error type |
| 175 | +- **Domain-specific errors**: When errors are specific to your application domain |
| 176 | +- **Pattern matching**: When you need different handling for different error cases |
| 177 | +- **API boundaries**: When errors cross module or crate boundaries |
| 178 | + |
| 179 | +```rust |
| 180 | +// ✅ Preferred: Explicit enum with context using thiserror |
| 181 | +#[derive(Debug, Error)] |
| 182 | +pub enum ConfigValidationError { |
| 183 | + #[error("Missing required field '{field}' in section '{section}'")] |
| 184 | + MissingField { field: String, section: String }, |
| 185 | + |
| 186 | + #[error("Field '{field}' has invalid value '{value}', expected: {expected}")] |
| 187 | + InvalidValue { field: String, value: String, expected: String }, |
| 188 | + |
| 189 | + #[error("Cannot access configuration file: {path}")] |
| 190 | + FileAccessError { |
| 191 | + path: PathBuf, |
| 192 | + #[source] |
| 193 | + source: std::io::Error |
| 194 | + }, |
| 195 | +} |
| 196 | + |
| 197 | +// Allows for precise error handling |
| 198 | +match config_result { |
| 199 | + Err(ConfigValidationError::MissingField { field, section }) => { |
| 200 | + println!("Please add '{}' to the '{}' section", field, section); |
| 201 | + }, |
| 202 | + Err(ConfigValidationError::InvalidValue { field, value, expected }) => { |
| 203 | + println!("Field '{}' has invalid value '{}', expected: {}", field, value, expected); |
| 204 | + }, |
| 205 | + // ... handle other cases |
| 206 | +} |
| 207 | +``` |
| 208 | + |
| 209 | +### Source Error Preservation for Traceability |
| 210 | + |
| 211 | +Always include source errors when wrapping underlying errors. This maintains the full error chain and enables complete traceability. |
| 212 | + |
| 213 | +#### ✅ Good: Preserve source errors |
| 214 | + |
| 215 | +```rust |
| 216 | +#[derive(Debug, Error)] |
| 217 | +pub enum DeploymentError { |
| 218 | + #[error("Failed to read SSH key from {path}")] |
| 219 | + SshKeyRead { |
| 220 | + path: PathBuf, |
| 221 | + #[source] // Preserves the original I/O error |
| 222 | + source: std::io::Error, |
| 223 | + }, |
| 224 | + |
| 225 | + #[error("Network operation '{operation}' failed")] |
| 226 | + NetworkError { |
| 227 | + operation: String, |
| 228 | + #[source] // Preserves the original network error |
| 229 | + source: reqwest::Error, |
| 230 | + }, |
| 231 | +} |
| 232 | + |
| 233 | +// Usage with source preservation |
| 234 | +fn read_ssh_key(path: &Path) -> Result<String, DeploymentError> { |
| 235 | + std::fs::read_to_string(path) |
| 236 | + .map_err(|source| DeploymentError::SshKeyRead { |
| 237 | + path: path.to_path_buf(), |
| 238 | + source, // Original error is preserved |
| 239 | + }) |
| 240 | +} |
| 241 | +``` |
| 242 | + |
| 243 | +#### ❌ Bad: Losing source information |
| 244 | + |
| 245 | +```rust |
| 246 | +// Don't do this - loses original error information |
| 247 | +fn read_ssh_key(path: &Path) -> Result<String, DeploymentError> { |
| 248 | + std::fs::read_to_string(path) |
| 249 | + .map_err(|e| DeploymentError::SshKeyRead { |
| 250 | + path: path.to_path_buf(), |
| 251 | + // Missing source - loses traceability! |
| 252 | + }) |
| 253 | +} |
| 254 | +``` |
| 255 | + |
| 256 | +### When to Use Anyhow |
| 257 | + |
| 258 | +Use `anyhow` only when the caller cannot do anything meaningful to handle different error types, even with pattern matching. |
| 259 | + |
| 260 | +#### Appropriate Use Cases |
| 261 | + |
| 262 | +- **Utility functions**: Internal helpers where specific error handling isn't needed |
| 263 | +- **One-way operations**: When all errors should bubble up unchanged |
| 264 | +- **Rapid prototyping**: Early development phases (but migrate to enums later) |
| 265 | +- **External library integration**: When wrapping third-party errors temporarily |
| 266 | + |
| 267 | +```rust |
| 268 | +// ✅ Acceptable: Internal utility where caller can't handle specifics |
| 269 | +fn read_and_parse_internal_cache() -> anyhow::Result<CacheData> { |
| 270 | + let content = std::fs::read_to_string("cache.json")?; |
| 271 | + let data = serde_json::from_str(&content)?; |
| 272 | + Ok(data) |
| 273 | +} |
| 274 | + |
| 275 | +// ✅ Public API should use enums |
| 276 | +pub fn load_user_config(path: &Path) -> Result<Config, ConfigError> { |
| 277 | + let content = std::fs::read_to_string(path) |
| 278 | + .map_err(|e| ConfigError::FileAccess { path: path.to_path_buf(), source: e })?; |
| 279 | + |
| 280 | + serde_json::from_str(&content) |
| 281 | + .map_err(|e| ConfigError::InvalidJson { path: path.to_path_buf(), source: e }) |
| 282 | +} |
| 283 | +``` |
| 284 | + |
| 285 | +### Error Conversion Patterns |
| 286 | + |
| 287 | +```rust |
| 288 | +// Convert anyhow errors to domain errors at boundaries |
| 289 | +impl From<anyhow::Error> for DeploymentError { |
| 290 | + fn from(err: anyhow::Error) -> Self { |
| 291 | + DeploymentError::InternalError { |
| 292 | + message: err.to_string(), |
| 293 | + context: format!("{:?}", err.chain().collect::<Vec<_>>()) |
| 294 | + } |
| 295 | + } |
| 296 | +} |
| 297 | +``` |
| 298 | + |
| 299 | +## 📋 Error Review Checklist |
| 300 | + |
| 301 | +When reviewing error handling code, verify: |
| 302 | + |
| 303 | +- [ ] **Clarity**: Is the error message clear and unambiguous? |
| 304 | +- [ ] **Context**: Does the error include sufficient context (what, where, when, why)? |
| 305 | +- [ ] **Actionability**: Does the error tell users how to fix it? |
| 306 | +- [ ] **Type Safety**: Are domain-specific errors using enums instead of strings? |
| 307 | +- [ ] **Thiserror Usage**: Are enum errors using `thiserror` with proper `#[error]` attributes? |
| 308 | +- [ ] **Source Preservation**: Are source errors preserved with `#[source]` for traceability? |
| 309 | +- [ ] **Pattern Matching**: Can callers handle different error cases appropriately? |
| 310 | +- [ ] **Consistency**: Does the error follow project conventions? |
| 311 | + |
| 312 | +## 🔗 Related Documentation |
| 313 | + |
| 314 | +- [Development Principles](../development-principles.md) - Core principles including observability and actionability |
| 315 | +- [Contributing Guidelines](./README.md) - General contribution guidelines |
| 316 | +- [Testing Conventions](./testing.md) - Testing error scenarios |
| 317 | + |
| 318 | +## 📚 Examples in Codebase |
| 319 | + |
| 320 | +Look for error handling examples in: |
| 321 | + |
| 322 | +- `src/domain/` - Domain-specific error enums |
| 323 | +- `src/infrastructure/` - Infrastructure and I/O error handling |
| 324 | +- `src/application/commands/` - Command-level error aggregation |
| 325 | + |
| 326 | +## 🚀 Best Practices Summary |
| 327 | + |
| 328 | +1. **Design errors first**: Consider error cases during API design |
| 329 | +2. **Use enums by default**: Only use `anyhow` when justified |
| 330 | +3. **Include context**: Always provide enough information for diagnosis |
| 331 | +4. **Make errors actionable**: Tell users how to fix the problem |
| 332 | +5. **Test error paths**: Write tests for error scenarios |
| 333 | +6. **Document error types**: Document when and why specific errors occur |
| 334 | + |
| 335 | +By following these guidelines, we ensure that errors in the Torrust Tracker Deploy application are not just informative, but truly helpful in guiding users toward solutions. |
0 commit comments