diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/LICENSE b/security/security-design/shared-assets/oci-security-health-check-forensics/LICENSE new file mode 100644 index 000000000..5c3003e43 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/README.md b/security/security-design/shared-assets/oci-security-health-check-forensics/README.md new file mode 100644 index 000000000..62ba840da --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/README.md @@ -0,0 +1,284 @@ +# SHOW_OCI CSV Query Tool + +The SHOW_OCI Query Tool is designed to load and analyze data from Oracle Cloud Infrastructure (OCI) environments using SQL. This tool enables users to import CSV files containing OCI resource information (e.g., compute instances, users, compartments) and perform SQL queries on the data. + +## Features +- Automatic OCI data fetching using showoci integration +- **Audit events** and **Cloud Guard problems** fetching with parallel processing +- Advanced filtering capabilities for age-based and compartment analysis +- - Load CSV files with OCI data from multiple tenancies +- Execute SQL queries on the loaded data using DuckDB backend. Stay tuned for autonomous DB support. +- Support for `SHOW TABLES` and `DESCRIBE table_name` commands +- Interactive tenancy selection from combined OCI configuration files +- Command history and help system +- Batch query execution from YAML files + +The tool will be used for forensic purposes. Data can be collected by the customer and shipped to Oracle for forensic research. + +The tool is in development and the following is on the backlog: +- Switch back-end DB for large data sets. ADB support. +- Customer documentation to extract data and ship to Oracle in a secure way + +## Know Errors +- Error shown when a query results in an empty data frame when a filter is applied. + +## Installation + +Clone the repository: +```bash +git clone +cd healthcheck-forensic +``` + +Set up a Python virtual environment and install dependencies: +```bash +python3.10 -m venv .venv +source .venv/bin/activate +pip install -r requirements.txt +``` + +The `requirements.txt` file contains dependencies for DuckDB, pandas, OCI SDK, and other required libraries. + +### OCI Configuration Files + +The tool now supports split OCI configuration: + +- **`~/.oci/config`**: Contains only the DEFAULT domain configuration +- **`qt_config`**: Contains additional tenancy configurations + +The tool automatically combines these files when selecting tenancies. This separation allows you to keep your main OCI config clean while managing multiple tenancies in a separate file. + +## Usage + +### Running the Tool + +To start the tool, use: +```bash +python healthcheck_forensic_tool.py +``` +### Interactive Mode + +The tool supports an interactive mode for running SQL queries dynamically. Available commands include: + +#### Basic Commands +- `show tables`: Lists all loaded tables +- `describe `: Displays columns and data types for a given table +- `history`: Shows command history +- `help [command]`: Shows help for commands +- `exit` or `quit`: Exits the application + +#### Data Management +- `set tenancy`: Switch between different OCI tenancies +- `set queries [directory]`: Load queries from YAML files for batch execution +- `run queries`: Execute all loaded queries in sequence + +#### Data Fetching +- `audit_events fetch `: Fetch of audit events prior to specified date. +- `audit_events fetch`: Interactive loader for existing audit data +- `audit_events delete`: Delete audit events files and tables +- `cloudguard fetch `: Fetch of cloud guard problems prior to specified date. +- `cloudguard fetch`: Interactive loader for existing Cloud Guard data +- `cloudguard delete`: Delete Cloud Guard files and tables + +#### Filtering and Analysis +- `filter age `: Filter results by date age +- `filter compartment `: Analyze compartment structures + - `root`: Show root compartment + - `depth`: Show maximum depth + - `tree_view`: Display compartment tree + - `path_to `: Show path to specific compartment + +### Command-line Switches + +| Switch | Description | +|------------------|---------------------------------------------------| +| `--config-file` | Path to the configuration file (`config.yaml`). | +| `--interactive` | Enable interactive SQL mode. | + +Example usage: +```bash +python healthcheck_forensic_tool.py +``` + +## Configuration Options (`config.yaml`) + +| Setting | Description | +|----------------------------|-------------| +| `oci_config_file` | Path to the main OCI config file (default: `~/.oci/config`) | +| `tqt_config_file` | Path to the additional tenancies config file (default: `config/qt_config`) | +| `csv_dir` | Directory for CSV files | +| `prefix` | Filename prefix for filtering CSVs | +| `resource_argument` | Resource argument for showoci (a: all, i: identity, n: network, c: compute, etc.) | +| `delimiter` | Delimiter used in CSV files | +| `case_insensitive_headers` | Convert column headers to lowercase | +| `log_level` | Logging level (`INFO`, `DEBUG`, `ERROR`) | +| `interactive` | Enable interactive mode | +| `audit_worker_count` | Number of parallel workers for audit/Cloud Guard fetching (default: 10) | +| `audit_worker_window` | Hours per batch for parallel fetching (default: 1) | + +### Example `config.yaml` +```yaml +# OCI Configuration +oci_config_file: "~/.oci/config" # Main OCI config (DEFAULT domain) +tqt_config_file: "qt_config" # Additional tenancies + +# Data Management +csv_dir: "data" +prefix: "oci" +resource_argument: "a" + +# Output Settings +output_format: "DataFrame" +log_level: "INFO" +delimiter: "," +case_insensitive_headers: true + +# Interactive Mode +interactive: true + +# Parallel Fetching Configuration +audit_worker_count: 10 +audit_worker_window: 1 +``` + +## Predefined Queries + +Queries can be defined in YAML files for batch execution. Example `queries.yaml`: +```yaml +queries: + - description: "List all users with API access" + sql: "SELECT display_name, can_use_api_keys FROM identity_domains_users WHERE can_use_api_keys = 1" + - description: "Show compute instances by compartment" + sql: "SELECT server_name, compartment_name, status FROM compute WHERE status = 'STOPPED'" + filter: "age last_modified older 30" + sql: "sql: "SELECT server_name, compartment_name, status FROM compute WHERE compartment_name = ''" +``` + +## Example Usage Scenarios + +### Getting Started +```bash +# Start the tool +python healthcheck_forensic_tool.py + +# Select tenancy and load data +# Tool will prompt for tenancy selection from combined configs + +# Basic exploration +CMD> show tables +CMD> describe identity_domains_users +CMD> SELECT COUNT(*) FROM compute; +``` + +### Data Fetching +```bash +# Fetch 2 days of audit events ending June 15, 2025 +CMD> audit_events fetch 15-06-2025 2 + +# Fetch 30 days of Cloud Guard problems ending January 1, 2025 +CMD> cloudguard fetch 01-01-2025 30 + +# Load existing audit data interactively +CMD> audit_events fetch +``` + +### Advanced Analysis +```bash +# Filter API keys older than 90 days +CMD> SELECT display_name, api_keys FROM identity_domains_users; +CMD> filter age api_keys older 90 + +# Analyze compartment structure +CMD> SELECT path FROM identity_compartments; +CMD> filter compartment tree_view +CMD> filter compartment path_to my-compartment +``` + +### Batch Operations +```bash +# Load and run predefined queries +CMD> set queries < Select a query file using the query file browser > +CMD> run queries + +# Switch between tenancies +CMD> set tenancy +``` + +## Data Organization + +The tool organizes data in the following structure: +``` +data/ +├── tenancy1/ +│ ├── tenancy1_20241215_143022/ +│ │ ├── oci_compute.csv +│ │ ├── oci_identity_domains_users.csv +│ │ ├── audit_events_15-06-2025_7.json +│ │ └── cloudguard_problems_15062025_7.json +│ └── tenancy1_20241214_091545/ +└── tenancy2/ + └── tenancy2_20241215_100530/ +``` + +## Logging + +Logging is configured via the `log_level` setting in `config.yaml`. The tool provides detailed logging for: +- Configuration loading and validation +- CSV file loading and table creation +- Query execution and results +- Data fetching operations with progress tracking +- Error handling and troubleshooting information + +## Troubleshooting + +### Common Issues + +**OCI Configuration Problems** +- Ensure both `~/.oci/config` and `config/qt_config` exist and are readable +- Verify that tenancy profiles are properly configured with required keys +- Check that API keys and permissions are correctly set up + +**CSV Loading Issues** +- Ensure CSV files are properly formatted with consistent delimiters +- Column names in queries should match those in the loaded data (case-sensitive by default) +- Check that the specified prefix matches your CSV file naming convention + +**Data Fetching Problems** +- Verify OCI permissions for audit events and Cloud Guard APIs +- Check network connectivity and OCI service availability +- Ensure the date range doesn't exceed OCI's retention periods (365 days for audit events) + +**Query Execution** +- Use DuckDB-compatible SQL syntax +- Table names are derived from CSV filenames (minus prefix and extension) +- Check available tables with `show tables` and column structure with `describe ` + +### Getting Help + +For detailed command help: +```bash +CMD> help # Show all commands +CMD> help audit_events fetch # Show audit_events fetch options +CMD> help filter age # Show filter age options +``` + +## Advanced Features + +### Parallel Data Fetching +The tool supports parallel fetching for large datasets: +- Configurable worker count and time windows +- Progress tracking with detailed summaries +- Automatic retry handling for failed intervals +- Clean temporary file management + +### Smart Configuration Management +- Automatic detection and combination of split OCI configs +- Interactive tenancy selection with metadata display +- Temporary file creation for showoci integration +- Graceful handling of missing or invalid configurations + +### Comprehensive Filtering +- Date-based filtering with flexible column support +- Compartment hierarchy analysis and visualization +- Support for complex nested data structures +- Chainable filter operations on query results \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/__init__.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/__init__.py new file mode 100644 index 000000000..c4c5fe72a --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/__init__.py @@ -0,0 +1 @@ +from .output_formatter import OutputFormatter \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/api_key_filter.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/api_key_filter.py new file mode 100644 index 000000000..3aa6ad55c --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/api_key_filter.py @@ -0,0 +1,144 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +api_key_filter.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import pandas as pd +from datetime import datetime, timedelta +import re + +class ApiKeyFilter: + def __init__(self, column_name='api_keys', age_days=90, mode='older'): + """ + Initialize the ApiKeyFilter. + + Parameters: + - column_name (str): The name of the column containing API keys. + - age_days (int): The age threshold in days. + - mode (str): Either 'older' or 'younger' to filter dates accordingly. + 'older' shows dates older than age_days + 'younger' shows dates younger than or equal to age_days + """ + self.column_name = column_name + self.age_days = age_days + self.mode = mode.lower() + self.age_months = self.calculate_months(age_days) + + @staticmethod + def calculate_months(age_days): + """ + Calculate the number of months from the given days. + + Parameters: + - age_days (int): The number of days. + + Returns: + - int: The equivalent number of months. + """ + return (age_days + 29) // 30 # Round up to the nearest month + + def filter(self, df): + """ + Filter the DataFrame based on the age of API keys. + + Parameters: + - df (pd.DataFrame): The DataFrame to filter. + + Returns: + - pd.DataFrame: The filtered DataFrame. + """ + # Define the date threshold + today = datetime.now() + threshold_date = today - timedelta(days=self.age_days) + + # Check if the specified column exists in the DataFrame + if self.column_name not in df.columns: + print(f"Error: Column '{self.column_name}' does not exist in the DataFrame.") + return df + + # Extract the dates from the specified column + def extract_dates(key_str): + dates = [] + if pd.isnull(key_str): + return dates + + # Handle different formats by splitting entries by comma + entries = [entry.strip() for entry in key_str.split(',') if entry.strip()] + + date_formats = ['%Y-%m-%d %H:%M:%S', '%Y-%m-%d %H:%M'] + date_pattern = r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}(?::\d{2})?)' + + for entry in entries: + try: + # Case 1: Just a date string + if re.match(r'^\d{4}-\d{2}-\d{2} \d{2}:\d{2}(:\d{2})?$', entry.strip()): + for fmt in date_formats: + try: + date = datetime.strptime(entry.strip(), fmt) + dates.append(date) + break + except ValueError: + continue + + # Case 2: OCID with date (separated by spaces) + else: + # Look for date pattern in the entry + date_matches = re.findall(date_pattern, entry) + if date_matches: + for date_str in date_matches: + for fmt in date_formats: + try: + date = datetime.strptime(date_str, fmt) + dates.append(date) + break + except ValueError: + continue + # Fall back to original colon-based parsing if no date pattern found + elif ':' in entry: + # Split on the first occurrence of ':' + parts = entry.split(':', 1) + if len(parts) > 1: + date_part = parts[1].strip() + for fmt in date_formats: + try: + date = datetime.strptime(date_part, fmt) + dates.append(date) + break + except ValueError: + continue + else: + print(f"Warning: No valid date format found in entry: '{entry}'") + except Exception as e: + print(f"Error parsing date in entry: '{entry}', error: {e}") + + return dates + + # Apply the date extraction to the specified column + df['key_dates'] = df[self.column_name].apply(extract_dates) + + # Determine if any keys match the age criteria based on mode + def check_dates(dates_list): + if not dates_list: + return False + for date in dates_list: + if self.mode == 'older' and date <= threshold_date: + return True + elif self.mode == 'younger' and date >= threshold_date: # Changed from > to >= for inclusive younger + return True + return False + + # Apply the filter to the DataFrame + mask = df['key_dates'].apply(check_dates) + + # Keep rows where the condition is met + filtered_df = df[mask].copy() + + # Drop the temporary 'key_dates' column + filtered_df.drop(columns=['key_dates'], inplace=True) + + return filtered_df \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/audit_fetcher.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/audit_fetcher.py new file mode 100644 index 000000000..48666e8e4 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/audit_fetcher.py @@ -0,0 +1,403 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +audit_fetcher.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import glob +import json +import logging +import os +import time + +from datetime import datetime, timedelta, timezone +from concurrent.futures import ThreadPoolExecutor, as_completed +from typing import List, Tuple + +import oci +from oci.util import to_dict + +# Configure module-level logger +logger = logging.getLogger(__name__) +logger.setLevel(logging.INFO) + +class AuditFetcher: + """ + Fetch OCI Audit logs in parallel batches and consolidate into a single JSON file. + The window is completely prior to the reference_date (end date). + + Attributes: + reference_date (datetime): End date for retrieval window (UTC). + window (int): Total window size in days prior to reference_date. + workers (int): Number of parallel worker threads. + worker_window (int): Hours per batch. + config (dict): OCI config loaded from file. + compartment_id (str): Tenancy OCID from config. + audit_client (AuditClient): OCI Audit client. + intervals (List[Tuple[datetime, datetime]]): List of (start, end) batches. + verbose (bool): Whether to print detailed progress messages. + status_messages (List[str]): Collected status messages for summary. + """ + def __init__( + self, + reference_date: str, + window: int, + workers: int, + worker_window: int, + profile_name: str = "DEFAULT", + config_file: str = None, + verbose: bool = True + ): + # Parse reference date (this becomes the END date) + try: + self.reference_date = datetime.strptime(reference_date, "%d-%m-%Y").replace(tzinfo=timezone.utc) + except ValueError as ve: + raise ValueError(f"Invalid reference_date format: {ve}") + + self.window = window + self.workers = workers + self.worker_window = worker_window + self.verbose = verbose # Set from parameter instead of defaulting to True + self.status_messages = [] # Store messages for later summary + + # Calculate start and end times (window days BEFORE reference_date) + self.end_time = self.reference_date.replace( + hour=0, minute=0, second=0, microsecond=0 + ) + self.start_time = (self.reference_date - timedelta(days=window)).replace( + hour=0, minute=0, second=0, microsecond=0 + ) + + self._log(f"Audit search window: {self.start_time.strftime('%Y-%m-%d %H:%M:%S UTC')} to {self.end_time.strftime('%Y-%m-%d %H:%M:%S UTC')}") + self._log(f"Window duration: {window} days prior to {self.reference_date.strftime('%Y-%m-%d')}") + + # Load OCI configuration + if config_file: + cfg_location = os.path.expanduser(config_file) + self.config = oci.config.from_file(file_location=cfg_location, profile_name=profile_name) + else: + self.config = oci.config.from_file(profile_name=profile_name) + + self.compartment_id = self.config.get("tenancy") + self.audit_client = oci.audit.AuditClient( + self.config, + retry_strategy=oci.retry.DEFAULT_RETRY_STRATEGY + ) + + # Prepare batch intervals + self.intervals = self._generate_intervals() + + def _log(self, message, level="INFO"): + """Store messages for later display instead of immediate printing when in quiet mode""" + self.status_messages.append(f"[{level}] {message}") + + # Only print immediately if in verbose mode + if self.verbose: + if level == "ERROR": + print(f"ERROR: {message}") + else: + print(message) + + def _generate_intervals(self) -> List[Tuple[datetime, datetime]]: + """Generate a list of (start, end) datetime tuples for each worker batch.""" + intervals: List[Tuple[datetime, datetime]] = [] + current = self.start_time + delta = timedelta(hours=self.worker_window) + + if self.verbose: + self._log(f"Generating audit intervals with {self.worker_window}-hour chunks...") + + while current < self.end_time: + next_end = min(current + delta, self.end_time) + intervals.append((current, next_end)) + if self.verbose: + self._log(f" Interval: {current.strftime('%Y-%m-%d %H:%M')} to {next_end.strftime('%Y-%m-%d %H:%M')}") + current = next_end + + self._log(f"Total audit intervals: {len(intervals)}") + return intervals + + def _fetch_and_write_events(self, start: datetime, end: datetime) -> Tuple[bool, str, str]: + """ + Fetch audit events for a single time window and write to a temp JSON file. + + Returns: + tuple: (success: bool, result: str, timeframe_string: str) + """ + timeframe_string = f"{start.strftime('%d-%m-%Y %H:%M')},{end.strftime('%d-%m-%Y %H:%M')}" + + try: + # Only log fetch attempts in verbose mode + if self.verbose: + self._log(f"Fetching audit events from {start.strftime('%Y-%m-%d %H:%M')} to {end.strftime('%Y-%m-%d %H:%M')}") + + # Use OCI pagination helper to get all events in this interval + response = oci.pagination.list_call_get_all_results( + self.audit_client.list_events, + compartment_id=self.compartment_id, + start_time=start, + end_time=end + ) + events = response.data + logger.info(f"Fetched {len(events)} events from {start} to {end}") + + # Convert to serializable dicts + dicts = [to_dict(ev) for ev in events] + + # Write to temporary file + filename = f"audit_events_{start.strftime('%Y-%m-%dT%H-%M')}_to_{end.strftime('%Y-%m-%dT%H-%M')}.json" + with open(filename, 'w', encoding='utf-8') as f: + json.dump(dicts, f, indent=2) + + # Store detailed results for summary (always store, regardless of verbose mode) + result_msg = f"✓ {start.strftime('%Y-%m-%d %H:%M')}-{end.strftime('%H:%M')}: {len(dicts)} events → {filename}" + self.status_messages.append(result_msg) + + # Only print immediately if verbose + if self.verbose: + print(f" → Found {len(dicts)} audit events, saved to {filename}") + + return (True, filename, timeframe_string) + + except Exception as e: + error_msg = f"Error fetching audit events {start.strftime('%Y-%m-%d %H:%M')} to {end.strftime('%Y-%m-%d %H:%M')}: {e}" + logger.error(error_msg) + self.status_messages.append(f"{error_msg}") + + if self.verbose: + print(error_msg) + return (False, error_msg, timeframe_string) + + def run(self, output_file: str, progress_callback=None) -> Tuple[str, List[str]]: + """ + Execute the fetcher across all intervals and consolidate into a single JSON file. + + Args: + output_file (str): Path to final consolidated JSON file. + progress_callback (callable): Optional function called with each completed batch index. + + Returns: + tuple: (output_file_path: str, failed_timeframes: list) + """ + if self.verbose: + print(f"\nStarting parallel audit fetch with {self.workers} workers...") + print(f"Target output file: {output_file}") + + temp_files: List[str] = [] + failed_timeframes: List[str] = [] + + # Parallel fetching + with ThreadPoolExecutor(max_workers=self.workers) as executor: + future_to_idx = { + executor.submit(self._fetch_and_write_events, s, e): idx + for idx, (s, e) in enumerate(self.intervals) + } + + completed = 0 + total = len(self.intervals) + + for future in as_completed(future_to_idx): + idx = future_to_idx[future] + try: + success, result, timeframe_string = future.result() + + if success: + temp_files.append(result) + else: + failed_timeframes.append(timeframe_string) + # Store failure in status messages + self.status_messages.append(f"FAILED AUDIT TIMEFRAME: {timeframe_string}") + + # Only print immediately if verbose + if self.verbose: + print(f"FAILED AUDIT TIMEFRAME: {timeframe_string}") + print(f"Error: {result}") + + completed += 1 + + # Only show progress in verbose mode (progress bar handles this in quiet mode) + if self.verbose: + print(f"Progress: {completed}/{total} audit intervals completed") + + if progress_callback: + progress_callback(idx) + + except Exception as e: + logger.error(f"Audit batch {idx} exception: {e}") + if self.verbose: + print(f"EXCEPTION in audit batch {idx}: {e}") + + # Consolidate + self._log(f"Consolidating {len(temp_files)} audit temporary files...") + all_events = [] + + for tf in temp_files: + try: + with open(tf, 'r', encoding='utf-8') as f: + batch_events = json.load(f) + all_events.extend(batch_events) + if self.verbose: + self._log(f" → Added {len(batch_events)} audit events from {tf}") + except Exception as e: + logger.error(f"Error reading audit temp file {tf}: {e}") + self._log(f"Error reading audit temp file {tf}: {e}", "ERROR") + + # Sort by event_time if present + self._log(f"Sorting {len(all_events)} total audit events by event time...") + all_events.sort(key=lambda ev: ev.get('eventTime', ev.get('event_time', ''))) + + # Write final file + try: + os.makedirs(os.path.dirname(output_file), exist_ok=True) + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(all_events, f, indent=2) + logger.info(f"Consolidated {len(all_events)} events to {output_file}") + + self._log(f"✓ Consolidated audit file written: {output_file}") + self._log(f"✓ Total audit events found: {len(all_events)}") + + # Show date range of actual data + if all_events: + first_event = all_events[0].get('eventTime', all_events[0].get('event_time', 'Unknown')) + last_event = all_events[-1].get('eventTime', all_events[-1].get('event_time', 'Unknown')) + self._log(f"✓ Event time range: {first_event} to {last_event}") + + return (output_file, failed_timeframes) + except Exception as e: + logger.error(f"Error writing consolidated audit file: {e}") + self._log(f"Error writing consolidated audit file: {e}", "ERROR") + return ("", failed_timeframes) + + def retry_failed_timeframes(self, failed_timeframes: List[str], output_file: str = None) -> Tuple[int, List[str]]: + """ + Retry fetching for specific failed timeframes. + + Args: + failed_timeframes: List of timeframe strings in format "DD-MM-YYYY HH:MM,DD-MM-YYYY HH:MM" + output_file: Optional output file for retry results + + Returns: + tuple: (success_count: int, still_failed: list) + """ + print(f"\n{'='*60}") + print(f"RETRYING {len(failed_timeframes)} FAILED AUDIT TIMEFRAMES") + print(f"{'='*60}") + + retry_intervals = [] + invalid_timeframes = [] + + # Parse timeframe strings back to datetime objects + for tf_string in failed_timeframes: + try: + start_str, end_str = tf_string.split(',') + start_dt = datetime.strptime(start_str, "%d-%m-%Y %H:%M").replace(tzinfo=timezone.utc) + end_dt = datetime.strptime(end_str, "%d-%m-%Y %H:%M").replace(tzinfo=timezone.utc) + retry_intervals.append((start_dt, end_dt)) + print(f" Queued audit retry: {start_dt.strftime('%Y-%m-%d %H:%M')} to {end_dt.strftime('%Y-%m-%d %H:%M')}") + except Exception as e: + print(f" Invalid audit timeframe format '{tf_string}': {e}") + invalid_timeframes.append(tf_string) + + if not retry_intervals: + print("No valid audit timeframes to retry.") + return (0, invalid_timeframes) + + # Execute retries + temp_files = [] + still_failed = [] + + with ThreadPoolExecutor(max_workers=self.workers) as executor: + future_to_timeframe = { + executor.submit(self._fetch_and_write_events, start, end): (start, end) + for start, end in retry_intervals + } + + for future in as_completed(future_to_timeframe): + start, end = future_to_timeframe[future] + success, result, timeframe_string = future.result() + + if success: + temp_files.append(result) + print(f" AUDIT SUCCESS: {timeframe_string}") + else: + still_failed.append(timeframe_string) + print(f" AUDIT STILL FAILED: {timeframe_string}") + + # Consolidate retry results if requested + if output_file and temp_files: + print(f"\nConsolidating {len(temp_files)} audit retry results...") + all_events = [] + + for tf in temp_files: + try: + with open(tf, 'r', encoding='utf-8') as f: + batch_events = json.load(f) + all_events.extend(batch_events) + except Exception as e: + print(f"Error reading audit retry temp file {tf}: {e}") + + # Sort and write + all_events.sort(key=lambda ev: ev.get('eventTime', ev.get('event_time', ''))) + + try: + os.makedirs(os.path.dirname(output_file), exist_ok=True) + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(all_events, f, indent=2) + print(f"✓ Audit retry results written to: {output_file}") + print(f"✓ Total retry audit events found: {len(all_events)}") + except Exception as e: + print(f"Error writing audit retry file: {e}") + + # Report final status + success_count = len(retry_intervals) - len(still_failed) + still_failed.extend(invalid_timeframes) # Include invalid formats + + print(f"\n{'='*60}") + print(f" AUDIT RETRY SUMMARY") + print(f"{'='*60}") + print(f" Successful audit retries: {success_count}") + print(f" Still failed audit: {len(still_failed)}") + + if still_failed: + print("\nAudit timeframes still failing:") + print("STILL_FAILED_AUDIT_TIMEFRAMES = [") + for tf in still_failed: + print(f' "{tf}",') + print("]") + + return (success_count, still_failed) + + def cleanup(self) -> None: + """Remove all temporary batch files matching the audit events pattern.""" + pattern = "audit_events_*_to_*.json" + temp_files = glob.glob(pattern) + + if temp_files: + self._log(f"Cleaning up {len(temp_files)} audit temporary files...") + for tmp in temp_files: + try: + os.remove(tmp) + logger.debug(f"Removed audit temp file {tmp}") + if self.verbose: + self._log(f" → Removed {tmp}") + except Exception as e: + logger.error(f"Failed to remove audit temp file {tmp}: {e}") + self._log(f" → Failed to remove {tmp}: {e}", "ERROR") + else: + self._log("No audit temporary files to clean up.") + + def get_date_range_info(self) -> dict: + """Return information about the calculated date range.""" + return { + "reference_date": self.reference_date.strftime('%Y-%m-%d'), + "window_days": self.window, + "start_time": self.start_time.strftime('%Y-%m-%d %H:%M:%S UTC'), + "end_time": self.end_time.strftime('%Y-%m-%d %H:%M:%S UTC'), + "total_hours": (self.end_time - self.start_time).total_seconds() / 3600, + "worker_window_hours": self.worker_window, + "number_of_intervals": len(self.intervals) + } \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/cloudguard_fetcher.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/cloudguard_fetcher.py new file mode 100644 index 000000000..26576c80b --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/cloudguard_fetcher.py @@ -0,0 +1,364 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +cloudguard_fetcher.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import oci +import json +import os +import glob +from datetime import datetime, timedelta, timezone +from oci.util import to_dict +from oci.pagination import list_call_get_all_results +from concurrent.futures import ThreadPoolExecutor, as_completed + +class CloudGuardFetcher: + """ + Fetch OCI Cloud Guard problems in parallel batches and consolidate into a single JSON file. + The window is completely prior to the reference_date (end date). + """ + + def __init__( + self, + reference_date: str, + window: int, + workers: int, + worker_window: int, + profile_name: str = "DEFAULT", + config_file: str = None + ): + # Initialize status tracking + self.status_messages = [] + self.verbose = True # Set to False to suppress interval generation messages + + # Parse reference date (this becomes the END date) + try: + self.reference_date = datetime.strptime(reference_date, "%d-%m-%Y").replace(tzinfo=timezone.utc) + except ValueError as ve: + raise ValueError(f"Invalid reference_date format: {ve}") + + self.window = window + self.workers = workers + self.worker_window = worker_window + + # Calculate start and end times (window days BEFORE reference_date) + self.end_time = self.reference_date.replace( + hour=0, minute=0, second=0, microsecond=0 + ) + self.start_time = (self.reference_date - timedelta(days=window)).replace( + hour=0, minute=0, second=0, microsecond=0 + ) + + self._log(f"Search window: {self.start_time.strftime('%Y-%m-%d %H:%M:%S UTC')} to {self.end_time.strftime('%Y-%m-%d %H:%M:%S UTC')}") + self._log(f"Window duration: {window} days prior to {self.reference_date.strftime('%Y-%m-%d')}") + + # Load OCI config + if config_file: + cfg_loc = os.path.expanduser(config_file) + self.config = oci.config.from_file(file_location=cfg_loc, profile_name=profile_name) + else: + self.config = oci.config.from_file(profile_name=profile_name) + self.compartment_id = self.config.get("tenancy") + self.client = oci.cloud_guard.CloudGuardClient( + self.config, + retry_strategy=oci.retry.DEFAULT_RETRY_STRATEGY + ) + + # Prepare batch intervals + self.intervals = self._generate_intervals() + + def _log(self, message, level="INFO"): + """Store messages for later display instead of immediate printing""" + self.status_messages.append(f"[{level}] {message}") + + def _print_summary_report(self): + """Print collected status messages as a summary report""" + if not self.status_messages: + return + + print("\n" + "=" * 80) + print("CLOUD GUARD FETCH SUMMARY REPORT") + print("=" * 80) + for msg in self.status_messages: + print(msg) + print("=" * 80) + + def _generate_intervals(self): + """Generate time intervals for parallel processing.""" + intervals = [] + current = self.start_time + delta = timedelta(hours=self.worker_window) + + if self.verbose: + self._log(f"Generating intervals with {self.worker_window}-hour chunks...") + + while current < self.end_time: + next_end = min(current + delta, self.end_time) + intervals.append((current, next_end)) + if self.verbose: + self._log(f" Interval: {current.strftime('%Y-%m-%d %H:%M')} to {next_end.strftime('%Y-%m-%d %H:%M')}") + current = next_end + + self._log(f"Total intervals: {len(intervals)}") + return intervals + + def _fetch_and_write(self, start: datetime, end: datetime) -> tuple: + """ + Fetch problems for a single time window and write to a temp JSON file. + + Uses the correct parameters `time_last_detected_greater_than_or_equal_to` and + `time_last_detected_less_than_or_equal_to` as per the Python SDK. + + Returns: + tuple: (success: bool, result: str, timeframe_string: str) + """ + timeframe_string = f"{start.strftime('%d-%m-%Y %H:%M')},{end.strftime('%d-%m-%Y %H:%M')}" + + try: + # Only log fetch attempts, not every individual fetch + response = list_call_get_all_results( + self.client.list_problems, + compartment_id=self.compartment_id, + time_last_detected_greater_than_or_equal_to=start, + time_last_detected_less_than_or_equal_to=end + ) + problems = response.data + dicts = [to_dict(p) for p in problems] + + fname = f"cloudguard_problems_{start.strftime('%Y-%m-%dT%H-%M')}_to_{end.strftime('%Y-%m-%dT%H-%M')}.json" + with open(fname, 'w', encoding='utf-8') as f: + json.dump(dicts, f, indent=2) + + # Store detailed results for summary + self._log(f"✓ {start.strftime('%Y-%m-%d %H:%M')}-{end.strftime('%H:%M')}: {len(dicts)} problems → {fname}") + return (True, fname, timeframe_string) + + except Exception as e: + error_msg = f"Error fetching Cloud Guard problems {start.strftime('%Y-%m-%d %H:%M')} to {end.strftime('%Y-%m-%d %H:%M')}: {e}" + self._log(error_msg, "ERROR") + return (False, error_msg, timeframe_string) + + def run(self, output_file: str, progress_callback=None) -> tuple: + """ + Execute the fetching process and consolidate results. + + Args: + output_file: Path for the final consolidated JSON file + progress_callback: Optional callback function for progress updates + + Returns: + tuple: (output_file_path: str, failed_timeframes: list) + """ + # Clear messages and start fresh + self.status_messages = [] + self._log(f"Starting parallel fetch with {self.workers} workers") + self._log(f"Target output file: {output_file}") + + temp_files = [] + failed_timeframes = [] + + with ThreadPoolExecutor(max_workers=self.workers) as executor: + future_to_idx = { + executor.submit(self._fetch_and_write, s, e): idx + for idx, (s, e) in enumerate(self.intervals) + } + + completed = 0 + total = len(self.intervals) + + for future in as_completed(future_to_idx): + idx = future_to_idx[future] + success, result, timeframe_string = future.result() + + if success: + temp_files.append(result) + else: + failed_timeframes.append(timeframe_string) + self._log(f" FAILED: {timeframe_string} - {result}", "ERROR") + + completed += 1 + + if progress_callback: + progress_callback(idx) + + # Consolidate all temp files + self._log(f"Consolidating {len(temp_files)} temporary files...") + all_items = [] + + for tf in temp_files: + try: + with open(tf, 'r', encoding='utf-8') as f: + batch_items = json.load(f) + all_items.extend(batch_items) + # Removed the detailed log per file to clean up output + except Exception as e: + self._log(f"Error reading temp file {tf}: {e}", "ERROR") + + # Sort by last detected time (chronological order) + self._log(f"Sorting {len(all_items)} total problems by detection time...") + all_items.sort(key=lambda ev: ev.get('timeLastDetected', ev.get('time_last_detected', ''))) + + # Write consolidated file + try: + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(all_items, f, indent=2) + self._log(f"✓ Consolidated file written: {output_file}") + self._log(f"✓ Total problems found: {len(all_items)}") + + # Show date range of actual data + if all_items: + first_detection = all_items[0].get('timeLastDetected', 'Unknown') + last_detection = all_items[-1].get('timeLastDetected', 'Unknown') + self._log(f"✓ Detection time range: {first_detection} to {last_detection}") + + # Show summary report after progress bar completes + self._print_summary_report() + + # Report failed timeframes after summary + if failed_timeframes: + print(f"\n{'='*60}") + print(f" {len(failed_timeframes)} TIMEFRAMES FAILED") + print(f"{'='*60}") + print("Copy and paste these timeframes to retry failed intervals:") + print("\nFAILED_TIMEFRAMES = [") + for tf in failed_timeframes: + print(f' "{tf}",') + print("]") + print(f"{'='*60}") + + return (output_file, failed_timeframes) + except Exception as e: + self._log(f"Error writing consolidated file: {e}", "ERROR") + self._print_summary_report() + return ("", failed_timeframes) + + def cleanup(self) -> None: + """Remove temporary files created during processing.""" + temp_pattern = "cloudguard_problems_*_to_*.json" + temp_files = glob.glob(temp_pattern) + + if temp_files: + self._log(f"Cleaning up {len(temp_files)} temporary files...") + for tmp in temp_files: + try: + os.remove(tmp) + self._log(f" → Removed {tmp}") + except Exception as e: + self._log(f" → Failed to remove {tmp}: {e}", "ERROR") + else: + self._log("No temporary files to clean up.") + + def retry_failed_timeframes(self, failed_timeframes: list, output_file: str = None) -> tuple: + """ + Retry fetching for specific failed timeframes. + + Args: + failed_timeframes: List of timeframe strings in format "DD-MM-YYYY HH:MM,DD-MM-YYYY HH:MM" + output_file: Optional output file for retry results + + Returns: + tuple: (success_count: int, still_failed: list) + """ + print(f"\n{'='*60}") + print(f" RETRYING {len(failed_timeframes)} FAILED TIMEFRAMES") + print(f"{'='*60}") + + retry_intervals = [] + invalid_timeframes = [] + + # Parse timeframe strings back to datetime objects + for tf_string in failed_timeframes: + try: + start_str, end_str = tf_string.split(',') + start_dt = datetime.strptime(start_str, "%d-%m-%Y %H:%M").replace(tzinfo=timezone.utc) + end_dt = datetime.strptime(end_str, "%d-%m-%Y %H:%M").replace(tzinfo=timezone.utc) + retry_intervals.append((start_dt, end_dt)) + print(f" Queued: {start_dt.strftime('%Y-%m-%d %H:%M')} to {end_dt.strftime('%Y-%m-%d %H:%M')}") + except Exception as e: + print(f" Invalid timeframe format '{tf_string}': {e}") + invalid_timeframes.append(tf_string) + + if not retry_intervals: + print("No valid timeframes to retry.") + return (0, invalid_timeframes) + + # Execute retries + temp_files = [] + still_failed = [] + + with ThreadPoolExecutor(max_workers=self.workers) as executor: + future_to_timeframe = { + executor.submit(self._fetch_and_write, start, end): (start, end) + for start, end in retry_intervals + } + + for future in as_completed(future_to_timeframe): + start, end = future_to_timeframe[future] + success, result, timeframe_string = future.result() + + if success: + temp_files.append(result) + print(f" SUCCESS: {timeframe_string}") + else: + still_failed.append(timeframe_string) + print(f" STILL FAILED: {timeframe_string}") + + # Consolidate retry results if requested + if output_file and temp_files: + print(f"\nConsolidating {len(temp_files)} retry results...") + all_items = [] + + for tf in temp_files: + try: + with open(tf, 'r', encoding='utf-8') as f: + batch_items = json.load(f) + all_items.extend(batch_items) + except Exception as e: + print(f"Error reading retry temp file {tf}: {e}") + + # Sort and write + all_items.sort(key=lambda ev: ev.get('timeLastDetected', ev.get('time_last_detected', ''))) + + try: + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(all_items, f, indent=2) + print(f"✓ Retry results written to: {output_file}") + print(f"✓ Total retry problems found: {len(all_items)}") + except Exception as e: + print(f"Error writing retry file: {e}") + + # Report final status + success_count = len(retry_intervals) - len(still_failed) + still_failed.extend(invalid_timeframes) # Include invalid formats + + print(f"\n{'='*60}") + print(f" RETRY SUMMARY") + print(f"{'='*60}") + print(f" Successful retries: {success_count}") + print(f" Still failed: {len(still_failed)}") + + if still_failed: + print("\nTimeframes still failing:") + print("STILL_FAILED_TIMEFRAMES = [") + for tf in still_failed: + print(f' "{tf}",') + print("]") + + return (success_count, still_failed) + + def get_date_range_info(self) -> dict: + """Return information about the calculated date range.""" + return { + "reference_date": self.reference_date.strftime('%Y-%m-%d'), + "window_days": self.window, + "start_time": self.start_time.strftime('%Y-%m-%d %H:%M:%S UTC'), + "end_time": self.end_time.strftime('%Y-%m-%d %H:%M:%S UTC'), + "total_hours": (self.end_time - self.start_time).total_seconds() / 3600, + "worker_window_hours": self.worker_window, + "number_of_intervals": len(self.intervals) + } \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/command_parser.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/command_parser.py new file mode 100644 index 000000000..829d3cc0b --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/command_parser.py @@ -0,0 +1,42 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +command_parser.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +class CommandParser: + ALIASES = { + 'ls': 'show tables', + 'desc': 'describe', + '!': 'history', + } + + def __init__(self, registry): + self.registry = registry + + def parse(self, user_input: str) -> (str, str): + text = user_input.strip() + if not text: + return None, None + + # 1) apply any aliases + for alias, full in self.ALIASES.items(): + if text == alias or text.startswith(alias + ' '): + text = text.replace(alias, full, 1) + break + + text_lower = text.lower() + + # 2) try to match one of the registered multi‑word commands + # (longest first so “show tables” wins over “show”) + for cmd in sorted(self.registry.all_commands(), key=len, reverse=True): + if text_lower.startswith(cmd): + args = text[len(cmd):].strip() + return cmd, args + + # 3) nothing matched → treat the *entire* line as SQL + return '', text diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/__init__.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/__init__.py new file mode 100644 index 000000000..cc3173365 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/__init__.py @@ -0,0 +1,14 @@ +# === classes/commands/__init__.py === + +# This file makes the `classes.commands` directory a Python package. +# It can be empty, or you can expose submodules here. + +__all__ = [ + "registry", + "base_command", + "standard_commands", + "filter_commands", + "control_commands", + "command_history", + "exceptions", +] diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/audit_commands.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/audit_commands.py new file mode 100644 index 000000000..95ec7c6fb --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/audit_commands.py @@ -0,0 +1,582 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +audit_commands.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +from .base_command import Command +from datetime import datetime, timedelta +import os +import json +import glob +import questionary +import pandas as pd +from tqdm import tqdm +from ..audit_fetcher import AuditFetcher + +class AuditEventsFetchCommand(Command): + description = """Fetches OCI audit events or loads existing data. + +USAGE: + audit_events fetch # Fetch new data + audit_events fetch # Load existing data + +FETCH NEW DATA: + audit_events fetch 15-06-2025 7 + → Fetches audit events from June 8-15, 2025 (7 days ending on June 15) + + audit_events fetch 01-01-2025 30 + → Fetches audit events from December 2-31, 2024 (30 days ending on Jan 1) + +LOAD EXISTING DATA: + audit_events fetch + → Shows interactive file selector with details: + - Event count and file size + - Date range and creation time + - Target DuckDB table name + → Loads selected file into DuckDB for querying + +WHAT FETCH DOES: + ✓ Splits time window into parallel worker batches + ✓ Fetches all audit events using OCI Audit API + ✓ Shows clean progress bar with summary report + ✓ Creates: audit_events__.json + ✓ Loads into DuckDB table: audit_events_ + ✓ Provides retry instructions for failed periods + +CONFIGURATION: + audit_worker_count: 10 # Parallel workers (config.yaml) + audit_worker_window: 1 # Hours per batch (config.yaml) + +NOTE: OCI audit logs have a 365-day retention period. The window cannot extend +beyond this limit from the current date.""" + + def execute(self, args): + parts = args.split() + snapshot_dir = self.ctx.query_executor.current_snapshot_dir + if not snapshot_dir: + print("Error: No active tenancy snapshot. Use 'set tenancy' first.") + return + + # Mode 2: Interactive load of existing audit_events JSON files + if len(parts) == 0: + self._interactive_load_existing_data(snapshot_dir) + return + + # Mode 1: Fetch new audit events data + if len(parts) != 2: + print("Usage: audit_events fetch ") + print(" or: audit_events fetch (interactive mode)") + return + + self._fetch_new_data(parts, snapshot_dir) + + def _interactive_load_existing_data(self, snapshot_dir): + """Interactive mode to load existing audit events JSON files""" + pattern = os.path.join(snapshot_dir, "audit_events_*_*.json") + files = glob.glob(pattern) + + if not files: + print(f"No audit events JSON files found in {snapshot_dir}") + print("Use 'audit_events fetch ' to fetch new data first.") + return + + # Analyze files and create rich choices + file_choices = [] + for file_path in sorted(files, key=os.path.getmtime, reverse=True): + filename = os.path.basename(file_path) + file_info = self._analyze_file(file_path) + + choice_text = f"{filename}\n" \ + f" → {file_info['event_count']} events, {file_info['file_size']}, " \ + f"Created: {file_info['created']}\n" \ + f" → Date range: {file_info['date_range']}\n" \ + f" → Will load as table: {file_info['table_name']}" + + file_choices.append({ + 'name': choice_text, + 'value': { + 'path': file_path, + 'filename': filename, + 'table_name': file_info['table_name'] + } + }) + + print("\n" + "=" * 80) + print("LOAD EXISTING AUDIT EVENTS DATA") + print("=" * 80) + + selected = questionary.select( + "Select an audit events JSON file to load into DuckDB:", + choices=file_choices + ).ask() + + if not selected: + print("No file selected.") + return + + # Load the selected file + json_file = selected['path'] + table_name = selected['table_name'] + filename = selected['filename'] + + print(f"\nLoading {filename}...") + self._load_to_duckdb(json_file, table_name) + print(f"✓ Successfully loaded audit events into table: {table_name}") + print(f"✓ Use: SELECT event_name, event_time, source_name, resource_name, user_name FROM {table_name} ORDER BY event_time DESC LIMIT 10;") + + def _analyze_file(self, file_path): + """Analyze an audit events JSON file to extract metadata""" + filename = os.path.basename(file_path) + + # Get file stats + stat = os.stat(file_path) + file_size = self._format_file_size(stat.st_size) + created = datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M') + + # Extract date and window from filename + # Format: audit_events_DD-MM-YYYY_DAYS.json + try: + parts = filename.replace('audit_events_', '').replace('.json', '').split('_') + date_part = parts[0] # DD-MM-YYYY + days_part = parts[1] # DAYS + + # Parse date + end_date = datetime.strptime(date_part, "%d-%m-%Y") + start_date = end_date - pd.Timedelta(days=int(days_part)) + + date_range = f"{start_date.strftime('%B %d')} - {end_date.strftime('%B %d, %Y')} ({days_part} days)" + except: + date_range = "Unknown date range" + + # Count events in JSON + try: + with open(file_path, 'r') as f: + data = json.load(f) + event_count = len(data) if isinstance(data, list) else 0 + except: + event_count = "Unknown" + + # Generate table name + table_name = filename.replace('audit_events_', '').replace('.json', '').replace('-', '') + + return { + 'event_count': event_count, + 'file_size': file_size, + 'created': created, + 'date_range': date_range, + 'table_name': f"audit_events_{table_name}" + } + + def _format_file_size(self, size_bytes): + """Format file size in human readable format""" + if size_bytes == 0: + return "0 B" + size_names = ["B", "KB", "MB", "GB"] + import math + i = int(math.floor(math.log(size_bytes, 1024))) + p = math.pow(1024, i) + s = round(size_bytes / p, 1) + return f"{s} {size_names[i]}" + + def _fetch_new_data(self, parts, snapshot_dir): + """Fetch new audit events data from OCI API""" + reference_date, window = parts + + # Validate reference_date + try: + ref_date = datetime.strptime(reference_date, "%d-%m-%Y") + retention_days = 365 # OCI audit log retention period + if (datetime.now() - ref_date).days > retention_days: + print(f"Error: reference_date must be within the last {retention_days} days") + return + except ValueError: + print("Error: reference_date must be in format DD-MM-YYYY") + return + + # Validate window + try: + window = int(window) + if window < 1 or window > retention_days: + print(f"Error: window must be between 1 and {retention_days} days") + return + + # Check if the window extends beyond retention period + start_date = ref_date - timedelta(days=window) + if (datetime.now() - start_date).days > retention_days: + print(f"Error: The specified window extends beyond the {retention_days}-day audit log retention period") + return + except ValueError: + print("Error: window must be an integer") + return + + # Get configuration + worker_count = self.ctx.config_manager.get_setting("audit_worker_count") or 10 + worker_window = self.ctx.config_manager.get_setting("audit_worker_window") or 1 + + # Initialize fetcher + try: + # Create a quiet fetcher that doesn't print verbose messages during progress + fetcher = AuditFetcher( + reference_date=reference_date, + window=window, + workers=worker_count, + worker_window=worker_window, + profile_name=self.ctx.config_manager.get_setting("oci_profile") or "DEFAULT", + verbose=False # Suppress all verbose output including interval generation + ) + + # Use snapshot_dir for temporary batch files + original_cwd = os.getcwd() + os.chdir(snapshot_dir) + try: + total_intervals = len(fetcher.intervals) + + # Show clean progress without cluttered output + print(f"\nStarting parallel audit fetch with {worker_count} workers...") + print(f"Target: {total_intervals} intervals, {reference_date} ({window} days)") + + with tqdm(total=total_intervals, desc="Fetching audit events", + bar_format="{l_bar}{bar}| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}]") as pbar: + def progress_callback(idx): + pbar.update(1) + + output_filename = f"audit_events_{reference_date}_{window}.json" + output_path = os.path.join(snapshot_dir, output_filename) + + # Fetch data with clean progress bar (no verbose output) + json_file, failed_timeframes = fetcher.run(output_path, progress_callback) + + # Now show the summary report that was collected during fetching + self._print_fetch_summary(fetcher, json_file, failed_timeframes) + + # Load into DuckDB if we got some data + if json_file and os.path.exists(json_file): + table_name = f"audit_events_{reference_date.replace('-', '')}_{window}" + self._load_to_duckdb(json_file, table_name) + fetcher.cleanup() + print(f"✓ Successfully loaded audit events into table: {table_name}") + print(f"✓ Use: SELECT event_name, event_time, source_name FROM {table_name} ORDER BY event_time DESC LIMIT 10;") + else: + print("❌ No data was successfully fetched") + + finally: + os.chdir(original_cwd) + + except Exception as e: + print(f"Error fetching audit events: {e}") + + def _print_fetch_summary(self, fetcher, json_file, failed_timeframes): + """Print a clean summary report after fetching is complete""" + print("\n" + "=" * 80) + print("AUDIT EVENTS FETCH SUMMARY REPORT") + print("=" * 80) + + # Show successful batches from fetcher's status messages + if hasattr(fetcher, 'status_messages'): + success_count = len([msg for msg in fetcher.status_messages if "✓" in msg]) + print(f"✓ Successful intervals: {success_count}") + + # Show a few examples of successful fetches + success_messages = [msg for msg in fetcher.status_messages if "✓" in msg] + if success_messages: + print("✓ Sample successful intervals:") + for msg in success_messages[:3]: # Show first 3 + print(f" {msg}") + if len(success_messages) > 3: + print(f" ... and {len(success_messages) - 3} more successful intervals") + + if json_file and os.path.exists(json_file): + # Get final file stats + stat = os.stat(json_file) + file_size = self._format_file_size(stat.st_size) + print(f"✓ Consolidated file: {os.path.basename(json_file)} ({file_size})") + + # Count total events + try: + with open(json_file, 'r') as f: + data = json.load(f) + total_events = len(data) if isinstance(data, list) else 0 + print(f"✓ Total events collected: {total_events:,}") + + if data and total_events > 0: + first_event = data[0].get('eventTime', data[0].get('event_time', 'Unknown')) + last_event = data[-1].get('eventTime', data[-1].get('event_time', 'Unknown')) + print(f"✓ Event time range: {first_event} to {last_event}") + except: + print("✓ Consolidated file created (event count unavailable)") + + # Handle failed timeframes + if failed_timeframes: + print(f"\n❌ Failed intervals: {len(failed_timeframes)}") + print("You can retry failed timeframes using the fetcher's retry method") + + print("=" * 80) + + def _load_to_duckdb(self, json_file, table_name): + """Load JSON file into DuckDB with flattening""" + try: + with open(json_file, 'r', encoding='utf-8') as f: + data = json.load(f) + + if not data: + print("Warning: JSON file contains no data") + return + + # Check if table already exists + existing_tables = self.ctx.query_executor.show_tables() + if table_name in existing_tables: + overwrite = questionary.confirm( + f"Table '{table_name}' already exists. Overwrite?" + ).ask() + if not overwrite: + print("Load cancelled.") + return + # Drop existing table + self.ctx.query_executor.conn.execute(f"DROP TABLE IF EXISTS {table_name}") + + # Flatten nested JSON + flattened = [] + for event in data: + flat_event = {} + self._flatten_dict(event, flat_event) + flattened.append(flat_event) + + df = pd.DataFrame(flattened) + + # Register and create table + self.ctx.query_executor.conn.register(table_name, df) + self.ctx.query_executor.conn.execute(f"CREATE TABLE {table_name} AS SELECT * FROM {table_name}") + print(f"Created table '{table_name}' with {len(df)} rows and {len(df.columns)} columns") + + except Exception as e: + print(f"Error loading audit events into DuckDB: {e}") + + def _flatten_dict(self, d, flat_dict, prefix=''): + """Recursively flatten nested dictionaries and handle lists""" + for k, v in d.items(): + key = f"{prefix}{k}" if prefix else k + key = key.replace(' ', '_').replace('-', '_').replace('.', '_') + + if isinstance(v, dict): + self._flatten_dict(v, flat_dict, f"{key}_") + elif isinstance(v, list): + flat_dict[key] = json.dumps(v) if v else None + else: + flat_dict[key] = v + + +class AuditEventsDeleteCommand(Command): + description = """Delete audit events JSON files and their corresponding DuckDB tables. + +USAGE: + audit_events delete + +FUNCTIONALITY: + ✓ Shows interactive list of all audit events files in current snapshot + ✓ Displays file details: size, event count, date range, creation time + ✓ Allows single or multiple file selection + ✓ Confirms deletion with detailed summary + ✓ Removes corresponding DuckDB tables if they exist + ✓ Shows cleanup summary with freed disk space + +EXAMPLE OUTPUT: + Select audit events files to delete: + + [✓] audit_events_15-06-2025_7.json + → 1,243 events, 2.1 MB, June 8-15 2025, Table: audit_events_15062025_7 + + [ ] audit_events_01-01-2025_30.json + → 5,678 events, 8.7 MB, Dec 2-Jan 1 2025, Table: audit_events_01012025_30 + +SAFETY FEATURES: + ✓ Confirmation prompt before deletion + ✓ Shows exactly what will be deleted + ✓ Option to cancel at any time + ✓ Graceful handling of missing tables""" + + def execute(self, args): + snapshot_dir = self.ctx.query_executor.current_snapshot_dir + if not snapshot_dir: + print("Error: No active tenancy snapshot. Use 'set tenancy' first.") + return + + pattern = os.path.join(snapshot_dir, "audit_events_*_*.json") + files = glob.glob(pattern) + + if not files: + print(f"No audit events JSON files found in {snapshot_dir}") + return + + # Analyze files and create choices + file_choices = [] + for file_path in sorted(files, key=os.path.getmtime, reverse=True): + filename = os.path.basename(file_path) + file_info = self._analyze_file(file_path) + + choice_text = f"{filename}\n" \ + f" → {file_info['event_count']} events, {file_info['file_size']}, " \ + f"{file_info['date_range']}\n" \ + f" → Table: {file_info['table_name']}, Created: {file_info['created']}" + + file_choices.append({ + 'name': choice_text, + 'value': { + 'path': file_path, + 'filename': filename, + 'table_name': file_info['table_name'], + 'size_bytes': file_info['size_bytes'] + } + }) + + print("\n" + "=" * 80) + print("DELETE AUDIT EVENTS DATA") + print("=" * 80) + + # Multiple selection + selected_files = questionary.checkbox( + "Select audit events files to delete:", + choices=file_choices + ).ask() + + if not selected_files: + print("No files selected for deletion.") + return + + # Show deletion summary + total_size = sum(f['size_bytes'] for f in selected_files) + total_files = len(selected_files) + + print(f"\n{'='*60}") + print("DELETION SUMMARY") + print(f"{'='*60}") + print(f"Files to delete: {total_files}") + print(f"Total disk space to free: {self._format_file_size(total_size)}") + print("\nFiles and tables to be removed:") + + for file_info in selected_files: + print(f" 📄 {file_info['filename']}") + print(f" 🗃️ {file_info['table_name']} (if exists)") + + # Final confirmation + confirm = questionary.confirm( + f"\n❗ Are you sure you want to delete {total_files} file(s) and their tables?" + ).ask() + + if not confirm: + print("Deletion cancelled.") + return + + # Perform deletion + deleted_files = 0 + deleted_tables = 0 + freed_space = 0 + + existing_tables = self.ctx.query_executor.show_tables() + + for file_info in selected_files: + try: + # Delete JSON file + os.remove(file_info['path']) + deleted_files += 1 + freed_space += file_info['size_bytes'] + print(f"✓ Deleted file: {file_info['filename']}") + + # Delete DuckDB table if it exists + table_name = file_info['table_name'] + if table_name in existing_tables: + self.ctx.query_executor.conn.execute(f"DROP TABLE IF EXISTS {table_name}") + deleted_tables += 1 + print(f"✓ Deleted table: {table_name}") + + except Exception as e: + print(f"❌ Error deleting {file_info['filename']}: {e}") + + # Final summary + print(f"\n{'='*60}") + print("DELETION COMPLETE") + print(f"{'='*60}") + print(f"✓ Files deleted: {deleted_files}") + print(f"✓ Tables deleted: {deleted_tables}") + print(f"✓ Disk space freed: {self._format_file_size(freed_space)}") + + def _analyze_file(self, file_path): + """Analyze an audit events JSON file to extract metadata""" + filename = os.path.basename(file_path) + + # Get file stats + stat = os.stat(file_path) + file_size = self._format_file_size(stat.st_size) + created = datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M') + + # Extract date and window from filename + try: + parts = filename.replace('audit_events_', '').replace('.json', '').split('_') + date_part = parts[0] + days_part = parts[1] + + end_date = datetime.strptime(date_part, "%d-%m-%Y") + start_date = end_date - pd.Timedelta(days=int(days_part)) + + date_range = f"{start_date.strftime('%b %d')} - {end_date.strftime('%b %d %Y')}" + except: + date_range = "Unknown" + + # Count events in JSON + try: + with open(file_path, 'r') as f: + data = json.load(f) + event_count = len(data) if isinstance(data, list) else 0 + except: + event_count = "Unknown" + + # Generate table name + table_name = filename.replace('audit_events_', '').replace('.json', '').replace('-', '') + + return { + 'event_count': event_count, + 'file_size': file_size, + 'size_bytes': stat.st_size, + 'created': created, + 'date_range': date_range, + 'table_name': f"audit_events_{table_name}" + } + + def _format_file_size(self, size_bytes): + """Format file size in human readable format""" + if size_bytes == 0: + return "0 B" + size_names = ["B", "KB", "MB", "GB"] + import math + i = int(math.floor(math.log(size_bytes, 1024))) + p = math.pow(1024, i) + s = round(size_bytes / p, 1) + return f"{s} {size_names[i]}" + + +# Remove the old FetchAuditEventsCommand class (keeping it for backward compatibility if needed) +class FetchAuditEventsCommand(Command): + """Deprecated: Use 'audit_events fetch' instead""" + description = """⚠️ DEPRECATED: Use 'audit_events fetch' instead. + +This command is kept for backward compatibility but will be removed in future versions. +Please use the new audit_events commands: +- audit_events fetch # Fetch new data +- audit_events fetch # Load existing data +- audit_events delete # Delete files""" + + def execute(self, args): + print("⚠️ DEPRECATED: 'fetch audit_events' is deprecated.") + print("Please use the new commands:") + print(" - audit_events fetch # Fetch new data") + print(" - audit_events fetch # Load existing data") + print(" - audit_events delete # Delete files") + print() + + # For now, redirect to the new fetch command + fetch_cmd = AuditEventsFetchCommand(self.ctx) + fetch_cmd.execute(args) \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/base_command.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/base_command.py new file mode 100644 index 000000000..4214080fd --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/base_command.py @@ -0,0 +1,30 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +base_command.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +from abc import ABC, abstractmethod + +class ShellContext: + def __init__(self, query_executor, config_manager, logger, history, query_selector, reload_tenancy_fn=None): + self.query_executor = query_executor + self.config_manager = config_manager + self.logger = logger + self.history = history + self.query_selector = query_selector + self.reload_tenancy = reload_tenancy_fn + +class Command(ABC): + description = "No description available." # Default description + + def __init__(self, ctx: ShellContext): + self.ctx = ctx + + @abstractmethod + def execute(self, args: str): + """Perform the command; args is the raw string after the keyword.""" diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/cloudguard_commands.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/cloudguard_commands.py new file mode 100644 index 000000000..6af5a92cf --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/cloudguard_commands.py @@ -0,0 +1,507 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +cloudguard_commands.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +from .base_command import Command +from datetime import datetime +import os +import glob +import json +import questionary +import pandas as pd +from tqdm import tqdm +from ..cloudguard_fetcher import CloudGuardFetcher + +class CloudGuardFetchCommand(Command): + description = """Fetches OCI Cloud Guard problems or loads existing data. + +USAGE: + cloudguard fetch # Fetch new data + cloudguard fetch # Load existing data + +FETCH NEW DATA: + cloudguard fetch 15-06-2025 7 + → Fetches Cloud Guard problems from June 8-15, 2025 (7 days ending on June 15) + + cloudguard fetch 01-01-2025 30 + → Fetches Cloud Guard problems from December 2-31, 2024 (30 days ending on Jan 1) + +LOAD EXISTING DATA: + cloudguard fetch + → Shows interactive file selector with details: + - Problem count and file size + - Date range and creation time + - Target DuckDB table name + → Loads selected file into DuckDB for querying + +WHAT FETCH DOES: + ✓ Splits time window into parallel worker batches + ✓ Fetches all Cloud Guard problems using OCI API + ✓ Shows clean progress bar with summary report + ✓ Creates: cloudguard_problems__.json + ✓ Loads into DuckDB table: cloudguard_problems_ + ✓ Provides retry instructions for failed periods + +CONFIGURATION: + audit_worker_count: 5 # Parallel workers (config.yaml) + audit_worker_window: 1 # Hours per batch (config.yaml)""" + + def execute(self, args): + parts = args.split() + snapshot_dir = self.ctx.query_executor.current_snapshot_dir + if not snapshot_dir: + print("Error: No active tenancy snapshot. Use 'set tenancy' first.") + return + + # Mode 2: Interactive load of existing cloudguard JSON files + if len(parts) == 0: + self._interactive_load_existing_data(snapshot_dir) + return + + # Mode 1: Fetch new Cloud Guard data + if len(parts) != 2: + print("Usage: cloudguard fetch ") + print(" or: cloudguard fetch (interactive mode)") + return + + self._fetch_new_data(parts, snapshot_dir) + + def _interactive_load_existing_data(self, snapshot_dir): + """Interactive mode to load existing Cloud Guard JSON files""" + pattern = os.path.join(snapshot_dir, "cloudguard_problems_*_*.json") + files = glob.glob(pattern) + + if not files: + print(f"No Cloud Guard JSON files found in {snapshot_dir}") + print("Use 'cloudguard fetch ' to fetch new data first.") + return + + # Analyze files and create rich choices + file_choices = [] + for file_path in sorted(files, key=os.path.getmtime, reverse=True): + filename = os.path.basename(file_path) + file_info = self._analyze_file(file_path) + + choice_text = f"{filename}\n" \ + f" → {file_info['problem_count']} problems, {file_info['file_size']}, " \ + f"Created: {file_info['created']}\n" \ + f" → Date range: {file_info['date_range']}\n" \ + f" → Will load as table: {file_info['table_name']}" + + file_choices.append({ + 'name': choice_text, + 'value': { + 'path': file_path, + 'filename': filename, + 'table_name': file_info['table_name'] + } + }) + + print("\n" + "=" * 80) + print("LOAD EXISTING CLOUD GUARD DATA") + print("=" * 80) + + selected = questionary.select( + "Select a Cloud Guard JSON file to load into DuckDB:", + choices=file_choices + ).ask() + + if not selected: + print("No file selected.") + return + + # Load the selected file + json_file = selected['path'] + table_name = selected['table_name'] + filename = selected['filename'] + + print(f"\nLoading {filename}...") + self._load_to_duckdb(json_file, table_name) + print(f"✓ Successfully loaded Cloud Guard data into table: {table_name}") + print(f"✓ Use: select resource_name, detector_rule_id, risk_level, labels, time_first_detected, time_last_detected, lifecycle_state, lifecycle_detail, detector_id from {table_name} where risk_level = 'HIGH' ORDER BY resource_name") + + def _analyze_file(self, file_path): + """Analyze a Cloud Guard JSON file to extract metadata""" + filename = os.path.basename(file_path) + + # Get file stats + stat = os.stat(file_path) + file_size = self._format_file_size(stat.st_size) + created = datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M') + + # Extract date and window from filename + # Format: cloudguard_problems_DDMMYYYY_DAYS.json + try: + parts = filename.replace('cloudguard_problems_', '').replace('.json', '').split('_') + date_part = parts[0] # DDMMYYYY + days_part = parts[1] # DAYS + + # Parse date + day = date_part[:2] + month = date_part[2:4] + year = date_part[4:8] + end_date = datetime.strptime(f"{day}-{month}-{year}", "%d-%m-%Y") + start_date = end_date - pd.Timedelta(days=int(days_part)) + + date_range = f"{start_date.strftime('%B %d')} - {end_date.strftime('%B %d, %Y')} ({days_part} days)" + except: + date_range = "Unknown date range" + + # Count problems in JSON + try: + with open(file_path, 'r') as f: + data = json.load(f) + problem_count = len(data) if isinstance(data, list) else 0 + except: + problem_count = "Unknown" + + # Generate table name + table_name = filename.replace('cloudguard_problems_', '').replace('.json', '').replace('-', '_') + + return { + 'problem_count': problem_count, + 'file_size': file_size, + 'created': created, + 'date_range': date_range, + 'table_name': f"cloudguard_problems_{table_name}" + } + + def _format_file_size(self, size_bytes): + """Format file size in human readable format""" + if size_bytes == 0: + return "0 B" + size_names = ["B", "KB", "MB", "GB"] + import math + i = int(math.floor(math.log(size_bytes, 1024))) + p = math.pow(1024, i) + s = round(size_bytes / p, 1) + return f"{s} {size_names[i]}" + + def _fetch_new_data(self, parts, snapshot_dir): + """Fetch new Cloud Guard data from OCI API""" + reference_date, window = parts + + # Validate reference_date + try: + ref_date = datetime.strptime(reference_date, "%d-%m-%Y") + retention_days = 365 + if (datetime.now() - ref_date).days > retention_days: + print(f"Warning: reference_date is more than {retention_days} days ago. Data may not be available.") + except ValueError: + print("Error: reference_date must be in format DD-MM-YYYY") + return + + # Validate window + try: + window = int(window) + if window < 1: + print("Error: window must be a positive integer") + return + except ValueError: + print("Error: window must be an integer") + return + + # Get configuration + worker_count = self.ctx.config_manager.get_setting("audit_worker_count") or 5 + worker_window = self.ctx.config_manager.get_setting("audit_worker_window") or 1 + + # Initialize fetcher + try: + fetcher = CloudGuardFetcher( + reference_date=reference_date, + window=window, + workers=worker_count, + worker_window=worker_window, + profile_name=self.ctx.config_manager.get_setting("oci_profile") or "DEFAULT" + ) + + # Use snapshot_dir for temporary batch files + original_cwd = os.getcwd() + os.chdir(snapshot_dir) + try: + total_intervals = len(fetcher.intervals) + with tqdm(total=total_intervals, desc="Fetching Cloud Guard problems") as pbar: + def progress_callback(idx): + pbar.update(1) + + output_filename = f"cloudguard_problems_{reference_date.replace('-', '')}_{window}.json" + output_path = os.path.join(snapshot_dir, output_filename) + + # Fetch data with clean progress bar + json_file, failed_timeframes = fetcher.run(output_path, progress_callback) + + # Handle failed timeframes + if failed_timeframes: + print(f"\n⚠️ Warning: {len(failed_timeframes)} timeframes failed during fetch") + print("You can retry failed timeframes using:") + print("FAILED_TIMEFRAMES = [") + for tf in failed_timeframes[:3]: + print(f' "{tf}",') + if len(failed_timeframes) > 3: + print(f" # ... and {len(failed_timeframes) - 3} more") + print("]") + + # Load into DuckDB if we got some data + if json_file and os.path.exists(json_file): + table_name = f"cloudguard_problems_{reference_date.replace('-', '')}_{window}" + self._load_to_duckdb(json_file, table_name) + fetcher.cleanup() + print(f"✓ Successfully loaded Cloud Guard problems into table: {table_name}") + print(f"✓ Use: SELECT * FROM {table_name} LIMIT 10;") + else: + print("❌ No data was successfully fetched") + + finally: + os.chdir(original_cwd) + + except Exception as e: + print(f"Error fetching Cloud Guard problems: {e}") + + def _load_to_duckdb(self, json_file, table_name): + """Load JSON file into DuckDB with flattening""" + try: + with open(json_file, 'r', encoding='utf-8') as f: + data = json.load(f) + + if not data: + print("Warning: JSON file contains no data") + return + + # Check if table already exists + existing_tables = self.ctx.query_executor.show_tables() + if table_name in existing_tables: + overwrite = questionary.confirm( + f"Table '{table_name}' already exists. Overwrite?" + ).ask() + if not overwrite: + print("Load cancelled.") + return + # Drop existing table + self.ctx.query_executor.conn.execute(f"DROP TABLE IF EXISTS {table_name}") + + # Flatten nested JSON + flattened = [] + for item in data: + flat_item = {} + self._flatten_dict(item, flat_item) + flattened.append(flat_item) + + df = pd.DataFrame(flattened) + + # Register and create table + self.ctx.query_executor.conn.register(table_name, df) + self.ctx.query_executor.conn.execute(f"CREATE TABLE {table_name} AS SELECT * FROM {table_name}") + print(f"Created table '{table_name}' with {len(df)} rows and {len(df.columns)} columns") + + except Exception as e: + print(f"Error loading Cloud Guard data into DuckDB: {e}") + + def _flatten_dict(self, d, flat_dict, prefix=''): + """Recursively flatten nested dictionaries and handle lists""" + for k, v in d.items(): + key = f"{prefix}{k}" if prefix else k + key = key.replace(' ', '_').replace('-', '_').replace('.', '_') + + if isinstance(v, dict): + self._flatten_dict(v, flat_dict, f"{key}_") + elif isinstance(v, list): + flat_dict[key] = json.dumps(v) if v else None + else: + flat_dict[key] = v + + +class CloudGuardDeleteCommand(Command): + description = """Delete Cloud Guard JSON files and their corresponding DuckDB tables. + +USAGE: + cloudguard delete + +FUNCTIONALITY: + ✓ Shows interactive list of all Cloud Guard files in current snapshot + ✓ Displays file details: size, problem count, date range, creation time + ✓ Allows single or multiple file selection + ✓ Confirms deletion with detailed summary + ✓ Removes corresponding DuckDB tables if they exist + ✓ Shows cleanup summary with freed disk space + +EXAMPLE OUTPUT: + Select Cloud Guard files to delete: + + [✓] cloudguard_problems_15062025_7.json + → 67 problems, 145 KB, June 8-15 2025, Table: cloudguard_problems_15062025_7 + + [ ] cloudguard_problems_01012025_30.json + → 234 problems, 892 KB, Dec 2-Jan 1 2025, Table: cloudguard_problems_01012025_30 + +SAFETY FEATURES: + ✓ Confirmation prompt before deletion + ✓ Shows exactly what will be deleted + ✓ Option to cancel at any time + ✓ Graceful handling of missing tables""" + + def execute(self, args): + snapshot_dir = self.ctx.query_executor.current_snapshot_dir + if not snapshot_dir: + print("Error: No active tenancy snapshot. Use 'set tenancy' first.") + return + + pattern = os.path.join(snapshot_dir, "cloudguard_problems_*_*.json") + files = glob.glob(pattern) + + if not files: + print(f"No Cloud Guard JSON files found in {snapshot_dir}") + return + + # Analyze files and create choices + file_choices = [] + for file_path in sorted(files, key=os.path.getmtime, reverse=True): + filename = os.path.basename(file_path) + file_info = self._analyze_file(file_path) + + choice_text = f"{filename}\n" \ + f" → {file_info['problem_count']} problems, {file_info['file_size']}, " \ + f"{file_info['date_range']}\n" \ + f" → Table: {file_info['table_name']}, Created: {file_info['created']}" + + file_choices.append({ + 'name': choice_text, + 'value': { + 'path': file_path, + 'filename': filename, + 'table_name': file_info['table_name'], + 'size_bytes': file_info['size_bytes'] + } + }) + + print("\n" + "=" * 80) + print("DELETE CLOUD GUARD DATA") + print("=" * 80) + + # Multiple selection + selected_files = questionary.checkbox( + "Select Cloud Guard files to delete:", + choices=file_choices + ).ask() + + if not selected_files: + print("No files selected for deletion.") + return + + # Show deletion summary + total_size = sum(f['size_bytes'] for f in selected_files) + total_files = len(selected_files) + + print(f"\n{'='*60}") + print("DELETION SUMMARY") + print(f"{'='*60}") + print(f"Files to delete: {total_files}") + print(f"Total disk space to free: {self._format_file_size(total_size)}") + print("\nFiles and tables to be removed:") + + for file_info in selected_files: + print(f" 📄 {file_info['filename']}") + print(f" 🗃️ {file_info['table_name']} (if exists)") + + # Final confirmation + confirm = questionary.confirm( + f"\n❗ Are you sure you want to delete {total_files} file(s) and their tables?" + ).ask() + + if not confirm: + print("Deletion cancelled.") + return + + # Perform deletion + deleted_files = 0 + deleted_tables = 0 + freed_space = 0 + + existing_tables = self.ctx.query_executor.show_tables() + + for file_info in selected_files: + try: + # Delete JSON file + os.remove(file_info['path']) + deleted_files += 1 + freed_space += file_info['size_bytes'] + print(f"✓ Deleted file: {file_info['filename']}") + + # Delete DuckDB table if it exists + table_name = file_info['table_name'] + if table_name in existing_tables: + self.ctx.query_executor.conn.execute(f"DROP TABLE IF EXISTS {table_name}") + deleted_tables += 1 + print(f"✓ Deleted table: {table_name}") + + except Exception as e: + print(f"❌ Error deleting {file_info['filename']}: {e}") + + # Final summary + print(f"\n{'='*60}") + print("DELETION COMPLETE") + print(f"{'='*60}") + print(f"✓ Files deleted: {deleted_files}") + print(f"✓ Tables deleted: {deleted_tables}") + print(f"✓ Disk space freed: {self._format_file_size(freed_space)}") + + def _analyze_file(self, file_path): + """Analyze a Cloud Guard JSON file to extract metadata""" + filename = os.path.basename(file_path) + + # Get file stats + stat = os.stat(file_path) + file_size = self._format_file_size(stat.st_size) + created = datetime.fromtimestamp(stat.st_mtime).strftime('%Y-%m-%d %H:%M') + + # Extract date and window from filename + try: + parts = filename.replace('cloudguard_problems_', '').replace('.json', '').split('_') + date_part = parts[0] + days_part = parts[1] + + day = date_part[:2] + month = date_part[2:4] + year = date_part[4:8] + end_date = datetime.strptime(f"{day}-{month}-{year}", "%d-%m-%Y") + start_date = end_date - pd.Timedelta(days=int(days_part)) + + date_range = f"{start_date.strftime('%b %d')} - {end_date.strftime('%b %d %Y')}" + except: + date_range = "Unknown" + + # Count problems in JSON + try: + with open(file_path, 'r') as f: + data = json.load(f) + problem_count = len(data) if isinstance(data, list) else 0 + except: + problem_count = "Unknown" + + # Generate table name + table_name = filename.replace('cloudguard_problems_', '').replace('.json', '').replace('-', '_') + + return { + 'problem_count': problem_count, + 'file_size': file_size, + 'size_bytes': stat.st_size, + 'created': created, + 'date_range': date_range, + 'table_name': f"cloudguard_problems_{table_name}" + } + + def _format_file_size(self, size_bytes): + """Format file size in human readable format""" + if size_bytes == 0: + return "0 B" + size_names = ["B", "KB", "MB", "GB"] + import math + i = int(math.floor(math.log(size_bytes, 1024))) + p = math.pow(1024, i) + s = round(size_bytes / p, 1) + return f"{s} {size_names[i]}" \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/command_history.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/command_history.py new file mode 100644 index 000000000..f3a2916ab --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/command_history.py @@ -0,0 +1,92 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +command_history.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import readline +import os +from typing import Optional, List +from .exceptions import ArgumentError + +class CommandHistory: + def __init__(self, history_file: str = ".sql_history"): + """Initialize command history manager""" + self.history_file = os.path.expanduser(history_file) + self.load_history() + + def load_history(self): + """Load command history from file""" + try: + readline.read_history_file(self.history_file) + except FileNotFoundError: + # Create history file if it doesn't exist + self.save_history() + + def save_history(self): + """Save command history to file""" + try: + readline.write_history_file(self.history_file) + except Exception as e: + print(f"Warning: Could not save command history: {e}") + + def add(self, command: str): + """Add a command to history""" + if command and command.strip(): # Only add non-empty commands + readline.add_history(command) + self.save_history() # Save after each command for persistence + + def get_history(self, limit: Optional[int] = None) -> List[str]: + """Get list of commands from history""" + history = [] + length = readline.get_current_history_length() + start = max(1, length - (limit or length)) + + for i in range(start, length + 1): + cmd = readline.get_history_item(i) + if cmd: # Only add non-None commands + history.append((i, cmd)) + return history + + def get_command(self, reference: str) -> str: + """ + Get a command from history using reference (e.g., !4 or !-1) + Returns the resolved command + """ + try: + # Remove the '!' from the reference + ref = reference.lstrip('!') + + # Handle negative indices + if ref.startswith('-'): + index = readline.get_current_history_length() + int(ref) + else: + index = int(ref) + + # Get the command + command = readline.get_history_item(index) + + if command is None: + raise ArgumentError(f"No command found at position {ref}") + + return command + + except ValueError: + raise ArgumentError(f"Invalid history reference: {reference}") + except Exception as e: + raise ArgumentError(f"Error accessing history: {e}") + + def show_history(self, limit: Optional[int] = None): + """Display command history""" + history = self.get_history(limit) + if not history: + print("No commands in history.") + return + + print("\nCommand History:") + for index, command in history: + print(f"{index}: {command}") \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/control_commands.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/control_commands.py new file mode 100644 index 000000000..7128fe0f8 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/control_commands.py @@ -0,0 +1,116 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +control_commands.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +from .base_command import Command +from classes.file_selector import FileSelector +from classes.query_selector import QuerySelector +from classes.output_formatter import OutputFormatter +from classes.commands.filter_commands import AgeFilterCommand, CompartmentFilterCommand + +class SetQueriesCommand(Command): + """ + Usage: set queries [] + Launches an interactive YAML-file picker and loads the selected queries. + """ + description = """Loads queries from a YAML file for batch execution. +Usage: set queries [directory] +- If directory is not specified, uses default query directory +- Opens an interactive file picker to select the YAML file +- Loads selected queries into the execution queue""" + + def execute(self, args: str): + # allow optional override of query-directory via args + directory = args or self.ctx.config_manager.get_setting("query_dir") or "query_files" + selector = FileSelector(directory) + yaml_path = selector.select_file() + if not yaml_path: + print("No YAML file selected.") + return + + qs = QuerySelector(yaml_path) + qs.select_queries() + self.ctx.query_selector = qs + print(f"Loaded queries from '{yaml_path}' into queue.") + +class SetTenancyCommand(Command): + """ + Usage: set tenancy + Re‑runs the tenancy‑selection & CSV loading flow, replacing the active QueryExecutor. + """ + description = """Changes the active tenancy and reloads CSV data. +Usage: set tenancy +- Prompts for tenancy selection +- Reloads CSV files for the selected tenancy +- Updates the query executor with new data""" + + def execute(self, args: str): + if not callable(self.ctx.reload_tenancy): + print("Error: tenancy reload not configured.") + return + new_executor = self.ctx.reload_tenancy() + + if new_executor: + self.ctx.query_executor = new_executor + self.ctx.last_result = None + print("Switched to new tenancy data.") + else: + print("Failed to change tenancy.") + +class RunQueriesCommand(Command): + """ + Usage: run queries + Executes all queries loaded by `set queries` in FIFO order. + """ + description = """Executes all queries that were loaded using 'set queries'. +Usage: run queries +- Executes queries in FIFO order +- Displays results after each query +- Can include both SQL queries and filter operations""" + + def execute(self, args: str): + qs = self.ctx.query_selector + if not qs or qs.query_queue.empty(): + print("No queries loaded (or queue is empty).") + return + + while True: + item = qs.dequeue_item() + if not item: + break + kind, val = item + + if kind == "Description": + print(f"\n== {val} ==") + + elif kind == "SQL": + print(f"Running SQL: {val}") + df = self.ctx.query_executor.execute_query(val) + if df is not None: + # store for potential filtering + self.ctx.last_result = df + fmt = self.ctx.config_manager.get_setting("output_format") or "dataframe" + # use the imported OutputFormatter + print(OutputFormatter.format_output(df, fmt)) + + elif kind == "Filter": + # val is something like "age api_keys 90" or "compartment tree_view" + parts = val.split() + filter_type = parts[0] # "age" or "compartment" + filter_args = " ".join(parts[1:]) # e.g. "api_keys 90" + + cmd_key = f"filter {filter_type}" + cmd_cls = self.ctx.registry.get(cmd_key) + if not cmd_cls: + print(f"Unknown filter command '{cmd_key}'") + continue + + # instantiate and run the filter command + cmd = cmd_cls(self.ctx) + cmd.execute(filter_args) diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/exceptions.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/exceptions.py new file mode 100644 index 000000000..c197ea0f3 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/exceptions.py @@ -0,0 +1,13 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +exceptions.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +class ArgumentError(Exception): + """Exception raised for errors in command arguments""" + pass \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/filter_commands.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/filter_commands.py new file mode 100644 index 000000000..217afe69b --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/filter_commands.py @@ -0,0 +1,102 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +filter_commands.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +from .base_command import Command +from classes.api_key_filter import ApiKeyFilter +from classes.compartment_structure import HCCompartmentStructure + +class AgeFilterCommand(Command): + description = """Filters results based on age in days for a specified column. +Usage: filter age + +Modes: +- older: Show only entries older than the specified days +- younger: Show only entries younger than or equal to the specified days + +The column specified by can contain dates in the following formats: +1. Direct date strings: 'YYYY-MM-DD HH:MM:SS' or 'YYYY-MM-DD HH:MM' +2. Comma-separated lists of dates +3. OCID entries with dates (separated by spaces or colons) + +Examples: +- filter age creation_date older 90 (shows entries older than 90 days) +- filter age api_keys younger 30 (shows entries 30 days old or newer) +- filter age last_modified older 60 (shows entries older than 60 days) + +The command will: +1. Parse all dates found in the specified column +2. For 'older' mode: Keep only rows where any date is older than the specified number of days +3. For 'younger' mode: Keep only rows where any date is younger than or equal to the specified number of days +4. Remove rows where no valid dates are found + +Note: +- The 'older' filter shows entries strictly older than +- The 'younger' filter shows entries equal to or newer than +- Rows where the date column is NULL/None or contains no valid dates will be excluded from the results +- If a row contains multiple dates, it will be included if ANY of its dates match the filter criteria""" + + def execute(self, args): + parts = args.split() + if len(parts) != 3: + print("Usage: filter age ") + return + + col, mode, days = parts + if mode.lower() not in ['older', 'younger']: + print("Mode must be either 'older' or 'younger'") + return + + if self.ctx.last_result is None: + print("No prior result to filter.") + return + + try: + days = int(days) + df = ApiKeyFilter(column_name=col, age_days=days, mode=mode.lower()).filter(self.ctx.last_result) + self.ctx.last_result = df + fmt = self.ctx.config_manager.get_setting("output_format") + print(__import__('classes.output_formatter').OutputFormatter.format_output(df, fmt)) + except ValueError: + print("Days must be an integer.") + +class CompartmentFilterCommand(Command): + description = """Filters and analyzes compartment structures. +Usage: filter compartment [arg] +Subcommands: +- root: Show root compartment +- depth: Show maximum depth +- tree_view: Display compartment tree +- path_to : Show path to specific compartment +- subs : Show sub-compartments +- comps_at_depth : Show compartments at specific depth""" + + def execute(self, args): + parts = args.split() + if not parts: + print("Usage: filter compartment [arg]") + return + sub = parts[0]; param = parts[1] if len(parts)>1 else None + if self.ctx.last_result is None or 'path' not in self.ctx.last_result.columns: + print("No 'path' column in last result.") + return + inst = HCCompartmentStructure(self.ctx.last_result['path'].tolist()) + method = { + 'root': inst.get_root_compartment, + 'depth': inst.get_depth, + 'tree_view': inst.get_comp_tree, + 'path_to': lambda: inst.get_path_to(param), + 'subs': lambda: inst.get_sub_compartments(param), + 'comps_at_depth': lambda: inst.get_compartments_by_depth(int(param)), + }.get(sub) + if not method: + print(f"Unknown subcommand '{sub}'.") + return + out = method() + print(out) diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/registry.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/registry.py new file mode 100644 index 000000000..2c125963d --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/registry.py @@ -0,0 +1,33 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +registry.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +class CommandRegistry: + def __init__(self): + # maps normalized command names to Command subclasses + self._commands = {} + + def register(self, name: str, command_cls): + """ + Register a Command subclass under a given name. + e.g. registry.register('show tables', ShowTablesCommand) + """ + self._commands[name.lower()] = command_cls + + def get(self, name: str): + """ + Look up a Command subclass by name; returns None if not found. + """ + return self._commands.get(name.lower()) + + def all_commands(self): + """ + Returns a sorted list of all registered command names. + """ + return sorted(self._commands.keys()) diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/standard_commands.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/standard_commands.py new file mode 100644 index 000000000..637e5b789 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/commands/standard_commands.py @@ -0,0 +1,156 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +standard_commands.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +from .base_command import Command +from classes.output_formatter import OutputFormatter + + +class ShowTablesCommand(Command): + description = "Lists all available tables in the current database.\nUsage: show tables" + + def execute(self, args): + tables = self.ctx.query_executor.show_tables() + print("Available tables:") + for t in tables: + print(f" - {t}") + +class DescribeCommand(Command): + description = "Shows the structure of a table, including column names and types.\nUsage: describe " + + def execute(self, args): + if not args: + print("Usage: describe ") + return + info = self.ctx.query_executor.describe_table(args) + if not info: + print(f"Table '{args}' not found.") + else: + print(f"Columns in '{args}':") + for name, typ in info: + print(f" - {name}: {typ}") + +class ExecuteSqlCommand(Command): + description = "Executes a SQL query and displays the results.\nUsage: " + """Fallback for unrecognized commands: treat as SQL.""" + def execute(self, args): + sql = args.strip() + if not sql: + return + + # log directly (no print around it) + self.ctx.logger.log(f"Running SQL: {sql}", level="DEBUG") + + # execute the query + df = self.ctx.query_executor.execute_query(sql) + if df is not None: + self.ctx.last_result = df + + fmt = self.ctx.config_manager.get_setting("output_format") or "dataframe" + self.ctx.logger.log(f"Formatting output as {fmt}", level="DEBUG") + + # format and print the result + output = OutputFormatter.format_output(df, fmt) + print(output) + +class ExitCommand(Command): + description = "Exits the application.\nUsage: exit (or quit)" + + def execute(self, args): + print("Bye.") + raise SystemExit + +class HistoryCommand(Command): + description = """Shows command history. +Usage: history +- Shows all previously executed commands with their index numbers +- Use !n to re-run command number n from history +- Use !-n to re-run nth previous command""" + + def execute(self, args: str): + # Fetch the list of (index, command) tuples + history_items = self.ctx.history.get_history() + if not history_items: + print("No commands in history.") + return + + print("\nCommand History:") + for idx, cmd in history_items: + print(f"{idx}: {cmd}") +class HelpCommand(Command): + description = "Show help for available commands. Usage: help [command]" + + def execute(self, args): + if not args: + # Show all commands + print("Available commands:") + for name in self.ctx.registry.all_commands(): + cmd_cls = self.ctx.registry.get(name) + brief_desc = cmd_cls.description.split('\n')[0] # First line only + print(f" - {name:<20} - {brief_desc}") + print("\nType 'help ' for detailed help on a specific command.") + else: + # Show help for specific command + cmd_name = args.lower() + cmd_cls = self.ctx.registry.get(cmd_name) + if not cmd_cls: + print(f"Unknown command: {cmd_name}") + return + + print(f"\nHelp for '{cmd_name}':") + print(f"\n{cmd_cls.description}") + +class FilterCommand(Command): + def __init__(self, query_executor, original_result=None): + self.query_executor = query_executor + self.original_result = original_result + + def execute(self, args: str, **kwargs): + if self.original_result is None: + print("No results to filter. Run a query first.") + return + + try: + filter_parts = args.strip().lower().split() + if not filter_parts: + print("Invalid filter command. Usage: filter ") + return + + filter_type = filter_parts[0] + filter_args = filter_parts[1:] + + if filter_type == 'age': + return self._handle_age_filter(filter_args) + else: + print(f"Unknown filter type: {filter_type}") + + except Exception as e: + print(f"Error executing filter: {str(e)}") + + def _handle_age_filter(self, args): + if len(args) != 2: + print("Invalid age filter command. Usage: filter age ") + return + + column_name, age_days = args + try: + age_days = int(age_days) + from ..api_key_filter import ApiKeyFilter + + api_key_filter = ApiKeyFilter(column_name=column_name, age_days=age_days) + result = api_key_filter.filter(self.original_result.copy()) + + if result is not None and not result.empty: + print(OutputFormatter.format_output(result)) + else: + print("No records found after applying the filter.") + except ValueError: + print("Age must be a number") + except Exception as e: + print(f"Error applying age filter: {str(e)}") diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/compartment_structure.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/compartment_structure.py new file mode 100644 index 000000000..ccdad9eb3 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/compartment_structure.py @@ -0,0 +1,174 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +commpartment_structure.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +class HCCompartmentStructure: + def __init__(self, compartments): + self.compartments = [comp.strip() for comp in compartments] + + def get_root_compartment(self): + for comp in self.compartments: + if "(root)" in comp: + return comp + return None + + def get_depth(self): + root_depth = 1 + max_depth = max(len(comp.split('/')) for comp in self.compartments) + return max_depth + root_depth + + def get_sub_compartments_root(self): + return self.get_compartments_by_depth(1) + + def get_sub_compartments(self, target_compartment): + sub_compartments = set() # make unique + + for compartment in self.compartments: + if "(root)" in compartment: + continue + + parts = compartment.split(" / ") + if target_compartment in parts: + index = parts.index(target_compartment) + if index + 1 < len(parts): + sub_compartments.add(parts[index + 1]) + return list(sub_compartments) + + def get_compartments_by_depth(self, depth): + root_compartment = self.get_root_compartment() + compartments_at_depth = set() + + for compartment in self.compartments: + if root_compartment in compartment: + continue + parts = compartment.split(" / ") + + if len(parts) >= depth: + compartments_at_depth.add(parts[depth - 1]) + + return sorted(compartments_at_depth) + + def get_comp_tree(self): + tree = self.__build_tree(self.compartments) + return self.__print_tree(tree) + + def __build_tree(self, paths): + tree = {} + root = self.get_root_compartment() + tree[root] = {} + + for path in paths: + if path == root: + continue + parts = path.split('/') + current = tree[root] + for part in parts: + part = part.strip() + if part not in current: + current[part] = {} + current = current[part] + return tree + + def __print_tree(self, tree, prefix=''): + tree_str = "" + for idx, (key, value) in enumerate(sorted(tree.items())): + connector = "└── " if idx == len(tree) - 1 else "├── " + tree_str += f"{prefix}{connector}{key}\n" + if value: + extension = " " if idx == len(tree) - 1 else "│ " + tree_str += self.__print_tree(value, prefix + extension) + return tree_str + + def get_path_to(self, target_compartment): + """ + Return a list of all full paths from the root compartment + down to compartments whose name == `target_compartment`. + + Each path keeps the root compartment name, including '(root)', intact. + Example for 'acme-appdev-cmp': + ["/iosociiam (root)/acme-top-cmp/acme-appdev-cmp"] + """ + # 1) Build the tree from your existing compartments + tree = self.__build_tree(self.compartments) + + # 2) Identify the root compartment key (e.g. "/ iosociiam (root)") + root_key = self.get_root_compartment() + if root_key not in tree: + raise ValueError("Root compartment not found in the tree.") + + # Clean up leading/trailing spaces but **do not remove '(root)'**. + # For instance, if root_key is "/ (root)", + # `strip()` will remove extra leading/trailing whitespace but keep "(root)". + # If it starts with '/', we'll remove only that one slash so that + # the final path can start with a single slash. + cleaned_root = root_key.strip() + if cleaned_root.startswith("/"): + cleaned_root = cleaned_root[1:].strip() + + # Store any matching full paths in a list + results = [] + + def dfs(subtree, path_so_far): + """ + Depth-First Search through the compartment hierarchy. + subtree: the nested dictionary for the current node + path_so_far: list of compartment names from the root down to this node + """ + for child_name, child_subtree in subtree.items(): + # Clean the child but DO NOT remove '(root)' + child_clean = child_name.strip() + + new_path = path_so_far + [child_clean] + + # If this child matches target_compartment, record the full path + if child_clean == target_compartment: + # Build final path. Example: + # path_so_far = ["iosociiam (root)", "acme-top-cmp"] + # child_clean = "acme-appdev-cmp" + # => "/iosociiam (root)/acme-top-cmp/acme-appdev-cmp" + full_path = " / " + " / ".join(new_path) + results.append(full_path) + + # Recur into the child node + dfs(child_subtree, new_path) + + # 3) Start DFS from the root's subtree, using [cleaned_root] as the path + dfs(tree[root_key], [cleaned_root]) + + # 4) If no matches, raise an error + if not results: + raise ValueError(f"Compartment '{target_compartment}' not found.") + + return results + + + """ + This is to handle the different subcommands from the CLI filter compartment command. + """ + def handle_request(self, request, *args): + if request == "get_root_compartment": + return self.get_root_compartment() + elif request == "get_max_depth": + return self.get_depth() + elif request == "get_sub_compartments_root": + return self.get_sub_compartments_root() + elif request == "get_tree_view": + return self.get_comp_tree() + elif request == "get_sub_compartments": + if args: + return self.get_sub_compartments(args[0]) + else: + raise ValueError("Compartment name required for 'get_sub_compartments' request.") + elif request == "get_compartments_at_depth": + if args: + return self.get_compartments_by_depth(int(args[0])) + else: + raise ValueError("Depth value required for 'get_compartments_at_depth' request.") + else: + raise ValueError("Invalid request.") diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/config_manager.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/config_manager.py new file mode 100644 index 000000000..52fb8ef69 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/config_manager.py @@ -0,0 +1,55 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +config_manager.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import yaml +import argparse +import sys + +class ConfigManager: + def __init__(self): + + if len(sys.argv) == 1: + sys.argv.extend(['--config-file=config.yaml', '--interactive']) + + # Parse arguments + self.args = self.parse_arguments() + + # Load YAML configuration if specified + if self.args.config_file: + self.config = self.load_yaml_config(self.args.config_file) + else: + self.config = {} + + def load_yaml_config(self, config_file): + with open(config_file, 'r') as file: + return yaml.safe_load(file) + + def parse_arguments(self): + parser = argparse.ArgumentParser(description="OCI Query Tool") + parser.add_argument("--config-file", type=str, help="Path to YAML config file") + parser.add_argument("--csv-dir", type=str, help="Directory with CSV files") + parser.add_argument("--prefix", type=str, help="File prefix for filtering CSV files") + parser.add_argument("--output-format", type=str, help="Output format (DataFrame, JSON, YAML)") + parser.add_argument("--query-file", type=str, help="Path to YAML query file") + parser.add_argument("--delimiter", type=str, help="CSV delimiter") + parser.add_argument("--case-insensitive-headers", action="store_true", help="Convert headers to lowercase") + parser.add_argument("--output-dir", type=str, help="Directory to save query results") + parser.add_argument("--interactive", action="store_true", help="Enable interactive mode") + parser.add_argument("--log-level", type=str, help="Set log level") + parser.add_argument("--debug", action="store_true", help="Enable debug mode") + + parser.add_argument("--train-model", type=str, help="Path to JSON file for training the username classifier") + # New argument for testing the model + parser.add_argument("--test-model", type=str, help="Username to test with the classifier") + return parser.parse_args() + + def get_setting(self, key): + # Return CLI argument if available, otherwise fallback to config file + return getattr(self.args, key.replace('-', '_'), None) or self.config.get(key, None) diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/csv_loader_duckdb.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/csv_loader_duckdb.py new file mode 100644 index 000000000..80fd235ab --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/csv_loader_duckdb.py @@ -0,0 +1,50 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +csv_loader_duckdb.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import duckdb +import pandas as pd +import os +import re + +class CSVLoaderDuckDB: + def __init__(self, csv_dir, prefix="csve", delimiter=',', case_insensitive_headers=False): + self.csv_dir = csv_dir + self.prefix = prefix # No auto-underscore to allow flexibility + self.delimiter = delimiter + self.case_insensitive_headers = case_insensitive_headers + self.conn = duckdb.connect(database=':memory:') # In-memory DuckDB connection + + def load_csv_files(self): + for filename in os.listdir(self.csv_dir): + if filename.endswith(".csv") and filename.startswith(self.prefix): # Ensure prefix check + # Remove only the prefix from the beginning, keeping the rest intact + table_name = filename[len(self.prefix):].removeprefix("_").removesuffix(".csv") + + # Ensure valid DuckDB table name + table_name = table_name.replace("-", "_").replace(" ", "_") + table_name = f'"{table_name}"' # Quote it to allow special characters + + print(f"Loading CSV file into DuckDB: {filename} as {table_name}") + + # Read CSV into pandas DataFrame + df = pd.read_csv(os.path.join(self.csv_dir, filename), delimiter=self.delimiter) + + # Replace dots in headers with underscores + df.columns = [re.sub(r'[.-]', '_', col) for col in df.columns] + + # Optionally convert headers to lowercase + if self.case_insensitive_headers: + df.columns = [col.lower() for col in df.columns] + + # Register DataFrame in DuckDB + self.conn.execute(f"CREATE TABLE {table_name} AS SELECT * FROM df") + + def query(self, sql): + return self.conn.execute(sql).fetchdf() # Fetch result as a pandas DataFrame diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/data_validator.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/data_validator.py new file mode 100644 index 000000000..39f4e47d6 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/data_validator.py @@ -0,0 +1,16 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +data_validator.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +class DataValidator: + @staticmethod + def validate_dataframe(df, required_columns): + missing_columns = [col for col in required_columns if col not in df.columns] + if missing_columns: + print(f"Warning: Missing columns {missing_columns} in DataFrame") diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/directory_selector.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/directory_selector.py new file mode 100644 index 000000000..071efa1e9 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/directory_selector.py @@ -0,0 +1,68 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +directory_selector.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import os +import questionary + +class DirectorySelector: + def __init__(self, parent_dir): + """ + Initialize with the parent directory which contains subdirectories. + :param parent_dir: Path to the parent directory. + """ + if not os.path.isdir(parent_dir): + raise ValueError(f"Provided path '{parent_dir}' is not a directory.") + self.parent_dir = os.path.abspath(parent_dir) + self.new_snapshot = "Create new snapshot of tenancy" + + + def list_subdirectories(self): + """ + List all subdirectories in the parent directory sorted by creation time (newest first). + :return: A list of subdirectory names. + """ + subdirs = [ + name for name in os.listdir(self.parent_dir) + if os.path.isdir(os.path.join(self.parent_dir, name)) + ] + + # Sort by creation time, newest first + subdirs.sort(key=lambda name: os.path.getctime(os.path.join(self.parent_dir, name)), reverse=True) + return subdirs + + def select_directory(self): + """ + Prompts the user to select a subdirectory using questionary. + :return: The full path to the selected subdirectory or None if no selection is made. + """ + subdirs = self.list_subdirectories() + if not subdirs: + print(f"No subdirectories found in {self.parent_dir}") + return None + + # Prompt the user to select one of the subdirectories. + subdirs.append(self.new_snapshot) + selected = questionary.select( + "Select a directory or create a new snapshot from the tenancy using showoci:", + choices=subdirs + ).ask() + + if selected is None: + # User cancelled the selection. + return None + + if selected == self.new_snapshot: + return selected + + # Return the full directory path. + return os.path.join(self.parent_dir, selected) + + def get_new_snapshot(self): + return self.new_snapshot \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/file_selector.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/file_selector.py new file mode 100644 index 000000000..d39d206a4 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/file_selector.py @@ -0,0 +1,43 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +file_selector.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import os +import questionary + +class FileSelector: + def __init__(self, directory): + """Initialize FileSelector with the given directory.""" + self.directory = directory + + def get_yaml_files(self): + """Retrieve a list of YAML files from the specified directory.""" + if not os.path.isdir(self.directory): + print(f"Error: The directory '{self.directory}' does not exist.") + return [] + + # List only .yaml or .yml files + return [f for f in os.listdir(self.directory) if f.endswith((".yaml", ".yml"))] + + def select_file(self): + """Allows the user to select a YAML file interactively.""" + yaml_files = self.get_yaml_files() + + if not yaml_files: + print("No YAML files found in the directory.") + return None + + # Use questionary to allow the user to select a file + selected_file = questionary.select( + "Select a YAML file:", choices=yaml_files + ).ask() + + if selected_file: + return os.path.join(self.directory, selected_file) + return None diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/logger.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/logger.py new file mode 100644 index 000000000..6d7396b23 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/logger.py @@ -0,0 +1,23 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +logger.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import logging + +class Logger: + def __init__(self, level='INFO'): + self.logger = logging.getLogger(__name__) + self.logger.setLevel(getattr(logging, level.upper(), logging.INFO)) + handler = logging.StreamHandler() + handler.setFormatter(logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')) + self.logger.addHandler(handler) + + def log(self, message, level='INFO'): + if hasattr(self.logger, level.lower()): + getattr(self.logger, level.lower())(message) diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/oci_config_selector.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/oci_config_selector.py new file mode 100644 index 000000000..ee594d621 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/oci_config_selector.py @@ -0,0 +1,219 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +oci_config_selector.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import configparser +import os +import questionary +import tempfile + +class OCIConfigSelector: + def __init__(self, oci_config_file, tqt_config_file, csv_dir): + """ + Initializes the OCIConfigSelector with paths to both config files. + The paths support '~' to denote the user's home directory. + :param oci_config_file: Path to the main OCI config file (DEFAULT domain). + :param tqt_config_file: Path to the TQT config file (additional tenancies). + :param csv_dir: Base directory for CSV files. + """ + # Expand the user home directory (e.g., '~/.oci/config') + self.oci_config_file = os.path.expanduser(oci_config_file) + self.tqt_config_file = os.path.expanduser(tqt_config_file) + self.csv_dir = csv_dir + self.config = configparser.ConfigParser() + self.combined_config_content = None + self.read_and_combine_configs() + + def read_and_combine_configs(self): + """ + Reads both config files, concatenates their content, and loads the combined config. + """ + combined_content = [] + + # Read the main OCI config file (DEFAULT domain) + if os.path.exists(self.oci_config_file): + try: + with open(self.oci_config_file, 'r') as f: + oci_content = f.read().strip() + if oci_content: + combined_content.append(oci_content) + print(f"Loaded DEFAULT domain from: {self.oci_config_file}") + except Exception as e: + print(f"Warning: Could not read OCI config file {self.oci_config_file}: {e}") + else: + print(f"Warning: OCI config file not found: {self.oci_config_file}") + + # Read the TQT config file (additional tenancies) + if os.path.exists(self.tqt_config_file): + try: + with open(self.tqt_config_file, 'r') as f: + tqt_content = f.read().strip() + if tqt_content: + combined_content.append(tqt_content) + print(f"Loaded additional tenancies from: {self.tqt_config_file}") + except Exception as e: + print(f"Warning: Could not read TQT config file {self.tqt_config_file}: {e}") + else: + print(f"Warning: TQT config file not found: {self.tqt_config_file}") + + # Combine the content + if combined_content: + self.combined_config_content = '\n\n'.join(combined_content) + + # Create a temporary file to load the combined configuration + with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.config') as temp_file: + temp_file.write(self.combined_config_content) + temp_config_path = temp_file.name + + try: + # Read the combined configuration + read_files = self.config.read(temp_config_path) + if not read_files: + raise FileNotFoundError(f"Unable to read combined config from temporary file") + print(f"Successfully combined and loaded configuration from both files") + finally: + # Clean up the temporary file + try: + os.unlink(temp_config_path) + except: + pass + else: + raise FileNotFoundError("No valid configuration content found in either config file") + + def get_combined_config_content(self): + """ + Returns the combined configuration content as a string. + Useful for debugging or logging purposes. + """ + if self.combined_config_content is None: + raise ValueError("No combined configuration content available. Check if config files were read successfully.") + return self.combined_config_content + + def list_sections(self): + """ + Returns a list of sections available in the combined config. + Note: The DEFAULT section is not included in this list because configparser + treats it as a special section containing default values. + """ + return self.config.sections() + + def select_section(self): + """ + Uses questionary to prompt the user to select one of the available sections, + select the DEFAULT section, or create a new section. + Returns a tuple: (section_name, prefix) where prefix is the value of the + prefix attribute under the section if it exists, otherwise None. + """ + sections = self.list_sections() + # Add "Create New Section" as an option + choices = ["DEFAULT"] + sections + ["Create New Section"] + + answer = questionary.select( + "Select a section (use arrow keys and press ENTER):", + choices=choices, + default="DEFAULT" + ).ask() + + if answer == "Create New Section": + answer = self.create_new_section() + else: + print(f"You selected: {answer}") + + # Check for the 'prefix' attribute in the selected section. + if answer == "DEFAULT": + # For DEFAULT, check the defaults() dictionary. + prefix = self.config.defaults().get("prefix", None) + else: + prefix = self.config.get(answer, "prefix") if self.config.has_option(answer, "prefix") else None + + return answer, prefix + + def create_new_section(self): + """ + Creates a new section in the TQT config file. + Asks the user whether they have CSV files or connection details. + For CSV files: asks for the prefix and shows the path where files must be pasted. + For connection details: prompts for necessary details and adds them to the new section. + """ + section_name = questionary.text("Enter the name for the new section:").ask() + # Check if the section already exists + if section_name in self.config.sections(): + print(f"Section '{section_name}' already exists. Please choose a different name.") + return self.select_section()[0] # Re-prompt for selection and return the section name + + option = questionary.select( + "Do you have CSV files or connection details?", + choices=["CSV files", "Connection Details"] + ).ask() + + if option == "CSV files": + prefix = questionary.text("Provide the prefix for the CSV files:").ask() + # Determine the path where the CSV files should be pasted. + csv_path = os.path.join(self.csv_dir, section_name, section_name + '__') + print(f"Please paste your CSV files with prefix '{prefix}' into the following directory:\n{csv_path}") + # Optionally, create the directory if it does not exist. + if not os.path.exists(csv_path): + os.makedirs(csv_path) + print(f"Created directory: {csv_path}") + + # Save the CSV prefix in the TQT config file for future reference. + self._add_section_to_tqt_config(section_name, {"prefix": prefix}) + + # Return the new section name. + print(f"New section '{section_name}' added to TQT config file. Restart to select and load your data.") + exit() + + elif option == "Connection Details": + oci_user = questionary.text("Enter OCI user:").ask() + fingerprint = questionary.text("Enter fingerprint:").ask() + tenancy = questionary.text("Enter tenancy:").ask() + region = questionary.text("Enter region:").ask() + key_file = questionary.text("Enter key file path:").ask() + + # Create new section with connection details in TQT config. + config_data = { + "user": oci_user, + "fingerprint": fingerprint, + "tenancy": tenancy, + "region": region, + "key_file": key_file + } + self._add_section_to_tqt_config(section_name, config_data) + print(f"New section '{section_name}' added to TQT config file.") + return section_name + + else: + print("Invalid option selected.") + return self.create_new_section() # Recurse until a valid option is provided. + + def _add_section_to_tqt_config(self, section_name, config_data): + """ + Adds a new section to the TQT config file. + :param section_name: Name of the section to add + :param config_data: Dictionary of key-value pairs for the section + """ + # Create the TQT config file if it doesn't exist + os.makedirs(os.path.dirname(self.tqt_config_file), exist_ok=True) + + # Read existing TQT config if it exists + tqt_config = configparser.ConfigParser() + if os.path.exists(self.tqt_config_file): + tqt_config.read(self.tqt_config_file) + + # Add the new section + tqt_config.add_section(section_name) + for key, value in config_data.items(): + tqt_config.set(section_name, key, value) + + # Write back to the TQT config file + with open(self.tqt_config_file, "w") as configfile: + tqt_config.write(configfile) + + # Refresh the combined configuration + self.read_and_combine_configs() \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/output_formatter.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/output_formatter.py new file mode 100644 index 000000000..655159167 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/output_formatter.py @@ -0,0 +1,33 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +output_formatter.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import pandas as pd +import json + +class OutputFormatter: + @staticmethod + def format_output(data, output_format="DataFrame"): + if data is None: + return "No data to display" + + try: + if isinstance(data, pd.DataFrame): + with pd.option_context('display.max_rows', None, + 'display.max_columns', None, + 'display.width', 1000): + return str(data) + elif isinstance(data, (list, tuple)): + return "\n".join(map(str, data)) + elif isinstance(data, dict): + return "\n".join(f"{k}: {v}" for k, v in data.items()) + else: + return str(data) + except Exception as e: + return f"Error formatting output: {str(e)}" \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/query_executor_duckdb.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/query_executor_duckdb.py new file mode 100644 index 000000000..0a26a17d6 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/query_executor_duckdb.py @@ -0,0 +1,37 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +query_executor_duckdb.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import duckdb + +class QueryExecutorDuckDB: + def __init__(self, conn): + self.conn = conn # DuckDB connection + + def execute_query(self, query): + try: + result = self.conn.execute(query).fetchdf() + return result + except Exception as e: + print(f"Error executing query: {e}") + return None + + def show_tables(self): + """Use DuckDB's SHOW TABLES command to list all tables.""" + result = self.conn.execute("SHOW TABLES").fetchall() + return [row[0] for row in result] + + def describe_table(self, table_name): + """Use DuckDB's DESCRIBE command to get column names and types.""" + try: + result = self.conn.execute(f"DESCRIBE {table_name}").fetchall() + return [(row[0], row[1]) for row in result] + except Exception as e: + print(f"Error describing table '{table_name}': {e}") + return None diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/classes/query_selector.py b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/query_selector.py new file mode 100644 index 000000000..8c20f9aec --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/classes/query_selector.py @@ -0,0 +1,68 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +query_selector.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import yaml +import questionary +import queue + +class QuerySelector: + def __init__(self, yaml_file=None): + """Initialize QuerySelector with an optional YAML file path and a FIFO queue.""" + self.yaml_file = yaml_file + self.query_queue = queue.Queue() # Always initialize an empty FIFO queue + + if yaml_file: + self.queries = self.load_queries() + else: + print("No YAML file provided. Initializing an empty queue.") + self.queries = [] # Empty query list if no file is provided + + def load_queries(self): + """Load queries from a YAML file.""" + try: + with open(self.yaml_file, "r") as file: + data = yaml.safe_load(file) + return data.get("queries", []) + except Exception as e: + print(f"Error loading YAML file: {e}") + return [] + + def select_queries(self): + """Displays a list of query descriptions, allowing multiple selections, and pushes each item separately onto FIFO queue.""" + if not self.queries: + print("No queries available.") + return [] + + # Prepare choices: Show description only + choices = [query["description"] for query in self.queries] + + # Use questionary to allow multiple selections + selected_descriptions = questionary.checkbox( + "Select one or more queries:", choices=choices + ).ask() + + for choice in selected_descriptions: + for query in self.queries: + if query["description"] == choice: + self.query_queue.put(("Description", query["description"])) + self.query_queue.put(("SQL", query["sql"])) + if query.get("filter") != None: + self.query_queue.put(("Filter", query.get("filter", "None"))) # Return filter as-is + break # Stop after adding matching query + + def dequeue_item(self): + """Dequeues and returns the next item from the FIFO queue.""" + if not self.query_queue.empty(): + return self.query_queue.get() + else: + print("Queue is empty.") + return None + + diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/config.yaml b/security/security-design/shared-assets/oci-security-health-check-forensics/config.yaml new file mode 100644 index 000000000..48d2085f3 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/config.yaml @@ -0,0 +1,28 @@ +# Directory to store CSV files +csv_dir: "data" + +oci_config_file: "~/.oci/config" # OCI config file location (default is ~/.oci/config) +tqt_config_file: "qt_config" # Tenancy Query Tool config file location qt_config + +# Prefix for CSV files +prefix: "oci" + +# Resource argument for showoci (a: all, i: identity, n: network, c: compute, etc.) +resource_argument: "a" + +# Output format (DataFrame, json, etc.) +output_format: "DataFrame" + +# Log level (DEBUG, INFO, WARNING, ERROR, CRITICAL) +log_level: "INFO" + +# CSV file settings +delimiter: "," +case_insensitive_headers: true + +# Interactive mode +interactive: true + +# Audit fetch settings +audit_worker_count: 10 +audit_worker_window: 1 # hours \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/healthcheck_forensic_tool.py b/security/security-design/shared-assets/oci-security-health-check-forensics/healthcheck_forensic_tool.py new file mode 100644 index 000000000..2ab34185b --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/healthcheck_forensic_tool.py @@ -0,0 +1,374 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +healthcheck_forensic_tool.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import sys +import os +import glob +import shutil +import requests +import readline +import atexit +import datetime +import tempfile +from pathlib import Path + +from classes.config_manager import ConfigManager +from classes.logger import Logger +from classes.csv_loader_duckdb import CSVLoaderDuckDB as CSVLoader +from classes.query_executor_duckdb import QueryExecutorDuckDB as QueryExecutor +from classes.oci_config_selector import OCIConfigSelector +from classes.directory_selector import DirectorySelector +from classes.command_parser import CommandParser +from classes.commands.registry import CommandRegistry +from classes.commands.base_command import ShellContext +from classes.commands.command_history import CommandHistory +from classes.commands.cloudguard_commands import CloudGuardFetchCommand, CloudGuardDeleteCommand +import classes.commands.standard_commands as std +import classes.commands.filter_commands as filt +import classes.commands.control_commands as ctl +import classes.commands.audit_commands as audit + +# Pandas display options (if you ever import pandas here) +try: + import pandas as pd + pd.set_option("display.max_rows", None) + pd.set_option("display.max_columns", None) + pd.set_option("display.width", 1000) + pd.set_option("display.max_colwidth", None) +except ImportError: + pass + +# Global variable to store the combined config file path +_combined_config_file = None + +# ----------------------------------------------------------------------------- +def is_repo_accessible(url: str) -> bool: + try: + r = requests.get(url, timeout=5) + return r.status_code == 200 + except requests.RequestException: + return False + +def setup_showoci() -> bool: + repo_url = "https://github.com/oracle/oci-python-sdk" + repo_path = "oci-python-sdk" + showoci_dir = "showoci" + backup_zip = "showoci_zip/showoci.zip" + + # First, try to clone/update from GitHub + if is_repo_accessible(repo_url): + print("✓ Internet connection detected. Attempting to clone/update OCI SDK...") + try: + # Clone or pull + if not os.path.isdir(repo_path): + print("Cloning OCI SDK from GitHub...") + import git + git.Repo.clone_from(repo_url, repo_path) + print("✓ Successfully cloned OCI SDK repository") + else: + print("Updating existing OCI SDK repository...") + import git + repo = git.Repo(repo_path) + repo.remotes.origin.pull() + print("✓ Successfully updated OCI SDK repository") + + # Create symlink and copy files + link_target = os.path.join(repo_path, "examples", "showoci") + if not os.path.exists(showoci_dir): + os.symlink(link_target, showoci_dir) + + # Copy the .py files into the CWD + for src in glob.glob(os.path.join(showoci_dir, "*.py")): + shutil.copy(src, ".") + + print("✓ ShowOCI setup completed using GitHub repository") + return True + + except Exception as e: + print(f"⚠️ Failed to clone/update from GitHub: {e}") + print("📦 Falling back to local backup...") + # Fall through to backup method + + else: + print("❌ No internet connection detected or GitHub is not accessible") + print("📦 Using local backup archive...") + + # Fallback: Use local backup zip file + if not os.path.exists(backup_zip): + print(f"❌ Error: Backup file '{backup_zip}' not found!") + print(" Please ensure you have either:") + print(" 1. Internet connection to download from GitHub, OR") + print(" 2. The backup file 'showoci_zip/showoci.zip' in your project directory") + return False + + try: + print(f"📦 Extracting ShowOCI from backup archive: {backup_zip}") + import zipfile + + # Extract zip to current directory + with zipfile.ZipFile(backup_zip, 'r') as zip_ref: + zip_ref.extractall(".") + + print("✓ Successfully extracted ShowOCI from backup archive") + print("📋 Note: Using offline backup - some features may be outdated") + return True + + except Exception as e: + print(f"❌ Failed to extract from backup archive: {e}") + print(" Please check that 'showoci_zip/showoci.zip' is a valid archive") + return False + +def create_combined_config_file(oci_config_selector): + """ + Creates a temporary combined config file that showoci can use. + Returns the path to the temporary file. + """ + global _combined_config_file + + # Clean up any existing combined config file + cleanup_combined_config_file() + + # Create a new temporary file that won't be automatically deleted + temp_fd, temp_path = tempfile.mkstemp(suffix='.config', prefix='combined_oci_') + + try: + with os.fdopen(temp_fd, 'w') as temp_file: + temp_file.write(oci_config_selector.get_combined_config_content()) + + _combined_config_file = temp_path + print(f"Created temporary combined config file: {temp_path}") + return temp_path + except Exception as e: + # Clean up on error + try: + os.unlink(temp_path) + except: + pass + raise e + +def cleanup_combined_config_file(): + """ + Cleans up the temporary combined config file. + """ + global _combined_config_file + + if _combined_config_file and os.path.exists(_combined_config_file): + try: + os.unlink(_combined_config_file) + print(f"Cleaned up temporary config file: {_combined_config_file}") + except Exception as e: + print(f"Warning: Could not clean up temporary config file {_combined_config_file}: {e}") + finally: + _combined_config_file = None + +def call_showoci(combined_conf_file, profile, tenancy, out_dir, prefix, arg): + """ + Updated to use the combined config file instead of the original one. + """ + sys.argv = [ + "main.py", + "-cf", combined_conf_file, # Use the combined config file + "-t", tenancy, + f"-{arg}", + "-csv", os.path.join(out_dir, prefix), + "-jf", os.path.join(out_dir, "showoci.json") + ] + from showoci import execute_extract + execute_extract() + +def new_snapshot(tenancy, base, prefix, combined_conf_file, arg): + """ + Updated to use the combined config file. + """ + ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S") + target = os.path.join(base, f"{tenancy}_{ts}") + os.makedirs(target, exist_ok=True) + call_showoci(combined_conf_file, tenancy, tenancy, target, prefix, arg) + return target + +def set_tenancy_data(logger, cfg_mgr): + csv_dir = cfg_mgr.get_setting("csv_dir") + oci_conf = cfg_mgr.get_setting("oci_config_file") + tqt_conf = cfg_mgr.get_setting("tqt_config_file") + prefix = cfg_mgr.get_setting("prefix") + resource_arg= cfg_mgr.get_setting("resource_argument") + + print(f"\nConfig → csv_dir={csv_dir}, oci_config_file={oci_conf}, tqt_config_file={tqt_conf}, prefix={prefix}\n") + + # Create the OCI config selector with both config files + sel = OCIConfigSelector(oci_conf, tqt_conf, csv_dir) + tenancy, override_prefix = sel.select_section() + prefix = override_prefix or prefix + + # Create a temporary combined config file for showoci to use + combined_conf_file = create_combined_config_file(sel) + + tenancy_dir = os.path.join(csv_dir, tenancy) + if os.path.isdir(tenancy_dir) and os.listdir(tenancy_dir): + ds = DirectorySelector(tenancy_dir) + choice = ds.select_directory() + if choice == ds.get_new_snapshot(): + choice = new_snapshot(tenancy, tenancy_dir, prefix, combined_conf_file, resource_arg) + else: + choice = new_snapshot(tenancy, tenancy_dir, prefix, combined_conf_file, resource_arg) + + print(f"Loading CSVs from → {choice}") + loader = CSVLoader( + csv_dir=choice, + prefix=prefix, + delimiter=cfg_mgr.get_setting("delimiter"), + case_insensitive_headers=cfg_mgr.get_setting("case_insensitive_headers") or False + ) + loader.load_csv_files() + logger.log("CSV files loaded.") + + tables = loader.conn.execute("SHOW TABLES").fetchall() + tables = [t[0] for t in tables] + logger.log(f"Tables: {tables}") + + executor = QueryExecutor(loader.conn) + executor.current_snapshot_dir = choice + return executor + +def show_startup_help(): + """Display helpful information when the tool starts""" + print("=" * 80) + print("OCI QUERY TOOL - Interactive Mode") + print("=" * 80) + print("Available commands:") + print(" show tables - List all loaded CSV tables") + print(" describe
- Show table structure") + print(" SELECT * FROM
- Run SQL queries on your data") + print(" history - Show command history") + print(" help [command] - Get detailed help") + print() + print("Data Fetching Commands:") + print(" audit_events fetch DD-MM-YYYY - Fetch audit events") + print(" audit_events fetch - Load existing audit data") + print(" audit_events delete - Delete audit files") + print(" cloudguard fetch DD-MM-YYYY - Fetch Cloud Guard problems") + print(" cloudguard fetch - Load existing Cloud Guard data") + print(" cloudguard delete - Delete Cloud Guard files") + print(" Example: audit_events fetch 15-06-2025 7") + print(" (Fetches 7 days of data ending on June 15, 2025)") + print() + print("Filtering & Analysis:") + print(" filter age - Filter by date") + print(" filter compartment - Analyze compartments") + print() + print("Batch Operations:") + print(" set queries - Load queries from YAML file") + print(" run queries - Execute loaded queries") + print(" set tenancy - Switch to different tenancy") + print() + print("Type 'help ' for detailed usage or 'exit' to quit.") + print("=" * 80) + +# Register cleanup function to run at exit +def cleanup_at_exit(): + cleanup_combined_config_file() + +# ----------------------------------------------------------------------------- +def main(): + # Register cleanup function + atexit.register(cleanup_at_exit) + + try: + # 1) load config & logger + cfg = ConfigManager() + log = Logger(level=cfg.get_setting("log_level") or "INFO") + + # 2) initial setup & CLI history + setup_showoci() + cmd_history = CommandHistory(".sql_history") + + # 3) build context + executor = set_tenancy_data(log, cfg) + ctx = ShellContext( + query_executor=executor, + config_manager=cfg, + logger=log, + history=cmd_history, + query_selector=None, + reload_tenancy_fn=lambda: set_tenancy_data(log, cfg) + ) + + # 4) command registry & parser + registry = CommandRegistry() + parser = CommandParser(registry) + ctx.registry = registry + + # register commands + registry.register('show tables', std.ShowTablesCommand) + registry.register('describe', std.DescribeCommand) + registry.register('exit', std.ExitCommand) + registry.register('quit', std.ExitCommand) + registry.register('history', std.HistoryCommand) + registry.register('help', std.HelpCommand) + registry.register('filter age', filt.AgeFilterCommand) + registry.register('filter compartment', filt.CompartmentFilterCommand) + registry.register('set queries', ctl.SetQueriesCommand) + registry.register('run queries', ctl.RunQueriesCommand) + registry.register('set tenancy', ctl.SetTenancyCommand) + registry.register('audit_events fetch', audit.AuditEventsFetchCommand) + registry.register('audit_events delete', audit.AuditEventsDeleteCommand) + registry.register('cloudguard fetch', CloudGuardFetchCommand) + registry.register('cloudguard delete', CloudGuardDeleteCommand) + registry.register('', std.ExecuteSqlCommand) + + # Show startup help + show_startup_help() + + # 5) REPL + while True: + try: + user_input = input("CMD> ").strip() + if not user_input: + continue + + low = user_input.lower() + if low in ('exit','quit'): + cmd_history.save_history() + break + if low == 'history': + cmd_history.show_history() + continue + if user_input.startswith('!'): + user_input = cmd_history.get_command(user_input) + + # save it (unless it was a bang-exec) + if not user_input.startswith('!'): + cmd_history.add(user_input) + + cmd_name, args = parser.parse(user_input) + cmd_cls = registry.get(cmd_name) + if not cmd_cls: + print(f"Unknown command: {cmd_name}") + continue + + cmd = cmd_cls(ctx) + cmd.execute(args) + + except EOFError: + cmd_history.save_history() + break + except KeyboardInterrupt: + print("\nCancelled.") + except Exception as e: + log.log(f"Error: {e}", level="ERROR") + + except Exception as e: + print(f"Fatal error: {e}") + finally: + # Ensure cleanup happens + cleanup_combined_config_file() + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/json_file_analyzer_showoci.py b/security/security-design/shared-assets/oci-security-health-check-forensics/json_file_analyzer_showoci.py new file mode 100644 index 000000000..88165a1c3 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/json_file_analyzer_showoci.py @@ -0,0 +1,397 @@ +""" +Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. +This software is dual-licensed to you under the Universal Permissive License (UPL) 1.0 as shown at https://oss.oracle.com/licenses/upl or Apache License 2.0 as shown at http://www.apache.org/licenses/LICENSE-2.0. You may choose either license. + +json_file_analyzer_showoci.py +@author base: Jacco Steur +Supports Python 3 and above + +coding: utf-8 +""" +import json +import argparse +from collections import OrderedDict, defaultdict + +def get_type_name(value): + """Returns a string representation of the value's type.""" + if isinstance(value, str): + return "string" + elif isinstance(value, bool): + return "boolean" + elif isinstance(value, int): + return "integer" + elif isinstance(value, float): + return "float" + elif value is None: + return "null" + elif isinstance(value, list): + return "array" + elif isinstance(value, dict): + return "object" + else: + return type(value).__name__ + +def discover_json_structure_recursive(data, max_depth=5, current_depth=0): + """ + Recursively discovers the structure of JSON data with depth limiting. + """ + if isinstance(data, dict): + if current_depth >= max_depth: + return f"object (max depth {max_depth} reached - {len(data)} keys)" + + structure = OrderedDict() + for key, value in data.items(): + structure[key] = discover_json_structure_recursive(value, max_depth, current_depth + 1) + return structure + elif isinstance(data, list): + if not data: + return "array (empty)" + else: + # Always try to show the structure of array elements + first_element = data[0] + + # If it's an array of simple types, handle that + if not isinstance(first_element, (dict, list)): + element_type = get_type_name(first_element) + # Check if all elements are the same simple type + if all(get_type_name(item) == element_type for item in data[:min(5, len(data))]): + return f"array of {element_type}" + else: + return "array of mixed simple types" + + # For complex types (objects/arrays), analyze structure + element_structure = discover_json_structure_recursive(first_element, max_depth, current_depth + 1) + + # Check if other elements have similar structure (sample a few) + sample_size = min(5, len(data)) + similar_structure = True + + if len(data) > 1: + for i in range(1, sample_size): + other_structure = discover_json_structure_recursive(data[i], max_depth, current_depth + 1) + if not structures_are_similar(element_structure, other_structure): + similar_structure = False + break + + if similar_structure: + return { + "_array_info": f"array of {len(data)} items", + "_element_structure": element_structure + } + else: + return { + "_array_info": f"array of {len(data)} items (mixed structures)", + "_example_element": element_structure + } + else: + return get_type_name(data) + +def structures_are_similar(struct1, struct2): + """Check if two structures are similar enough to be considered the same type.""" + if type(struct1) != type(struct2): + return False + + if isinstance(struct1, dict) and isinstance(struct2, dict): + # Consider similar if they have mostly the same keys + keys1 = set(struct1.keys()) + keys2 = set(struct2.keys()) + common_keys = keys1 & keys2 + total_keys = keys1 | keys2 + + # Similar if at least 70% of keys are common + similarity = len(common_keys) / len(total_keys) if total_keys else 1 + return similarity >= 0.7 + + return struct1 == struct2 + +def merge_structures(struct1, struct2): + """ + Merge two structure representations, handling cases where the same field + might have different types across different records of the same type. + """ + if struct1 == struct2: + return struct1 + + if isinstance(struct1, dict) and isinstance(struct2, dict): + merged = OrderedDict() + all_keys = set(struct1.keys()) | set(struct2.keys()) + + for key in all_keys: + if key in struct1 and key in struct2: + merged[key] = merge_structures(struct1[key], struct2[key]) + elif key in struct1: + merged[key] = f"{struct1[key]} (optional)" + else: + merged[key] = f"{struct2[key]} (optional)" + + return merged + else: + # If structures are different and not both dicts, show both possibilities + return f"{struct1} | {struct2}" + +def analyze_json_by_type(data, max_depth=5): + """ + Analyze JSON data grouped by 'type' field. + + Args: + data: List of dictionaries, each containing 'type' and 'data' fields + max_depth: Maximum depth for structure analysis + + Returns: + Dictionary mapping type names to their data structures + """ + if not isinstance(data, list): + raise ValueError("Expected JSON data to be a list of objects") + + type_structures = {} + type_counts = defaultdict(int) + + for item in data: + if not isinstance(item, dict): + print(f"Warning: Found non-dict item: {type(item)}") + continue + + if 'type' not in item: + print(f"Warning: Found item without 'type' field: {list(item.keys())}") + continue + + if 'data' not in item: + print(f"Warning: Found item without 'data' field for type '{item['type']}'") + continue + + item_type = item['type'] + item_data = item['data'] + type_counts[item_type] += 1 + + # Discover structure of this item's data + current_structure = discover_json_structure_recursive(item_data, max_depth) + + # If we've seen this type before, merge structures + if item_type in type_structures: + type_structures[item_type] = merge_structures( + type_structures[item_type], + current_structure + ) + else: + type_structures[item_type] = current_structure + + return type_structures, dict(type_counts) + +def print_dict_structure(struct, indent=0, max_line_length=120): + """Print dictionary structure with proper indentation and special handling for arrays.""" + spaces = " " * indent + if isinstance(struct, dict): + # Special handling for array structures + if "_array_info" in struct: + print(f"{spaces}{struct['_array_info']}") + if "_element_structure" in struct: + print(f"{spaces}Each element has structure:") + print_dict_structure(struct["_element_structure"], indent + 2, max_line_length) + elif "_example_element" in struct: + print(f"{spaces}Example element structure:") + print_dict_structure(struct["_example_element"], indent + 2, max_line_length) + return + + # Normal object structure + print(f"{spaces}{{") + for i, (key, value) in enumerate(struct.items()): + comma = "," if i < len(struct) - 1 else "" + if isinstance(value, dict): + if "_array_info" in value: + # Special compact display for arrays + print(f"{spaces} \"{key}\": {value['_array_info']}") + if "_element_structure" in value: + print(f"{spaces} Each element:") + print_dict_structure(value["_element_structure"], indent + 6, max_line_length) + elif "_example_element" in value: + print(f"{spaces} Example element:") + print_dict_structure(value["_example_element"], indent + 6, max_line_length) + if comma: + print(f"{spaces} ,") + else: + print(f"{spaces} \"{key}\": {{") + print_dict_structure(value, indent + 4, max_line_length) + print(f"{spaces} }}{comma}") + else: + # Handle simple values and other types + value_str = format_value_string(value, max_line_length - indent - len(key) - 6) + print(f"{spaces} \"{key}\": {value_str}{comma}") + print(f"{spaces}}}") + else: + formatted_value = format_value_string(struct, max_line_length - indent) + print(f"{spaces}{formatted_value}") + +def format_value_string(value, max_length=80): + """Format a value string with appropriate truncation and cleaning.""" + value_str = str(value) + + # Clean up common patterns + value_str = value_str.replace("OrderedDict(", "").replace("})", "}") + + # For array descriptions, make them more readable + if value_str.startswith("array of ") or value_str.startswith("array ("): + # Keep array descriptions intact but clean them up + if len(value_str) > max_length: + # Find a good break point + if "," in value_str and len(value_str) > max_length: + parts = value_str.split(",") + truncated = parts[0] + if len(truncated) < max_length - 10: + truncated += ", ..." + value_str = truncated + else: + value_str = value_str[:max_length-3] + "..." + else: + # For other long strings, truncate normally + if len(value_str) > max_length: + value_str = value_str[:max_length-3] + "..." + + return value_str + +def print_type_analysis(type_structures, type_counts, filter_types=None): + """Print the analysis results in a readable format.""" + print("=" * 80) + print("JSON STRUCTURE ANALYSIS BY TYPE") + print("=" * 80) + + # Filter if requested + if filter_types: + filter_set = set(filter_types) + type_structures = {k: v for k, v in type_structures.items() if k in filter_set} + type_counts = {k: v for k, v in type_counts.items() if k in filter_set} + + print(f"\nFound {len(type_structures)} different types:") + for type_name in sorted(type_counts.keys()): + print(f" - {type_name}: {type_counts[type_name]} record(s)") + + if not filter_types: + print("\n" + "=" * 80) + print("TIP: Use --type-filter to focus on specific types for detailed analysis") + print(" Example: --type-filter \"identity,showoci\"") + + print("\n" + "=" * 80) + + for type_name in sorted(type_structures.keys()): + structure = type_structures[type_name] + print(f"\nTYPE: {type_name}") + print(f"Records: {type_counts[type_name]}") + print("-" * 60) + print("Data structure:") + + # Pretty print with better formatting + if isinstance(structure, dict): + print_dict_structure(structure, indent=2) + else: + print(f" {structure}") + + # Show field count for complex structures + if isinstance(structure, dict): + print(f" → {len(structure)} top-level fields") + print() + +def show_sample_data(data, sample_type, max_items=1): + """Show sample data for a specific type.""" + print("=" * 80) + print(f"SAMPLE DATA FOR TYPE: {sample_type}") + print("=" * 80) + + count = 0 + for item in data: + if isinstance(item, dict) and item.get('type') == sample_type: + print(f"\nSample {count + 1}:") + print("-" * 40) + sample_data = json.dumps(item['data'], indent=2) + if len(sample_data) > 2000: + lines = sample_data.split('\n') + truncated = '\n'.join(lines[:50]) + print(f"{truncated}\n... (truncated - showing first 50 lines)") + else: + print(sample_data) + + count += 1 + if count >= max_items: + break + + if count == 0: + print(f"No records found for type '{sample_type}'") + +def main(): + """ + Main function to parse arguments, read JSON file, analyze by type, + and print the results. + """ + parser = argparse.ArgumentParser( + description="Analyze JSON file structure grouped by 'type' field." + ) + parser.add_argument("json_file", help="Path to the JSON file to analyze.") + parser.add_argument( + "--max-depth", + type=int, + default=4, + help="Maximum depth to analyze nested structures (default: 4)" + ) + parser.add_argument( + "--type-filter", + help="Only analyze specific type(s), comma-separated" + ) + parser.add_argument( + "--list-types", + action="store_true", + help="Just list all available types and exit" + ) + parser.add_argument( + "--sample", + help="Show sample data for a specific type" + ) + + args = parser.parse_args() + + try: + with open(args.json_file, 'r', encoding='utf-8') as f: + try: + data = json.load(f, object_pairs_hook=OrderedDict) + except json.JSONDecodeError as e: + print(f"Error: Invalid JSON file. {e}") + return + + print(f"Analyzing file: {args.json_file}") + + type_structures, type_counts = analyze_json_by_type(data, args.max_depth) + + # List types mode + if args.list_types: + print("\nAvailable types/sections:") + for type_name in sorted(type_counts.keys()): + print(f" - {type_name} ({type_counts[type_name]} records)") + return + + # Sample data mode + if args.sample: + show_sample_data(data, args.sample) + return + + # Filter by type if specified + filter_types = None + if args.type_filter: + filter_types = [t.strip() for t in args.type_filter.split(',')] + print(f"Filtering to types: {', '.join(filter_types)}") + + print_type_analysis(type_structures, type_counts, filter_types=filter_types) + + # Additional analysis info + print("=" * 80) + print("USAGE TIPS:") + print(f"- Use --list-types to see all available types") + print(f"- Use --type-filter \"type1,type2\" to focus on specific types") + print(f"- Use --sample \"type_name\" to see actual sample data") + print(f"- Use --max-depth N to control analysis depth (current: {args.max_depth})") + + except FileNotFoundError: + print(f"Error: File not found at {args.json_file}") + except ValueError as e: + print(f"Error: {e}") + except Exception as e: + print(f"An unexpected error occurred: {e}") + +if __name__ == "__main__": + main() \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/CIS_3.0.0_OCI_Foundations_Benchmark_Identity_And_Access_Management.yaml b/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/CIS_3.0.0_OCI_Foundations_Benchmark_Identity_And_Access_Management.yaml new file mode 100644 index 000000000..76c93dbb8 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/CIS_3.0.0_OCI_Foundations_Benchmark_Identity_And_Access_Management.yaml @@ -0,0 +1,295 @@ +queries: + # Identity and Access Management + - description: "[CIS 3.0.0]:1.1 Ensure service level admins are created to manage resources of particular service (Manual)" + sql: > + SELECT DISTINCT + ic.name as compartment_name, + ic.path as compartment_path, + ip.statement, + ip.policy_name + FROM identity_policy ip + JOIN identity_compartments ic ON ip.compartment_id = ic.id + WHERE LOWER(ip.statement) LIKE '%allow group%' + AND LOWER(ip.statement) LIKE '%to manage all-resources%' + AND LOWER(ip.policy_name) != 'tenant admin policy' + ORDER BY ip.policy_name; + + - description: "[CIS 3.0.0]:1.2 Ensure permissions on all resources are given only to the tenancy administrator group (Automated)" + sql: > + SELECT DISTINCT + ic.name as compartment_name, + ic.path as compartment_path, + ip.statement, + ip.policy_name + FROM identity_compartments ic + JOIN identity_policy ip ON ic.id = ip.compartment_id + WHERE LOWER(ip.statement) LIKE '%allow group%' + AND LOWER(ip.statement) LIKE '%to manage all-resources in tenancy%' + AND LOWER(ip.policy_name) != 'tenant admin policy' + ORDER BY ip.policy_name; + + - description: "[CIS 3.0.0]:1.3 Ensure IAM administrators cannot update tenancy Administrators group" + sql: > + SELECT DISTINCT + ic.name as compartment_name, + ic.path as compartment_path, + ip.statement, + ip.policy_name + FROM identity_policy ip + JOIN identity_compartments ic ON ip.compartment_id = ic.id + WHERE LOWER(ip.policy_name) NOT IN ('tenant admin policy', 'psm-root-policy') + AND LOWER(ip.statement) LIKE '%allow group%' + AND LOWER(ip.statement) LIKE '%tenancy%' + AND (LOWER(ip.statement) LIKE '%to manage%' OR LOWER(ip.statement) LIKE '%to use%') + AND (LOWER(ip.statement) LIKE '%all-resources%' OR (LOWER(ip.statement) LIKE '%groups%' AND LOWER(ip.statement) LIKE '%users%')); + ORDER BY ip.policy_name; + + - description: "[CIS 3.0.0]:1.4 Ensure IAM password policy requires minimum length of 14 or greater (Automated). Ensure that 1 or more is selected for Numeric (minimum) OR Special (minimum)" + sql: > + SELECT DISTINCT + ic.name as compartment_name, + ic.path as compartment_path, + ip.domain_name, + ip.name, + ip.min_length, + ip.min_numerals, + ip.min_special_chars + FROM identity_domains_pwd_policies ip + JOIN identity_compartments ic ON ip.compartment_ocid = ic.id + WHERE ip.name = 'defaultPasswordPolicy' + AND min_length < 14 + AND (ip.min_numerals IS NULL OR ip.min_numerals < 1 OR ip.min_special_chars IS NULL OR ip.min_special_chars < 1) + ORDER BY LOWER(ip.domain_name) + + - description: "[CIS 3.0.0]:1.5 Ensure IAM password policy expires passwords within 365 days (Manual)" + sql: > + SELECT DISTINCT + ic.name as compartment_name, + ic.path as compartment_path, + ip.domain_name, + ip.name, + ip.password_expires_after + FROM identity_domains_pwd_policies ip + JOIN identity_compartments ic ON ip.compartment_ocid = ic.id + WHERE ip.name = 'defaultPasswordPolicy' + AND (ip.password_expires_after IS NULL OR ip.password_expires_after > 365) + ORDER BY LOWER(ip.domain_name) + + - description: "[CIS 3.0.0]:1.6 Ensure IAM password policy prevents password reuse (Manual)" + sql: > + SELECT DISTINCT + ic.name as compartment_name, + ic.path as compartment_path, + ip.domain_name, + ip.name, + ip.num_passwords_in_history + FROM identity_domains_pwd_policies ip + JOIN identity_compartments ic ON ip.compartment_ocid = ic.id + WHERE ip.name = 'defaultPasswordPolicy' + AND (ip.num_passwords_in_history IS NULL OR ip.num_passwords_in_history < 24) + ORDER BY LOWER(ip.domain_name) + + - description: "[CIS 3.0.0]:1.7 Ensure MFA is enabled for all users with a console password (Automated)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + mfa_status, + is_federated_user, + can_use_console_password + FROM identity_domains_users + WHERE active = 'True' + AND is_federated_user IS NULL + AND mfa_status IS NULL + AND can_use_console_password = 'True' + ORDER BY LOWER(domain_name) + + - description: "[CIS 3.0.0]:1.8 Ensure user API keys rotate within 90 days (Automated)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + can_use_api_keys, + api_keys + FROM identity_domains_users + WHERE can_use_api_keys = 'True' + AND api_keys IS NOT NULL + filter : "age api_keys older 90" + + - description: "[CIS 3.0.0]:1.9 Ensure user customer secret keys rotate within 90 days (Automated)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + can_use_customer_secret_keys, + customer_secret_keys + FROM identity_domains_users + WHERE can_use_customer_secret_keys = 'True' + AND customer_secret_keys IS NOT NULL + filter : "age customer_secret_keys older 90" + + - description: "[CIS 3.0.0]:1.10 Ensure user auth tokens rotate within 90 days or less (Automated)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + can_use_auth_tokens, + auth_tokens + FROM identity_domains_users + WHERE can_use_auth_tokens = 'True' + AND auth_tokens IS NOT NULL + filter : "age auth_tokens older 90" + + - description: "[CIS 3.0.0]:1.11 Ensure user IAM Database Passwords rotate within 90 days (Manual)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + can_use_db_credentials, + db_credentials + FROM identity_domains_users + WHERE can_use_db_credentials = 'True' + AND db_credentials IS NOT NULL + filter : "age db_credentials older 90" + + - description: "[CIS 3.0.0]:1.12 Ensure API keys are not created for tenancy administrator users (Automated)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + can_use_api_keys, + api_keys, + groups + FROM identity_domains_users + WHERE api_keys IS NOT NULL + AND can_use_api_keys = True + AND domain_name = 'Default' + AND groups LIKE '%Administrators%' + + - description: "[CIS 3.0.0]:1.13 Ensure all OCI IAM user accounts have a valid and current email address (Manual) ⚠️ Assuming account_recovery_required is true when email is not verified." + sql: > + SELECT DISTINCT + domain_name, + display_name, + external_id, + active, + status, + account_recovery_required + FROM identity_domains_users + WHERE account_recovery_required is true + AND active is true + AND external_id is null + + - description: "[CIS 3.0.0]:1.14 Ensure Instance Principal authentication is used for OCI instances, OCI Cloud Databases and OCI Functions to access OCI resources (Manual)" + sql: > + SELECT DISTINCT + c.path AS compartment_path, + p.policy_name, + p.statement + FROM identity_policy p + JOIN identity_compartments c ON p.compartment_id = c.id + WHERE LOWER(p.statement) LIKE '%request.principal%' + + - description: "[CIS 3.0.0]:1.15 Ensure storage service-level admins cannot delete resources they manage (Manual)" + sql: > + WITH storage_policies AS ( + SELECT DISTINCT + tenant_name, + policy_name, + statement, + LOWER(statement) as statement_lower, + CASE + WHEN LOWER(statement) LIKE '%where%' THEN + REPLACE(REPLACE(LOWER(SPLIT_PART(statement, 'WHERE', 2)), ' ', ''), '''', '') + ELSE '' + END as clean_where_clause + FROM identity_policy + WHERE LOWER(policy_name) NOT IN ('tenant admin policy', 'psm-root-policy') + AND LOWER(statement) LIKE '%allow group%' + AND LOWER(statement) LIKE '%to manage%' + AND ( + LOWER(statement) LIKE '%object-family%' OR + LOWER(statement) LIKE '%file-family%' OR + LOWER(statement) LIKE '%volume-family%' OR + LOWER(statement) LIKE '%buckets%' OR + LOWER(statement) LIKE '%objects%' OR + LOWER(statement) LIKE '%file-systems%' OR + LOWER(statement) LIKE '%volumes%' OR + LOWER(statement) LIKE '%mount-targets%' OR + LOWER(statement) LIKE '%volume-backups%' OR + LOWER(statement) LIKE '%boot-volume-backups%' + ) + ), + non_compliant_policies AS ( + SELECT * + FROM storage_policies + WHERE + -- Exclude storage admin policies (they are allowed to have = permissions) + NOT (clean_where_clause LIKE '%request.permission=bucket_delete%' OR + clean_where_clause LIKE '%request.permission=object_delete%' OR + clean_where_clause LIKE '%request.permission=file_system_delete%' OR + clean_where_clause LIKE '%request.permission=mount_target_delete%' OR + clean_where_clause LIKE '%request.permission=export_set_delete%' OR + clean_where_clause LIKE '%request.permission=volume_delete%' OR + clean_where_clause LIKE '%request.permission=volume_backup_delete%' OR + clean_where_clause LIKE '%request.permission=boot_volume_backup_delete%') + AND ( + -- No WHERE clause (unrestricted access) + (clean_where_clause = '') OR + -- WHERE clause exists but doesn't properly restrict delete permissions based on resource type + (clean_where_clause != '' AND NOT ( + -- Object-family restrictions + (statement_lower LIKE '%object-family%' AND + clean_where_clause LIKE '%request.permission!=bucket_delete%' AND + clean_where_clause LIKE '%request.permission!=object_delete%') OR + -- File-family restrictions + (statement_lower LIKE '%file-family%' AND + clean_where_clause LIKE '%request.permission!=export_set_delete%' AND + clean_where_clause LIKE '%request.permission!=mount_target_delete%' AND + clean_where_clause LIKE '%request.permission!=file_system_delete%' AND + clean_where_clause LIKE '%request.permission!=file_system_delete_snapshot%') OR + -- Volume-family restrictions + (statement_lower LIKE '%volume-family%' AND + clean_where_clause LIKE '%request.permission!=volume_backup_delete%' AND + clean_where_clause LIKE '%request.permission!=volume_delete%' AND + clean_where_clause LIKE '%request.permission!=boot_volume_backup_delete%') OR + -- Individual resource restrictions + (statement_lower LIKE '%buckets%' AND clean_where_clause LIKE '%request.permission!=bucket_delete%') OR + (statement_lower LIKE '%objects%' AND clean_where_clause LIKE '%request.permission!=object_delete%') OR + (statement_lower LIKE '%file-systems%' AND + clean_where_clause LIKE '%request.permission!=file_system_delete%' AND + clean_where_clause LIKE '%request.permission!=file_system_delete_snapshot%') OR + (statement_lower LIKE '%mount-targets%' AND clean_where_clause LIKE '%request.permission!=mount_target_delete%') OR + (statement_lower LIKE '%volumes%' AND clean_where_clause LIKE '%request.permission!=volume_delete%') OR + (statement_lower LIKE '%volume-backups%' AND clean_where_clause LIKE '%request.permission!=volume_backup_delete%') OR + (statement_lower LIKE '%boot-volume-backups%' AND clean_where_clause LIKE '%request.permission!=boot_volume_backup_delete%') + )) + ) + ) + SELECT + tenant_name, + policy_name, + statement, + FROM non_compliant_policies + ORDER BY tenant_name, policy_name + + - description: "[CIS 3.0.0]:1.16 Ensure OCI IAM credentials unused for 45 days or more are disabled (Automated)" + sql: > + SELECT DISTINCT + domain_name, + user_name, + password_last_successful_login_date + FROM identity_domains_users + filter : "age password_last_successful_login_date older 45" + + - description: "[CIS 3.0.0]:1.17 Ensure there is only one active API Key for any single OCI IAM user (Automated)" + sql: > + SELECT DISTINCT + domain_name, + display_name, + can_use_api_keys, + api_keys + FROM identity_domains_users + WHERE can_use_api_keys = 'True' + AND api_keys IS NOT NULL + AND CONTAINS(api_keys, ',') diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/FORENSIC_Audit.yaml b/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/FORENSIC_Audit.yaml new file mode 100644 index 000000000..f6212cad3 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/FORENSIC_Audit.yaml @@ -0,0 +1,10 @@ +# Replace the table names for audit logs and cloudguard events +queries: + - description: "[FORENSIC]: Fetch distict set of eventtypes from the fetched audit logs window." + sql: "SELECT DISTINCT event_type, source, data_event_name, data_compartment_name, data_identity_principal_name FROM audit_events_15042025_10" + + - description: "[FORENSIC] Get all the event_types etc and order them by priciple_name for IdentityControlPlane" + sql: "SELECT data_identity_principal_name, data_identity_ip_address, event_type, source, data_compartment_name, data_event_name FROM audit_events_15042025_10 where source = 'IdentityControlPlane' GROUP BY data_identity_principal_name, data_identity_ip_address, event_type, source, data_compartment_name, data_event_name ORDER BY data_identity_principal_name" + + - description: "[FORENSIC] Get all the event_types etc and order them by priciple_name for ConsoleSignIn" + sql: "SELECT data_identity_principal_name, data_identity_ip_address, event_type, source, data_compartment_name, data_event_name FROM audit_events_15042025_10 where source = 'IdentitySignOn' GROUP BY data_identity_principal_name, data_identity_ip_address, event_type, source, data_compartment_name, data_event_name ORDER BY data_identity_principal_name" \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/FORENSIC_Cloudguard.yaml b/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/FORENSIC_Cloudguard.yaml new file mode 100644 index 000000000..d7e449a3b --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/query_files/FORENSIC_Cloudguard.yaml @@ -0,0 +1,4 @@ +# Replace the table names for audit logs and cloudguard events +queries: + - description: "[FORENSIC] Get all the CG problems sorted by resource_name" + sql: "select resource_name, detector_rule_id, risk_level, labels, time_first_detected, time_last_detected, lifecycle_state, lifecycle_detail, detector_id from cloudguard_problems_10052025_12 ORDER BY resource_name" \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/requirements.txt b/security/security-design/shared-assets/oci-security-health-check-forensics/requirements.txt new file mode 100644 index 000000000..e8e3ed687 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-forensics/requirements.txt @@ -0,0 +1,8 @@ +pandas +pyyaml +duckdb +oci +questionary +tqdm +gitpython +requests \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-forensics/showoci_zip/showoci.zip b/security/security-design/shared-assets/oci-security-health-check-forensics/showoci_zip/showoci.zip new file mode 100644 index 000000000..af15e5a18 Binary files /dev/null and b/security/security-design/shared-assets/oci-security-health-check-forensics/showoci_zip/showoci.zip differ