Skip to content

Refactor file detection and scanning logic to fix commit file handling #101

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,6 @@ file_generator.py
.env.local
Pipfile
test/
logs
logs
ai_testing/
verify_find_files_lazy_loading.py
91 changes: 80 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ The Socket Security CLI was created to enable integrations with other tools like
```` shell
socketcli [-h] [--api-token API_TOKEN] [--repo REPO] [--integration {api,github,gitlab}] [--owner OWNER] [--branch BRANCH]
[--committers [COMMITTERS ...]] [--pr-number PR_NUMBER] [--commit-message COMMIT_MESSAGE] [--commit-sha COMMIT_SHA]
[--target-path TARGET_PATH] [--sbom-file SBOM_FILE] [--files FILES] [--default-branch] [--pending-head]
[--generate-license] [--enable-debug] [--enable-json] [--enable-sarif] [--disable-overview] [--disable-security-issue]
[--allow-unverified] [--ignore-commit-files] [--disable-blocking] [--scm SCM] [--timeout TIMEOUT]
[--exclude-license-details]
[--target-path TARGET_PATH] [--sbom-file SBOM_FILE] [--files FILES] [--save-submitted-files-list SAVE_SUBMITTED_FILES_LIST]
[--default-branch] [--pending-head] [--generate-license] [--enable-debug] [--enable-json] [--enable-sarif]
[--disable-overview] [--disable-security-issue] [--allow-unverified] [--ignore-commit-files] [--disable-blocking]
[--scm SCM] [--timeout TIMEOUT] [--exclude-license-details]
````

If you don't want to provide the Socket API Token every time then you can use the environment variable `SOCKET_SECURITY_API_KEY`
Expand Down Expand Up @@ -40,13 +40,15 @@ If you don't want to provide the Socket API Token every time then you can use th
| --commit-sha | False | "" | Commit SHA |

#### Path and File
| Parameter | Required | Default | Description |
|:----------------------|:---------|:----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| --target-path | False | ./ | Target path for analysis |
| --sbom-file | False | | SBOM file path |
| --files | False | [] | Files to analyze (JSON array string) |
| --excluded-ecosystems | False | [] | List of ecosystems to exclude from analysis (JSON array string). You can get supported files from the [Supported Files API](https://docs.socket.dev/reference/getsupportedfiles) |
| --license-file-name | False | `license_output.json` | Name of the file to save the license details to if enabled |
| Parameter | Required | Default | Description |
|:----------------------------|:---------|:----------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| --target-path | False | ./ | Target path for analysis |
| --sbom-file | False | | SBOM file path |
| --files | False | [] | Files to analyze (JSON array string) |
| --excluded-ecosystems | False | [] | List of ecosystems to exclude from analysis (JSON array string). You can get supported files from the [Supported Files API](https://docs.socket.dev/reference/getsupportedfiles) |
| --license-file-name | False | `license_output.json` | Name of the file to save the license details to if enabled |
| --save-submitted-files-list | False | | Save list of submitted file names to JSON file for debugging purposes |
| --save-manifest-tar | False | | Save all manifest files to a compressed tar.gz archive with original directory structure |

#### Branch and Scan Configuration
| Parameter | Required | Default | Description |
Expand Down Expand Up @@ -133,6 +135,73 @@ The CLI determines which files to scan based on the following logic:
- **Using `--files`**: If you specify `--files '["package.json"]'`, the CLI will check if this file exists and is a manifest file before triggering a scan.
- **Using `--ignore-commit-files`**: This forces a scan of all manifest files in the target path, regardless of what's in your commit.

## Debugging and Troubleshooting

### Saving Submitted Files List

The CLI provides a debugging option to save the list of files that were submitted for scanning:

```bash
socketcli --save-submitted-files-list submitted_files.json
```

This will create a JSON file containing:
- Timestamp of when the scan was performed
- Total number of files submitted
- Total size of all files (in bytes and human-readable format)
- Complete list of file paths that were found and submitted for scanning

Example output file:
```json
{
"timestamp": "2025-01-22 10:30:45 UTC",
"total_files": 3,
"total_size_bytes": 2048,
"total_size_human": "2.00 KB",
"files": [
"./package.json",
"./requirements.txt",
"./Pipfile"
]
}
```

This feature is useful for:
- **Debugging**: Understanding which files the CLI found and submitted
- **Verification**: Confirming that expected manifest files are being detected
- **Size Analysis**: Understanding the total size of manifest files being uploaded
- **Troubleshooting**: Identifying why certain files might not be included in scans or if size limits are being hit

> **Note**: This option works with both differential scans (when git commits are detected) and full scans (API mode).

### Saving Manifest Files Archive

For backup, sharing, or analysis purposes, you can save all manifest files to a compressed tar.gz archive:

```bash
socketcli --save-manifest-tar manifest_files.tar.gz
```

This will create a compressed archive containing all the manifest files that were found and submitted for scanning, preserving their original directory structure relative to the scanned directory.

Example usage with other options:
```bash
# Save both files list and archive
socketcli --save-submitted-files-list files.json --save-manifest-tar backup.tar.gz

# Use with specific target path
socketcli --target-path ./my-project --save-manifest-tar my-project-manifests.tar.gz
```

The manifest archive feature is useful for:
- **Backup**: Creating portable backups of all dependency manifest files
- **Sharing**: Sending the exact files being analyzed to colleagues or support
- **Analysis**: Examining the dependency files offline or with other tools
- **Debugging**: Verifying file discovery and content issues
- **Compliance**: Maintaining records of scanned dependency files

> **Note**: The tar.gz archive preserves the original directory structure, making it easy to extract and examine the files in their proper context.

## Development

This project uses `pyproject.toml` as the primary dependency specification.
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "hatchling.build"

[project]
name = "socketsecurity"
version = "2.1.21"
version = "2.1.23"
requires-python = ">= 3.10"
license = {"file" = "LICENSE"}
dependencies = [
Expand Down
2 changes: 1 addition & 1 deletion socketsecurity/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__author__ = 'socket.dev'
__version__ = '2.1.21'
__version__ = '2.1.23'
16 changes: 16 additions & 0 deletions socketsecurity/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ class CliConfig:
jira_plugin: PluginConfig = field(default_factory=PluginConfig)
slack_plugin: PluginConfig = field(default_factory=PluginConfig)
license_file_name: str = "license_output.json"
save_submitted_files_list: Optional[str] = None
save_manifest_tar: Optional[str] = None

@classmethod
def from_args(cls, args_list: Optional[List[str]] = None) -> 'CliConfig':
Expand Down Expand Up @@ -101,6 +103,8 @@ def from_args(cls, args_list: Optional[List[str]] = None) -> 'CliConfig':
'repo_is_public': args.repo_is_public,
"excluded_ecosystems": args.excluded_ecosystems,
'license_file_name': args.license_file_name,
'save_submitted_files_list': args.save_submitted_files_list,
'save_manifest_tar': args.save_manifest_tar,
'version': __version__
}
try:
Expand Down Expand Up @@ -262,6 +266,18 @@ def create_argument_parser() -> argparse.ArgumentParser:
metavar="<string>",
help="SBOM file path"
)
path_group.add_argument(
"--save-submitted-files-list",
dest="save_submitted_files_list",
metavar="<path>",
help="Save list of submitted file names to JSON file for debugging purposes"
)
path_group.add_argument(
"--save-manifest-tar",
dest="save_manifest_tar",
metavar="<path>",
help="Save all manifest files to a compressed tar.gz archive with original directory structure"
)
path_group.add_argument(
"--files",
metavar="<json>",
Expand Down
Loading
Loading