feat(python): add pylock.toml support#10137
feat(python): add pylock.toml support#10137DmitriyLewen wants to merge 11 commits intoaquasecurity:mainfrom
Conversation
Add support for pylock.toml (PEP 751) lock file format. - Add PyLock LangType and TypePyLock analyzer type - Add pylock analyzer for fs and repo modes - Update parser to accept context parameter - Update purl package to map PyLock to PyPi - Add integration test and documentation
|
|
||
| type pylockAnalyzer struct{} | ||
|
|
||
| func (a pylockAnalyzer) Analyze(ctx context.Context, input analyzer.AnalysisInput) (*analyzer.AnalysisResult, error) { |
There was a problem hiding this comment.
PEP 751 states that pylock.toml should be placed alongside pyproject.toml:
The lock file(s) SHOULD be located in the directory as appropriate for the scope of the lock file. Locking against a single pyproject.toml, for instance, would place the pylock.toml in the same directory.
Don't we need to parse pyproject.toml (e.g., [project.dependencies]) to determine direct vs indirect dependencies, similar to how the poetry analyzer does it? If so, this would need to be a PostAnalyzer instead of a simple Analyzer.
There was a problem hiding this comment.
I initially added the analyzer only for pylock.toml, but it makes sense to support pyproject.toml as well.
Added in 2ad6402
| } | ||
|
|
||
| func (a pylockAnalyzer) Required(filePath string, _ os.FileInfo) bool { | ||
| return filepath.Base(filePath) == types.PyLockFile |
There was a problem hiding this comment.
PEP 751 defines that a lock file must be named pylock.toml or match the regex r"^pylock\.([^.]+)\.toml$" (e.g., pylock.linux.toml, pylock.prod.toml). Do we need to handle the pylock.<IDENTIFIER>.toml pattern as well?
There was a problem hiding this comment.
Nice catch!
Added d6ce02c.
I don't think it makes sense to use the regex from PEP 751.
I only added a check for the prefix and suffix.
I believe this should be sufficient for our case.
PEP 751 allows lock files to be named pylock.toml or pylock.<identifier>.toml (e.g., pylock.linux.toml, pylock.prod.toml).
Prepare for pyproject.toml integration to identify direct/indirect dependencies, similar to how poetry analyzer works.
Parse pyproject.toml alongside pylock.toml to identify direct vs indirect dependencies using [project.dependencies] (PEP 621).
| func (a pylockAnalyzer) matchLockFile(filePath string) bool { | ||
| // Match pylock.toml or pylock.<identifier>.toml (PEP 751) | ||
| base := filepath.Base(filePath) | ||
| return strings.HasPrefix(base, "pylock.") && strings.HasSuffix(base, ".toml") |
There was a problem hiding this comment.
PEP 751 defines the regex r"^pylock\.([^.]+)\.toml$" for named lock files, where the identifier must NOT contain dots ([^.]+). The current HasPrefix/HasSuffix approach is more permissive and would also match files like pylock.linux.arm64.toml, which is technically not valid per the spec. Is this intentional, or should we use a stricter regex match?
There was a problem hiding this comment.
Yes, this is an intentional trade-off. I prefer to avoid regex unless absolutely necessary, especially in a hot path like this (checking every scanned file).
I think we can afford to be lenient here. The likelihood of encountering a file with a dot in the identifier (e.g., pylock.linux.arm64.toml) is extremely low. Even if we do find such a file:
- We scan it successfully. The user likely ignored the PEP 751 naming convention, but I don't think it's Trivy's job to enforce that.
- If the file is invalid for some reason, we will simply show a debug log.
There was a problem hiding this comment.
At the very least, we should state that detecting files that violate PEP 751 is the intended behavior. Otherwise, it just looks like a bug. Even looking at the tests, they simply expect true, which also makes it look like dots are allowed in PEP 751.
{
name: "named lock file with dots in identifier",
filePath: "pylock.linux.arm64.toml",
want: true,
},
There was a problem hiding this comment.
I don't know what I was thinking yesterday 😄 .
I realized I can simply check that there is no dot in the identifier.
Updated the logic in 57006f5
| return xerrors.Errorf("unable to parse %s: %w", path, err) | ||
| } | ||
|
|
||
| directDeps := p.MainDeps() |
There was a problem hiding this comment.
Should we also mark dev dependencies (i.e., set Dev: true) here? The poetry analyzer identifies production dependencies by traversing the dependency graph from MainDeps(), and marks packages not reachable from that graph as Dev: true. A similar approach could work here.
Traverse the dependency graph from production dependencies defined in pyproject.toml [project.dependencies] to identify dev packages. Packages not reachable from production roots are marked with Dev: true.
Use strict regex matching per PEP 751 specification: ^pylock\.([^.]+)\.toml$ - identifier cannot contain dots. Also update documentation with pylock features.
cbd1b61 to
57006f5
Compare
Description
Add support for
pylock.toml(PEP 751) lock file format.This PR implements:
pylockanalyzer that works infsandrepomodesPyLockLangType andTypePyLockanalyzer type constantsChanges
pkg/fanal/types/const.go: AddPyLockLangType and rename filename constant toPyLockFilepkg/fanal/analyzer/const.go: AddTypePyLockanalyzer type and update slicespkg/fanal/analyzer/language/python/pylock/: New pylock analyzer packagepkg/dependency/parser/python/pylock/parse.go: Update parser to accept context parameterpkg/purl/purl.go: Add PyLock to PyPi type mappingintegration/: Add integration test for pylockdocs/guide/coverage/language/python.md: Add pylock documentationExample
Related issues
pylock.toml(PEP 751) #9410Related PRs
Checklist