Skip to content

Commit 8efbc88

Browse files
authored
1.4 - State tracking for Phylum incomplete package data, fixed yarn parsing, cleanup (#5)
* Adding state tracking for incomplete packages * fix if clause * fix input variables * fix input variables * fix path resolve() * enable tmate * fix paths; disable tmate * fix output declaration * update success and complete_succcess files with CORRECT files * updated testing files * enable tmate * update testing files with old and new reqs approach * fix string issue in .replace() for incompletes * disable tmate * Refactor support for yarn lockfile parsing Added parse_yarn module to support identification and parsing of yarn v1 and v2 lockfiles returning a list of tuples (pkg,ver) * remove IPython import * break out functions for lockfile submission and changes submission * fix return stmt to parse_yarn module * fix error message when looking for PREVIOUS_INCOMPLETE env var * add debug for parse_yarn * enable tmate * update to fix single package upgrade bug * disable tmate * clean up * update comment message to fix #5 (comment) * re-enable exit condition when environment variables cannot be identified: #5 (comment) * update comment text to generalize references to requirements.txt
1 parent b37490e commit 8efbc88

File tree

10 files changed

+2205
-75
lines changed

10 files changed

+2205
-75
lines changed

action.yml

Lines changed: 51 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,14 @@ inputs:
2929
description: "Phylum version"
3030
required: false
3131
default: '0'
32+
incomplete_package_strategy:
33+
description: "Method for resolving incomplete packages"
34+
required: false
35+
default: "pass_with_comment"
36+
invoke_test_matrix:
37+
description: "Only used for testing"
38+
required: false
39+
default: false
3240

3341

3442
runs:
@@ -40,6 +48,20 @@ runs:
4048
phylum_token: ${{ inputs.phylum_token }}
4149
phylum_version: ${{ inputs.phylum_version }}
4250

51+
- name: Check for previous comment
52+
uses: peter-evans/find-comment@v1
53+
id: fc
54+
with:
55+
issue-number: ${{ github.event.pull_request.number }}
56+
body-includes: INCOMPLETE
57+
58+
- name: Store result of id=fc in environment
59+
shell: bash
60+
if: "contains(steps.fc.outputs.comment-body, 'Phylum')"
61+
run: |
62+
echo "storing PREVIOUS_INCOMPLETE"
63+
echo PREVIOUS_INCOMPLETE=1 >> $GITHUB_ENV
64+
4365
- name: Check for existing project
4466
shell: bash
4567
run: |
@@ -100,11 +122,23 @@ runs:
100122
popd
101123
102124
125+
# - name: tmate
126+
# uses: mxschmitt/action-tmate@v3
127+
128+
- name: invoke test matrix
129+
shell: bash
130+
if: "contains(inputs.invoke_test_matrix, 'true')"
131+
run: |
132+
python $GITHUB_ACTION_PATH/test_matrix.py
133+
103134
- name: python script analyze.py
104135
shell: bash
105136
if: "!contains(steps.get-prtype.outputs.prtype, 'NA')"
106137
run: python $GITHUB_ACTION_PATH/analyze.py "analyze" $GITHUB_REPOSITORY ${{ github.event.number }} ${{ inputs.vul_threshold }} ${{ inputs.mal_threshold }} ${{ inputs.eng_threshold }} ${{ inputs.lic_threshold }} ${{ inputs.aut_threshold }}
107138

139+
# - name: tmate
140+
# uses: mxschmitt/action-tmate@v3
141+
108142
- id: get-returncode
109143
shell: bash
110144
run: |
@@ -114,13 +148,7 @@ runs:
114148
ret="${ret//$'\r'/'%0A'}"
115149
echo "::set-output name=ret::$ret"
116150
117-
- name: return 5 for incomplete packages
118-
shell: bash
119-
if: "contains(steps.get-returncode.outputs.ret, '5')"
120-
run: |
121-
echo 'exiting with 5 for incomplete packages'
122-
exit 5
123-
151+
# This will catch SUCCESS cases
124152
- name: return 0 for success
125153
shell: bash
126154
if: "contains(steps.get-returncode.outputs.ret, '0')"
@@ -129,7 +157,9 @@ runs:
129157
exit 0
130158
131159
- id: get-comment-body
132-
if: "contains(steps.get-returncode.outputs.ret, '1')"
160+
# this will have to check for 1 or 5 AND if on the second run
161+
# if: "contains(steps.get-returncode.outputs.ret, '1')"
162+
if: "steps.get-returncode.outputs.ret > 0"
133163
shell: bash
134164
run: |
135165
body="$(cat ~/pr_comment.txt)"
@@ -139,12 +169,24 @@ runs:
139169
echo "::set-output name=body::$body"
140170
141171
- name: Set comment
142-
if: "contains(steps.get-returncode.outputs.ret, '1')"
172+
# This will have to check for 1 or 5
173+
# Could check for > 0 ?
174+
#if: "contains(steps.get-returncode.outputs.ret, '1')"
175+
if: "steps.get-returncode.outputs.ret > 0"
143176
uses: peter-evans/create-or-update-comment@v1
144177
with:
145178
issue-number: ${{ github.event.pull_request.number }}
146179
body: ${{ steps.get-comment-body.outputs.body }}
147180

181+
# This will catch INCOMPLETE and COMPLETE_SUCCESS
182+
- name: handle ret values of 4 or 5
183+
shell: bash
184+
if: "steps.get-returncode.outputs.ret >= 4"
185+
run: |
186+
echo 'exiting with 0 for success - ret = ${{ steps.get-returncode.outputs.ret }}'
187+
exit 0
188+
189+
# This will catch FAILURE and COMPLETE_FAILURE
148190
- name: return 1 for risk analysis failure
149191
shell: bash
150192
if: "contains(steps.get-returncode.outputs.ret, '1')"

analyze.py

Lines changed: 71 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,7 @@
66
from unidiff import PatchSet
77
import pathlib
88
from subprocess import run
9-
10-
# TODO:
11-
# [DONE] 1. Clearly document which environment variables are used
12-
# [DONE] 2. Don't assume PRs are going into master branch, need to get the target
13-
# [DONE] 3. Add Gmefile support
14-
# [DONE] 4. Document file paths
9+
import parse_yarn
1510

1611
ENV_KEYS = [
1712
"GITHUB_SHA", # for get_PR_diff; this is the SHA of the commit for the branch being merged
@@ -26,6 +21,35 @@
2621
"pr_comment": "/home/runner/pr_comment.txt",
2722
}
2823

24+
'''
25+
States on returncode
26+
0 = No comment
27+
1 = FAILED_COMMENT
28+
5 = INCOMPLETE_COMMENT then:
29+
4 = COMPLETE_SUCCESS_COMMENT
30+
1 = COMPLETE_FAILED_COMMENT
31+
'''
32+
33+
# Headers for distinct comment types
34+
DETAILS_DROPDOWN = "<details>\n<summary>Background</summary>\n<br />\nThis repository uses a GitHub Action to automatically analyze the risk of new dependencies added via Pull Request. An administrator of this repository has set score requirements for Phylum's five risk domains.<br /><br />\nIf you see this comment, one or more dependencies added to the package manager lockfile in this Pull Request have failed Phylum's risk analysis.\n</details>\n\n"
35+
36+
INCOMPLETE_COMMENT = "## Phylum OSS Supply Chain Risk Analysis - INCOMPLETE\n\n"
37+
INCOMPLETE_COMMENT += "This pull request contains TKTK package versions Phylum has not yet processed, preventing a complete risk analysis. Phylum is processing these packages currently and should complete within 30 minutes. Please wait for at least 30 minutes, then re-run the GitHub Check pertaining to `phylum-analyze-pr-action`.\n\n"
38+
INCOMPLETE_COMMENT += DETAILS_DROPDOWN
39+
40+
COMPLETE_FAILED_COMMENT = "## Phylum OSS Supply Chain Risk Analysis - COMPLETE\n\n"
41+
COMPLETE_FAILED_COMMENT += "The Phylum risk analysis is now complete.\n\n"
42+
COMPLETE_FAILED_COMMENT += DETAILS_DROPDOWN
43+
44+
COMPLETE_SUCCESS_COMMENT = "## Phylum OSS Supply Chain Risk Analysis - COMPLETE\n\n"
45+
COMPLETE_SUCCESS_COMMENT += "The Phylum risk analysis is now complete and did not identify any issues for this PR.\n\n"
46+
COMPLETE_SUCCESS_COMMENT += DETAILS_DROPDOWN
47+
48+
FAILED_COMMENT = "## Phylum OSS Supply Chain Risk Analysis\n\n"
49+
FAILED_COMMENT +=DETAILS_DROPDOWN
50+
51+
52+
2953
class AnalyzePRForReqs():
3054
def __init__(self, repo, pr_num, vul, mal, eng, lic, aut):
3155
self.repo = repo
@@ -38,17 +62,21 @@ def __init__(self, repo, pr_num, vul, mal, eng, lic, aut):
3862
self.gbl_failed = False
3963
self.gbl_incomplete = False
4064
self.incomplete_pkgs = list()
65+
self.previous_incomplete = False
4166
self.env = dict()
4267
self.get_env_vars()
4368

69+
4470
def get_env_vars(self):
4571
for key in ENV_KEYS:
4672
temp = os.environ.get(key)
4773
if temp is not None:
4874
self.env[key] = temp
4975
else:
50-
print(f"[ERROR] could not get value for os.environ.get({key})")
76+
print(f"[ERROR] could not get value for required env variable os.environ.get({key})")
5177
sys.exit(11)
78+
if os.environ.get("PREVIOUS_INCOMPLETE"):
79+
self.previous_incomplete = True
5280
return
5381

5482
def new_get_PR_diff(self):
@@ -161,26 +189,15 @@ def parse_package_lock(self, changes):
161189
ver = version_match.groups()[0]
162190
pkg_ver.append((name,ver))
163191
cur +=1
192+
193+
print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}")
164194
return pkg_ver
165195

166196
''' Parse yarn.lock diff to generate a list of tuples of (package_name, version) '''
167-
def parse_yarn_lock(self, changes):
168-
cur = 0
169-
name_pat = re.compile(r"[\"]?(@?.*?)(?=@)")
170-
version_pat = re.compile(r".*version \"(.*?)\"")
171-
resolved_pat = re.compile(r".*resolved \"(.*?)\"")
172-
integrity_pat = re.compile(r".*integrity.*")
173-
pkg_ver = list()
174197

175-
while cur < len(changes)-3:
176-
if name_match := re.match(name_pat, changes[cur]):
177-
if version_match := re.match(version_pat, changes[cur+1]):
178-
if resolved_match := re.match(resolved_pat, changes[cur+2]):
179-
if integrity_match := re.match(integrity_pat, changes[cur+3]):
180-
name = name_match.groups()[0]
181-
ver = version_match.groups()[0]
182-
pkg_ver.append((name,ver))
183-
cur += 1
198+
def parse_yarn_lock(self, changes):
199+
pkg_ver = parse_yarn.parse_yarn_lock_changes(changes)
200+
print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}")
184201
return pkg_ver
185202

186203
def parse_gemfile_lock(self, changes):
@@ -194,6 +211,8 @@ def parse_gemfile_lock(self, changes):
194211
ver = name_ver_match.groups()[1]
195212
pkg_ver.append((name,ver))
196213
cur += 1
214+
215+
print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}")
197216
return pkg_ver
198217

199218
def parse_requirements_txt(self, changes):
@@ -207,6 +226,8 @@ def parse_requirements_txt(self, changes):
207226
ver = name_ver_match.groups()[1]
208227
pkg_ver.append((name,ver))
209228
cur += 1
229+
230+
print(f"[DEBUG]: pkg_ver length: {len(pkg_ver)}")
210231
return pkg_ver
211232

212233

@@ -226,24 +247,7 @@ def generate_pkgver(self, changes, pr_type):
226247
pkg_ver_tup = self.parse_gemfile_lock(changes)
227248
return pkg_ver_tup
228249

229-
# no_version = 0
230-
# pkg_ver = dict()
231-
# pkg_ver_tup = list()
232-
233-
# for line in changes:
234-
# if line == '\n':
235-
# continue
236-
# if match := re.match(pat, line):
237-
# pkg,ver = match.groups()
238-
# pkg_ver[pkg] = ver
239-
# pkg_ver_tup.append((pkg,ver))
240-
# else:
241-
# no_version += 1
242-
243-
# if no_version > 0:
244-
# print(f"[ERROR] Found entries that do not specify version, preventing analysis. Exiting")
245-
# sys.exit(11)
246-
250+
# shouldn't get here
247251
return pkg_ver_tup
248252

249253
''' Read phylum_analysis.json file '''
@@ -330,17 +334,10 @@ def check_risk_scores(self, package_json):
330334
else:
331335
return None
332336

333-
#TODO: generalize this
334337
def build_issues_list(self, package_json, issue_flags: list):
335338
issues = list()
336339
pkg_issues = package_json.get("issues")
337-
# pkg_vulns = package_json.get("vulnerabilities")
338340

339-
# if 'vul' in issue_flags:
340-
# for vuln in pkg_vulns:
341-
# risk_level = vuln.get("risk_level")
342-
# title = vuln.get("title")
343-
# issues.append(('VUL', risk_level,title))
344341

345342
for flag in issue_flags:
346343
for pkg_issue in pkg_issues:
@@ -373,37 +370,45 @@ def run_analyze(self):
373370
pr_type = self.determine_pr_type(diff_data)
374371
changes = self.get_diff_hunks(diff_data, pr_type)
375372
pkg_ver = self.generate_pkgver(changes, pr_type)
376-
# phylum_json = self.read_phylum_analysis('/home/runner/phylum_analysis.json')
377373
phylum_json = self.read_phylum_analysis(FILE_PATHS.get("phylum_analysis"))
378374
risk_data = self.parse_risk_data(phylum_json, pkg_ver)
379375
project_url = self.get_project_url(phylum_json)
380376
returncode = 0
381377

382-
# Write pr_comment.txt only if the analysis failed (self.gbl_result == 1)
383-
if self.gbl_failed:
384-
returncode += 1
385-
386-
header = "## Phylum OSS Supply Chain Risk Analysis\n\n"
387-
header += "<details>\n<summary>Background</summary>\n<br />\nThis repository uses a GitHub Action to automatically analyze the risk of new dependencies added to requirements.txt via Pull Request. An administrator of this repository has set score requirements for Phylum's five risk domains.<br /><br />\nIf you see this comment, one or more dependencies added to the requirements.txt file in this Pull Request have failed Phylum's risk analysis.\n</details>\n\n"
388-
389-
# with open('/home/runner/pr_comment.txt','w') as outfile:
390-
with open(FILE_PATHS.get("pr_comment"),'w') as outfile:
391-
outfile.write(header)
392-
for line in risk_data:
393-
if line:
394-
outfile.write(line)
395-
outfile.write(f"\n[View this project in Phylum UI]({project_url})")
396-
print(f"[DEBUG] pr_comment.txt: wrote {outfile.tell()} bytes")
378+
output = ""
379+
# Write pr_comment.txt only if the analysis failed and all pkgvers are completed(self.gbl_result == 1)
380+
if self.gbl_failed == True and self.gbl_incomplete == False:
381+
returncode = 1
382+
# if this is a repeated test of previously incomplete packages, set the comment based on states of failed, not incomplete and previous
383+
if self.previous_incomplete == True:
384+
output = COMPLETE_FAILED_COMMENT
385+
else:
386+
output = FAILED_COMMENT
387+
388+
# write data from risk analysis
389+
for line in risk_data:
390+
if line:
391+
output += line
392+
397393
# If any packages are incomplete, add 5 to the returncode so we know the results are incomplete
398394
if self.gbl_incomplete == True:
395+
returncode = 5
399396
print(f"[DEBUG] {len(self.incomplete_pkgs)} packages were incomplete as of the analysis job")
400-
returncode += 5
397+
output = INCOMPLETE_COMMENT.replace("TKTK",str(len(self.incomplete_pkgs)))
398+
399+
if self.gbl_failed == False and self.gbl_incomplete == False and self.previous_incomplete == True:
400+
returncode = 4
401+
print(f"[DEBUG] failed=False incomplete=False previous_incomplete=True")
402+
output = COMPLETE_SUCCESS_COMMENT
401403

402-
# with open('/home/runner/returncode.txt','w') as resultout:
403404
with open(FILE_PATHS.get("returncode"),'w') as resultout:
404405
resultout.write(str(returncode))
405406
print(f"[DEBUG] returncode: wrote {str(returncode)}")
406407

408+
with open(FILE_PATHS.get("pr_comment"),'w') as outfile:
409+
outfile.write(output)
410+
outfile.write(f"\n[View this project in Phylum UI]({project_url})")
411+
print(f"[DEBUG] pr_comment.txt: wrote {outfile.tell()} bytes")
407412

408413
if __name__ == "__main__":
409414
argv = sys.argv

0 commit comments

Comments
 (0)