feat: Add the main file for determine package reachability level of Python project #2131

p1gc0rn · 2025-08-01T01:18:35Z

The new feature helps determine the reachability of imported Python libraries in a Python project. This is part of the under developing project for imported Python libraries' reachability.

This PR includes the support for imported Python libraries defined in poetry.lock file.

experimental/main.go

experimental/main_test.go

experimental/main.go

cuixq · 2025-08-01T07:07:55Z

can you add some description for this PR?

cuixq

also let's move this to experimental/pythonreach folder and add a README file.

cuixq · 2025-09-24T04:54:00Z

experimental/pythonreach/example/poetry.lock

@@ -0,0 +1,415 @@
+# This file is automatically @generated by Poetry 2.1.3 and should not be changed by hand.


I don't see this example project is being referenced anywhere in test?

cuixq · 2025-09-24T04:56:42Z

experimental/pythonreach/main.go

+	Dependencies []string      // Direct dependencies declared in library's metadata
+}
+
+// Constants for terminal output formatting


this comment should be removed?

cuixq · 2025-09-24T05:21:21Z

experimental/pythonreach/main.go

+
+		// 6. Comparison between the collected imported libraries and the PYPI dependencies of the libraries
+		// to find the reachability of the PYPI dependencies.
+		for _, library := range importedLibraries {


let's make a struct for the output so that we can test the reachability result properly as well.

https://github.com/google/osv-scalibr/blob/main/enricher/reachability/java/java.go can be a reference about what the results should look like.

cuixq · 2025-09-24T06:24:45Z

experimental/pythonreach/main_test.go

+
+	for _, tc := range testCases {
+		t.Run(tc.name, func(t *testing.T) {
+			ctx := context.Background()


you may use t.Context() for context in testing

cuixq · 2025-09-24T06:29:26Z

experimental/pythonreach/main_test.go

+				}
+			}
+
+			// To compare slices, we need a canonical order.


I assume the main entry point will be a single string so we probably don't need to sort it.

cuixq · 2025-09-24T06:52:45Z

experimental/pythonreach/main.go

+	response, _ := reg.GetIndex(ctx, libraryInfo.Name)
+	downloadURL := ""
+	fileName := strings.ToLower(fmt.Sprintf(`%s-%s.tar.gz`, libraryInfo.Name, libraryInfo.Version))
+	for _, file := range response.Files {


you may use pypi.SdistVersion to check what is the version of the file name, see https://github.com/google/osv-scalibr/blob/629500d6fb6a1392afe40e0c9327c48a787358d8/clients/resolution/pypi_registry_client.go#L95

experimental/pythonreach/main.go

cuixq · 2025-09-24T06:54:04Z

experimental/pythonreach/main.go

+		}
+	}
+
+	downloadFileSource, err := downloadPackageSource(downloadURL)


you may find https://github.com/google/osv-scalibr/blob/629500d6fb6a1392afe40e0c9327c48a787358d8/clients/datasource/pypi_registry.go#L91 useful

thank you for the link!

cuixq · 2025-09-24T06:55:48Z

experimental/pythonreach/testdata/multifileswithentrypoint/script.sh

@@ -0,0 +1 @@
+#!/bin/bash


is there any particular reason that we want to keep this shell script?

cuixq · 2025-09-24T06:56:20Z

experimental/pythonreach/testdata/multifileswithentrypoint/main.py

@@ -0,0 +1,3 @@
+print('hello')


let's add newline to end of these test files if that won't bring any trouble

cuixq · 2025-09-24T06:58:56Z

we probably need go.mod as well

github-actions · 2025-12-13T02:29:29Z

This pull request has not had any activity for 60 days and will be automatically closed in two weeks

github-actions · 2025-12-27T02:30:31Z

Automatically closing stale pull request

cuixq · 2026-01-19T04:17:23Z

/gemini review

gemini-code-assist

Code Review

This PR introduces a new tool for analyzing Python dependency reachability, which is a great feature. The implementation follows a logical flow: finding entry points, parsing manifests, analyzing imports, and then diving into library source code. However, the current implementation has several critical and high-severity issues that need to be addressed. The tool downloads and extracts package source code into the current working directory, which is unsafe. There are cases where errors are ignored, which can lead to panics. The logic for identifying manifest files and parsing Python code has some flaws that will cause the tool to fail on valid projects or produce incorrect results. I've left specific comments on these points. Addressing these issues will significantly improve the tool's reliability and usability.

gemini-code-assist · 2026-01-19T04:19:41Z

experimental/pythonreach/main.go

+		} else if slices.Contains(unsupportedManifests, fileName) {
+			return nil, fmt.Errorf("unsupported manifest file found: %s", fileName)
+		}


The logic for finding manifest files will incorrectly fail for most Poetry projects. pyproject.toml is a standard file in projects using Poetry and will almost always be present alongside poetry.lock. The current implementation considers pyproject.toml as an "unsupported" manifest and returns an error, which will prevent the tool from running on valid projects. You should remove pyproject.toml from the unsupportedManifests list. A better approach would be to ignore files that are not in supportedManifests instead of erroring out.

Suggested change

} else if slices.Contains(unsupportedManifests, fileName) {

return nil, fmt.Errorf("unsupported manifest file found: %s", fileName)

}

} else if slices.Contains(unsupportedManifests, fileName) {

return nil, fmt.Errorf("unsupported manifest file found: %s", fileName)

}

gemini-code-assist · 2026-01-19T04:19:41Z

experimental/pythonreach/main.go

+// collect dependencies of the imported library.
+func retrieveSourceAndCollectDependencies(ctx context.Context, libraryInfo *LibraryInfo) error {
+	reg := datasource.NewPyPIRegistryAPIClient("")
+	response, _ := reg.GetIndex(ctx, libraryInfo.Name)


The error returned from reg.GetIndex(ctx, libraryInfo.Name) is being ignored. If this API call fails (e.g., due to network issues or the package not being found), response will be nil, and the subsequent code will cause a nil pointer dereference when accessing response.Files. You must handle this error.

response, err := reg.GetIndex(ctx, libraryInfo.Name) if err != nil { return fmt.Errorf("failed to get PyPI index for %s: %w", libraryInfo.Name, err) }

gemini-code-assist · 2026-01-19T04:19:41Z

experimental/pythonreach/main.go

+	importRegex       = regexp.MustCompile(`^\s*import\s+([a-zA-Z0-9_.]+)(?:\s+as\s+([a-zA-Z0-9_]+))?`)
+	fromImportRegex   = regexp.MustCompile(`^\s*from\s+([a-zA-Z0-9_.]+)\s+import\s+(.+)`)
+	importItemRegex   = regexp.MustCompile(`([a-zA-Z0-9_.*]+)(?:\s+as\s+([a-zA-Z0-9_]+))?`)
+	memberImportRegex = regexp.MustCompile(`import (\w+)\.(\w+)`)


The memberImportRegex is not anchored with ^, which means it can match an import statement anywhere in a line, including inside a string literal or comment (e.g., print("import foo.bar")). This will lead to incorrect parsing of imported libraries.

Furthermore, its placement in the if-else if chain in findImportedLibraries makes it unlikely to be reached for valid import statements, as the broader importRegex will match them first.

To fix this, you should anchor the regex and reconsider its purpose or placement in the parsing logic.

Suggested change

memberImportRegex = regexp.MustCompile(`import (\w+)\.(\w+)`)

memberImportRegex = regexp.MustCompile(`^\s*import (\w+)\.(\w+)`)

gemini-code-assist · 2026-01-19T04:19:41Z

experimental/pythonreach/main.go

+// downloadPackageSource downloads the source code of a package from PyPI.
+func downloadPackageSource(downloadLink string) (string, error) {
+	filename := filepath.Base(downloadLink)
+	tempFile, err := os.CreateTemp(".", filename)


The function downloadPackageSource downloads files to the current working directory by using os.CreateTemp(".", filename). This is risky as it can pollute the user's project directory. It's strongly recommended to download files into a dedicated temporary directory created using os.MkdirTemp to avoid side effects.

gemini-code-assist · 2026-01-19T04:19:41Z

experimental/pythonreach/main.go

+		if err != nil {
+			return fmt.Errorf("failed to read tar header: %w", err)
+		}
+		targetPath := filepath.Join(".", header.Name)


The function extractCompressedPackageSource extracts the package source code into the current working directory using filepath.Join(".", header.Name). This is dangerous as it can pollute the user's project directory and potentially overwrite existing files with the same names. It's strongly recommended to extract the contents into a dedicated temporary directory created using os.MkdirTemp.

gemini-code-assist · 2026-01-19T04:19:42Z

experimental/pythonreach/main.go

+	// TODO: Find alternative ways for these regexes.
+	mainEntryRegex    = regexp.MustCompile(`^\s*if\s+__name__\s*==\s*['"]__main__['"]\s*:`)
+	importRegex       = regexp.MustCompile(`^\s*import\s+([a-zA-Z0-9_.]+)(?:\s+as\s+([a-zA-Z0-9_]+))?`)
+	fromImportRegex   = regexp.MustCompile(`^\s*from\s+([a-zA-Z0-9_.]+)\s+import\s+(.+)`)


The regular expression for from ... import ... statements does not handle multi-line imports that use parentheses, such as:

from my_library import ( module1, module2, )

The current regex ^\s*from\s+([a-zA-Z0-9_.]+)\s+import\s+(.+) will only capture the first line. To handle this, you would need a more complex parsing logic that can read across multiple lines when an opening parenthesis is detected.

gemini-code-assist · 2026-01-19T04:19:42Z

experimental/pythonreach/main.go

+		if err != nil {
+			return err
+		}
+		if d.IsDir() && strings.Contains(d.Name(), folderName) {


The findFolder function uses strings.Contains(d.Name(), folderName) to find the library's source directory. This can be inaccurate. For example, if searching for foo-1.0, it could incorrectly match a directory named my-foo-1.0-extra. It would be more reliable to check if the directory name is the folder name, or at least starts with it, e.g., d.Name() == folderName or strings.HasPrefix(d.Name(), folderName). Since extracted tarballs usually create a directory like packagename-version, checking for a prefix would be safer.

gemini-code-assist · 2026-01-19T04:19:42Z

experimental/pythonreach/main.go

+
+		return scanFile(file, func(line string) error {
+			for _, module := range libraryInfo.Modules {
+				searchTerm := fmt.Sprintf("def %s(", module.Name)


The method for finding where an imported item is defined is based on a simple string search: searchTerm := fmt.Sprintf("def %s(", module.Name). This is not very robust and can lead to both false positives and false negatives.

False positives: It can match function names in comments or strings.

False negatives: It won't match functions with different spacing (e.g., def my_func(...)), functions defined in classes (methods), or other imported symbols like classes or variables.
Consider using regular expressions or, for a more robust solution, a Python AST parser to accurately locate definitions.

gemini-code-assist · 2026-01-19T04:19:42Z

experimental/pythonreach/main.go

+			}
+
+			for _, module := range library.Modules {
+				if module.SourceDefinedPaths == nil {


The logic here assumes that if the source definition path for a module is not found (module.SourceDefinedPaths == nil), all of the library's dependencies are reachable. This is a risky assumption and can lead to incorrect reachability analysis. It would be better to report the reachability as "unknown" or "undetermined" in this case, rather than defaulting to "reachable". This makes the tool's output more accurate about its limitations.

gemini-code-assist · 2026-01-19T04:19:42Z

experimental/pythonreach/main_test.go

+		},
+		{
+			name:           "Malformed poetry.lock - Parser error",
+			fpathInTestDir: "./testdata/tmultifileswithentrypoint/poetry.lock",


There seems to be a typo in the file path for this test case. The directory is named multifileswithentrypoint, but the path used is ./testdata/tmultifileswithentrypoint/poetry.lock (with an extra 't'). This will cause the test to pass because os.Open will fail to find the file, which matches expectError: true, but it's not testing what's intended (parsing a malformed file).

Suggested change

fpathInTestDir: "./testdata/tmultifileswithentrypoint/poetry.lock",

fpathInTestDir: "./testdata/multifileswithentrypoint/poetry.lock",

Add the main file

9feba41

p1gc0rn requested a review from cuixq August 1, 2025 01:18

p1gc0rn changed the title ~~Add the main file for package reachability level~~ #feat Add the main file for package reachability level Aug 1, 2025

p1gc0rn changed the title ~~#feat Add the main file for package reachability level~~ feat: Add the main file for package reachability level Aug 1, 2025

Add a comment

388c206

cuixq reviewed Aug 1, 2025

View reviewed changes

p1gc0rn changed the title ~~feat: Add the main file for package reachability level~~ feat: Add the main file for determine package reachability level of Python project Aug 17, 2025

p1gc0rn added 5 commits August 17, 2025 21:35

Update main.go with suggestion changes

c26429c

Resolved comments

413dab2

Add helper function

fddeaf7

Add testdata

c5f2001

Clean up and add example of reachable library

a652588

cuixq reviewed Sep 24, 2025

View reviewed changes

github-actions bot added the stale The issue or PR is stale and pending automated closure label Dec 13, 2025

github-actions bot added the autoclosed Closed by automation label Dec 27, 2025

github-actions bot closed this Dec 27, 2025

cuixq removed autoclosed Closed by automation stale The issue or PR is stale and pending automated closure labels Dec 31, 2025

cuixq reopened this Dec 31, 2025

gemini-code-assist bot reviewed Jan 19, 2026

View reviewed changes

		@@ -0,0 +1,415 @@
		# This file is automatically @generated by Poetry 2.1.3 and should not be changed by hand.

	memberImportRegex = regexp.MustCompile(`import (\w+)\.(\w+)`)
	memberImportRegex = regexp.MustCompile(`^\s*import (\w+)\.(\w+)`)

	fpathInTestDir: "./testdata/tmultifileswithentrypoint/poetry.lock",
	fpathInTestDir: "./testdata/multifileswithentrypoint/poetry.lock",

feat: Add the main file for determine package reachability level of Python project #2131

Are you sure you want to change the base?

feat: Add the main file for determine package reachability level of Python project #2131

Uh oh!

Conversation

p1gc0rn commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cuixq commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cuixq left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cuixq commented Sep 24, 2025

Uh oh!

github-actions bot commented Dec 13, 2025

Uh oh!

github-actions bot commented Dec 27, 2025

Uh oh!

cuixq commented Jan 19, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

p1gc0rn commented Aug 1, 2025 •

edited

Loading

cuixq commented Aug 1, 2025 •

edited

Loading