Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 19 additions & 10 deletions model/src/main/kotlin/licenses/ResolvedLicenseInfo.kt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ import org.ossreviewtoolkit.model.config.CopyrightGarbage
import org.ossreviewtoolkit.model.config.LicenseFilePatterns
import org.ossreviewtoolkit.model.config.PathExclude
import org.ossreviewtoolkit.model.utils.PathLicenseMatcher
import org.ossreviewtoolkit.utils.common.FileMatcher
Copy link
Member

@fviernau fviernau May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only maybe related: Are we actually missing some kind of distinct() call on relativeFilePaths in ...?

fun getApplicableLicenseFilesForDirectories(
        relativeFilePaths: Collection<String>,
        directories: Collection<String>
)

As I just noticed the comment, that the issue came from large amount of paths.

import org.ossreviewtoolkit.utils.ort.CopyrightStatementsProcessor
import org.ossreviewtoolkit.utils.spdx.SpdxExpression
import org.ossreviewtoolkit.utils.spdx.SpdxLicenseChoice
Expand Down Expand Up @@ -83,22 +84,30 @@ data class ResolvedLicenseInfo(
* in any of the configured [LicenseFilePatterns] matched against the root path of the package (or project).
*/
fun mainLicense(): SpdxExpression? {
val matcher = PathLicenseMatcher(LicenseFilePatterns.getInstance())
val licensePaths = flatMap { resolvedLicense ->
val licenseFilePatterns = LicenseFilePatterns.getInstance()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit-message: If this fixes this serious performance issue, I believe this should be made more prominent.

Furthermore, could you add some details, why exactly previously it was so slow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these topics, I'd refer to the original fix in ebc6fe0.

val fileMatcher = FileMatcher(licenseFilePatterns.allLicenseFilenames, ignoreCase = true)
val licenseMatcher = PathLicenseMatcher(licenseFilePatterns)

// Only keep those resolved licenses that can contribute to the main license as they match the configured
// license file patterns. This vastly reduces the search for applicable license files for scan results with a
// lot of detected license findings, like from file headers in a large code base.
val relevantResolvedLicenses = mapNotNull { resolvedLicense ->
val locations = resolvedLicense.locations.filterTo(mutableSetOf()) { fileMatcher.matches(it.location.path) }
if (locations.isNotEmpty()) resolvedLicense.copy(locations = locations) else null
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the filtering of resolved licenses against the paths was extracted to a function, and also some comment was added why filtering first is important for performance, it could be less likely the issue gets re-introduced. What do you think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does extracting the filtering to function help to avoid reintroducing the problem? Any user would still need to be aware of the function, and make use of it.

Copy link
Member

@fviernau fviernau May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone attempts to refactor the code within mainLicense(), it currently is not obvious IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to add a code comment instead.


val licensePaths = relevantResolvedLicenses.flatMap { resolvedLicense ->
resolvedLicense.locations.map { it.location.path }
}

val applicablePathsCache = mutableMapOf<String, Map<String, Set<String>>>()
val detectedLicenses = filterTo(mutableSetOf()) { resolvedLicense ->
val detectedLicenses = relevantResolvedLicenses.filterTo(mutableSetOf()) { resolvedLicense ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the lines 87 -105 actually be simplyfied to something like this / would it make sense? (This would avoid calling getApplicableLicenseFilesForDirectories() in a loop, and also manually messing around with the PathMatcher(), I guess it would be more efficient too. )

val licensePaths = flatMapTo(mutableSetOf()) { resolvedLicense ->
    resolvedLicense.locations.map { it.location.path }
}

val vcsPath = (licenseInfo.detectedLicenseInfo.findings.firstOrNull()?.provenance as? RepositoryProvenance)?.vcsInfo?.path.orEmpty()

val applicableLicensePaths = licenseMatcher.getApplicableLicenseFilesForDirectories(
    licensePaths,
    listOf(rootPath)
)

val detectedMainLicenses = mapNotNull { resolvedLicense ->
            val locations = resolvedLicense.locations.filterTo(mutableSetOf()) { it.location.path in applicableLicensePaths }
            if (locations.isNotEmpty()) resolvedLicense.copy(locations = locations) else null
}

resolvedLicense.locations.any {
val rootPath = (it.provenance as? RepositoryProvenance)?.vcsInfo?.path.orEmpty()

val applicableLicensePaths = applicablePathsCache.getOrPut(rootPath) {
matcher.getApplicableLicenseFilesForDirectories(
licensePaths,
listOf(rootPath)
)
}
val applicableLicensePaths = licenseMatcher.getApplicableLicenseFilesForDirectories(
licensePaths,
listOf(rootPath)
)

val applicableLicenseFiles = applicableLicensePaths[rootPath].orEmpty()

Expand Down
2 changes: 1 addition & 1 deletion model/src/main/kotlin/utils/PathLicenseMatcher.kt
Original file line number Diff line number Diff line change
Expand Up @@ -115,4 +115,4 @@ class PathLicenseMatcher(licenseFilePatterns: LicenseFilePatterns = LicenseFileP
}

private fun createFileMatcher(filenamePatterns: Collection<String>): FileMatcher =
FileMatcher(filenamePatterns.map { "/**/$it" }, true)
FileMatcher(filenamePatterns.map { "/**/$it" }, ignoreCase = true)
Loading