Skip to content

Conversation

daveverwer
Copy link
Member

Supercedes #3355 and implements the first part of #551. UI changes still pending.

@supersonicbyte I was seeing test failures in test_decode_Metadata_null and test_fetchMetadata which I have fixed by making Parent optional in the GitHub metadata. I don’t believe that parent is a mandatory field supplied by GitHub, so this feels like a reasonable fix to me but could you verify please?

Thanks!

@finestructure
Copy link
Member

Yes, parent is optional in the GraphQL schema:

CleanShot 2024-09-05 at 09 03 31@2x

Note that GraphQL uses sort of a reverse syntax to Swift:

optionalField: String          -> var optionalField: String?
nonOptionalField: String!      -> var nonOptionalField: String

var summary: String?

@Field(key: "forked_from")
var forkedFrom: Fork?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we're trying (😅) to keep the data fields sorted (although this isn't true generally, because it's something we started half-way through without going back and fixing it everywhere). I'll patch this up real quick.

Copy link
Collaborator

@supersonicbyte supersonicbyte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the review and fixes. @finestructure could you link the logic for normalizing the repo urls from PackageList? I couldn't find it.

}
} catch {
Current.logger().warning("updating forked from failed")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we didn't stick this in updateRepository like this - the function should really only update repository from parameters passed in (even if the "computation" is rather trivial) and shouldn't have other side effects like running a package query.

Instead, let's make updateRepository take a parameter:

func updateRepository(on database: Database,
                      for repository: Repository,
                      metadata: Github.Metadata,
                      licenseInfo: Github.License?,
                      readmeInfo: Github.Readme?,
                      s3Readme: S3Readme?,
                      fork: Fork?) async throws {

and move the "fork logic" to

func getFork(on database: Database, metadata: Github.Metadata) async throws -> Fork? {
    guard let parentUrl = metadata.repository?.normalizedParentUrl else { return nil }

    if let packageId = try await Package.query(on: database)
        .filter(\.$url, .custom("ilike"), parentUrl)
        .first()?.id {
        return .parentId(packageId)
    } else {
        return .parentURL(parentUrl)
    }
}

which we then use around line 128 before the updateRepository call:

                    let fork = try await getFork(on: database, metadata: metadata)

                    try await updateRepository(on: database,
                                               for: repo,
                                               metadata: metadata,
                                               licenseInfo: license,
                                               readmeInfo: readme,
                                               s3Readme: s3Readme,
                                               fork: fork)

This will also allow us to add a couple of tests specifically for getFork behaviour like

  • returning nil on normalisation failure
  • correct matching when urls differ in case
  • covering both the url and the id based lookups

.filter(\Repository.$package.$id == forkedId).first()

XCTAssertNil(repo?.forkedFrom)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By moving the main functionality to getFork, these tests become much easier to set up because we don't need to test all of ingest. Instead we just test getFork in isolation:

    func test_getFork() async throws {
        try await Package(id: .id0, url: "https://github.com/foo/parent.git".url, processingStage: .analysis).save(on: app.db)
        try await Package(url: "https://github.com/bar/forked.git", processingStage: .analysis).save(on: app.db)

        var metadata = Github.Metadata.mock

        do {  // test lookup when package is in the index
            metadata.repository?.parent = .init(url: "https://github.com/foo/parent.git")
            let fork = try await getFork(on: app.db, metadata: metadata)
            XCTAssertEqual(fork, .parentId(.id0))
        }

        do {  // test lookup when package is in the index but with different case in URL
            metadata.repository?.parent = .init(url: "https://github.com/Foo/Parent.git")
            let fork = try await getFork(on: app.db, metadata: metadata)
            XCTAssertEqual(fork, .parentId(.id0))
        }

        do {  // test lookup when package is not in the index
            metadata.repository?.parent = .init(url: "https://github.com/some/other.git")
            let fork = try await getFork(on: app.db, metadata: metadata)
            XCTAssertEqual(fork, .parentURL("https://github.com/some/other.git"))
        }

        do {  // test lookup when parent url is nil
            metadata.repository?.parent = .init(url: nil)
            let fork = try await getFork(on: app.db, metadata: metadata)
            XCTAssertEqual(fork, nil)
        }
    }

and then we simply extend a single test where we ensure fork is written to the db in updateRepository. For example, I'd modify test_updateRepository_update and add fork (call on or around line 152):

        try await updateRepository(on: app.db,
                                   for: repo,
                                   metadata: md,
                                   licenseInfo: .init(htmlUrl: "license url"),
                                   readmeInfo: .init(etag: "etag",
                                                     html: "readme html",
                                                     htmlUrl: "readme html url",
                                                     imagesToCache: []),
                                   s3Readme: .cached(s3ObjectUrl: "url", githubEtag: "etag"),
                                   fork: .parentURL("https://github.com/foo/bar.git"))

and validate it (around line 168):

            XCTAssertEqual(repo.forkedFrom, .parentURL("https://github.com/foo/bar.git"))

@finestructure
Copy link
Member

LGTM! Thanks for the review and fixes. @finestructure could you link the logic for normalizing the repo urls from PackageList? I couldn't find it.

It's here: https://github.com/SwiftPackageIndex/PackageList/blob/08b5021396a618f2c596dc6f48c336c82901d966/.github/add_package.swift#L24

This is essentially what determines the normalisation in the packages.url field when we add packages.

@supersonicbyte
Copy link
Collaborator

LGTM! Thanks for the review and fixes. @finestructure could you link the logic for normalizing the repo urls from PackageList? I couldn't find it.

It's here: https://github.com/SwiftPackageIndex/PackageList/blob/08b5021396a618f2c596dc6f48c336c82901d966/.github/add_package.swift#L24

This is essentially what determines the normalisation in the packages.url field when we add packages.

Thanks! That could be definitely reused.

Copy link

cla-bot bot commented Sep 5, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Ucanbarlic.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@supersonicbyte
Copy link
Collaborator

Hey @finestructure I incorporated your suggestions, please take a look when u find the time! I also took the URL approach from PackageList, it seems nicer than the way I did it.

Copy link
Member

@finestructure finestructure left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only a couple of small things left that are essentially changes to my previously proposed changes.

Comment on lines 127 to 134

let fork: Fork?
do {
fork = try await getFork(on: database, metadata: metadata)
} catch {
fork = nil
Current.logger().warning("updating forked from failed")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let fork: Fork?
do {
fork = try await getFork(on: database, metadata: metadata)
} catch {
fork = nil
Current.logger().warning("updating forked from failed")
}
let fork = try? await getFork(on: database, metadata: metadata)

Let's just straight up assign it. getFork is technically throwing due to query but we could even write try? await Package.query there, to be honest.

In fact, I think

func getFork(on database: Database, metadata: Github.Metadata) async -> Fork? {

is better, because we're mixing using nil as an "error" value and throws at the same time - my bad!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that I'd left another little tweak to getFork in a follow up comment on the other PR #3355 (comment)

So in total it should be

func getFork(on database: Database, parent: Github.Metadata.Parent?) async -> Fork? {
    guard let parentUrl = parent?.normalizedURL else { return nil }

    if let packageId = try? await Package.query(on: database)
        .filter(\.$url, .custom("ilike"), parentUrl)
        .first()?.id {
        return .parentId(packageId)
    } else {
        return .parentURL(parentUrl)
    }
}

Sorry for the back and forth!

try await repository.save(on: database)
}

func getFork(on database: Database, metadata: Github.Metadata) async throws -> Fork? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func getFork(on database: Database, metadata: Github.Metadata) async throws -> Fork? {
func getFork(on database: Database, metadata: Github.Metadata) async -> Fork? {

return nil
}

if let packageId = try await Package.query(on: database)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let packageId = try await Package.query(on: database)
if let packageId = try? await Package.query(on: database)

}

private extension URL {
var normalizedParent: Self? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var normalizedParent: Self? {
var normalized: Self? {

as this is for URLs generally.

And I think we should drop the private and put it in a new file URL+ext.swift in Core/Extensions.

@supersonicbyte
Copy link
Collaborator

@finestructure thanks for the feedback! Here we go again, could you please review again? Thanks!

s3Readme = .error("\(error)")
}

let fork: Fork? = try? await getFork(on: database, parent: metadata.repository?.parent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a warning here, I'll fix it up real quick.

@finestructure finestructure merged commit 71387f5 into main Sep 7, 2024
5 checks passed
@finestructure finestructure deleted the add-forked-from branch September 7, 2024 11:02
@finestructure finestructure mentioned this pull request Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants