Skip to content

Conversation

@bartek
Copy link

@bartek bartek commented Dec 4, 2024

This allows the fetching of items using files.get from Google Drive

Copy link
Contributor

@tballison tballison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to merge this into main? Or actually just the tika-grpc-3x-features branch?

If we merge to main, I'll cherrypick back to branch_3x.

main is now at 4.0.0-SNAPSHOT and is the dev branch.

}

if (spoolToTemp) {
File tempFile = Files.createTempFile("spooled-temp", ".dat").toFile();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this will ensure that the temp file gets deleted when the TikaInputStream is closed: https://github.com/apache/tika/blob/main/tika-pipes/tika-fetchers/tika-fetcher-s3/src/main/java/org/apache/tika/pipes/fetcher/s3/S3Fetcher.java#L221

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, used in this fixup: 0823d12

<parent>
<artifactId>tika-fetchers</artifactId>
<groupId>org.apache.tika</groupId>
<version>3.0.0-SNAPSHOT</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4.0.0-SNAPSHOT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.0.0-SNAPSHOT is referenced in many of the pom.xml files inside tika-pipes. I can update this Google Fetcher to reference 4.0.0-SNAPSHOT, but should there be a separate PR to get everything else on the same version identifier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, main is 4.0.0-SNAPSHOT now. If you're targetting main, this needs to be 4.0.0-SNAPSHOT.

Separately for the grpc work, y, you'll want to update everything in that branch to 4.0.0-SNAPSHOT.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed da30ede, hopefully this satisfies what you were expecting?

I can also adjust the merge-to branch to be main. Are all the changes inside tika-grpc-3x-features in main? (excluding this one)

@bartek
Copy link
Author

bartek commented Dec 4, 2024

@tballison Thanks for review! Honestly, I wasn't expecting one as I'm mostly pushing this to collaborate with @nddipiazza, however, if it makes sense to work as a group, then I will happily do so.

I am pushing to tika-grpc-3x-features as that is the branch we're working off in terms of using the tika-pipes project, but I'm not sure on the overall state of things -- if you have a clearer picture, it'd be great to know.

Happy to hop on a quick call if it's easier. I'll read your other comments in the meantime 👍

@tballison
Copy link
Contributor

To the degree we can make small/logical changes in main to achieve the project's goals, all the better? This is definitely a standalone PR that can go straight into main, I think.

@bartek
Copy link
Author

bartek commented Dec 4, 2024

To the degree we can make small/logical changes in main to achieve the project's goals, all the better? This is definitely a standalone PR that can go straight into main, I think.

Sounds great, I'm glad you think so. Let me sort out your feedback and then we'll go from there. Appreciate the response!

@bartek bartek force-pushed the tika-grpc-3x-features branch from c28c558 to e06b9f1 Compare December 5, 2024 01:54
@bartek bartek requested a review from tballison December 5, 2024 13:57
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to GoogleDriveFetcher as the files.get call is part of the Google Drive API

I don't expect this fetcher to service other Google APIs, and the fetchKey/configuration is kept simple by narrowing the fetchers scope

@tballison
Copy link
Contributor

Sorry, what I meant was that over on the tika-grpc-3x branch you should change everything to 4.0.0-SNAPSHOT. On this PR, you should only need to do that for the GoogleDriveFetcher, and you should make this PR against main.

@bartek bartek force-pushed the tika-grpc-3x-features branch from da30ede to d439aca Compare December 6, 2024 00:46
@bartek
Copy link
Author

bartek commented Dec 6, 2024

Sorry, what I meant was that over on the tika-grpc-3x branch you should change everything to 4.0.0-SNAPSHOT. On this PR, you should only need to do that for the GoogleDriveFetcher, and you should make this PR against main.

No worries, made that change here: d439aca

Regarding merging to main, it looks like main and tika-grpc-3x are not in sync. There's an additional ~100 commits if I change the branch I wish to merge to as main.

Not sure how this was previously sorted between yourself/Nick, but I'm happy to figure it out if you offer some suggestions.

@bartek bartek changed the base branch from tika-grpc-3x-features to main December 6, 2024 00:47
@bartek bartek changed the base branch from main to tika-grpc-3x-features December 6, 2024 00:47
Nicholas DiPiazza and others added 26 commits December 6, 2024 12:22
This allows the fetching of items using files.get from Google Drive
Rename to GoogleDriveFetcher. This name is more appropriate as the
files.get call is specific to Google Drive
@bartek bartek force-pushed the tika-grpc-3x-features branch from d439aca to 4c254df Compare December 6, 2024 16:23
@bartek
Copy link
Author

bartek commented Dec 6, 2024

Closing this in favour of #2077, which is based off main and targeting it. The tika-grpc-3x-features branch needs significant cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants