-
Notifications
You must be signed in to change notification settings - Fork 888
Add GoogleFetcher #2074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GoogleFetcher #2074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to merge this into main? Or actually just the tika-grpc-3x-features branch?
If we merge to main, I'll cherrypick back to branch_3x.
main is now at 4.0.0-SNAPSHOT and is the dev branch.
| } | ||
|
|
||
| if (spoolToTemp) { | ||
| File tempFile = Files.createTempFile("spooled-temp", ".dat").toFile(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this will ensure that the temp file gets deleted when the TikaInputStream is closed: https://github.com/apache/tika/blob/main/tika-pipes/tika-fetchers/tika-fetcher-s3/src/main/java/org/apache/tika/pipes/fetcher/s3/S3Fetcher.java#L221
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, used in this fixup: 0823d12
| <parent> | ||
| <artifactId>tika-fetchers</artifactId> | ||
| <groupId>org.apache.tika</groupId> | ||
| <version>3.0.0-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4.0.0-SNAPSHOT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.0.0-SNAPSHOT is referenced in many of the pom.xml files inside tika-pipes. I can update this Google Fetcher to reference 4.0.0-SNAPSHOT, but should there be a separate PR to get everything else on the same version identifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, main is 4.0.0-SNAPSHOT now. If you're targetting main, this needs to be 4.0.0-SNAPSHOT.
Separately for the grpc work, y, you'll want to update everything in that branch to 4.0.0-SNAPSHOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed da30ede, hopefully this satisfies what you were expecting?
I can also adjust the merge-to branch to be main. Are all the changes inside tika-grpc-3x-features in main? (excluding this one)
|
@tballison Thanks for review! Honestly, I wasn't expecting one as I'm mostly pushing this to collaborate with @nddipiazza, however, if it makes sense to work as a group, then I will happily do so. I am pushing to Happy to hop on a quick call if it's easier. I'll read your other comments in the meantime 👍 |
|
To the degree we can make small/logical changes in |
Sounds great, I'm glad you think so. Let me sort out your feedback and then we'll go from there. Appreciate the response! |
c28c558 to
e06b9f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to GoogleDriveFetcher as the files.get call is part of the Google Drive API
I don't expect this fetcher to service other Google APIs, and the fetchKey/configuration is kept simple by narrowing the fetchers scope
|
Sorry, what I meant was that over on the tika-grpc-3x branch you should change everything to 4.0.0-SNAPSHOT. On this PR, you should only need to do that for the GoogleDriveFetcher, and you should make this PR against |
da30ede to
d439aca
Compare
No worries, made that change here: d439aca Regarding merging to main, it looks like main and tika-grpc-3x are not in sync. There's an additional ~100 commits if I change the branch I wish to merge to as main. Not sure how this was previously sorted between yourself/Nick, but I'm happy to figure it out if you offer some suggestions. |
…r so that it can be modified
…r so that it can be modified
This allows the fetching of items using files.get from Google Drive
Rename to GoogleDriveFetcher. This name is more appropriate as the files.get call is specific to Google Drive
d439aca to
4c254df
Compare
|
Closing this in favour of #2077, which is based off |
This allows the fetching of items using files.get from Google Drive