Skip to content

Update file types supported by drive_download()#465

Merged
jennybc merged 15 commits intotidyverse:mainfrom
ateucher:update-download-mimetypes
Sep 9, 2025
Merged

Update file types supported by drive_download()#465
jennybc merged 15 commits intotidyverse:mainfrom
ateucher:update-download-mimetypes

Conversation

@ateucher
Copy link
Contributor

@ateucher ateucher commented Jul 9, 2025

Since Google Docs now supports exporting to Markdown, it seemed like a good idea to enable that support in drive_download().

The only code I changed manually was to add the default .md extension for text/markdown mimetype in data-raw/extension-mime-type-defaults.csv. Then I just ran data-raw/mime-types.R to pull in the Google MIME types and create the csv files in inst/extdata/data/.

This does work now:

library(googledrive)

drive_auth(email = "andy@andyteucher.ca")

drive_download(
  "https://docs.google.com/document/d/1EK6WZyEUfdy6sj3GjLII20Ok7qTcmRD-vv-osfHnkPU",
  path = "my_document.md",
)
#> File downloaded:
#> • 'Test for google drive' <id: 1EK6WZyEUfdy6sj3GjLII20Ok7qTcmRD-vv-osfHnkPU>
#> Saved locally as:
#> • 'my_document.md'

Created on 2025-07-09 with reprex v2.1.1

@ateucher ateucher marked this pull request as draft July 10, 2025 16:51
@ateucher
Copy link
Contributor Author

Converted to draft - I'm going to do a bit more work around extensions and defaults

ateucher added 2 commits July 10, 2025 10:43
* If mimetype does not have an extension in mime_tbl.csv NA is appended to the filename
@ateucher
Copy link
Contributor Author

ateucher commented Jul 10, 2025

I don't think there is any more to do around adding default extensions to mimetypes. All of the changes here come from changes in mime::mimemap and the importFormats and exportFormats in Google's about endpoint and it seems sensible to just accept them.

I did however discover that if a mimetype does not have an extension in inst/extdata/data/mime_tbl.csv, .NA is appended to the given filename if that mimetype is specified in type. Eg.,

drive_download(
  "https://docs.google.com/document/d/1EK6WZyEUfdy6sj3GjLII20Ok7qTcmRD-vv-osfHnkPU",
  type = "text/x-markdown",
  path = "my_document.md",
  overwrite = TRUE
)
#> File downloaded:
#> • 'Test for google drive' <id: 1EK6WZyEUfdy6sj3GjLII20Ok7qTcmRD-vv-osfHnkPU>
#> Saved locally as:
#> • 'my_document.md.NA'

I added a fix and a test for this, so now it does not add .NA:

drive_download(
  "https://docs.google.com/document/d/1EK6WZyEUfdy6sj3GjLII20Ok7qTcmRD-vv-osfHnkPU",
  type = "text/x-markdown",
  path = "my_document.md",
  overwrite = TRUE
)
#> File downloaded:
#> • 'Test for google drive' <id: 1EK6WZyEUfdy6sj3GjLII20Ok7qTcmRD-vv-osfHnkPU>
#> Saved locally as:
#> • 'my_document.md'

Created on 2025-07-10 with reprex v2.1.1

@ateucher ateucher marked this pull request as ready for review July 10, 2025 18:44
@ateucher
Copy link
Contributor Author

ateucher commented Jul 10, 2025

The opposite also works:

tfile <- tempfile(fileext = ".md")
writeLines(
  "# Title

  text
  ",
  tfile
)

drive_auth("andy@andyteucher.ca")

drive_upload(tfile, type = "document")
#> Local file:
#> • '/var/folders/_f/n9fw7ctx3fqf2ty9ylw502g80000gn/T//Rtmpk9yyHy/file82652c912409.md'
#> Uploaded into Drive file:
#> • 'file82652c912409.md' <id: 1APiJm3DzCRdq2WH8_X2SE5xCDYxyOU1lVYjVmoa30ns>
#> With MIME type:
#> • 'application/vnd.google-apps.document'

Created on 2025-07-10 with reprex v2.1.1

import,application/vnd.google-apps.document,text/plain,NA
import,application/vnd.google-apps.document,text/richtext,NA
import,application/vnd.google-apps.document,text/rtf,NA
import,application/vnd.google-apps.document,text/x-markdown,NA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

text/x-markdown appears to be a rather weird/legacy MIME type, but if it's one of the official ones (and it is), I agree we need to plan for it. This is a note to myself.

jennybc and others added 9 commits September 9, 2025 12:42
The test won't work as written anymore, using the "text/x-markdown" MIME type. Instead of modifying it, I tested the filepath utility itself.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@jennybc
Copy link
Member

jennybc commented Sep 9, 2025

Thanks! As for my commits, this was mostly an effort to recollect how these MIME type / file extension tables are constructed. The way I had rigged this was not terribly clear. I think it's somewhat improved -- it'll have to do for now. And we've got tests for upload/download.

@jennybc jennybc merged commit fe38eba into tidyverse:main Sep 9, 2025
12 checks passed
@ateucher
Copy link
Contributor Author

ateucher commented Sep 9, 2025

Great, thanks so much for finishing it off!

@jrosell
Copy link

jrosell commented Sep 10, 2025

@ateucher thanks for your PR!.

It would have been awesome if the code were not lost in the process of upload + download, but I guess it's a google issue and not an implementation issue.

pak::pak("tidyverse/googledrive")
#> ℹ Loading metadata database
#> ✔ Loading metadata database ... done
#> 
#> 
#> ✔ All system requirements are already installed.
#> 
#> ℹ No downloads are needed
#> ✔ 1 pkg + 24 deps: kept 24 [4.8s]
library(googledrive)

tfile <- tempfile(fileext = ".md")
code <- r"(

# Title

```r
1+1
```

text
)"

writeLines(code, tfile, sep = "\n")
cat(paste0(readLines(tfile), "\n"))
#> 
#>  
#>  # Title
#>  
#>  ```r
#>  1+1
#>  ```
#>  
#>  text
#> 
options(gargle_oauth_email = Sys.getenv("GOOGLE_EMAIL"))
drive_auth_configure(
  path = Sys.getenv("GOOGLE_API_CREDENTIALS_OAUTH20_CLIENT_ID_DESTOP_FILE")
)
drive_upload(tfile, type = "document")
#> Local file:
#> • '/tmp/Rtmp2Lg9bf/file27bff5042c5a8.md'
#> Uploaded into Drive file:
#> • 'file27bff5042c5a8.md' <id: 1aX6tY9RIyMiq_RB1LSDeai7judGcVhGNKi4JhEdD7f8>
#> With MIME type:
#> • 'application/vnd.google-apps.document'


drive_download(
  basename(tfile),
  path = "markdown-googledrive.md",
  overwrite = TRUE
)
#> File downloaded:
#> • 'file27bff5042c5a8.md' <id: 1aX6tY9RIyMiq_RB1LSDeai7judGcVhGNKi4JhEdD7f8>
#> Saved locally as:
#> • 'markdown-googledrive.md'
cat(paste0(readLines("markdown-googledrive.md"), "\n"))
#> # Title
#>  
#>  1+1
#>  
#>  text

Created on 2025-09-10 with reprex v2.1.1.9000

@jennybc
Copy link
Member

jennybc commented Sep 10, 2025

If I send this markdown:

# Title

```r
1+1
```

```rust
fn main() { println!("Hello, world!"); }
```

text

then re-download it, I get:

# Title

```
1+1
```

```rust
fn main() { println!("Hello, world!"); }
```

text  

The code fences are retained. But the language identifier for r is lost, where the one for rust is retained. I think this is a Google problem, because R is not available in this dropdown list of languages:

Screen.Recording.2025-09-10.at.3.26.47.PM.mov

It would be great to include R there, since it is more popular than some of these languages. But it's beyond the scope of what googledrive can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants