|
| 1 | +### video subtitles (vtt files) |
| 2 | + |
| 3 | +The `IQSS/dataverse` PR sets the content type for new(!) files with extension `vtt` to `text/vtt` |
| 4 | +what is presented as "_Web Video Text Tracks_". The PR also enables full text indexing for these files, |
| 5 | +if [configured](https://guides.dataverse.org/en/latest/installation/config.html#solrfulltextindexing). |
| 6 | + |
| 7 | +The `gdcc/dataverse-previewer` PRs provide a new version of the video previewer. |
| 8 | +The new previewer version presents `vtt` files as subtitles for videos, |
| 9 | +the naming convention is `<video-basename>.<language-tag>.vtt`. |
| 10 | +The previewer does not rely on the content type. |
| 11 | +A proper content type may hint users to ask permission for the subtitles together with a video. |
| 12 | + |
| 13 | +Existing files with extension `vtt` will keep content type `application/octet-stream` presented as "_Unknown_". |
| 14 | +The following query shows the number of files per extension with an "_Unknown_" content type: |
| 15 | + |
| 16 | + SELECT substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) AS extension, COUNT(*) as count |
| 17 | + FROM datafile f LEFT JOIN filemetadata m ON f.id = m.datafile_id |
| 18 | + WHERE f.contenttype = 'application/octet-stream' |
| 19 | + GROUP BY extension; |
| 20 | + |
| 21 | +If `vtt` does not appear in the result, you are done. |
| 22 | +Otherwise, you may want to update the content type for existing files and reindex those datasets. |
| 23 | + |
| 24 | +First figure out which datasets would need [reindexing](https://guides.dataverse.org/en/latest/admin/solr-search-index.html#manual-reindexing): |
| 25 | + |
| 26 | + select distinct |
| 27 | + o.protocol, o.authority, o.identifier, |
| 28 | + v.versionnumber, v.minorversionnumber, v.versionstate |
| 29 | + from datafile f |
| 30 | + left join filemetadata m on f.id = m.datafile_id |
| 31 | + left join datasetversion v on v.id = m.datasetversion_id |
| 32 | + left join dvobject o on o.id = v.dataset_id |
| 33 | + WHERE contenttype = 'application/octet-stream' |
| 34 | + AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) |
| 35 | + ; |
| 36 | + |
| 37 | +Then update the content type for the files: |
| 38 | + |
| 39 | + UPDATE datafile SET contenttype = 'text/vtt' WHERE id IN ( |
| 40 | + SELECT datafile_id FROM filemetadata m |
| 41 | + WHERE contenttype = 'application/octet-stream' |
| 42 | + AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) |
| 43 | + ); |
0 commit comments