Skip to content

Commit 72d7d9c

Browse files
authored
Merge pull request #11300 from DANS-KNAW-jp/datatype-for-vtt
content type for files with vtt extension
2 parents 9c2a8bf + 770a7e3 commit 72d7d9c

File tree

4 files changed

+46
-0
lines changed

4 files changed

+46
-0
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
### video subtitles (vtt files)
2+
3+
The `IQSS/dataverse` PR sets the content type for new(!) files with extension `vtt` to `text/vtt`
4+
what is presented as "_Web Video Text Tracks_". The PR also enables full text indexing for these files,
5+
if [configured](https://guides.dataverse.org/en/latest/installation/config.html#solrfulltextindexing).
6+
7+
The `gdcc/dataverse-previewer` PRs provide a new version of the video previewer.
8+
The new previewer version presents `vtt` files as subtitles for videos,
9+
the naming convention is `<video-basename>.<language-tag>.vtt`.
10+
The previewer does not rely on the content type.
11+
A proper content type may hint users to ask permission for the subtitles together with a video.
12+
13+
Existing files with extension `vtt` will keep content type `application/octet-stream` presented as "_Unknown_".
14+
The following query shows the number of files per extension with an "_Unknown_" content type:
15+
16+
SELECT substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) AS extension, COUNT(*) as count
17+
FROM datafile f LEFT JOIN filemetadata m ON f.id = m.datafile_id
18+
WHERE f.contenttype = 'application/octet-stream'
19+
GROUP BY extension;
20+
21+
If `vtt` does not appear in the result, you are done.
22+
Otherwise, you may want to update the content type for existing files and reindex those datasets.
23+
24+
First figure out which datasets would need [reindexing](https://guides.dataverse.org/en/latest/admin/solr-search-index.html#manual-reindexing):
25+
26+
select distinct
27+
o.protocol, o.authority, o.identifier,
28+
v.versionnumber, v.minorversionnumber, v.versionstate
29+
from datafile f
30+
left join filemetadata m on f.id = m.datafile_id
31+
left join datasetversion v on v.id = m.datasetversion_id
32+
left join dvobject o on o.id = v.dataset_id
33+
WHERE contenttype = 'application/octet-stream'
34+
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
35+
;
36+
37+
Then update the content type for the files:
38+
39+
UPDATE datafile SET contenttype = 'text/vtt' WHERE id IN (
40+
SELECT datafile_id FROM filemetadata m
41+
WHERE contenttype = 'application/octet-stream'
42+
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
43+
);

src/main/java/propertyFiles/MimeTypeDetectionByFileExtension.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ mat=application/matlab-mat
1616
md=text/markdown
1717
mp3=audio/mp3
1818
m4a=audio/mp4
19+
vtt=text/vtt
1920
nii=image/nii
2021
nc=application/netcdf
2122
ods=application/vnd.oasis.opendocument.spreadsheet

src/main/java/propertyFiles/MimeTypeDisplay.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ video/x-m4v=MPEG-4 Video
217217
video/ogg=OGG Video
218218
video/quicktime=Quicktime Video
219219
video/webm=WebM Video
220+
text/vtt=Web Video Text Tracks
220221
# Network Data
221222
text/xml-graphml=GraphML Network Data
222223
# 3D Data

src/main/java/propertyFiles/MimeTypeFacets.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ text/richtext=Text
3030
text/turtle=Text
3131
application/xml=Text
3232
text/xml=Text
33+
text/vtt=Text
3334
# Code
3435
text/x-c=Code
3536
text/x-c++src=Code

0 commit comments

Comments
 (0)