Skip to content

Commit ff7fd7b

Browse files
committed
release note for vtt files
1 parent 43a80f8 commit ff7fd7b

File tree

1 file changed

+43
-0
lines changed

1 file changed

+43
-0
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
### video subtitles (vtt files)
2+
3+
The `IQSS/dataverse` PR sets the content type for new(!) files with extension `vtt` to `text/vtt`
4+
what is presented as "_Web Video Text Tracks_". The PR also enables full text indexing for these files,
5+
if [configured](https://guides.dataverse.org/en/latest/installation/config.html#solrfulltextindexing).
6+
7+
The `gdcc/dataverse-previewer` PRs provide a new version of the video previewer.
8+
The new previewer version presents `vtt` files as subtitles for videos,
9+
the naming convention is `<video-basename>.<language-tag>.vtt`.
10+
The previewer does not rely on the content type.
11+
A proper content type may hint users to ask permission for the subtitles together with a video.
12+
13+
Existing files with extension `vtt` will keep content type `application/octet-stream` presented as "_Unknown_".
14+
The following query shows the number of files per extension with an "_Unknown_" content type:
15+
16+
SELECT substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2)) AS extension, COUNT(*) as count
17+
FROM datafile f LEFT JOIN filemetadata m ON f.id = m.datafile_id
18+
WHERE f.contenttype = 'application/octet-stream'
19+
GROUP BY extension;
20+
21+
If vtt does not appear in the result, you are done.
22+
Otherwise, you may want to update the content type for existing files and reindex those datasets.
23+
24+
First figure out which datasets would need [reindexing](https://guides.dataverse.org/en/latest/admin/solr-search-index.html#manual-reindexing):
25+
26+
select distinct
27+
o.protocol, o.authority, o.identifier,
28+
v.versionnumber, v.minorversionnumber, v.versionstate
29+
from datafile f
30+
left join filemetadata m on f.id = m.datafile_id
31+
left join datasetversion v on v.id = m.datasetversion_id
32+
left join dvobject o on o.id = v.dataset_id
33+
WHERE contenttype = 'application/octet-stream'
34+
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
35+
;
36+
37+
Then update the content type for the files:
38+
39+
UPDATE datafile SET contenttype = 'text/vtt' WHERE id IN (
40+
SELECT datafile_id FROM filemetadata m
41+
WHERE contenttype = 'application/octet-stream'
42+
AND 'vtt' = substring(m.label from (length(label) - strpos(reverse(m.label), '.') + 2))
43+
);

0 commit comments

Comments
 (0)