[FIX] Issue#1665 Enhanced Matroska Language Tag Handling #1671
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In raising this pull request, I confirm the following:
My familiarity with the project is as follows:
Description
Introduced improved handling of language tags in the Matroska parser. It addresses an issue where IETF BCP47 language tags (e.g., "en-US") were not being correctly processed, leading to potential segmentation faults and inaccurate subtitle extraction. Like in issue #1665
The Initial Problem: Modern MKV Files and IETF Language Tags
Modern Matroska (MKV) files are increasingly using IETF BCP47 language tags to identify subtitle tracks. These tags offer more precision than the traditional 3-letter ISO 639-2 codes, allowing for specification of regional variations, scripts, and other linguistic details (e.g.,
en-GBfor British English,es-MXfor Mexican Spanish).The existing parser was primarily designed for the older 3-letter codes and did not fully account for the presence and proper handling of these IETF tags. This resulted in the parser failing to correctly identify and utilize the IETF language tags, leading to issues such as:
Summary of Changes
sub_track->lang_ietf = lang_ietf;during subtitle track creation to ensure IETF language tags are properly stored in thematroska_sub_trackstructure.generate_filename_from_track()to prioritize IETF language tags when available, creating more descriptive and accurate filenames.matroska_save_all()to first attempt matching against IETF language tags before falling back to 3-letter ISO 639 codes, improving language selection accuracy.lang_ietffield to prevent memory leaks and segmentation faults.This enhancement is crucial for:
How Has This Been Tested?
Thank you,
Tank0nf.