Merged
Conversation
ee7f55c to
5cae95e
Compare
本提交完整实现了txt词典的注释语法与排序规则,包括向后兼容的API设计和命令行工具支持。 ## 注释语法支持 **基本语法:** - 注释行:以 # 开头的整行 - 词典记录行:以tab分隔的 key/value pair - 空行:不包含任何可见字符 **注释块分类:** - Header block:文件开头注释块(在第一个词典记录前的最后一个空行之前) - Footer block:文件结尾注释块(在最后一条词典记录之后) - Attached block:紧贴词典记录行的注释块(中间无空行) - Floating block:游离注释块(不满足attach条件的注释块) **排序规则:** - 排序最小单位为词典记录 + 其附加的注释块 - Header/Footer block固定在文件开头/结尾 - 仅对词典记录的key进行稳定排序 - Floating block在排序后插入到其锚点位置 ## 向后兼容设计 **默认行为(preserveComments=false):** - 完全兼容旧版本 - 遇到 # 开头的行会抛出异常(原行为) - 不解析和保存注释结构 **新行为(preserveComments=true):** - # 开头的行被识别为注释,不报错 - 保存注释块结构用于排序和序列化 ## API修改 **核心API:** - Lexicon::ParseLexiconFromFile(FILE* fp, bool preserveComments = false) - TextDict::NewFromFile(FILE* fp, bool preserveComments = false) - TextDict::NewFromSortedFile(FILE* fp, bool preserveComments = false) - ConvertDictionary(..., bool preserveComments = false) **命令行工具:** opencc_dict 添加了 -p, --preserve-comments 参数 使用示例: ```bash # 默认行为(向后兼容)- 会对带注释的文件报错 opencc_dict -i input.txt -o output.txt -f text -t text # 保留注释并排序 opencc_dict -i input.txt -o output.txt -f text -t text --preserve-comments ``` ## 实现细节 **数据结构:** - CommentBlock:注释块结构 - AnnotatedEntry:带注释的词条 - 在Lexicon中添加了header/footer/annotated/floating blocks的存储 **核心逻辑:** - 重写ParseLexiconFromFile,支持两种解析模式 - 实现SortWithAnnotations,确保注释块随词条移动 - 修改TextDict::SerializeToFile,正确输出注释块和空行 ## 测试 添加了完整的测试覆盖(LexiconAnnotationTest): - ParseCommentLines:解析注释行 - ParseAttachedComment:解析附加注释 - ParseFloatingComment:解析游离注释 - ParseFooterComment:解析尾部注释 - SerializeWithAnnotations:带注释的序列化 - SortWithAnnotations:带注释的排序 - DefaultBehaviorIgnoresComments:验证默认行为 - DefaultBehaviorRejectsCommentLines:验证向后兼容 所有8个测试通过。手动测试命令行工具功能正常。
5cae95e to
aca33bf
Compare
Add standardized headers listing the official config usage for each top-level dictionary file.
aca33bf to
34b4af5
Compare
BYVoid
reviewed
Jan 15, 2026
d96a31d to
2ed1fd4
Compare
BYVoid
approved these changes
Jan 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.