Skip to content

Commit 7384966

Browse files
authored
Merge pull request #1800 from pengli09/emb_doc
The description for vocabulary file is not consistent with the latest file
2 parents c1b47b2 + bf1a4af commit 7384966

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

doc/tutorials/embedding_model/index_cn.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@
66

77
## 介绍 ###
88
### 中文字典 ###
9-
我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下: "《红楼梦》"将被分为 "《","红楼梦","》",和 "《红楼梦》"。字典采用UTF8编码,输出有2列:词本身和词频。字典共包含 3206325个词和3个特殊标记
9+
我们的字典使用内部的分词工具对百度知道和百度百科的语料进行分词后产生。分词风格如下: "《红楼梦》"将被分为 "《","红楼梦","》",和 "《红楼梦》"。字典采用UTF8编码,输出有2列:词本身和词频。字典共包含 3206326个词和4个特殊标记
1010
- `<s>`: 分词序列的开始
1111
- `<e>`: 分词序列的结束
12+
- `PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING`: 占位符,没有实际意义
1213
- `<unk>`: 未知词
1314

1415
### 中文词向量的预训练模型 ###

doc/tutorials/embedding_model/index_en.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ We thank @lipeng for the pull request that defined the model schemas and pretrai
66

77
## Introduction ###
88
### Chinese Word Dictionary ###
9-
Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《","红楼梦","》",and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206325, including 3 special token:
9+
Our Chinese-word dictionary is created on Baidu ZhiDao and Baidu Baike by using in-house word segmentor. For example, the participle of "《红楼梦》" is "《","红楼梦","》",and "《红楼梦》". Our dictionary (using UTF-8 format) has has two columns: word and its frequency. The total word count is 3206326, including 4 special token:
1010
- `<s>`: the start of a sequence
1111
- `<e>`: the end of a sequence
12+
- `PALCEHOLDER_JUST_IGNORE_THE_EMBEDDING`: a placeholder, just ignore it and its embedding
1213
- `<unk>`: a word not included in dictionary
1314

1415
### Pretrained Chinese Word Embedding Model ###

0 commit comments

Comments
 (0)