Skip to content

Commit 4ee20db

Browse files
committed
20250406
1 parent 40f5b83 commit 4ee20db

File tree

8 files changed

+280
-34
lines changed

8 files changed

+280
-34
lines changed

.DS_Store

0 Bytes
Binary file not shown.

cntext/.DS_Store

0 Bytes
Binary file not shown.

docs/embeddings.md

Lines changed: 36 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,41 @@
1-
# 词向量资源
1+
# 四、词向量资源&代码&文献
22

3+
## 4.1词嵌入模型资源
34
使用 cntext2.x 训练得到的相关词向量资源,汇总如下
45

5-
66
| 数据集介绍 | 词向量 | 下载链接 |
77
| --- | --- | --- |
8-
| 更新中 | 更新中| 更新中 |
8+
| [留言板](https://textdata.cn/blog/2023-12-22-renmin-gov-leader-comment-board/) | ***留言板-Word2Vec.200.15.bin***| https://pan.baidu.com/s/1n7vwCOBnrye1CYrt_IBqZA?pwd=9m42 |
9+
| [A股年报](https://textdata.cn/blog/2023-03-23-china-a-share-market-dataset-mda-from-01-to-21/) | ***mda01-23-GloVe.200.15.bin***| https://pan.baidu.com/s/1vXvbomHjOaFBeEz7GV0R6A?pwd=y6hd |
10+
| [A股年报](https://textdata.cn/blog/2023-03-23-china-a-share-market-dataset-mda-from-01-to-21/) | ***mda01-23-Word2Vec.200.15.bin***| https://pan.baidu.com/s/11V1RyqH_cKE9eju0Mm-1TQ?pwd=kcwx |
11+
|[港股年报](https://textdata.cn/blog/2024-01-21-hk-stock-market-anual-report/)| ***英文港股年报-Word2Vec.200.15.bin***| https://pan.baidu.com/s/1ISGAoZnA_1Ben6M2DCliOQ?pwd=nagx |
12+
|[港股年报](https://textdata.cn/blog/2024-01-21-hk-stock-market-anual-report/)| ***中文港股年报-Word2Vec.200.15.bin***| hhttps://pan.baidu.com/s/1smMcrPtIP8g635YABCodig?pwd=sjdj |
13+
| [黑猫消费者投诉](https://textdata.cn/blog/2025-03-05-consumer-complaint-dataset/) | ***消费者黑猫投诉-Word2Vec.200.15.bin***| https://pan.baidu.com/s/1FOI2BIVRojOswdKfqaNbsw?pwd=catc |
14+
| [豆瓣影评](2024-04-16-douban-movie-1000w-ratings-comments-dataset) | ***douban-movie-1000w-Word2Vec.200.15.bin***| https://pan.baidu.com/s/1uq6Ti7HbEWyT4CgktKrMng?pwd=63jg |
15+
| [B站](2023-11-12-using-100m-bilibili-user-sign-data-to-training-word2vec) | ***B站签名-Word2Vec.200.15.bin***| https://pan.baidu.com/s/1OtBU9BzitcNxkmPzhzH6FQ?pwd=m3iv |
16+
| [人民日报](https://textdata.cn/blog/2023-12-14-daily-news-dataset/)|[年份Word2Vec](https://textdata.cn/blog/2023-12-28-visualize-the-culture-change-using-people-daily-dataset/)|https://pan.baidu.com/s/1Ru_wxu9egsmhM7lATjSlgQ?pwd=bcea |
17+
| [人民日报](https://textdata.cn/blog/2023-12-14-daily-news-dataset/)|[对齐模型Aligned_Word2Vec](https://textdata.cn/blog/2023-12-28-visualize-the-culture-change-using-people-daily-dataset/)|https://pan.baidu.com/s/1IVgP0MyQpez0hpoJyEyFdA?pwd=7qsu|
18+
| [专利申请](https://textdata.cn/blog/2023-04-13-3571w-patent-dataset-in-china-mainland/) | ***专利摘要-Word2Vec.200.15.bin***| https://pan.baidu.com/s/1FHI_J7wU9eQGRckD12QB5g?pwd=6rr2 |
19+
| [专利申请](https://textdata.cn/blog/2023-11-20-word2vec-by-year-by-province/) | ***province_w2vs分省份训练词向量***| https://pan.baidu.com/s/1eBFTIZcv2DWssLiaRnCqZQ?pwd=ikpu |
20+
| [专利申请](https://textdata.cn/blog/2023-11-20-word2vec-by-year-by-province/) | ***year_w2vs分年份训练词向量***| https://pan.baidu.com/s/1lrVkML92cVJdHQa1HQyAwA?pwd=4gqa |
21+
22+
<br><br>
23+
24+
25+
26+
27+
## 4.2 相关代码
28+
- [实验 | 使用 Stanford Glove 代码训练中文语料的 GloVe 模型](https://textdata.cn/blog/2025-03-28-train_a_glove_model_on_chinese_corpus_using_stanfordnlp/)
29+
- [词向量 | 使用**人民网领导留言板**语料训练Word2Vec模型](https://textdata.cn/blog/2023-12-28-train-word2vec-using-renmin-gov-leader-board-dataset/)
30+
- [可视化 | 人民日报语料反映七十年文化演变](https://textdata.cn/blog/2023-12-28-visualize-the-culture-change-using-people-daily-dataset/)
31+
- [使用 5000w 专利申请数据集按年份(按省份)训练词向量](https://textdata.cn/blog/2023-11-20-word2vec-by-year-by-province/)
32+
33+
<br><br>
34+
35+
## 4.3 相关文献
36+
- [大数据时代下社会科学研究方法的拓展——基于词嵌入技术的文本分析的应用](https://textdata.cn/blog/2022-04-07-word-embeddings-in-social-science/)
37+
- [OS2022 | 概念空间 | 词嵌入模型如何为组织科学中的测量和理论提供信息](https://textdata.cn/blog/2023-11-03-organization-science-with-word-embeddings/)
38+
- [词嵌入技术在社会科学领域进行数据挖掘常见39个FAQ汇总](https://textdata.cn/blog/2023-03-15-39faq-about-word-embeddings-for-social-science/)
39+
40+
41+

docs/intro.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,10 @@ cntext2.x 含io、model、stats、mind五个模块
5454
| ***plot*** | ***ct.matplotlib_chinese()*** | 支持matplotlib中文绘图 |
5555
| ***plot*** | ***ct.lexical_dispersion_plot1(text, targets_dict, lang, title, figsize)*** | 对某一个文本text, 可视化不同目标类别词targets_dict在文本中出现位置 |
5656
| ***plot*** | ***ct.lexical_dispersion_plot2(texts_dict, targets, lang, title, figsize)*** | 对某几个文本texts_dict, 可视化某些目标词targets在文本中出现相对位置(0~100) |
57+
| ***mind*** | ``ct.generate_concept_axis(wv, c_words1, c_words2)`` | 生成概念轴向量。 |
5758
| ***mind*** | ***tm = ct.Text2Mind(wv)***<br> | 单个word2vec内挖掘潜在的态度偏见、刻板印象等。tm含多重方法 |
5859
| ***mind*** | ***ct.sematic_projection(wv, words, c_words1, c_words2)*** | 测量语义投影 |
60+
| ***mind*** | ***ct.project_word(wv, a, b)*** | 测量词语a在词语b上的投影语 |
5961
| ***mind*** | ***ct.sematic_distance(wv, words, c_words1, c_words2)*** | 测量语义距离 |
6062
| ***mind*** | ***ct.divergent_association_task(wv, words)*** | 测量发散思维(创造力) |
6163
| ***mind*** | ***ct.discursive_diversity_score(wv, words)*** | 测量语言差异性(认知差异性) |

docs/llm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
# 、LLM模块
1+
# 、LLM模块
22
目前大模型本地化使用越来越方便,
33

44
| 模块 | 函数(类) | 功能 |
55
| --------------- | ---------------------------------------------------- | ---------------------------------------------------------- |
66
| ***LLM*** | ***text_analysis_by_llm(text, prompt, base_url, api_key, model_name, temperature, output_format)*** | 使用大模型进行文本分析 |
77

8-
## 5.1 analysis_by_llm()
8+
## 6.1 analysis_by_llm()
99

1010
使用大模型(本地或API)进行文本分析,从非结构化的文本数据中识别模式、提取关键信息、理解语义,并将其转化为结构化数据以便进一步分析和应用。
1111

0 commit comments

Comments
 (0)