Skip to content

Commit a220a70

Browse files
committed
add paper and dataset url to README.md
1 parent a486879 commit a220a70

File tree

1 file changed

+19
-14
lines changed

1 file changed

+19
-14
lines changed

README.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,15 @@
22

33
## Introduction
44

5-
This project aims to provide a comprehensive paradigm for the establishment of an EEG dataset based on Chinese linguistic corpus. It seeks to facilitate the advancement of technologies related to EEG-based semantic decoding and brain-computer interfaces. The project is currently divided into the following modules: Chinese corpus segmentation and text embeddings, experimental design and stimulus presentation, data preprocessing, and data masking. For detailed information on each module, please refer to the README document in the respective folders or view the relevant code.
5+
An Electroencephalography (EEG) dataset utilizing rich text stimuli can advance the understanding of how the brain encodes semantic information and contribute to semantic decoding in brain-computer interface (BCI). Addressing the scarcity of EEG datasets featuring Chinese linguistic stimuli, we present the ChineseEEG dataset, a high-density EEG dataset complemented by simultaneous eye-tracking recordings. This dataset was compiled while 10 participants silently read approximately 11 hours of Chinese text from two well-known novels. This dataset provides long-duration EEG recordings, along with pre-processed EEG sensor-level data and semantic embeddings of reading materials extracted by a pre-trained natural language processing (NLP) model.
66

7-
For now, We recruited a total of 10 participants whose native language is Chinese. Each participant fully engaged in a Chinese novel reading task with a total duration of 12 hours, collectively accumulating 120 hours of data.
7+
**For more detailed information about our dataset, you can reach our preprint paper on bioRxiv: [ChineseEEG: A Chinese Linguistic Corpora EEG Dataset for Semantic Alignment and Neural Decoding](https://www.biorxiv.org/content/10.1101/2024.02.08.579481v1).**
8+
9+
**You can find the dataset via the ChineseNeuro Symphony community (CHNNeuro) in Science Data Bank platform ([https://doi.org/10.57760/sciencedb.CHNNeuro.00007](https://doi.org/10.57760/sciencedb.CHNNeuro.00007)) or via Openneuro ([https://openneuro.org/datasets/ds004952](https://openneuro.org/datasets/ds004952)).**
10+
11+
This repository contains all the code to reproduce the experiment and data processing procedure in our paper. It aims to provide a comprehensive paradigm for the establishment of an EEG dataset based on Chinese linguistic corpora. It seeks to facilitate the advancement of technologies related to EEG-based semantic decoding and brain-computer interfaces.
12+
13+
The project is mainly divided into four modules. The script `cut_chinese_novel.py` in the `novel_segmentation_and_text_embeddings` folder contains the code to prepare the stimulation materials from source materials. The script `play_novel.py` in the experiment module contains code for the experiment, including text stimuli presentation and control of the EGI device and Tobii Glasses 3 eye-tracker. The script `preprocessing.py` in `data_preprocessing_and_alignment` module contains the main part of the code to apply pre-processing on EEG data. The script `align_eeg_with_sentence.py` in the same module contains code to align the EEG segments with corresponding text contents and text embeddings. The `docker` module contains the Docker image required for deploying and running the code, as well as tutorials on how to use Docker for environment deployment. For detailed information on each module, please refer to the README document in the respective module.
814

915
## Pipeline
1016

@@ -99,11 +105,11 @@ After you have your texts, text embeddings and runs of EEG data, you can align t
99105

100106
## Credit
101107

102-
- [Mou Xinyu](https://github.com/12485953) - Coder for all parts of the project, Data processing, README writer for all parts.
108+
- [Mou Xinyu](https://github.com/12485953) - Coder for all parts of the project, Data processing.
103109

104-
- [He Cuilin](https://github.com/CuilinHe) - Experiment conductor, Data processing, README writing.
110+
- [He Cuilin](https://github.com/CuilinHe) - Experiment conductor, Data processing.
105111

106-
- [Tan Liwei](https://github.com/tanliwei09) - Experiment conductor, Data processing, README writing.
112+
- [Tan Liwei](https://github.com/tanliwei09) - Experiment conductor, Data processing.
107113

108114
- [Zhang Jianyu](https://github.com/ionaaaa) - Coder for Chinese corpus segmentation and EEG random masking.
109115

@@ -114,26 +120,25 @@ After you have your texts, text embeddings and runs of EEG data, you can align t
114120
Feel free to contact us if you have any questions about the project !!!
115121

116122
## Collaborators
117-
- [Wu Haiyan](https://github.com/haiyan0305) - 澳门大学
123+
- [Wu Haiyan](https://github.com/haiyan0305) - University of Macau
118124

119-
- [Liu Quanying] - 南方科技大学
125+
- [Liu Quanying] - Southern University of Science and Technology
120126

121127
- [Wang Xindi](https://github.com/sandywang)
122128

123-
- [Wang Qing] - 上海精神卫生中心
129+
- [Wang Qing] - Shanghai Jiao Tong University
124130

125131
- [Chen Zijiao] - National University of Singapore
126132

127133
- [Yang Yu-Fang] - Freie Universität Berlin
128134

129-
- [Hu Chuanpeng] - 南京师范大学
135+
- [Hu Chuanpeng] - Nanjing Normal University
130136

131-
- [Xu Ting] - Child Mind Institute
137+
- [Xu Ting] - Center for the Integrative Developmental Neuroscience, Child Mind Institute, New York
132138

133-
- [Cao Miao] - 北京大学
139+
- [Cao Miao] - Swinburne University of Technology
134140

135-
- [Liang Huadong](https://github.com/Romantic-Pumpkin) - 科大讯飞股份有限公司
141+
- [Liang Huadong](https://github.com/Romantic-Pumpkin) - iFLYTEK Co., LTD
136142
## Funding
137143

138-
本项目受到天桥脑科学研究院MindX数据支持计划的部分资助(共计伍万元)。
139-
其余资助来源:澳门科学技术发展基金项目(FDCT)
144+
This work was mainly supported by the MindD project of Tianqiao and Chrissy Chen Institute(TCCI), the Science and Technology Development Fund (FDCT) of Macau [0127/2020/A3, 0041/2022/A], the Natural Science Foundation of Guangdong Province(2021A1515012509), Shenzhen-Hong Kong-Macao Science and Technology Innovation Project (Category C) (SGDX2020110309280100), and the SRG of University of Macau (SRG2020-00027-ICI). We also thank all research assistants who provided general support in participant recruiting and data collection.

0 commit comments

Comments
 (0)