You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+19-14Lines changed: 19 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,15 @@
2
2
3
3
## Introduction
4
4
5
-
This project aims to provide a comprehensive paradigm for the establishment of an EEG dataset based on Chinese linguistic corpus. It seeks to facilitate the advancement of technologies related to EEG-based semantic decoding and brain-computer interfaces. The project is currently divided into the following modules: Chinese corpus segmentation and text embeddings, experimental design and stimulus presentation, data preprocessing, and data masking. For detailed information on each module, please refer to the README document in the respective folders or view the relevant code.
5
+
An Electroencephalography (EEG) dataset utilizing rich text stimuli can advance the understanding of how the brain encodes semantic information and contribute to semantic decoding in brain-computer interface (BCI). Addressing the scarcity of EEG datasets featuring Chinese linguistic stimuli, we present the ChineseEEG dataset, a high-density EEG dataset complemented by simultaneous eye-tracking recordings. This dataset was compiled while 10 participants silently read approximately 11 hours of Chinese text from two well-known novels. This dataset provides long-duration EEG recordings, along with pre-processed EEG sensor-level data and semantic embeddings of reading materials extracted by a pre-trained natural language processing (NLP) model.
6
6
7
-
For now, We recruited a total of 10 participants whose native language is Chinese. Each participant fully engaged in a Chinese novel reading task with a total duration of 12 hours, collectively accumulating 120 hours of data.
7
+
**For more detailed information about our dataset, you can reach our preprint paper on bioRxiv: [ChineseEEG: A Chinese Linguistic Corpora EEG Dataset for Semantic Alignment and Neural Decoding](https://www.biorxiv.org/content/10.1101/2024.02.08.579481v1).**
8
+
9
+
**You can find the dataset via the ChineseNeuro Symphony community (CHNNeuro) in Science Data Bank platform ([https://doi.org/10.57760/sciencedb.CHNNeuro.00007](https://doi.org/10.57760/sciencedb.CHNNeuro.00007)) or via Openneuro ([https://openneuro.org/datasets/ds004952](https://openneuro.org/datasets/ds004952)).**
10
+
11
+
This repository contains all the code to reproduce the experiment and data processing procedure in our paper. It aims to provide a comprehensive paradigm for the establishment of an EEG dataset based on Chinese linguistic corpora. It seeks to facilitate the advancement of technologies related to EEG-based semantic decoding and brain-computer interfaces.
12
+
13
+
The project is mainly divided into four modules. The script `cut_chinese_novel.py` in the `novel_segmentation_and_text_embeddings` folder contains the code to prepare the stimulation materials from source materials. The script `play_novel.py` in the experiment module contains code for the experiment, including text stimuli presentation and control of the EGI device and Tobii Glasses 3 eye-tracker. The script `preprocessing.py` in `data_preprocessing_and_alignment` module contains the main part of the code to apply pre-processing on EEG data. The script `align_eeg_with_sentence.py` in the same module contains code to align the EEG segments with corresponding text contents and text embeddings. The `docker` module contains the Docker image required for deploying and running the code, as well as tutorials on how to use Docker for environment deployment. For detailed information on each module, please refer to the README document in the respective module.
8
14
9
15
## Pipeline
10
16
@@ -99,11 +105,11 @@ After you have your texts, text embeddings and runs of EEG data, you can align t
99
105
100
106
## Credit
101
107
102
-
-[Mou Xinyu](https://github.com/12485953) - Coder for all parts of the project, Data processing, README writer for all parts.
108
+
-[Mou Xinyu](https://github.com/12485953) - Coder for all parts of the project, Data processing.
103
109
104
-
-[He Cuilin](https://github.com/CuilinHe) - Experiment conductor, Data processing, README writing.
110
+
-[He Cuilin](https://github.com/CuilinHe) - Experiment conductor, Data processing.
105
111
106
-
-[Tan Liwei](https://github.com/tanliwei09) - Experiment conductor, Data processing, README writing.
112
+
-[Tan Liwei](https://github.com/tanliwei09) - Experiment conductor, Data processing.
107
113
108
114
-[Zhang Jianyu](https://github.com/ionaaaa) - Coder for Chinese corpus segmentation and EEG random masking.
109
115
@@ -114,26 +120,25 @@ After you have your texts, text embeddings and runs of EEG data, you can align t
114
120
Feel free to contact us if you have any questions about the project !!!
This work was mainly supported by the MindD project of Tianqiao and Chrissy Chen Institute(TCCI), the Science and Technology Development Fund (FDCT) of Macau [0127/2020/A3, 0041/2022/A], the Natural Science Foundation of Guangdong Province(2021A1515012509), Shenzhen-Hong Kong-Macao Science and Technology Innovation Project (Category C) (SGDX2020110309280100), and the SRG of University of Macau (SRG2020-00027-ICI). We also thank all research assistants who provided general support in participant recruiting and data collection.
0 commit comments