Skip to content

Commit 881e348

Browse files
author
Kelly
committed
readme update
1 parent eb3c4a0 commit 881e348

File tree

1 file changed

+20
-9
lines changed

1 file changed

+20
-9
lines changed

README.md

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,15 @@
1-
# Data loading
1+
# NASA Conitive State Determination :thought_balloon:
2+
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
3+
4+
![Python](https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54)
5+
![Pandas](https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white)
6+
![NumPy](https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white)
7+
8+
In the near future, Astronauts will need to carry out autonomous operations as they venture beyond Low Earth Orbit. For many of the activities that the astronauts perform, the same training is provided to all crew members months ahead of their mission. A system that provides real-time operations support that is optimized and tailored to each astronaut's psychophysiological state at the time of the activities and which can be used during the training leading up to the mission would significantly improve human spaceflight operations.
9+
10+
In this challenge, solvers were asked to determine the present cognitive state of a trainee based on biosensor data, as well as predict what their future cognitive state will be, three seconds in the future.
11+
12+
## Data loading
213

314
The provided data has a high frequency (1 sec = thousands of rows), but the labeling was done manually. Also, final predictions are expected to have a frequency of one (one second = one row). I decided to transform the data so second equals one row both for training and testing. I rounded all timestamps to the closes second and tested several approaches:
415
1. Take the first value
@@ -8,17 +19,17 @@ The provided data has a high frequency (1 sec = thousands of rows), but the labe
819

920
The first approach showed the best results.
1021

11-
# EDA
22+
## EDA
1223

1324
During EDA, I noticed that the timestamps might have “holes” between neighbors. The hole is defined as: if the delta time between two neighbors' rows is above one second, there is a “hole” between them. These two neighbors belong to different “sessions.” I noticed that the sensor data between other sessions might be different. I also noticed that the actual target is always constant within these sessions. I incorporated my findings into feature generation and postprocessing of my predictions.
1425

1526

16-
# Creating target
27+
## Creating target
1728

1829
We have to predict cognitive start for time `t` and `t+3`. The target for `t` is equal to the value of the `induced_state` column. The target for `t+3` is the same as the target for `t` because the cognitive state is the same within the session. The data and target are the same, so I decided to train a single model, and use the same model for making predictions for `t` and `t+3`. Note: There is a separation between `t` and `t+3` models in the code. I decided to keep it just in case the data would be different in the future.
1930

2031

21-
# Feature generation
32+
## Feature generation
2233

2334
I had several ideas on features generation, and I combined them into the following groups.
2435

@@ -29,23 +40,23 @@ I had several ideas on features generation, and I combined them into the followi
2940
5 - Features based on the distances between eyes positions and gazing points.
3041

3142

32-
# Validation
43+
## Validation
3344

3445
I used stratified group k fold for validation. Stratified = each fold has approximately the same number of samples for each class. Group = provided `test_suite` column.
3546

3647

37-
# Model
48+
## Model
3849

3950
I used the Lightgbm classifier. I optimized hyperparameters using Optuna. The final prediction is the average predictions of several Lightgbm classifiers with different hyperparameters.
4051

41-
# Postprocessing
52+
## Postprocessing
4253

4354
As mentioned in EDA, the target is the same within a single session. Therefore, I post-processed predictions by calculating the rolling average of the model’s predictions from the beginning of the session till time `t` (including time `t`) for which we’re making predictions.
4455

45-
# Important features
56+
## Important features
4657

4758
To determine the most important features for making predictions for time t, I used built-in SHAP values calculation. Then, I selected top-3 sensor features from the output.
4859

49-
# Libraries
60+
## Libraries
5061

5162
You can find the list of used libraries in `requirenments.txt`.

0 commit comments

Comments
 (0)