You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+80-8Lines changed: 80 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,11 +53,44 @@ tensorflow-datasets
53
53
tensorflow-addons==0.13.0
54
54
scipy
55
55
seaborn
56
+
scikit-learn
57
+
notebook
56
58
```
57
59
58
60
A GPU is recommended (with all necessary drivers installed), and a moderate amount of RAM will be required to run the data preproccessing and model training.
59
61
60
62
63
+
### Downloading Data
64
+
65
+
The full dataset is stored on Zenodo at the following URL: https://zenodo.org/record/8220494
66
+
67
+
These can be downloaded from the site directly, but the following script may be preferable due to the large file size:
> These files are very large (4.0GB each, 135.4GB total).
79
+
> Ensure you have enough disk space before downloading.
80
+
81
+
To extract the files:
82
+
```bash
83
+
#!/bin/bash
84
+
85
+
foriin$(seq -w 0 5 165);do
86
+
printf -v j "%03d"$((${i#0}+4))
87
+
tar xzf data_${i}_${j}.tar.gz
88
+
done
89
+
```
90
+
91
+
See the instructions below on processing the resulting files for use.
92
+
93
+
61
94
## Usage
62
95
63
96
### TensorFlow Container
@@ -101,8 +134,14 @@ Change the `docker-compose.yml` to ensure the device is mounted in the container
101
134
The scripts in the `preprocessing` directory process the database file(s) into NumPy files, and then TFRecord datasets.
102
135
It is recommended to run these scripts from within the TensorFlow container described above.
103
136
104
-
Please note that these scripts load the full datasets into memory, and will consume large amounts of RAM.
105
-
It is recommended that you run them on a machine with at least 128GB of RAM.
137
+
> [!NOTE]
138
+
> Converting databases to NumPy files and filtering is only necessary if you are doing your own data collection.
139
+
> If the provided dataset on Zenodo is used, only the `np-to-tfrecord.py` script is needed.
140
+
141
+
> [!IMPORTANT]
142
+
> Please note that these scripts load the full datasets into memory, and will consume large amounts of RAM.
143
+
> It is recommended that you run them on a machine with at least 128GB of RAM.
144
+
106
145
107
146
#### db-to-np-multiple.py
108
147
@@ -116,6 +155,7 @@ python3 db-to-np-multiple.py
116
155
117
156
The resulting files will be placed in `code/processed` (ensure this directory already exists).
118
157
158
+
119
159
#### np-filter.py
120
160
121
161
This script normalizes the IQ samples, and filters out unusable data.
@@ -128,20 +168,52 @@ python3 np-filter.py
128
168
129
169
The resulting files will be placed in `code/filtered` (ensure this directory already exists).
130
170
171
+
131
172
#### np-to-tfrecord.py
132
173
133
174
This script converts NumPy files into the TFRecord format, for use in model training.
134
-
To run, `path_base` and `suffixes` are once again set as above.
135
-
The `chunk_size``shuffle`, `by_id`, and `id_counts` options may also be set to adjust how the dataset is generated -- the default options should be fine, unless alternative datasets (e.g. with transmitters removed) are required.
175
+
To run this script, ensure your data has been processed into NumPy files with the following format:
176
+
-`samples_<suffix>.npy`
177
+
-`ra_sat_<suffix>.npy`
178
+
-`ra_cell_<suffix>.npy`
136
179
137
-
The script runs with no arguments:
180
+
> [!NOTE]
181
+
> The `db-to-np-multiple.py` script will produce files in this format.
182
+
> The dataset available from Zenodo is also in this format.
0 commit comments