You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This repository contains all the data collection, model training, and analysis code for the `SatIQ` system, described in the paper "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting".
4
4
This system can be used to authenticate Iridium satellite transmitters using high sample rate message headers.
5
5
6
-
Additional materials:
7
-
- Paper (arXiv preprint): https://arxiv.org/abs/2305.06947
8
-
- Full dataset (Iridium message headers): https://zenodo.org/record/8220494
6
+
> [!NOTE]
7
+
> This version of the repository contains the code used for the paper "SatIQ: Extensible and Stable Satellite Authentication using Hardware Fingerprinting".
8
+
> The code used for "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting" can be found [here](https://github.com/ssloxford/SatIQ/tree/v1.0.2).
- Full dataset (UK): https://doi.org/10.7910/DVN/P5FUAW
13
+
- Full dataset (Germany): https://doi.org/10.7910/DVN/RXWV1M
14
+
- Full dataset (Switzerland): https://doi.org/10.7910/DVN/OSSJ68
15
+
- Trained model weights: https://doi.org/10.7910/DVN/GANMDZ
16
+
17
+
Additional materials (Watch This Space):
18
+
- "Watch This Space" paper: https://arxiv.org/abs/2305.06947
19
+
- Full dataset: https://zenodo.org/record/8220494
9
20
- Trained model weights: https://zenodo.org/record/8298532
10
21
11
22
When using this code, please cite the following paper: "Watch This Space: Securing Satellite Communication through Resilient Transmitter Fingerprinting".
@@ -60,9 +71,24 @@ notebook
60
71
A GPU is recommended (with all necessary drivers installed), and a moderate amount of RAM will be required to run the data preproccessing and model training.
61
72
62
73
63
-
### Downloading Data
74
+
### Downloading Data (SatIQ)
75
+
76
+
The full dataset for "SatIQ" is stored on the Harvard Dataverse at the following URL: https://dataverse.harvard.edu/dataverse/satiq.
77
+
78
+
This includes three datasets for each of the three locations (UK, Germany, Switzerland), and trained model weights.
79
+
80
+
These can be downloaded from the site directly, but the following script may be preferable due to the large file size:
81
+
```bash
82
+
TODO https://eamonnbell.webspace.durham.ac.uk/2023/03/07/bulk-downloading-from-dataverse/
83
+
```
84
+
85
+
> [!WARNING]
86
+
> The files are very large (approximately 1TB total).
87
+
> Ensure you have enough disk space before downloading.
88
+
89
+
### Downloading Data (Watch This Space)
64
90
65
-
The full dataset is stored on Zenodo at the following URL: https://zenodo.org/record/8220494
91
+
The full dataset for "Watch This Space" is stored on Zenodo at the following URL: https://zenodo.org/record/8220494.
66
92
67
93
These can be downloaded from the site directly, but the following script may be preferable due to the large file size:
68
94
```bash
@@ -91,6 +117,37 @@ done
91
117
See the instructions below on processing the resulting files for use.
92
118
93
119
120
+
### Directory Structure
121
+
122
+
The training and analysis scripts expect the repository to be laid out as follows:
123
+
```
124
+
SatIQ
125
+
├── ...
126
+
└── data
127
+
├── models
128
+
│ ├── downsample
129
+
│ │ └── ...
130
+
│ └── ...
131
+
├── tfrecord
132
+
│ ├── ...
133
+
│ ├── germany
134
+
│ │ └── ...
135
+
│ ├── switzerland
136
+
│ │ └── ...
137
+
│ └── uk-switzerland
138
+
└── test
139
+
├── embeddings
140
+
│ └── ...
141
+
└── labels
142
+
└── ...
143
+
```
144
+
145
+
Any downloaded model/loss files with `downsample` in the name should be placed in `data/models/downsample`, and any other model files should be placed in `data/models`.
146
+
147
+
The `uk-switzerland` directory can be populated using `preprocessing/dataset-combine.sh`, and the `embeddings` and `labels` directories using `preprocessing/generate-embeddings.py`.
148
+
These are described in greater detail below.
149
+
150
+
94
151
## Usage
95
152
96
153
### TensorFlow Container
@@ -108,6 +165,7 @@ If your machine has no GPUs:
108
165
109
166
The `util` directory contains the main data processing and model code:
110
167
-`data.py` contains utilities for data loading and preprocessing.
168
+
-`processing.py` contains utilities for processing and analysis of results.
111
169
-`models.py` contains the main model code.
112
170
-`model_utils.py` contains various helper classes and functions used during model construction and training.
113
171
@@ -128,6 +186,8 @@ Data will be stored in `data/db.sqlite3`.
128
186
If a different SDR is used, the `iridium_extractor` configuration may need to be altered.
129
187
Change the `docker-compose.yml` to ensure the device is mounted in the container, and modify `iridium_extractor/iridium_extractor.py` to use the new device as a source.
130
188
189
+
The `autorun.sh` and `restart.sh` scripts are provided for convenience, in order to automate the process of stopping the container and moving the resulting database files to a permanent storage location.
190
+
131
191
132
192
### Data Preprocessing
133
193
@@ -136,13 +196,25 @@ It is recommended to run these scripts from within the TensorFlow container desc
136
196
137
197
> [!NOTE]
138
198
> Converting databases to NumPy files and filtering is only necessary if you are doing your own data collection.
139
-
> If the provided dataset on Zenodo is used, only the `np-to-tfrecord.py` script is needed.
199
+
> If the "SatIQ" dataset is used, no preprocessing is required.
200
+
> If the "Watch This Space" dataset is used, only the `np-to-tfrecord.py` script is required.
140
201
141
202
> [!IMPORTANT]
142
203
> Please note that these scripts load the full datasets into memory, and will consume large amounts of RAM.
143
204
> It is recommended that you run them on a machine with at least 128GB of RAM.
144
205
145
206
207
+
#### db-to-tfrecord.py
208
+
209
+
This script extracts database files and processes them directly into TFRecord files, optionally adding weather data if provided.
210
+
This should be used preferentially over the legacy scripts described below.
211
+
To run this script, use the command-line arguments as directed by the script itself.
212
+
213
+
```bash
214
+
python3 db-to-tfrecord.py --help
215
+
```
216
+
217
+
146
218
#### db-to-np-multiple.py
147
219
148
220
This script extracts the database files into NumPy files.
@@ -203,7 +275,7 @@ Be careful using this option, as it creates a much larger number of files, and t
203
275
> This script in particular will use a large amount of RAM, since it loads the entire dataset into memory at once.
204
276
> Processing may be done in batches by using the `--max-files` and `--skip-files` command-line arguments, or the script below.
205
277
206
-
#####np-to-tfrecord-parallel.sh
278
+
#### np-to-tfrecord-parallel.sh
207
279
208
280
This script can run multiple instances of `np-to-tfrecord.py` in parallel, allowing preprocessing to be sped up and/or less RAM to be used.
This script builds the combined UK-Switzerland dataset out of the two separate datasets by linking files.
313
+
This script has no configuration options and is used as follows:
314
+
```bash
315
+
./dataset-combine.sh
316
+
```
317
+
318
+
319
+
#### generate-embeddings.py
320
+
321
+
This script takes a trained model (or multiple models) and generates the embeddings of the dataset produced by that model, to enable faster analysis.
322
+
To use this script, modify `data_dir`, `model_dir`, and `output_dir` to point to the relevant input/output directories, and ensure `model_names` contains the correct names of the models from which the embeddings should be generated.
323
+
The script should then be run with no arguments
324
+
```bash
325
+
python3 generate-embeddings.py
326
+
```
327
+
328
+
238
329
#### Noise
239
330
331
+
> [!NOTE]
332
+
> These scripts are only used for "Watch This Space", and should not be used with the "SatIQ" data.
333
+
240
334
The `noise` directory contains modified versions of the above scripts that filter the dataset to remove the messages with the highest noise.
241
335
Use in the same way as above.
242
336
@@ -245,16 +339,20 @@ Ensure that all the requisite directories have been created before these scripts
245
339
246
340
### Model Training
247
341
248
-
The script for training the `SatIQ` model is found in `code/training/ae-triplet-conv.py`.
342
+
The scripts for model training can be found in the `training` directory.
249
343
Ensure that data is placed in the `data` directory before running.
344
+
The `ae-triplet-conv-dataset-slices.py` script is used to train models from "SatIQ", and `ae-triplet-conv` to train models from "Watch This Space".
345
+
Additionally, `train-___.sh` scripts are provided as examples for training multiple models sequentially under different configurations.
250
346
251
347
Adjust the arguments at the top of the script to ensure the data and output directories are set correctly (these should be fine if running inside the TensorFlow Docker container), then run the script with no arguments:
252
348
```bash
253
-
python3 ae-triplet-conv.py
349
+
python3 ae-triplet-conv-dataset-slices.py
254
350
```
255
351
256
-
This will take a long time.
257
-
The checkpoints should appear in `data/models`.
352
+
Additional command-line arguments can be used to adjust characteristics of the model.
353
+
354
+
Training will take a long time.
355
+
The checkpoints will appear in `data/models`.
258
356
259
357
260
358
### Analysis
@@ -267,9 +365,16 @@ See [Setup](#Setup) for requirements to run outside docker.
267
365
268
366
Note that these also require a large amount of RAM, and a GPU is recommended in order to run the models.
269
367
270
-
The `plots-data.ipynb` notebook contains plots relating to the raw samples.
368
+
The `satiq-data.ipynb` notebook contains plots relating to the raw samples.
369
+
370
+
The `satiq-models.ipynb` notebook contains all the analysis of the trained models.
271
371
272
-
The `plots-models.ipynb` notebook contains all the analysis of the trained models.
372
+
> [!NOTE]
373
+
> The `past-ai-___.pdf` plots require access to the dataset from the paper "PAST-AI: Physical-layer authentication of satellite transmitters via deep learning".
374
+
> Please contact the authors of this paper for access if needed.
375
+
376
+
> [!NOTE]
377
+
> The `wts-data.ipynb` and `wts-models.ipynb` are also included for legacy purposes -- these are the equivalent analysis scripts from "Watch This Space".
0 commit comments