Skip to content

Commit 9f53f2f

Browse files
author
rtlingg
committed
Formatting for PyPI
1 parent db04c3b commit 9f53f2f

File tree

6 files changed

+258
-203
lines changed

6 files changed

+258
-203
lines changed

README.md

Lines changed: 38 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,35 @@
11
## The Sagitta Pipeline
22
Sagitta is a deep neural network based python3 pipeline that relies on Gaia DR2 and 2MASS photometry to identify pre-main sequence (PMS) stars and derive their age estimates.
33

4+
# Installation:
5+
```pip install sagitta``` (requires Python3)
6+
47
## Description
5-
Sagitta is a python3 script that takes a Flexible Image Transport System (FITS) file as input. The only required column that must be specified for predictions to be generated is the Gaia DR2 source ID column with the "--source_id" flag. The values for the source id column must be unique for each star. All other missing required fields can/will be automatically downloaded when the pipeline is run. If a file is given that contains stars with and without Gaia source IDs, only the stars with values for the source ID will be run through the pipeline. In its default configuration, the pipeline will produce three predictions for each star: 1) a estimation of stellar extinction (Av), 2) the probablilty that a star is PMS (with 0 being 0% probablity and 1 being a 100% probablity), and 3) the estimated age of each star. Once the pipeline has been run and the output table has been automatically saved, the user should look at the output to determine an appropriate PMS output probablity cutoff to create their predicted PMS subset (ie. select pms > 0.8). Due to the nature of how the age model in the pipeline was trained only stars with significantly high PMS model probability output should be considered to have accurate age predictions.
8+
Sagitta is a python3 script that takes a Flexible Image Transport System (FITS) file as input. The only required column that must be specified for predictions to be generated is the Gaia DR2 source ID column with the ```--source_id``` flag. The values for the source id column must be unique for each star. All other missing required fields can/will be automatically downloaded when the pipeline is run. If a file is given that contains stars with and without Gaia source IDs, only the stars with values for the source ID will be run through the pipeline. In its default configuration, the pipeline will produce three predictions for each star: 1) a estimation of stellar extinction (Av), 2) the probablilty that a star is PMS (with 0 being 0% probablity and 1 being a 100% probablity), and 3) the estimated age of each star. Once the pipeline has been run and the output table has been automatically saved, the user should look at the output to determine an appropriate PMS output probablity cutoff to create their predicted PMS subset (ie. select pms > 0.8). Due to the nature of how the age model in the pipeline was trained only stars with significantly high PMS model probability output should be considered to have accurate age predictions.
69

710
Behing the scenes, Sagitta uses three seperate convolutional neural networks (CNNs) to make its predictions. The first model, denoted as the Av model, is used for generating stellar extcintion (Av) values for stars in the input table. The second model, denoted as the PMS model, is used for generating the probability that each star is pre-main sequence. The thrid model, denoted as the age model, is used for generating the predicted ages for the stars.
811

9-
## Pipeline Options
12+
## Pipeline Usage Options
1013

1114
#### Flow Control Options
1215

1316
###### Turning off Av, PMS, or age predictions
14-
In the default configuration all three models will be run with their outputs saved as columns in a output FITS file. If specified, the user can choose to not produce outputs from any of these models using the "--no_av_prediction", "--no_pms_prediction", and "--no_age_prediction" flags. However, in order to make PMS or age predictions, Av values must have either been generated with the Av model or the input column's name that holds that Av values should be specified with the input naming option. It is important to note however that the Av values requred for use in the PMS and age models should be generated from the Av model to provide optimal predictions.
17+
In the default configuration all three models will be run with their outputs saved as columns in a output FITS file. If specified, the user can choose to not produce outputs from any of these models using the ```--no_av_prediction```, ```--no_pms_prediction```, and ```--no_age_prediction``` flags. However, in order to make PMS or age predictions, Av values must have either been generated with the Av model or the input column's name that holds that Av values should be specified with the input naming option. It is important to note however that the Av values requred for use in the PMS and age models should be generated from the Av model to provide optimal predictions.
1518

1619
###### Only Downloading Data
17-
If you want to only download all of the data required for the use of the pipeline but NOT run any of the models, than you can use the "--download_only" flag to perform this action. It will download all required Gaia and 2MASS fields along with their associated errors, parallax, PMRA, PMDEC, PMRA_error, and PMDEC_error for every star with Gaia source ID specified.
20+
If you want to only download all of the data required for the use of the pipeline but NOT run any of the models, than you can use the ```--download_only``` flag to perform this action. It will download all required Gaia and 2MASS fields along with their associated errors, parallax, PMRA, PMDEC, PMRA_error, and PMDEC_error for every star with Gaia source ID specified.
1821

1922
###### Prediction Uncertainty Statistic Generation
20-
Also included in the pipeline is a uncertainty statistics generator for each of the models predictions. The statistics are generated on a per-star basis by randomly varying the input parameters by their associated errors and analyzing the outputs. The number of times each star is sampled to create these output statistics is an option given to the user but it should be noted that computaional cost scales linearly with the number of times sampled. These uncertainty generators are turned off by default but can be turned on by specifying the "--av_uncertainty", "--pms_uncertainty", or "--age_uncertainy" flags where the number of times to sample each star follows the flag (ie using "--age_uncertainty 10" would generate the age model output statistics for each star by sampling each star 10 times, varying the outputs, and analying the predictions). The statistics produced for the model output includes mean, median, standard deviation, variance, minimum, and maximum.
23+
Also included in the pipeline is a uncertainty statistics generator for each of the models predictions. The statistics are generated on a per-star basis by randomly varying the input parameters by their associated errors and analyzing the outputs. The number of times each star is sampled to create these output statistics is an option given to the user but it should be noted that computaional cost scales linearly with the number of times sampled. These uncertainty generators are turned off by default but can be turned on by specifying the ```--av_uncertainty```, ```--pms_uncertainty```, or ```--age_uncertainy``` flags where the number of times to sample each star follows the flag (ie using ```--age_uncertainty 10``` would generate the age model output statistics for each star by sampling each star 10 times, varying the outputs, and analying the predictions). The statistics produced for the model output includes mean, median, standard deviation, variance, minimum, and maximum.
2124

2225
###### Uncertainty Av Scattering Range Option
23-
By default, because the Av values from the Av model don't contain a true uncertainty values, the amount by which they are varied in the PMS and age model uncertainty generation is performed by choosing a random value from a uniform distribution with range +/- 0.1 of the original Av. But because selecting +/- 0.1 was only done based off of current Av model output trends, the size of this range can be specified via the "--av_scatter_range" flag.
26+
By default, because the Av values from the Av model don't contain a true uncertainty values, the amount by which they are varied in the PMS and age model uncertainty generation is performed by choosing a random value from a uniform distribution with range +/- 0.1 of the original Av. But because selecting +/- 0.1 was only done based off of current Av model output trends, the size of this range can be specified via the ```--av_scatter_range``` flag.
2427

2528
###### Testing Mode
26-
It is recommended that before running the pipeline on a large set of data, that you first test that the pipeline will execute properly by using the "--test" flag. In this mode only the first 10000 stars of the input file will be processed with the pipeline. The output of the test run will be saved by default as "{tableIn}-test-sagitta.fits" so that you can look at the output to make sure that it is as desired.
29+
It is recommended that before running the pipeline on a large set of data, that you first test that the pipeline will execute properly by using the ```--test``` flag. In this mode only the first 10000 stars of the input file will be processed with the pipeline. The output of the test run will be saved by default as "{tableIn}-test-sagitta.fits" so that you can look at the output to make sure that it is as desired.
2730

2831
###### Specifying an Av Input Column
29-
Using the "--av" flag to specify an input Av column is ONLY recommended for situtaitons where you already have generated Av values with the pipeline and are specifing that previous output column. If this is the case, then you prevent redundant generation of Av values by using this flag. It should be known though, that in order for the pipeline to produce its best predictions the Av column used should always be generated by the Av model.
32+
Using the ```--av``` flag to specify an input Av column is ONLY recommended for situtaitons where you already have generated Av values with the pipeline and are specifing that previous output column. If this is the case, then you prevent redundant generation of Av values by using this flag. It should be known though, that in order for the pipeline to produce its best predictions the Av column used should always be generated by the Av model.
3033

3134
#### Data Processing Options
3235

@@ -70,24 +73,37 @@ It is recommended that in the the case where the input table already contains an
7073
If a column name is not specified but is in the required list of photometric fields then it will be downloaded and saved in the output table with its default name.
7174

7275
###### Output Fits File Naming Option
73-
The user can specify the name for the output file via the "--tableOut" flag. If this flag is not specified then by default the output table will be named {tableIn}-sagitta.fits if NOT in testing model, or {tableIn}-test-sagitta.fits if in testing mode.
76+
The user can specify the name for the output file via the ```--tableOut``` flag. If this flag is not specified then by default the output table will be named {tableIn}-sagitta.fits if NOT in testing model, or {tableIn}-test-sagitta.fits if in testing mode.
7477

7578
###### Output Column Naming Options
76-
There are three flags for output column naming specification. They are the "--av_out", "--pms_out", and "--age_out" flags with their default values being "av", "pms", and "age" respectivly. These names correspond to the output column names from each of the three models, and will also be used in the uncertainty statistic generation output column names as well.
79+
There are three flags for output column naming specification. They are the ```--av_out```, ```--pms_out```, and ```--age_out``` flags with their default values being "av", "pms", and "age" respectivly. These names correspond to the output column names from each of the three models, and will also be used in the uncertainty statistic generation output column names as well.
80+
81+
## Examples
82+
Testing all three models in the pipeline on example.fits and renaming the Av and pms output columns:
83+
```sagitta example.fits --av_out av_sagitta --pms_out pms_sagitta --test```
84+
85+
Running all three models and specifying the output table name to be output.fits:
86+
```sagitta example.fits --tableOut output.fits```
87+
88+
Only running the Av and PMS models:
89+
```sagitta example.fits --no_age_prediction```
90+
91+
Running all three models AND generating the PMS output uncertainty statistics with the sampling rate to 5 times per star:
92+
```sagitta example.fits --pms_uncertainty 5```
93+
94+
Specifying that the example.fits's source ID colum is named Gaia_DR2_ID:
95+
```sagitta example.fits --source_id Gaia_DR2_ID```
96+
97+
Pulling up the terminal help:
98+
```sagitta --help```
7799

78100
## Required Packages
79-
1. [AstroPy](https://www.astropy.org/)
80-
2. [AstroQuery](https://astroquery.readthedocs.io/)
81-
3. [GalPy](https://docs.galpy.org/)
82-
4. [NumPy](https://numpy.org/)
83-
5. [Pandas](https://pandas.pydata.org/)
84-
6. [Pytorch](https://pytorch.org/)
85-
86-
## Usage
87-
The Sagitta pipeline can be invoked by running the [sagitta.py](./Sagitta/sagitta.py) file directly or using command "sagitta" via the command line.
88-
89-
(TODO: Build this into the setup.py functionality)
90-
(TODO: Make some TTYGIFs of the pipeline being run)
101+
* [AstroPy](https://www.astropy.org/)
102+
* [AstroQuery](https://astroquery.readthedocs.io/)
103+
* [GalPy](https://docs.galpy.org/)
104+
* [NumPy](https://numpy.org/)
105+
* [Pandas](https://pandas.pydata.org/)
106+
* [Pytorch](https://pytorch.org/)
91107

92108
## Paper Reference
93109
[Untangling the Galaxy III: Photometric Search for Pre-main Sequence Stars with Deep Learning](https://arxiv.org/abs/2012.10463)

requirements.txt

Lines changed: 0 additions & 6 deletions
This file was deleted.

sagitta/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
#

0 commit comments

Comments
 (0)