Skip to content

Commit 085e15c

Browse files
add package, readme, tests, and workflow
1 parent e14e856 commit 085e15c

File tree

8 files changed

+665
-1
lines changed

8 files changed

+665
-1
lines changed

.github/workflows/run-tests.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: Run pytest
2+
3+
on:
4+
push:
5+
branches: [ "main" ]
6+
pull_request:
7+
branches: [ "main" ]
8+
9+
jobs:
10+
build:
11+
runs-on: ubuntu-latest
12+
strategy:
13+
fail-fast: false
14+
matrix:
15+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
16+
steps:
17+
- uses: actions/checkout@v4
18+
- name: Set up Python ${{ matrix.python-version }}
19+
uses: actions/setup-python@v3
20+
with:
21+
python-version: ${{ matrix.python-version }}
22+
- name: Install dependencies
23+
run: |
24+
python -m pip install --upgrade pip
25+
python -m pip install '.[dev]'
26+
- name: Test with pytest
27+
run: |
28+
pytest

README.md

Lines changed: 67 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,67 @@
1-
# lsst_refcats
1+
# lsst_refcats
2+
3+
Download (trimmed) GaiaDR3 / GaiaDR2 / Pan-STARRS1 reference catalogs for use with the LSST Science Pipelines.
4+
5+
Requires the LSST Science Pipelines as a dependency: https://pipelines.lsst.io/
6+
- This package has been tested with version `w_2024_34` of the Science Pipelines.
7+
8+
Install:
9+
```
10+
$ python -m pip install get-lsst-refcats
11+
```
12+
13+
Usage:
14+
```
15+
usage: lsst-refcats [-h] [--output OUTPUT] [--paths PATHS [PATHS ...]] [--repo REPO] [--dataset DATASET] [--collections COLLECTIONS]
16+
[--where WHERE] [--refcat_indexer REFCAT_INDEXER] [--pixel_margin PIXEL_MARGIN] [--log-level LOG_LEVEL]
17+
[--export-run EXPORT_RUN] [--export-dataset-name EXPORT_DATASET_NAME] [--import-file] [--processes PROCESSES]
18+
refcat_name
19+
20+
positional arguments:
21+
refcat_name The reference catalog name
22+
23+
options:
24+
-h, --help show this help message and exit
25+
--output OUTPUT, -o OUTPUT
26+
The output file to write the trimmed refcat YAML to (default: data/refcats)
27+
--paths PATHS [PATHS ...], -p PATHS [PATHS ...]
28+
The paths to fits files to search (default: [])
29+
--repo REPO, -b REPO The repo to search for exposures (default: None)
30+
--dataset DATASET The dataset name to search if using repo (default: None)
31+
--collections COLLECTIONS
32+
The collections to search if using repo (default: None)
33+
--where WHERE A constraint for the dataset search if using repo (default: )
34+
--refcat_indexer REFCAT_INDEXER
35+
The refcat indexer (default: HTM)
36+
--pixel_margin PIXEL_MARGIN
37+
The pixel margin for determining overlapping refcat shards (default: 300)
38+
--log-level LOG_LEVEL
39+
The logging level, one of DEBUG, INFO, WARN, ERROR (default: INFO)
40+
--export-run EXPORT_RUN
41+
The RUN collection name to export collections into (default: refcats)
42+
--export-dataset-name EXPORT_DATASET_NAME
43+
The dataset name to use for exported datasets (default: None)
44+
--import-file Make import ECSV file (new style) instead of YAML export file (default: False)
45+
--processes PROCESSES, -J PROCESSES
46+
Number of processes to use for opening fits files or loading dataset refs (default: 8)
47+
```
48+
49+
Example:
50+
```
51+
$ lsst-refcats gaiadr3 --paths image.fits.fz --import-file
52+
$ butler create ./repo
53+
$ butler register-dataset-type ./repo gaia_dr3_20230707 SimpleCatalog htm7
54+
$ butler ingest-files ./repo gaia_dr3_20230707 refcats/gaia_dr3 ./data/refcats/gaia_dr3_20230707.ecsv
55+
lsst.daf.butler.script.ingest_files INFO: Ingesting 27 dataset ref(s) from 27 file(s)
56+
$ butler query-datasets ./repo gaia_dr3_20230707 --collections "refcats/gaia_dr3"
57+
lsst.daf.butler.script.queryDatasets INFO: Processing 1 dataset type
58+
59+
type run id htm7
60+
----------------- ---------------- ------------------------------------ ------
61+
gaia_dr3_20230707 refcats/gaia_dr3 ae761abd-3d9a-4ede-a21d-bb9b757af459 188496
62+
...
63+
```
64+
65+
Credits:
66+
67+
Code authored by [stevenstetzler](https://github.com/stevenstetzler/) and [DinoBektesevic](https://github.com/DinoBektesevic/).

pyproject.toml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
[build-system]
2+
requires = ["setuptools>=80", "setuptools-scm>=8"]
3+
build-backend = "setuptools.build_meta"
4+
5+
[project]
6+
name = "get-lsst-refcats"
7+
license = {file = "LICENSE"}
8+
readme = "README.md"
9+
authors = [
10+
{ name = "Steven Stetzler", email = "steven.stetzler@gmail.com" }
11+
]
12+
classifiers = [
13+
"Development Status :: 4 - Beta",
14+
"License :: OSI Approved :: MIT License",
15+
"Intended Audience :: Developers",
16+
"Intended Audience :: Science/Research",
17+
"Operating System :: OS Independent",
18+
"Programming Language :: Python",
19+
]
20+
dynamic = ["version"]
21+
requires-python = ">=3.9"
22+
dependencies = [
23+
"astropy",
24+
"requests",
25+
"joblib",
26+
]
27+
description = "Get reference catalogs for the LSST Science Pipelines."
28+
29+
[project.urls]
30+
"Source Code" = "https://github.com/stevenstetzler/get-lsst-refcats"
31+
32+
[project.scripts]
33+
lsst-refcats = "get_lsst_refcats.trim:main"
34+
35+
[tool.setuptools_scm]
36+
write_to = "src/get_lsst_refcats/_version.py"
37+
version_scheme = "guess-next-dev"
38+
local_scheme = "no-local-version"
39+
40+
[project.optional-dependencies]
41+
dev = [
42+
"pytest"
43+
]
44+

src/get_lsst_refcats/__init__.py

Whitespace-only changes.

src/get_lsst_refcats/_version.py

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# file generated by setuptools-scm
2+
# don't change, don't track in version control
3+
4+
__all__ = [
5+
"__version__",
6+
"__version_tuple__",
7+
"version",
8+
"version_tuple",
9+
"__commit_id__",
10+
"commit_id",
11+
]
12+
13+
TYPE_CHECKING = False
14+
if TYPE_CHECKING:
15+
from typing import Tuple
16+
from typing import Union
17+
18+
VERSION_TUPLE = Tuple[Union[int, str], ...]
19+
COMMIT_ID = Union[str, None]
20+
else:
21+
VERSION_TUPLE = object
22+
COMMIT_ID = object
23+
24+
version: str
25+
__version__: str
26+
__version_tuple__: VERSION_TUPLE
27+
version_tuple: VERSION_TUPLE
28+
commit_id: COMMIT_ID
29+
__commit_id__: COMMIT_ID
30+
31+
__version__ = version = '0.1.dev1'
32+
__version_tuple__ = version_tuple = (0, 1, 'dev1')
33+
34+
__commit_id__ = commit_id = 'ge14e856fa'

src/get_lsst_refcats/trim.py

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
from .utils import deferred_import, refcat_name_to_dataset_name, resolve_bbox2shard_ids, create_bbox_and_wcs_from_decam_fits, resolve_exposure_shard_ids, load_refcat_yaml, make_refcat_import
2+
import argparse
3+
import os
4+
import yaml
5+
import logging
6+
import sys
7+
import joblib
8+
9+
logger = logging.getLogger(__name__)
10+
logging.basicConfig(format="[%(levelname)s:%(filename)s:%(lineno)s - %(funcName)5s()] %(message)s")
11+
12+
13+
def main():
14+
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
15+
parser.add_argument("refcat_name", type=str, help="The reference catalog name")
16+
parser.add_argument("--output", "-o", type=str, default="data/refcats", help="The output file to write the trimmed refcat YAML to")
17+
parser.add_argument("--paths", "-p", nargs="+", type=str, default=[], help="The paths to fits files to search")
18+
parser.add_argument("--repo", "-b", type=str, default=None, help="The repo to search for exposures")
19+
parser.add_argument("--dataset", default=None, help="The dataset name to search if using repo")
20+
parser.add_argument("--collections", default=None, help="The collections to search if using repo")
21+
parser.add_argument("--where", default="", help="A constraint for the dataset search if using repo")
22+
parser.add_argument("--refcat_indexer", default="HTM", help="The refcat indexer")
23+
parser.add_argument("--pixel_margin", type=int, default=300, help="The pixel margin for determining overlapping refcat shards")
24+
parser.add_argument("--log-level", help="The logging level, one of DEBUG, INFO, WARN, ERROR", default="INFO")
25+
parser.add_argument("--export-run", type=str, default="refcats", help="The RUN collection name to export collections into")
26+
parser.add_argument("--export-dataset-name", type=str, default=None, help="The dataset name to use for exported datasets")
27+
parser.add_argument("--import-file", action="store_true", help="Make import ECSV file (new style) instead of YAML export file")
28+
parser.add_argument("--processes", "-J", type=int, default=8, help="Number of processes to use for opening fits files or loading dataset refs")
29+
30+
args = parser.parse_args()
31+
if args.repo is not None and args.dataset is None:
32+
raise ValueError("must use argument --dataset if specifying repo")
33+
logger.setLevel(getattr(logging, args.log_level))
34+
35+
os.makedirs(args.output, exist_ok=True)
36+
37+
deferred_import("lsst.meas.algorithms", "measAlgs", ns=globals())
38+
refCatConf = measAlgs.DatasetConfig()
39+
ref_dataset_name = refcat_name_to_dataset_name.get(args.refcat_name, None)
40+
if ref_dataset_name is None:
41+
raise ValueError(f"{args.refcat_name} is not an alias for any dataset name, use one of {list(refcat_name_to_dataset_name.keys())}")
42+
refCatConf.ref_dataset_name = ref_dataset_name
43+
if args.refcat_indexer != "HTM":
44+
raise ValueError(f"refcat indexer {args.refcat_indexer} is not supported")
45+
refCatConf.indexer = args.refcat_indexer
46+
47+
if args.export_dataset_name is None:
48+
export_dataset_name = ref_dataset_name
49+
else:
50+
export_dataset_name = args.export_dataset_name
51+
52+
bboxes = []
53+
wcss = []
54+
if args.paths:
55+
def work(path):
56+
import logging
57+
logging.basicConfig()
58+
logger = logging.getLogger(__name__)
59+
logger.setLevel(getattr(logging, args.log_level))
60+
logger.info(f"loading fits {path}")
61+
return create_bbox_and_wcs_from_decam_fits(path)
62+
63+
results = joblib.Parallel(n_jobs=args.processes)(joblib.delayed(work)(path) for path in args.paths)
64+
for bbox, wcs in results:
65+
bboxes.extend(bbox)
66+
wcss.extend(wcs)
67+
68+
# for path in args.paths:
69+
# logger.info(f"loading fits {path}")
70+
# bbox, wcs = create_bbox_and_wcs_from_decam_fits(path)
71+
# bboxes.extend(bbox)
72+
# wcss.extend(wcs)
73+
74+
if args.repo:
75+
deferred_import("lsst.daf.butler", "dafButler", ns=globals())
76+
butler = dafButler.Butler(args.repo, collections=args.collections)
77+
refs = butler.registry.queryDatasets(args.dataset, where=args.where)
78+
def work(ref):
79+
import logging
80+
logging.basicConfig()
81+
logger = logging.getLogger(__name__)
82+
logger.setLevel(getattr(logging, args.log_level))
83+
logger.info(f"loading dataset {ref}")
84+
85+
wcs = butler.get(f"{args.dataset}.wcs", ref.dataId, collections=ref.run)
86+
bbox = butler.get(f"{args.dataset}.detector", ref.dataId, collections=ref.run).getBBox()
87+
return bbox, wcs
88+
89+
results = joblib.Parallel(n_jobs=args.processes)(joblib.delayed(work)(ref) for ref in refs)
90+
for bbox, wcs in results:
91+
bboxes.append(bbox)
92+
wcss.append(wcs)
93+
94+
# for ref in refs:
95+
# logger.info(f"loading dataset {ref}")
96+
# bboxes.append(bbox)
97+
# wcss.append(wcs)
98+
99+
shards = []
100+
for bbox, wcs in zip(bboxes, wcss):
101+
shards.extend(resolve_bbox2shard_ids(refCatConf, bbox, wcs, pixelMargin=args.pixel_margin))
102+
103+
shards = list(set(shards))
104+
logger.info("shards: %s", shards)
105+
if args.import_file:
106+
import_table = make_refcat_import(ref_dataset_name, shards, args.output)
107+
import_table.write(os.path.join(args.output, ref_dataset_name + ".ecsv"), format="ascii.ecsv")
108+
else:
109+
# load the full yaml and trim to include just the chosen shards
110+
logger.info(f"loading refcat for {ref_dataset_name}")
111+
refcat = load_refcat_yaml(ref_dataset_name)
112+
datasets = list(filter(lambda d : d['type'] == "dataset", refcat['data']))[0]
113+
collection = list(filter(lambda d : d['type'] == "collection", refcat['data']))[0]
114+
dataset_type = list(filter(lambda d : d['type'] == "dataset_type", refcat['data']))[0]
115+
collection['name'] = args.export_run
116+
dataset_type['name'] = export_dataset_name
117+
logger.info("trimming records")
118+
records = list(filter(lambda rec : rec['data_id'][0]['htm7'] in shards, datasets['records']))
119+
datasets['records'] = records
120+
datasets['run'] = args.export_run
121+
datasets['dataset_type'] = export_dataset_name
122+
refcat['data'] = [collection, dataset_type, datasets]
123+
124+
print(yaml.dump(refcat))
125+
126+
if __name__ == "__main__":
127+
main()
128+

0 commit comments

Comments
 (0)