Skip to content

Commit 8d25ab8

Browse files
committed
updates from Tom
1 parent e4b9ddb commit 8d25ab8

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

joss/paper.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ bibliography: paper.bib
3535

3636
# Summary
3737

38-
Cryogenic electron microscopy (cryo-EM) [@cryoem-drug-review; @cryoem-challenges] is an imaging technique used to obtain the structure of objects of near-atomic scales experimentally via transmission electron microscopy of cryogenically frozen samples. The Electron Microscopy Public Image Archive (EMPIAR) [@empiar] is a public resource for the raw image data collected by cryo-EM experiments and facilitates free access to this data, allowing it to be used for methods development and validation. Deep learning-based image processing approaches have been applied to many steps of the cryo-EM reconstruction workflow [@ai-in-cryoem]. Many of the resulting algorithms have been widely adopted as they enable quicker processing or improved interpretation of the data. Deep learning-based approaches require large amounts of data to train the algorithms. However, as datasets can have hundreds of files and sizes on the order of terabytes or hundreds of gigabytes, downloading and managing these datasets can become a barrier to the development of deep-learning methods. Additionally, the currently recommended tools to download data from EMPIAR either use proprietary software, require a user account or necessitate a web browser.
38+
Cryogenic electron microscopy (cryo-EM) [@cryoem-drug-review; @cryoem-challenges] is an imaging technique used to obtain the structure of biomolecular objects at near-atomic scales experimentally via transmission electron microscopy of cryogenically frozen samples. The Electron Microscopy Public Image Archive (EMPIAR) [@empiar] is a public resource for the raw image data collected by cryo-EM experiments and facilitates free access to this data, allowing it to be used for methods development and validation. Deep learning-based image processing approaches have been applied to many steps of the cryo-EM reconstruction workflow [@ai-in-cryoem]. Many of the resulting algorithms have been widely adopted as they enable quicker processing and/or improved interpretation of the data. Deep learning-based approaches require large amounts of data to train the algorithms. However, as datasets can have hundreds of files and sizes on the order of terabytes or hundreds of gigabytes, downloading and managing these datasets can become a barrier to the development of deep-learning methods. Additionally, the currently recommended tools to download data from EMPIAR either use proprietary software, require a user account or necessitate a web browser.
3939
To address this and to provide a way to integrate EMPIAR data into machine learning codebases, we have developed EMPIARreader. This is an open source tool which provides a Python library to allow lazy loading of EMPIAR datasets into a machine learning-compatible format. It parses EMPIAR metadata, uses the mrcfile library [@mrcfile] to interpret MRC files, supports common image file formats and uses the starfile library [@starfile] to interpret STAR files. To our knowledge, there are no other tools to effectively make use of EMPIAR in a dynamic manner for data intensive tasks such as machine learning. EMPIARreader additionally provides a simple, lightweight command line interface (CLI) which allows users to search and download EMPIAR entries using glob patterns or regular expressions and then download files via FTP or HTTP(S).
4040
EMPIARreader is easily installed in a Python environment via the standard Python package management tools pip and Poetry and has been released as a PyPI [@pypi] package ([EMPIARreader](https://pypi.org/project/empiarreader/)).
4141

@@ -44,9 +44,9 @@ EMPIARreader is easily installed in a Python environment via the standard Python
4444
In cryo-EM, the scattering of the electron beam by the electrostatic potential of the molecules in the sample is recorded in the images captured by the detector.
4545
Due to advancements in hardware and software since 2013, the resolution achievable via cryo-EM reconstruction rivals that possible through x-ray crystallography [@cryoem-resolution], with cryo-EM being the preferable technique for determining the conformations of many macromolecules [@cryoem-development].
4646
The images which make up cryo-EM datasets commonly have a very low signal to noise ratio (SNR) germane to minimisation of radiation damage induced disorder. Consequently, the structures are obtained by averaging through thousands of examples of the structures in the samples, which necessitates a very large dataset per experiment.
47-
Raw image datasets are deposited into the online public image archive, EMPIAR [@empiar]. There is a loose structure to follow, but generally each deposited dataset is structured according to the needs or preferences of the depositing user with no particular directory structure enforced. With over 1300 entries and >3PB of data hosted, EMPIAR has become an important resource for the structural biology community, amassing over 700 citations in published works.
47+
Raw image datasets can be deposited into the online public image archive, EMPIAR [@empiar]. There is a loose structure to follow, but generally each deposited dataset is structured according to the needs or preferences of the depositing user with no particular directory structure enforced. With over 1300 entries and >3PB of data hosted, EMPIAR has become an important resource for the structural biology community, amassing over 700 citations in published works.
4848

49-
Deep-learning-based methods have developed significantly in recent years [@dl-development] and a number of algorithms have been developed for use in cryo-EM data processing. Deep-learning has been applied to the particle picking [@topaz; @cryolo], 3D classification and dynamics [@cryodrgn; @3dflex; @dynamight], postprocessing [@deepemhancer] and model building [@jamali2023automated; @backbonepred] stages of the reconstruction pipeline among many more examples [@ai-in-cryoem]. Datasets from EMPIAR have been used extensively for training and validating cryo-EM related deep learning algorithms, particularly for those which rely on raw image data. To make optimal use of the archive it is essential that the datasets are easily accessible and their size does not hinder accessibility or algorithm performance.
49+
Deep-learning-based methods have developed significantly in recent years [@dl-development] and a number of algorithms have been developed for use in cryo-EM data processing. Deep-learning has been applied to stages of the image processing and reconstruction pipeline, including particle picking [@topaz; @cryolo], 3D classification and dynamics [@cryodrgn; @3dflex; @dynamight] and model building [@jamali2023automated; @backbonepred] among many more examples [@ai-in-cryoem]. Datasets from EMPIAR have been used extensively for training and validating cryo-EM related deep learning algorithms, particularly for those which rely on raw image data. To make optimal use of the archive it is essential that the datasets are easily accessible and their size does not hinder accessibility or algorithm performance.
5050

5151
The current recommended methods to download data from EMPIAR are via:
5252

@@ -61,7 +61,7 @@ EMPIARreader allows the granularity of downloads to be configured from an entire
6161

6262

6363
# Licensing and userbase
64-
EMPIARreader is offered under a BSD 3-clause license and can be utilised either from a CLI or via a Python library. It is currently in active use by researchers at the Alan Turing Institute and the STFC Scientific Computing Department.
64+
EMPIARreader is offered under a BSD 3-clause license and can be utilised either from a CLI or via a Python library. It is currently in active use by researchers at the Alan Turing Institute and STFC Scientific Computing Department.
6565

6666
# Acknowledgements
6767

0 commit comments

Comments
 (0)