Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
161 commits
Select commit Hold shift + click to select a range
bd248ab
modified io.py and main.py to dump powdercif object as json
berrakozer Jul 27, 2021
9fb7848
deleted print statement
berrakozer Jul 27, 2021
1c15864
added argparse library
berrakozer Aug 5, 2021
0531245
made the slight modification on the if statement
berrakozer Aug 9, 2021
9749d18
removed --rank and empty line
berrakozer Aug 9, 2021
2c80570
removed unnecessary argparser code lines and used pathlib
berrakozer Aug 11, 2021
ded8823
xtype wavelength user_input_file are added as command line arguments
berrakozer Aug 11, 2021
0bb9b7a
made the modifications required on the parser
berrakozer Aug 12, 2021
5a774f4
updates on the args inside argparser
berrakozer Sep 1, 2021
0a9872a
edits for argparser in main.py and debugging cif_io.py
berrakozer Sep 1, 2021
c6c0c5e
fix for the unit of the wavelength
berrakozer Sep 3, 2021
c4fe5b1
deleted short index for xquantity and xunit, introduced optional outp…
berrakozer Oct 7, 2021
3372e3b
Merge pull request #63 from Billingegroup/json_dump
sbillinge Oct 21, 2021
4d63d98
generalized correlation analysis through correlate() function in util…
maak-sdu Sep 29, 2021
7b35f1a
changed default behavior for invalid corr_type to pearson, test passes
maak-sdu Oct 7, 2021
6cfd121
tell users the allowed corr_types in the docstring
maak-sdu Oct 26, 2021
11754c8
edited the docstring for the returned float
maak-sdu Oct 26, 2021
5b5b83c
fixed rebasing error in line 132 in main.py, main.py runs
maak-sdu Oct 26, 2021
3ff1d24
removed old input file info from top of main.py
maak-sdu Oct 26, 2021
551fa48
writing skipped cifs to a log file in _output
berrakozer Sep 28, 2021
43e0462
removed chatter from main.py and cif_io.py
berrakozer Sep 28, 2021
a451b3a
introduced verbose for chatter in main.py and cif_io.py, main also ru…
berrakozer Oct 5, 2021
919bfe7
modified cif_io.py to be PEP8 compliant regarding the verbose feature
berrakozer Oct 6, 2021
e207730
out commented all lines with bg_mpl_style in plotters.py
berrakozer Oct 6, 2021
7ba458e
rebase successful, edited main.py for a minor error, main.py runs and…
berrakozer Oct 26, 2021
fe0581a
Merge pull request #81 from maak-sdu/generalize_comparison_function_i…
sbillinge Oct 26, 2021
5d59358
uncommented the commented-out code linesusing bg-mpl-stylesheet, main…
berrakozer Oct 28, 2021
4350401
Merge pull request #80 from Billingegroup/log_cif_keyerror_issue41
sbillinge Nov 9, 2021
b663ba9
fixed some missing renaming in main such that main now runs
maak-sdu Jan 12, 2022
859f76a
cache builds for all cifs
maak-sdu Jan 24, 2022
8865d3e
logging missing twotheta, intensity, wavelength
maak-sdu Jan 25, 2022
8752f48
fix such that tests pass
maak-sdu Jan 25, 2022
e48954c
cleaning main
maak-sdu Jan 31, 2022
ba172f7
modifying rank_plot
maak-sdu Jan 31, 2022
d215803
browsing more cif keys and cleaning
maak-sdu Jan 31, 2022
5b37bad
Merge pull request #85 from maak-sdu/cif_key_browsing_and_logging
sbillinge Feb 1, 2022
47e5e08
modfied rank_plot such that it is easier to tweak the plots for the p…
maak-sdu Feb 2, 2022
5e20020
Merge pull request #86 from maak-sdu/plot_edits_for_paper
sbillinge Feb 2, 2022
376ae2f
xticks inside and no blankspace for plot
maak-sdu Feb 3, 2022
72be62a
increased the height of the plots
maak-sdu Feb 4, 2022
a794110
reading refs together with dois from .txt file to save 5 mins of Cros…
maak-sdu Feb 4, 2022
d41f12f
Merge pull request #87 from maak-sdu/plot_ticks_inside_no_blankspace
sbillinge Feb 6, 2022
58c8c8e
wrote iucrids, dois, and refs to .json instead of .txt, now directly …
maak-sdu Feb 7, 2022
54ed76a
modified main.py to also rank papers, modified rank_write() to take r…
maak-sdu Feb 7, 2022
2ec5f0e
modified rank_plot() to accept ranktype as arg, now making cif rank p…
maak-sdu Feb 7, 2022
b36d91a
including cifs where twotheta array is calculated from twotheta min/m…
maak-sdu Feb 7, 2022
cab2c6d
only logging 'log reason' (no wl, tt, int) when creating cache
maak-sdu Feb 7, 2022
94f65fb
cif_read() now processes cifs with ',', '.', '?', and '-' as intensit…
maak-sdu Feb 10, 2022
856202f
altered test_xy_resample(), added test files, tests pass
maak-sdu Feb 10, 2022
f83610e
removed poor input files for test_utils_xy_resample
maak-sdu Feb 10, 2022
522e8ba
added input files for test_xy_resample()
maak-sdu Feb 10, 2022
853efe6
typo in xy_resample(), altered test_xy_resample(), tests pass
maak-sdu Feb 10, 2022
27a203a
calculating twotheta using twotheta_inc instead of len(intensity)
maak-sdu Feb 11, 2022
05aeea6
Merge pull request #88 from maak-sdu/ranking_papers
sbillinge Feb 12, 2022
cc52c8c
added test for getting doi from iucrid
maak-sdu Feb 14, 2022
361713f
added function for getting doi from iucrid, test passes
maak-sdu Feb 14, 2022
96fb48c
introduced global variabels for min and max returns, together with th…
maak-sdu Feb 14, 2022
bee1858
using group stylesheet for plots
maak-sdu Feb 15, 2022
fef05fc
fix for legend handles which appeared for .pdf files
maak-sdu Feb 15, 2022
34396fb
using legend.getlines().set_linewidth instead of white_patch
maak-sdu Feb 15, 2022
5486c7e
decreased iucr_id/doi to be tested, test passes
maak-sdu Feb 15, 2022
4bacf01
using dicts rather than lists when possible in main()
maak-sdu Feb 15, 2022
cb009bd
outcommented all lines related to return of cifs as cif returns are n…
maak-sdu Feb 16, 2022
bf51ed7
remove timing part and some leftover from cif investigation
maak-sdu Feb 16, 2022
a8aa793
added test_rank_returns()
maak-sdu Feb 16, 2022
25c9494
added rank_returns()
maak-sdu Feb 16, 2022
364aca2
using rank_returns() to get cifs/papers to return, main() runs, tests…
maak-sdu Feb 16, 2022
eded2f5
cleaned up print statements in the bottom of main()
maak-sdu Feb 16, 2022
1f56c76
Merge pull request #92 from maak-sdu/doi_crossref_top5
sbillinge Feb 16, 2022
4ed6723
replaced example cifs with those referred to in paper
maak-sdu Jun 17, 2022
533d1d7
removed empty dir that is not used
maak-sdu Jun 17, 2022
bec7589
renaming example file, providing the three used in the paper
maak-sdu Jun 17, 2022
be52930
Merge pull request #97 from maak-sdu/file_replacements
sbillinge Jun 17, 2022
4f619c8
ValueError fix when using _fixIfWindowsPath() for CifFile.ReadCif(), …
maak-sdu Jun 17, 2022
301902a
Merge pull request #98 from maak-sdu/cifread_fix
sbillinge Jun 17, 2022
336d57d
included bg-mpl-stylesheets in run.txt
maak-sdu Jun 17, 2022
dd3c527
Merge pull request #99 from maak-sdu/bg_mpl_stylesheets_inclusion
sbillinge Jun 17, 2022
32d1be2
included rever.xsh
maak-sdu Jul 6, 2022
07b65d3
Merge pull request #100 from maak-sdu/rever_prep
sbillinge Jul 9, 2022
79ac52a
included 'Columbia Trustees' in LICENSE file, added AUTHORS file
maak-sdu Jul 11, 2022
a1b2d52
polishing terminal print if 'verbose' in main()
maak-sdu Jul 11, 2022
97eec65
removed Columbia part from LICENSE
maak-sdu Jul 19, 2022
cb2cb9b
printing title line always, fixing prob with double dashed lines when…
maak-sdu Jul 19, 2022
8ef14ae
removed bozer, re-added Columbia, added UoSD
maak-sdu Jul 19, 2022
e5246b9
removed commented out lines, removed os import (not used)
maak-sdu Jul 19, 2022
74d0a66
Merge pull request #102 from maak-sdu/terminal_output_polish
sbillinge Jul 19, 2022
43635fa
paper ref without doi, url for gh repo, guidelines on installing and …
maak-sdu Jul 19, 2022
2f16543
reincluded two topmost lines
maak-sdu Jul 19, 2022
227cef3
Merge pull request #103 from maak-sdu/setup_cleaning
sbillinge Jul 19, 2022
f67b488
Merge pull request #101 from maak-sdu/license_authors_update
sbillinge Jul 19, 2022
11f3bc6
removed pin to python 3.9
maak-sdu Jul 19, 2022
eb7d54c
developer_notes on Google Cloud Storage Platform and FastAPI
maak-sdu Jul 19, 2022
b5f5a00
updated instructions to run package outside module dir
maak-sdu Jul 19, 2022
5cb123b
cleaning and included dir stru needed for program to run
maak-sdu Jul 20, 2022
1cd8734
reincluded pip install
maak-sdu Jul 20, 2022
48d148c
Merge pull request #104 from maak-sdu/readme_cleaning
sbillinge Jul 20, 2022
02a6c25
added arXiv doi, removed gh repo url
maak-sdu Jul 20, 2022
e1b2a94
Merge pull request #105 from maak-sdu/readme_update
sbillinge Jul 20, 2022
f4259bd
docs update, figs dir added
maak-sdu Jul 20, 2022
c59796f
bumped version to v1.0.0
berrakozer Jul 20, 2022
a4645e6
Merge pull request #106 from maak-sdu/docs_update
sbillinge Jul 25, 2022
de93ac3
moving cif files to separate measured and calculated directories
berrakozer Aug 25, 2022
3ca55e1
added simulated patterns
berrakozer Aug 25, 2022
0a27208
Merge pull request #107 from berrakozer/moving_files
sbillinge Aug 26, 2022
4c63232
now running with measured PDFs in the new directory structure
sbillinge Sep 22, 2022
6582811
Merge pull request #110 from sbillinge/fixed_paths
sbillinge Sep 22, 2022
99ddc6b
added simulated patterns to repo
berrakozer Oct 24, 2022
a70935e
one more try
berrakozer Oct 24, 2022
f01f21d
BUG: fix the usless import in the cif_io that causes the importError
st3107 Oct 24, 2022
76f9c0c
Merge pull request #112 from st3107/fix_the_useless_import
sbillinge Oct 24, 2022
d469f4f
deleting incorrect calculated files
berrakozer Oct 24, 2022
e397ed7
Merge branch 'main' of github.com:Billingegroup/pydatarecognition int…
berrakozer Oct 24, 2022
6961ba1
added .ds_store to .gitignore
berrakozer Oct 24, 2022
1bcf741
removing .icloud files merging songsheng fix
berrakozer Oct 24, 2022
233d566
Merge pull request #111 from berrakozer/simulated_patterns
sbillinge Oct 24, 2022
94f56e2
fix bugs in the logic for correcting for length of arrays generated b…
sbillinge Oct 25, 2022
7e72f36
cleaning rogue files and moving iucr_doi_mapping.json out of docs
sbillinge Oct 25, 2022
95dcfa2
Merge pull request #113 from sbillinge/phooey
sbillinge Oct 25, 2022
e0cd39a
better wrapping of resampling so that it shows why a cif is skipped
sbillinge Oct 25, 2022
77e2f67
Merge pull request #114 from sbillinge/detry
sbillinge Oct 25, 2022
2e682b4
simplifying syntax when checking if variables are set
sbillinge Oct 27, 2022
4616250
adding validators and arg processor
sbillinge Oct 27, 2022
59dc019
Merge pull request #115 from sbillinge/isinstance
sbillinge Oct 27, 2022
50c3970
cleaning up the io a bit and adding ability to load different x-coord…
sbillinge Oct 27, 2022
c709206
fixed xy_resample test
sbillinge Oct 27, 2022
7c1aa67
news
sbillinge Oct 27, 2022
8e0fa66
Merge pull request #116 from sbillinge/io
sbillinge Oct 27, 2022
8bddaf4
allow for different correlation coefficients to be used
sbillinge Oct 29, 2022
15721cb
cleaning loggin and printing and some other cleaning edits
sbillinge Oct 29, 2022
89a1a4d
passing tests and news
sbillinge Oct 29, 2022
25e395f
removed logging
sbillinge Oct 29, 2022
f4ca1ab
improved testing on correlator function
sbillinge Oct 29, 2022
ed95270
Merge pull request #117 from sbillinge/logging
sbillinge Oct 29, 2022
845349e
renamde input_data_3 for users
berrakozer Nov 10, 2022
aaf58aa
Merge pull request #118 from berrakozer/update_powderdata_example3
sbillinge Nov 10, 2022
e0a6954
threshold 0.8 instead of 8
sbillinge Nov 15, 2022
7474de9
Merge pull request #119 from sbillinge/logging
sbillinge Nov 15, 2022
1ca78f5
catch 'nan' from correlate and skip the cif
sbillinge Nov 15, 2022
f6cae3a
Merge pull request #120 from sbillinge/logging
sbillinge Nov 15, 2022
b03376a
handling min max as numbers correctly
sbillinge Nov 15, 2022
c14bceb
Merge pull request #121 from sbillinge/logging
sbillinge Nov 15, 2022
810d731
handle attributeError in user_resampled
sbillinge Nov 15, 2022
72386b3
Merge pull request #122 from sbillinge/error
sbillinge Nov 15, 2022
d89e1a0
added continue when error is encountered
sbillinge Nov 15, 2022
826d8f2
Merge pull request #123 from sbillinge/error
sbillinge Nov 15, 2022
d9af1a1
redo the attribute error line
sbillinge Nov 15, 2022
34488b6
Merge pull request #124 from sbillinge/errors
sbillinge Nov 15, 2022
41136ce
added function to utils that will return a y-range for plotting given…
sbillinge Dec 3, 2022
f2799ed
added an all_plotter to plot them all
sbillinge Dec 3, 2022
1cff2a4
now plotting multiple pages, 24 cifs per page
sbillinge Dec 3, 2022
175dd65
Merge pull request #125 from sbillinge/plot_all
sbillinge Dec 3, 2022
3a155d6
all_plots now saves a multi-page pdf file
sbillinge Dec 12, 2022
eaeae01
fixed bug that only user data was being plotted
sbillinge Dec 13, 2022
61fcf32
removing duplicate from subsequent pages
sbillinge Dec 13, 2022
e04361d
Merge pull request #127 from sbillinge/plot_all
sbillinge Dec 13, 2022
f6389dc
reset the n_subplots to the currently preferred value
sbillinge Dec 13, 2022
b87c81c
Merge pull request #128 from sbillinge/nsubplots
sbillinge Dec 13, 2022
c8b2b3d
merging main
sbillinge Feb 21, 2023
61d31ae
fiddling to make infrastructur work for simon. Successfully uploaded…
sbillinge Feb 21, 2023
497d676
app now running but login failing
sbillinge Feb 24, 2023
5e19e9b
remove deprecated reference to cif-iucr dictionary
sbillinge Feb 25, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,18 @@ __pycache__/
# cache
_cache

# .idea
.idea

# .DS_Store
.DS_Store

# cifs
docs/examples/cifs/_cache
docs/examples/cifs/_cache_all
docs/examples/cifs/cif_all
docs/examples/cifs/pydr_paper_cifs

# outputs
_output/

Expand Down
10 changes: 10 additions & 0 deletions AUTHORS.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Authors:

Berrak Özer,
Martin Aaskov Karlsen,
Zachary Allan Thatcher,
Simon J.L. Billinge

Contributors:

https://github.com/Billingegroup/pydatarecognition/graphs/contributors
6 changes: 5 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
BSD 3-Clause License

Copyright (c) 2021, berrakozer
Copyright (c) 2021-2022, The Trustees of Columbia University in the City
of New York

Copyright (c) 2021-2022, The Trustees of University of Southern Denmark

All rights reserved.

Redistribution and use in source and binary forms, with or without
Expand Down
82 changes: 81 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,86 @@
# pydatarecognition
# pyDataRecognition

[![tests](https://circleci.com/github/Billingegroup/pydatarecognition.svg?style=shield&circle-token=b187a993ea69930d37388bf61dccaf499456a481)](<test>)

## Preprint on arXiv
For a thorough description of the project, please see the paper by Özer *et al.*:
https://doi.org/10.48550/arXiv.2204.00434.

## Setup and installation

The following guidelines assume that the user runs a conda distribution, i.e. Anaconda or Miniconda.

### Create/activate conda environment
- Create/activate new conda env by running:
```shell
conda create -n pydatarecognition python=3
conda activate pydatarecognition
```
### Install dependencies
- Navigate to the main **pydatarecognition** directory and run:
```shell
conda install --file requirements/run.txt
pip install -r requirements/pip_requirements.txt
```
### Install package
- Install the package by navigating to the main **pydatarecognition**
directory and run:
```shell
python setup.py install
```

## Running the program

### Directory structure
Currently, the program should be run from a directory containing the cif files.
Within `docs/examples`, example cifs are located in the `cifs/measured` and `cifs/calculated` subdirectory, e.g., in `docs/examples/cifs/measured`.

### Example files
Within `docs/examples/powder_data`, three examples on input data files are available:
- 01_Mg-free-whitlockite_wl=1.540598.txt
- 02_BaTiO3_wl=0.1665.txt
- 03_KNaLi-NbMnO3_perovskite_wl=1.5482.txt

### How to run the program
With your `pydatarecognition` conda env activated, to get information on how to run the program type:
```shell
pydr --help
```
or
```shell
pydatarecognition -h
```
The program expects a syntax somewhat similar to:
```shell
python pydatarecognition.main -i INPUTFILE --xquantity XQUANTITY --xunit XUNIT -w WAVELENGTH
```
For a full description, please run the program with the help flag as shown above.

### Running the program for the measured data example files
Navigate to `docs/examples/cifs/measured`.

#### Running the program the first example measured data file
```shell
pydr -i ../../powder_data/01_Mg-free-whitlockite_wl=1.540598.txt --xquantity twotheta --xunit deg -w 1.540598
```
#### Running the program for the second example measured data file
```shell
pydr -i ../../powder_data/02_BaTiO3_wl=0.1665.txt --xquantity twotheta --xunit deg -w 0.1665
```
#### Running the program for the third example measured data file
```shell
pydr -i ../../powder_data/03_KNaLi-NbMnO3_perovskite_wl=1.5482.txt --xquantity twotheta --xunit deg -w 1.5482
```

### Running the program for the calculated cif example files
Navigate to `docs/examples/cifs/calculated` and rerun the commands above.


### Program output
Output files will be available in the `_output` folder created in the current working directory, i.e.
`docs/examples/_output`.

=====================================
###Development
- Create/activate new conda env
```shell
Expand Down
48 changes: 48 additions & 0 deletions developer_notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Developer notes
The notes below on **Google Cloud Storage Development** and **FastAPI** were
written by former Billinge Group member Zachary Thatcher. The notes decribe
how one should setup a Google Cloud Platform (GCP) account and proceed, just
as they describe how to set up FastAPI related to the use of Mongo Databases.

### Google Cloud Storage Development
The pydantic model now automatically exports all numpy arrays to google cloud when they are json serialized.
This should only affect use on the mongo server, since local running of the application leaves out these arrays
when serializing to the cache.

If you would like to develop for the mongo database, you should set up your own GCP service account and put the key
in the project's pydatarecognition folder with the name 'testing-cif-datarec-secret'. Instructions on how to do so
can be found below.
- create a GCP account and in the top right, go to console
- when in the console, go to the three hexagons in the top left, click, and select new project
- go through the steps, and make sure the project is active
- on the LHS, select the triple bar icon for the dropdown, and go to API's and Services, and go to library
- search google cloud storage and google cloud storage json api, click on each of them, and enable them
- on the LHS, select the triple bar icon for the dropdown, and go to API's and Services, and select service account
- create a new service account with an arbitrary name, as a role, go to basic -> owner and select, skip the final step
- go to keys, click add key, select json, create
- rename this json file to 'testing-cif-datarec-secret.json' and place it in pydatarecognition/pydatarecognition

### FastAPI Development
- update your dependencies
```shell
conda install --file requirements/run.txt
```
- Add a secret username and password to a yml file in the pydatarecognition folder named secret_password.yml
- These should take the following form (you replace the <>, removing the <>)
```yaml
username: <username>
password: <password>
```
- run the following command from the base dir terminal to run the app
```shell
uvicorn pydatarecognition.app:app --reload
```
- go to the following in your browser to see (and try out) the API
```shell
http://127.0.0.1:8000/docs
```
- the \_\_name__=="\_\_main__" section of mongo_utils.py is currently set up to export example cif data to group mongodb
atlas instance, which is different from the URI currently hardcoded into the fastapi app
- Be wary of this and feel free to develop in your own free mongo atlas instance (or locally with no need for username
or password)
- start to make the app look more like the following project https://github.com/markqiu/fastapi-mongodb-realworld-example-app
Loading