Skip to content

Commit e1d5ea4

Browse files
Update readme + add example schema
1 parent ab3aea5 commit e1d5ea4

File tree

2 files changed

+85
-2
lines changed

2 files changed

+85
-2
lines changed

mpcontribs-lux/README.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,29 @@
11
## MPContribs-LUX
22

3-
<span style="color:forestgreen"><i>Ego sum lux datorum</i></span>.
3+
<span style="color:goldenrod"><i><b>Ego sum lux datorum</b></i></span>.
44

5-
MPContribs-lux is a package which <it>sheds light</it> on data stored on the [Materials Project's AWS S3 OpenData bucket](https://materialsproject-contribs.s3.amazonaws.com/index.html#) by providing annotated schemas and optionally analysis tools to better explore user-submitted data.
5+
MPContribs-lux is a package which <it>sheds light</it> on data stored on the [Materials Project's AWS S3 OpenData bucket](https://materialsproject-contribs.s3.amazonaws.com/index.html#) by providing annotated schemas and optionally analysis tools to better explore user-submitted data.
6+
7+
Adding a schema to this database is a <span style="color:red"><b>pre-requisite</b></span> for obtaining permission/IAM credentials for uploading data to MP's OpenData Bucket.
8+
Once a staff member from MP reviews and approves your data schema, your receive IAM role will be granted/updated (as appropriate).
9+
10+
<span style="color:red"><b>What if I don't want my schemas / data made public yet?</b></span>
11+
12+
To expedite the process of review, follow [these instructions](https://docs.github.com/en/repositories/creating-and-managing-repositories/duplicating-a-repository) to make a private copy (not a fork, which cannot be private) of the `MPContribs` repo.
13+
Suppose you name your new repository `PrivateMPContribs` and your username is `<username>`, you would run these commands from a terminal:
14+
```console
15+
git clone --bare https://github.com/materialsproject/MPContribs.git
16+
cd MPContribs
17+
git push --mirror https://github.com/<username>/PrivateMPContribs.git
18+
cd ..
19+
rm -rf MPContribs
20+
```
21+
22+
Then add your schemas to the private repo `PrivateMPContribs` and invite the maintainers of `MPContribs` to view it (you don't need to give us edit access).
23+
We will then review your schemas.
24+
When you're ready to make your data public, you will also have to make a public PR with your new schemas.
25+
26+
<span style="color:red"><b>But my CSV/JSON/YAML/etc. file isn't complicated. Why do I need to upload a schema?</b></span>
27+
28+
Schemas are important for ensuring accessibility, interoperability, and reproducibility, and for ensuring that you are fully aware of possible errors in your dataset.
29+
If you are not comfortable mimicking the example `pydantic` schemas in `mpcontribs.lux.projects.examples`
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
"""Define example schemas for users.
2+
3+
This schema is used for the public MPContribs project `test_solid_data`:
4+
https://next-gen.materialsproject.org/contribs/projects/test_solid_data
5+
You can find its download on AWS S3:
6+
https://materialsproject-contribs.s3.amazonaws.com/index.html#test_solid_data/solid_data.parquet
7+
"""
8+
9+
from functools import cached_property
10+
11+
from pydantic import BaseModel, Field
12+
13+
from pymatgen.core import Structure
14+
15+
class ExampleSchema(BaseModel):
16+
"""Define example schema with appropriate levels of annotated metadata."""
17+
18+
formula : str | None = Field(
19+
None, description = "The chemical formula of the unit cell."
20+
)
21+
22+
a0 : float | None = Field(
23+
None, description = "The experimental equilibrium cubic "
24+
"lattice constant a, in Å, including zero-point corrections "
25+
"for nuclear vibration."
26+
)
27+
28+
b0 : float | None = Field(
29+
None, description = "The experimental bulk modulus at "
30+
"optimal lattice geometry, in GPa, including zero-point "
31+
"corrections for nuclear vibration."
32+
)
33+
34+
e0 : float | None = Field(
35+
None, description = "The experimental cohesive energy, in eV/atom, "
36+
"including zero-point corrections for nuclear vibration."
37+
)
38+
39+
cif : str | None = Field(
40+
None, description="The structure represented as a Crystallographic Information File."
41+
)
42+
43+
material_id : str | None = Field(
44+
None, description = "The Materials Project ID of the structure which "
45+
"corresponds to this entry. The ID will start with `mp-`"
46+
)
47+
48+
@cached_property
49+
def get_pymatgen_structure(self) -> Structure | None:
50+
"""Get the pymatgen structure for this entry, if it exists.
51+
52+
Example of adding functionality to downstream users to interact
53+
with your data.
54+
55+
You can provide more advanced analysis tools, which we also show below.
56+
"""
57+
if self.cif:
58+
return Structure.from_str(self.cif,fmt="cif")
59+
return None

0 commit comments

Comments
 (0)