Skip to content

Commit 47db3b8

Browse files
committed
0 parents  commit 47db3b8

File tree

31,284 files changed

+10346368
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

31,284 files changed

+10346368
-0
lines changed

.nojekyll

Whitespace-only changes.

CNAME

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
docs.metatensor.org

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# metatensor-docs
2+
3+
Documentation website for metatensor. This is in a separate repository to limit
4+
the size of the main repository.

index.html

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
<!DOCTYPE html>
2+
<html>
3+
4+
<head>
5+
<meta charset="utf-8" />
6+
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
7+
<meta http-equiv="refresh" content="0;URL=latest/index.html" />
8+
</head>
9+
10+
<body></body>
11+
</html>

latest/.buildinfo

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: 3fb5854ba905c2a5cfbd2afb9af91fa2
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7
Lines changed: 237 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"\n\n# Handling sparsity\n\nThe one sentence introduction to metatensor mentions that this is a \"self-describing\n**sparse** tensor data format\". The `previous tutorial <core-tutorial-first-steps>`\nexplained the self-describing part of the format, and in this tutorial we will explore\nwhat makes metatensor a sparse format; and how to remove the corresponding sparsity when\nrequired.\n\nLike in the `previous tutorial <core-tutorial-first-steps>`, we will load the data\nwe need from a file. The code used to generate this file can be found below:\n\n.. details:: Show the code used to generate the :file:`radial-spectrum.npz` file\n\n ..\n\n The data was generated with `featomic`_, a package to compute atomistic\n representations for machine learning applications.\n\n\n .. literalinclude:: radial-spectrum.py.example\n :language: python\n\nThe file contains a representation of two molecules called the radial spectrum. The atom\n$i$ is represented by the radial spectrum $R_i^\\alpha$, which is an\nexpansion of the neighbor density $\\rho_i^\\alpha(r)$ on a set of radial basis\nfunctions $f_n(r)$\n\n\\begin{align}R_i^\\alpha(n) = \\int f_n(r) \\rho_i(r) dr\\end{align}\n\nThe density $\\rho_i^\\alpha(r)$ associated with all neighbors of species\n$\\alpha$ of the atom $i$ (each neighbor is replaced with a Gaussian function\ncentered on the neighbor $g(r_{ij})$) is defined as:\n\n\\begin{align}\\rho_i^\\alpha(r) = \\sum_{j \\in \\text{ neighborhood of i }} g(r_{ij})\n \\delta_{\\alpha_j,\\alpha}\\end{align}\n\n\nThe exact mathematical details above don't matter too much for this tutorial, the main\npoint being that this representation treats atomic species as completely independent,\neffectively using the neighbor species $\\alpha$ for `one-hot encoding`_.\n\n\n.. py:currentmodule:: metatensor\n"
8+
]
9+
},
10+
{
11+
"cell_type": "code",
12+
"execution_count": null,
13+
"metadata": {
14+
"collapsed": false
15+
},
16+
"outputs": [],
17+
"source": [
18+
"import ase\nimport ase.visualize.plot\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimport metatensor"
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"We will work on the radial spectrum representation of three molecules in our system:\na carbon monoxide, an oxygen molecule and a nitrogen molecule.\n\n"
26+
]
27+
},
28+
{
29+
"cell_type": "code",
30+
"execution_count": null,
31+
"metadata": {
32+
"collapsed": false
33+
},
34+
"outputs": [],
35+
"source": [
36+
"atoms = ase.Atoms(\n \"COO2N2\",\n positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)],\n)\n\nfig, ax = plt.subplots(figsize=(3, 3))\nase.visualize.plot.plot_atoms(atoms, ax)\nax.set_axis_off()\nplt.show()"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"metadata": {},
42+
"source": [
43+
"## Sparsity in ``TensorMap``\n\nThe radial spectrum representation has two keys: ``central_species`` indicating the\nspecies of the central atom (atom $i$ in the equations); and\n``neighbor_type`` indicating the species of the neighboring atoms (atom $j$\nin the equations)\n\n"
44+
]
45+
},
46+
{
47+
"cell_type": "code",
48+
"execution_count": null,
49+
"metadata": {
50+
"collapsed": false
51+
},
52+
"outputs": [],
53+
"source": [
54+
"radial_spectrum = metatensor.load(\"radial-spectrum.npz\")\n\nprint(radial_spectrum)"
55+
]
56+
},
57+
{
58+
"cell_type": "markdown",
59+
"metadata": {},
60+
"source": [
61+
"This shows the first level of sparsity in ``TensorMap``: block sparsity.\n\nOut of all possible combinations of ``central_species`` and ``neighbor_type``, some\nare missing such as ``central_species=7, neighbor_type=8``. This is because we are\nusing a spherical cutoff of 2.5 \u00c5, and as such there are no oxygen neighbor atoms\nclose enough to the nitrogen centers. This means that all the corresponding radial\nspectrum coefficients $R_i^\\alpha(n)$ will be zero (since the neighbor density\n$\\rho_i^\\alpha(r)$ is zero everywhere).\n\nInstead of wasting memory space by storing all of these zeros explicitly, we simply\navoid creating the corresponding blocks from the get-go and save a lot of memory!\n\n"
62+
]
63+
},
64+
{
65+
"cell_type": "markdown",
66+
"metadata": {},
67+
"source": [
68+
"Let's now look at the block containing the representation for oxygen centers and\ncarbon neighbors:\n\n"
69+
]
70+
},
71+
{
72+
"cell_type": "code",
73+
"execution_count": null,
74+
"metadata": {
75+
"collapsed": false
76+
},
77+
"outputs": [],
78+
"source": [
79+
"block = radial_spectrum.block(center_type=8, neighbor_type=6)"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {},
85+
"source": [
86+
"Naively, this block should contain samples for all oxygen atoms (since\n``center_type=8``); in practice we only have a single sample!\n\n"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": null,
92+
"metadata": {
93+
"collapsed": false
94+
},
95+
"outputs": [],
96+
"source": [
97+
"print(block.samples)"
98+
]
99+
},
100+
{
101+
"cell_type": "markdown",
102+
"metadata": {},
103+
"source": [
104+
"There is a second level of sparsity happening here, using a format related to\n[coordinate sparse arrays (COO format)](COO_). Since there is only one oxygen atom\nwith carbon neighbors, we only include this atom in the samples, and the\ndensity/radial spectrum coefficient for all the other oxygen atoms is assumed to be\nzero.\n\n\n"
105+
]
106+
},
107+
{
108+
"cell_type": "markdown",
109+
"metadata": {},
110+
"source": [
111+
"## Making the data dense again\n\nSometimes, we might have to use data in a sparse metatensor format with code that does\nnot understands this sparsity. One solution is to convert the data to a dense format,\nmaking the zeros explicit as much as possible. Metatensor provides functionalities to\nconvert sparse data to a dense format for the keys sparsity; and metadata to convert\nto a dense format for sample sparsity.\n\nFirst, the sample sparsity can be removed block by block by creating a new array full\nof zeros, and copying the data according to the indices in ``block.samples``\n\n"
112+
]
113+
},
114+
{
115+
"cell_type": "code",
116+
"execution_count": null,
117+
"metadata": {
118+
"collapsed": false
119+
},
120+
"outputs": [],
121+
"source": [
122+
"dense_block_data = np.zeros((len(atoms), block.values.shape[1]))\n\n# only copy the non-zero data stored in the block\ndense_block_data[block.samples[\"atom\"]] = block.values\n\nprint(dense_block_data)"
123+
]
124+
},
125+
{
126+
"cell_type": "markdown",
127+
"metadata": {},
128+
"source": [
129+
"Alternatively, we can undo the keys sparsity with\n:py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`,\nwhich merge multiple blocks along the samples or properties dimensions respectively.\n\nWhich one of these functions to call will depend on the data you are handling.\nTypically, one-hot encoding (the ``neighbor_types`` key here) should be merged\nalong the properties dimension; and keys that define subsets of the samples\n(``center_type``) should be merged along the samples dimension.\n\n"
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": null,
135+
"metadata": {
136+
"collapsed": false
137+
},
138+
"outputs": [],
139+
"source": [
140+
"dense_radial_spectrum = radial_spectrum.keys_to_samples(\"center_type\")\ndense_radial_spectrum = dense_radial_spectrum.keys_to_properties(\"neighbor_type\")"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"metadata": {},
146+
"source": [
147+
"After calling these two functions, we now have a :py:class:`TensorMap` with a single\nblock and no keys:\n\n"
148+
]
149+
},
150+
{
151+
"cell_type": "code",
152+
"execution_count": null,
153+
"metadata": {
154+
"collapsed": false
155+
},
156+
"outputs": [],
157+
"source": [
158+
"print(dense_radial_spectrum)\n\nblock = dense_radial_spectrum.block()"
159+
]
160+
},
161+
{
162+
"cell_type": "markdown",
163+
"metadata": {},
164+
"source": [
165+
"We can see that the resulting dense data array contains a lot of zeros (and has a well\ndefined block-sparse structure):\n\n"
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": null,
171+
"metadata": {
172+
"collapsed": false
173+
},
174+
"outputs": [],
175+
"source": [
176+
"with np.printoptions(precision=3):\n print(block.values)"
177+
]
178+
},
179+
{
180+
"cell_type": "markdown",
181+
"metadata": {},
182+
"source": [
183+
"And using the metadata attached to the block, we can understand which part of the data\nis zero and why. For example, the lower-right corner of the array corresponds to\nnitrogen atoms (the last two samples):\n\n"
184+
]
185+
},
186+
{
187+
"cell_type": "code",
188+
"execution_count": null,
189+
"metadata": {
190+
"collapsed": false
191+
},
192+
"outputs": [],
193+
"source": [
194+
"print(block.samples.print(max_entries=-1))"
195+
]
196+
},
197+
{
198+
"cell_type": "markdown",
199+
"metadata": {},
200+
"source": [
201+
"And these two bottom rows are zero everywhere, except in the part representing the\nnitrogen neighbor density:\n\n"
202+
]
203+
},
204+
{
205+
"cell_type": "code",
206+
"execution_count": null,
207+
"metadata": {
208+
"collapsed": false
209+
},
210+
"outputs": [],
211+
"source": [
212+
"print(block.properties.print(max_entries=-1))"
213+
]
214+
}
215+
],
216+
"metadata": {
217+
"kernelspec": {
218+
"display_name": "Python 3",
219+
"language": "python",
220+
"name": "python3"
221+
},
222+
"language_info": {
223+
"codemirror_mode": {
224+
"name": "ipython",
225+
"version": 3
226+
},
227+
"file_extension": ".py",
228+
"mimetype": "text/x-python",
229+
"name": "python",
230+
"nbconvert_exporter": "python",
231+
"pygments_lexer": "ipython3",
232+
"version": "3.12.7"
233+
}
234+
},
235+
"nbformat": 4,
236+
"nbformat_minor": 0
237+
}
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)