metatensor
diff --git a/‎.nojekyll‎ b/‎.nojekyll‎
diff --git a/‎CNAME‎
Lines changed: 1 addition & 0 deletions b/‎CNAME‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 4 additions & 0 deletions b/‎README.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎index.html‎
Lines changed: 11 additions & 0 deletions b/‎index.html‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎latest/.buildinfo‎
Lines changed: 4 additions & 0 deletions b/‎latest/.buildinfo‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎latest/_datasets/17a04deb338b5b4e44f524068318d127222319290af5781f039f61bffc0543c4-fig_2-running-ase-md_001.json.gz‎
114 KB b/‎latest/_datasets/17a04deb338b5b4e44f524068318d127222319290af5781f039f61bffc0543c4-fig_2-running-ase-md_001.json.gz‎
114 KB
diff --git a/‎latest/_datasets/1f38575b0ef35eeaaef2ab3e48291e8ed7fdac0a94c931f334d4ebc108139963-fig_3-atomistic-model-with-nl_002.json.gz‎
168 KB b/‎latest/_datasets/1f38575b0ef35eeaaef2ab3e48291e8ed7fdac0a94c931f334d4ebc108139963-fig_3-atomistic-model-with-nl_002.json.gz‎
168 KB
diff --git a/‎latest/_downloads/06a401252cd11c041771bae30d8fcd80/2-handling-sparsity.ipynb‎
Lines changed: 237 additions & 0 deletions b/‎latest/_downloads/06a401252cd11c041771bae30d8fcd80/2-handling-sparsity.ipynb‎
Lines changed: 237 additions & 0 deletions
diff --git a/‎latest/_downloads/0cc0ba974aa15743303d66de4a7f7bb9/radial-spectrum.npz‎
10.2 KB b/‎latest/_downloads/0cc0ba974aa15743303d66de4a7f7bb9/radial-spectrum.npz‎
10.2 KB
diff --git a/‎latest/_downloads/0e60c5708ef3ea3957a153ed3f69f975/2-handling-sparsity.zip‎
16.9 KB b/‎latest/_downloads/0e60c5708ef3ea3957a153ed3f69f975/2-handling-sparsity.zip‎
16.9 KB
@@ -0,0 +1 @@
+docs.metatensor.org
@@ -0,0 +1,4 @@
+# metatensor-docs
+
+Documentation website for metatensor. This is in a separate repository to limit
+the size of the main repository.
@@ -0,0 +1,11 @@
+<!DOCTYPE html>
+<html>
+
+<head>
+  <meta charset="utf-8" />
+  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
+  <meta http-equiv="refresh" content="0;URL=latest/index.html" />
+</head>
+
+<body></body>
+</html>
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 3fb5854ba905c2a5cfbd2afb9af91fa2
+tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -0,0 +1,237 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "\n\n# Handling sparsity\n\nThe one sentence introduction to metatensor mentions that this is a \"self-describing\n**sparse** tensor data format\". The `previous tutorial <core-tutorial-first-steps>`\nexplained the self-describing part of the format, and in this tutorial we will explore\nwhat makes metatensor a sparse format; and how to remove the corresponding sparsity when\nrequired.\n\nLike in the `previous tutorial <core-tutorial-first-steps>`, we will load the data\nwe need from a file. The code used to generate this file can be found below:\n\n.. details:: Show the code used to generate the :file:`radial-spectrum.npz` file\n\n    ..\n\n        The data was generated with `featomic`_, a package to compute atomistic\n        representations for machine learning applications.\n\n\n        .. literalinclude:: radial-spectrum.py.example\n            :language: python\n\nThe file contains a representation of two molecules called the radial spectrum. The atom\n$i$ is represented by the radial spectrum $R_i^\\alpha$, which is an\nexpansion of the neighbor density $\\rho_i^\\alpha(r)$ on a set of radial basis\nfunctions $f_n(r)$\n\n\\begin{align}R_i^\\alpha(n) = \\int f_n(r) \\rho_i(r) dr\\end{align}\n\nThe density $\\rho_i^\\alpha(r)$ associated with all neighbors of species\n$\\alpha$ of the atom $i$ (each neighbor is replaced with a Gaussian function\ncentered on the neighbor $g(r_{ij})$) is defined as:\n\n\\begin{align}\\rho_i^\\alpha(r) = \\sum_{j \\in \\text{ neighborhood of i }} g(r_{ij})\n        \\delta_{\\alpha_j,\\alpha}\\end{align}\n\n\nThe exact mathematical details above don't matter too much for this tutorial, the main\npoint being that this representation treats atomic species as completely independent,\neffectively using the neighbor species $\\alpha$ for `one-hot encoding`_.\n\n\n.. py:currentmodule:: metatensor\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "import ase\nimport ase.visualize.plot\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nimport metatensor"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We will work on the radial spectrum representation of three molecules in our system:\na carbon monoxide, an oxygen molecule and a nitrogen molecule.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "atoms = ase.Atoms(\n    \"COO2N2\",\n    positions=[(0, 0, 0), (1.2, 0, 0), (0, 6, 0), (1.1, 6, 0), (6, 0, 0), (7.3, 0, 0)],\n)\n\nfig, ax = plt.subplots(figsize=(3, 3))\nase.visualize.plot.plot_atoms(atoms, ax)\nax.set_axis_off()\nplt.show()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Sparsity in ``TensorMap``\n\nThe radial spectrum representation has two keys: ``central_species`` indicating the\nspecies of the central atom (atom $i$ in the equations); and\n``neighbor_type`` indicating the species of the neighboring atoms (atom $j$\nin the equations)\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "radial_spectrum = metatensor.load(\"radial-spectrum.npz\")\n\nprint(radial_spectrum)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "This shows the first level of sparsity in ``TensorMap``: block sparsity.\n\nOut of all possible combinations of ``central_species`` and ``neighbor_type``, some\nare missing such as ``central_species=7, neighbor_type=8``. This is because we are\nusing a spherical cutoff of 2.5 \u00c5, and as such there are no oxygen neighbor atoms\nclose enough to the nitrogen centers. This means that all the corresponding radial\nspectrum coefficients $R_i^\\alpha(n)$ will be zero (since the neighbor density\n$\\rho_i^\\alpha(r)$ is zero everywhere).\n\nInstead of wasting memory space by storing all of these zeros explicitly, we simply\navoid creating the corresponding blocks from the get-go and save a lot of memory!\n\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Let's now look at the block containing the representation for oxygen centers and\ncarbon neighbors:\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "block = radial_spectrum.block(center_type=8, neighbor_type=6)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Naively, this block should contain samples for all oxygen atoms (since\n``center_type=8``); in practice we only have a single sample!\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "print(block.samples)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "There is a second level of sparsity happening here, using a format related to\n[coordinate sparse arrays (COO format)](COO_). Since there is only one oxygen atom\nwith carbon neighbors, we only include this atom in the samples, and the\ndensity/radial spectrum coefficient for all the other oxygen atoms is assumed to be\nzero.\n\n\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Making the data dense again\n\nSometimes, we might have to use data in a sparse metatensor format with code that does\nnot understands this sparsity. One solution is to convert the data to a dense format,\nmaking the zeros explicit as much as possible. Metatensor provides functionalities to\nconvert sparse data to a dense format for the keys sparsity; and metadata to convert\nto a dense format for sample sparsity.\n\nFirst, the sample sparsity can be removed block by block by creating a new array full\nof zeros, and copying the data according to the indices in ``block.samples``\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "dense_block_data = np.zeros((len(atoms), block.values.shape[1]))\n\n# only copy the non-zero data stored in the block\ndense_block_data[block.samples[\"atom\"]] = block.values\n\nprint(dense_block_data)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Alternatively, we can undo the keys sparsity with\n:py:meth:`TensorMap.keys_to_samples` and :py:meth:`TensorMap.keys_to_properties`,\nwhich merge multiple blocks along the samples or properties dimensions respectively.\n\nWhich one of these functions to call will depend on the data you are handling.\nTypically, one-hot encoding (the ``neighbor_types`` key here) should be merged\nalong the properties dimension; and keys that define subsets of the samples\n(``center_type``) should be merged along the samples dimension.\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "dense_radial_spectrum = radial_spectrum.keys_to_samples(\"center_type\")\ndense_radial_spectrum = dense_radial_spectrum.keys_to_properties(\"neighbor_type\")"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "After calling these two functions, we now have a :py:class:`TensorMap` with a single\nblock and no keys:\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "print(dense_radial_spectrum)\n\nblock = dense_radial_spectrum.block()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "We can see that the resulting dense data array contains a lot of zeros (and has a well\ndefined block-sparse structure):\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "with np.printoptions(precision=3):\n    print(block.values)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "And using the metadata attached to the block, we can understand which part of the data\nis zero and why. For example, the lower-right corner of the array corresponds to\nnitrogen atoms (the last two samples):\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "print(block.samples.print(max_entries=-1))"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "And these two bottom rows are zero everywhere, except in the part representing the\nnitrogen neighbor density:\n\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "print(block.properties.print(max_entries=-1))"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.12.7"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}