synthesizer-project
diff --git a/‎docs/source/library_gen/basic_library_generation.ipynb‎
Lines changed: 47 additions & 32 deletions b/‎docs/source/library_gen/basic_library_generation.ipynb‎
Lines changed: 47 additions & 32 deletions
@@ -7,9 +7,11 @@
    "source": [
     "# Basic Library Generation\n",
     "\n",
-    "Now that we have a basic understanding of how to use Synthesizer to create a galaxy object, we can move on to generating a library of galaxies. This is done using the `GalaxyBasis` class, which takes a set of parameters and generates a library of galaxies based on those parameters.\n",
+    "Now that we have a basic understanding of how to use Synthesizer to create a galaxy object, we can move on to generating a library of galaxies. This is done using the `GalaxyBasis` class within Synference, which takes a set of parameters and generates a library of galaxies based on those parameters.\n",
     "\n",
-    "In the most simple use-case, we define outside of the GalaxyBasis object the parameter space we want to sample, and then define the `GalaxyBasis` object with these arrays. We provide a helper function to generate these arrays for us, but you have complete freedom to define more complex parameter spaces if you wish."
+    "In the simplest scenario, you define the parameter space you want to sample outside of the `GalaxyBasis` object. You then initialize the `GalaxyBasis` using these arrays.\n",
+    "\n",
+    "We provide helper functions to generate these arrays for you, but you have complete flexibility to define your own, more complex parameter spaces if you wish."
    ]
   },
   {
@@ -40,11 +42,13 @@
    "id": "11b7cf83",
    "metadata": {},
    "source": [
-    "These arrays can be generated using the `draw_from_hypercube` function, which takes as input the number of samples you want to draw, and the ranges for each parameter. The ranges are defined as a dictionary of tuples, where they key is the parameter range and each tuple contains the minimum and maximum value for that parameter. The function will then return a dictionary of arrays, where each array contains the values for that parameter.\n",
+    "These arrays can be generated using the `draw_from_hypercube` function, which takes as input the number of samples you want to draw, and a dictionary of parameter ranges. In this dictionary, the key is the parameter range, and the value is a tuple defining the minimum and maximum value for that parameter. The function will then return a dictionary of arrays, where each array contains the sampled values for each parameter. \n",
+    "\n",
+    "By default, the `draw_from_hypercube` function uses a form of low-discrepancy sampling called **Latin Hypercube Sampling (LHS)**. LHS is highly efficient compared to random sampling, especially when dealing with many parameters, as it ensures that the parameter space is sampled more evenly.\n",
     "\n",
-    "The `draw_from_hypercube` function by default uses a form of low-discrepancy sampling called Latin Hypercube Sampling (LHS), which is a more efficient way of sampling a multi-dimensional space than random sampling. This is particularly useful when the number of parameters is large, as it ensures that the parameter space is sampled more evenly.\n",
+    "The dictionary we must define encapsulates our prior knowledge about the parameter space we intend to sample. \n",
     "\n",
-    "The dictionary we must define will encapsulate our prior knowledge of the parameter space we wish to sample. For this example, we will define a parameter space with two parameters: redshift and log_stellar_mass. The default option is uniform sampling between the upper and lower bounds provided, but you can draw from any distribution you like by defining the arrays yourself.\n"
+    "For this example, we will define a parameter space with two parameters: `redshift` and `log_stellar_mass`. The default sampling method is uniform sampling between the upper and lower bounds provided, but you can draw from any distribution you like by defining the arrays yourself.\n"
    ]
   },
   {
@@ -64,7 +68,7 @@
    "id": "29174ec4",
    "metadata": {},
    "source": [
-    "Above we have defined a parameter space with two parameters: redshift and log_stellar_mass. We have defined the ranges for these parameters as (0.0, 5.0) and (8.0, 12.0) respectively. We then use the `draw_from_hypercube` function to draw 100 samples from this parameter space, which returns a dictionary of arrays containing the values for each parameter."
+    "Above we have defined a parameter space with two parameters: `redshift` and `log_stellar_mass`. We have defined the ranges for these parameters as (0.0, 5.0) and (8.0, 12.0) respectively. We then use the `draw_from_hypercube` function to draw 100 samples from this parameter space, which returns a dictionary of arrays containing the values for each parameter."
    ]
   },
   {
@@ -92,23 +96,26 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.scatter(param_grid[\"redshift\"], param_grid[\"log_stellar_mass\"])"
+    "plt.scatter(param_grid[\"redshift\"], param_grid[\"log_stellar_mass\"])\n",
+    "plt.xlabel(\"redshift\")\n",
+    "plt.ylabel(\"log_stellar_mass\")"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "b9bb9564",
    "metadata": {},
    "source": [
-    "For our basic model for this tutorial we will define a 6 dimensional parameter space, with the following parameters:\n",
-    " - redshift - the redshift of the galaxy\n",
-    " - log_stellar_mass - the logarithm (base 10) of the stellar mass in solar masses\n",
-    " - log_zmet - the logarithm (base 10) of the stellar metallicity\n",
-    " - peak_age - the age of the peak of the log-normal SFH\n",
-    " - tau - the width of the log-normal SFH\n",
-    " - tau_v - V-band optical depth\n",
+    "For this basic tutorial, we will define a 6 dimensional parameter space four our library of galaxies. We will draw 10,000 samples—a size chosen for quick demonstration, though a real, useful library would require significantly more samples—to construct our synthetic galaxy population.\n",
     "\n",
-    "We will draw 10,000 samples from this parameter space, and use these to generate a library of galaxies. This isn't a large enough sample size for a useful library, but it will run quickly for this demonstration."
+    "Our six parameters are:\n",
+    "\n",
+    " - `redshift`: The redshift of the galaxy\n",
+    " - `log_stellar_mass`: The logarithm (base 10) of the stellar mass in units of solar mass\n",
+    " - `log_zmet`: The logarithm (base 10) of the stellar metallicity\n",
+    " - `peak_age`: The age corresponding to the peak of the log-normal star formation history (SFH)\n",
+    " - `tau`: The dimensionless width of the log-normal SFH\n",
+    " - `tau_v`: The V-band optical depth"
    ]
   },
   {
@@ -159,9 +166,11 @@
    "id": "81f62cd7",
    "metadata": {},
    "source": [
-    "The way the `generate_sfh_basis` function works is that we give it the desired SFH class (e.g. `SFH.LogNormal`), and then provide it with the parameters required to instantiate that class. The function will then return an array of `SFH` instances, each with the parameters drawn from the arrays we provide. We also provide parameter units as unyt Units, which ensures that the parameters are correctly interpreted by the `SFH` class. \n",
+    "The `generate_sfh_basis` function is used to generate an array of SFH objects from our sampled parameter arrays.\n",
+    "\n",
+    "To do this, we pass the function the desired SFH class (e.g. `SFH.LogNormal`) and the dictionary of sampled arrays. The function then returns an array of instantiated `SFH` instances, where the paramaters for each object are drawn from the arrays we provided. We also require that you provide parameter units using `unyt` Units (e.g. `Myr` for age), which ensures that the parameters are correctly interpreted by the `SFH` class. \n",
     "\n",
-    "Optionally we can define a redshift dependent star formation history, and provide a parameter which will be scaled by the available lookback time at the redshift of the galaxy. This is useful for parameters which are physically constrained by the age of the universe, such as the peak age of a log-normal SFH. "
+    "Optionally, we can define a redshift dependent star formation history. To this, designate a parameter which will be scaled by the available lookback time at the redshift of the galaxy. This is useful for parameters which are physically constrained by the age of the universe, such as the peak age of a log-normal SFH. "
    ]
   },
   {
@@ -197,7 +206,7 @@
    "source": [
     "## Generating Metallicity Distributions\n",
     "\n",
-    "In the same manner that we can generate a library of star formation histories, we can also generate a library of metallicity histories. These are simpler to generate, so we don't provide a helper function for this, but it is still straightforward to do manually. "
+    "In the same manner that we can generate a library of star formation histories, we can also generate a library of metallicity histories. While we don't provide a helper function for this, these are simpler to generate:"
    ]
   },
   {
@@ -222,7 +231,7 @@
    "id": "230c6626",
    "metadata": {},
    "source": [
-    "All the above code does is loop over the metallicities drawn from the prior samples, and generate a metallicity class instance for each, and append it to a list."
+    "The above code loops over the metallicities drawn from the prior samples. For each metallicity value, it instantiates the metallicity class, and append the result to a list."
    ]
   },
   {
@@ -312,9 +321,11 @@
    "source": [
     "## Instruments and Filters\n",
     "\n",
-    "Finally, the last thing we have to define is what we actually want to observe. By default we would just get the rest-frame and observed-frame spectra on the wavelengths of the `Grid`, but what if we want observations in specific photometric filters, or matched to the resolution of a specfic spectroscopic instrument? We must create and define an `Instrument`, which contains this information.\n",
+    "Finally, the last thing we have to define is **what** we actually want to observe. We could simply call `get_spectra` to get the rest-frame and observed-frame spectra on the wavelengths of the `Grid`.\n",
     "\n",
-    "Here we will load a predefined NIRCam instrument containing the wideband photometric filters."
+    "But what if we want observations in specific photometric filters, or matched to the resolution of a specfic spectroscopic instrument? We must create and define an `Instrument`, which contains this information.\n",
+    "\n",
+    "Here we will load a predefined NIRCam instrument containing its wideband photometric filters."
    ]
   },
   {
@@ -338,7 +349,11 @@
     "\n",
     "Now that we have all the constituent components of our model we can finally put it all together and generate our library of observables. Firstly we simply instantiate a `GalaxyBasis` object, passing in the various components we have created above.\n",
     "\n",
-    "We also have to set a `galaxy_params` dictionary, which contains any arguments we wish to set on the individual galaxies. In this case, it is just the optical depth `tau_v`, but there could be a lot of emission model parameters which are set on individual galaxy, star or black hole instances. We are also explicitly telling the code to ignore the 'max_age' parameter, which will vary between galaxies but is simply a defined transformation of the redshift, and is hence a redundant parameter.\n",
+    "We must also define a `galaxy_params` dictionary, which contains any arguments we wish to set on the individual `Galaxy` objects, rather than globally on the emission model. \n",
+    "\n",
+    "In this case, we set the V-band optical depth (`tau_v`) for each galaxy. However, this dictionary can include any number of emission model parameters intended to vary per galaxy, star or black hole instance. \n",
+    "\n",
+    "We are also explicitly telling the code to ignore the `max_age` parameter. While `max_age` varies per galaxy, it is redundant because it is already a defined transformation of the redshift, meaning its value can be calculated from another parameter already present in our set.\n",
     "\n",
     "We give this model a name, 'testing_model', which will be used when saving the output files."
    ]
@@ -371,17 +386,17 @@
    "id": "f6383be2",
    "metadata": {},
    "source": [
-    "Now that we have created an instance of the `GalaxyBasis`, we can run the grid creation using the `create_mock_library` method. Here we set the output model name, which emission model key to generate the grid from (by default 'total'), as well as setting optional configuration parameters. We can see for our above emission model that the root key is 'emergent', so we will set that here. We could also change the output folder - it will default to the internal libraries/ folder where the synference code is installed.\n",
+    "Now that we have created an instance of the `GalaxyBasis`, we can run the grid creation using the `create_mock_library` method. Here we set the output model name, which emission model key to generate the grid from (by default `total`), as well as setting optional configuration parameters. We can see for our above emission model that the root key is `emergent`, so we will set that here. We could also change the output folder, which will otherwise default to the internal 'libraries/' folder where the synference code is installed.\n",
     "\n",
-    "The `create_mock_library` method uses the Synthesizer packages `Pipeline` functionality, which batches the galaxies and allows parallel execution if Synthesizer has been installed with OpenMP support (see the Synthesizer installaion documentaiton for more information.) This speeds up galaxy creation, and allows batching across nodes and cores on a HPC for generating very large libraries."
+    "The `create_mock_library` method utilizes Synthesizer's built-in `Pipeline` functionality, which handles processing the galaxies in batches and supports parallel execution if Synthesizer has been installed with OpenMP support (see the Synthesizer installation documentation for more information). This significantly speeds up galaxy library creation, making it scalable across cores and nodes for generating very large samples."
    ]
   },
   {
    "cell_type": "markdown",
    "id": "2aba55cb",
    "metadata": {},
    "source": [
-    "We also save the parameter transforms the 'max_age' parameter, which is a derived parameter based on the redshift of the galaxy. This is useful for later analysis so we can easily reconstruct the original parameter space."
+    "Before we make our mock library, we also choose to save the `max_age` parameter. This is a derived parameter based on the redshift of the galaxy and is useful for later analysis so that we can easily reconstruct the original parameter space."
    ]
   },
   {
@@ -441,7 +456,7 @@
     "\n",
     "There are numerous packages for inspecting HDF5 files, including the `h5py` package, which you have installed if you have run the code to this point without crashing. For more visual views, we recommend [H5Web](https://h5web.panosc.eu/), which has a VS Code extension, or the command line interface `h5forest`, which you can find on Github [here](https://github.com/WillJRoper/h5forest).\n",
     "\n",
-    "Below we are printing some details of our saved dataset using `h5py`. "
+    "Below we use the `h5py` library to inspect the structure and contents of our saved dataset:"
    ]
   },
   {
@@ -471,12 +486,12 @@
    "id": "2e41a8e6",
    "metadata": {},
    "source": [
-    "We can see that we have:\n",
-    "- an array called parameters, with our 6 parameters (mass, redshift, tau_v, metallicity, peak_age, tau) and 1000 draws from the prior.\n",
-    "- an array called photometry, with 8 photometric fluxes (for the 8 NIRCam widebands) for each of the 1000 draws,\n",
-    "- an empty supplementary parameters array. This would be used to store optional derived quantities such as star formation rates, or the surviving stellar mass. We didn't set any of these, so it is empty.\n",
-    "- a Model Group, which stores information about the emission model and instrument used. This lets us recreate the emission model and instrument later if we need to.\n",
+    "Our HDF5 file contains the following key components:\n",
     "\n",
+    "- Parameters array: Contains the input samples for all 6 parameters we defined for our prior (mass, redshift, tau_v, metallicity, peak_age, tau) with 100 draws from the prior for each parameter\n",
+    "- Photometry array: Our synthetic observables consisting of 8 photometric fluxes (corresponding to the 8 NIRCam widebands) calculated for each of the 100 galaxy draws\n",
+    "- Supplementary parameters array: An empty supplementary parameters array in this case, designed to store optional derived quantities such as star formation rates, or the surviving stellar mass\n",
+    "- Model Group: This stores information about the emission model and instrument used, allowing us recreate the emission model and instrument later if we need to\n",
     "\n",
     "It's worth noting that the only required arrays are the 'parameters' and 'photometry' datasets. So you can entirely avoid using Synthesizer and build models externally using your code and method of choice, as long as you can produde a HDF5 array with the same simple format you will be able to use the SBI functioality of synference with your code. Please see the tutorial where we train a model from the outputs of the hydrodynamical simulation SPHINX for an example."
    ]
@@ -488,7 +503,7 @@
    "source": [
     "## Plotting a galaxy from our model\n",
     "\n",
-    "synference has some debug methods to plot specific or random individual galaxy SEDs, photometry and star formation histories - `plot_galaxy` and `plot_random_galaxy`. Below we plot a random galaxy from the model. "
+    "Lastly, Synference has some debug methods to plot specific or random individual galaxy SEDs, photometry and star formation histories - `plot_galaxy` and `plot_random_galaxy`. Below we plot a random galaxy from the model. "
    ]
   },
   {