Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ title: Tstrait manual
author: Tskit Developers
copyright: "2023"
only_build_toc_files: true
favicon: favicon.ico

execute:
execute_notebooks: cache
Expand All @@ -22,14 +23,9 @@ html:
use_repository_button: true
use_edit_page_button: true

extra_footer: |
<div>
tstrait __TSTRAIT_VERSION__
</div>

sphinx:
extra_extensions:
- numpydoc
- sphinx_copybutton
- sphinx_design
- sphinx.ext.autodoc
Expand All @@ -38,9 +34,17 @@ sphinx:
- sphinx.ext.viewcode
- sphinx.ext.intersphinx
- sphinx_issues
- IPython.sphinxext.ipython_console_highlighting

config:
html_theme: sphinx_book_theme
html_theme_options:
pygments_dark_style: monokai
navigation_with_keys: false
logo:
text: |
tstrait<br/>
version __TSTRAIT_VERSION__
myst_enable_extensions:
- colon_fence
- deflist
Expand Down
Binary file added docs/favicon.ico
Binary file not shown.
141 changes: 62 additions & 79 deletions tstrait/genetic_value.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,11 @@ def _accumulate_individual_values(
class _GeneticValue:
"""GeneticValue class to compute genetic values of individuals.

Parameters
----------
ts : tskit.TreeSequence
Tree sequence data with mutation
trait_df : pandas.DataFrame
Dataframe that includes causal site ID, causal allele, simulated effect
size, and trait ID.
:param ts: Tree sequence data with mutation
:type ts: tskit.TreeSequence
:param trait_df: Dataframe that includes causal site ID, causal allele,
simulated effect size, and trait ID.
:type trait_df: pandas.DataFrame
"""

def __init__(self, ts, trait_df):
Expand Down Expand Up @@ -102,10 +100,8 @@ def _individual_genetic_values(self, tree, site, causal_allele, effect_size):
def _run(self):
"""Computes genetic values of individuals.

Returns
-------
pandas.DataFrame
Dataframe with genetic value, individual ID, and trait ID.
:returns: Dataframe with genetic value, individual ID, and trait ID.
:rtype: pandas.DataFrame
"""

num_ind = self.ts.num_individuals
Expand Down Expand Up @@ -139,56 +135,53 @@ def genetic_value(ts, trait_df):
"""
Obtains genetic value from a trait dataframe.

Parameters
----------
ts : tskit.TreeSequence
The tree sequence data that will be used in the quantitative trait
:param ts: The tree sequence data that will be used in the quantitative trait
simulation.
trait_df : pandas.DataFrame
Trait dataframe.

Returns
-------
pandas.DataFrame
Pandas dataframe that includes genetic value of individuals in the
:type ts: tskit.TreeSequence
:param trait_df: Trait dataframe.
:type trait_df: pandas.DataFrame
:returns: Pandas dataframe that includes genetic value of individuals in the
tree sequence.
:rtype: pandas.DataFrame

.. seealso::
:func:`trait_model` Return a trait model, which can be used as `model` input.

:func:`sim_trait` Return a trait dataframe, which can be used as a
`trait_df` input.

See Also
--------
trait_model : Return a trait model, which can be used as `model` input.
sim_trait : Return a trait dataframe, whch can be used as a `trait_df` input.
sim_env : Genetic value dataframe output can be used as an input to simulate
environmental noise.
:func:`sim_env` Genetic value dataframe output can be used as an input
to simulate environmental noise.

Notes
-----
The `trait_df` input has some requirements that will be noted below.
.. note::
The `trait_df` input has some requirements that will be noted below.

1. Columns
1. Columns

The following columns must be included in `trait_df`:
The following columns must be included in `trait_df`:

* **site_id**: Site IDs that have causal allele.
* **effect_size**: Simulated effect size of causal allele.
* **causal_allele**: Causal allele.
* **trait_id**: Trait ID.
* **site_id**: Site IDs that have causal allele.
* **effect_size**: Simulated effect size of causal allele.
* **causal_allele**: Causal allele.
* **trait_id**: Trait ID.

2. Data requirements
2. Data requirements

* Site IDs in **site_id** column must be sorted in an ascending order. Please
refer to :py:meth:`pandas.DataFrame.sort_values` for details on sorting
values in a :class:`pandas.DataFrame`.
* Site IDs in **site_id** column must be sorted in an ascending order. Please
refer to :py:meth:`pandas.DataFrame.sort_values` for details on sorting
values in a :class:`pandas.DataFrame`.

* Trait IDs in **trait_id** column must start from zero and be consecutive.
* Trait IDs in **trait_id** column must start from zero and be consecutive.

The genetic value dataframe contains the following columns:
The genetic value dataframe contains the following columns:

* **trait_id**: Trait ID.
* **individual_id**: Individual ID inside the tree sequence input.
* **genetic_value**: Genetic values that are obtained from the trait dataframe.
* **trait_id**: Trait ID.
* **individual_id**: Individual ID inside the tree sequence input.
* **genetic_value**: Genetic values that are obtained from the trait
dataframe.

.. rubric:: Examples

Examples
--------
See :ref:`genetic_value` for worked examples.
"""

Expand Down Expand Up @@ -217,44 +210,34 @@ def genetic_value(ts, trait_df):
def normalise_genetic_value(genetic_df, mean=0, var=1, ddof=1):
"""Normalise genetic value dataframe.

Parameters
----------
genetic_df : pandas.DataFrame
Genetic value dataframe.
mean : float, default 0
Mean of the resulting genetic value.
var : float, default 1
Variance of the resulting genetic value.
ddof : int, default 1
Delta degrees of freedom. The divisor used in computing the variance
:param genetic_df: Genetic value dataframe.
:type genetic_df: pandas.DataFrame
:param mean: Mean of the resulting genetic value.
:type mean: float
:param var: Variance of the resulting genetic value.
:type var: float
:param ddof: Delta degrees of freedom. The divisor used in computing the variance
is N - ddof, where N represents the number of elements.
:type ddof: int
:returns: Dataframe with normalised genetic value.
:rtype: pandas.DataFrame
:raises ValueError: If `var` <= 0.

Returns
-------
pandas.DataFrame
Dataframe with normalised genetic value.

Raises
------
ValueError
If `var` <= 0.
.. note::
The following columns must be included in `genetic_df`:

Notes
-----
The following columns must be included in `genetic_df`:
* **trait_id**: Trait ID.
* **individual_id**: Individual ID inside the tree sequence input.
* **genetic_value**: Simulated genetic values.

* **trait_id**: Trait ID.
* **individual_id**: Individual ID inside the tree sequence input.
* **genetic_value**: Simulated genetic values.
The dataframe output has the following columns:

The dataframe output has the following columns:
* **trait_id**: Trait ID.
* **individual_id**: Individual ID inside the tree sequence input.
* **genetic_value**: Normalised genetic values.

* **trait_id**: Trait ID.
* **individual_id**: Individual ID inside the tree sequence input.
* **genetic_value**: Normalised genetic values.
.. rubric:: Examples

Examples
--------
See :ref:`normalise_genetic_value` section for worked examples.
"""
if var <= 0:
Expand Down
133 changes: 60 additions & 73 deletions tstrait/simulate_effect_size.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,10 @@ class _FreqResult:
"""
Data class that contains simulated effect size and allele frequency.

Attributes
----------
beta_array : numpy.array
Numpy array that includes simulated effect size.
allele_freq : numpy.array
Allele frequency of each causal mutation.
:ivar beta_array: Numpy array that includes simulated effect size.
:vartype beta_array: numpy.array
:ivar allele_freq: Allele frequency of each causal mutation.
:vartype allele_freq: numpy.array
"""

beta_array: np.array
Expand All @@ -31,18 +29,16 @@ class _TraitSimulator:
"""Simulator class to select causal alleles and simulate effect sizes of causal
mutations.

Parameters
----------
ts : tskit.TreeSequence
Tree sequence data with mutation.
causal_sites : list
List of causal site IDs.
model : TraitModel
Trait model that will be used to simulate effect sizes.
alpha : float
Parameter that determines the degree of the frequency dependence model.
rng : numpy.random.Generator
Generator object that will be used to generate random numbers.
:param ts: Tree sequence data with mutation.
:type ts: tskit.TreeSequence
:param causal_sites: List of causal site IDs.
:type causal_sites: list
:param model: Trait model that will be used to simulate effect sizes.
:type model: TraitModel
:param alpha: Parameter that determines the degree of the frequency dependence model.
:type alpha: float
:param rng: Generator object that will be used to generate random numbers.
:type rng: numpy.random.Generator
"""

def __init__(self, ts, causal_sites, model, alpha, rng):
Expand Down Expand Up @@ -169,62 +165,53 @@ def sim_trait(
"""
Simulates traits.

Parameters
----------
ts : tskit.TreeSequence
The tree sequence data that will be used in the quantitative trait
:param ts: The tree sequence data that will be used in the quantitative trait
simulation.
model : tstrait.TraitModel
Trait model that will be used to simulate effect sizes.
num_causal : int, default None
Number of causal sites that will be randomly selected . If both `num_causal` and
`causal_sites` are None, number of causal sites will be 1.
causal_sites : list, default None
List of site IDs that have causal allele. If None, causal site IDs will be
chosen randomly according to `num_causal`.
alpha : float, default None
Parameter that determines the degree of the frequency dependence model. Please
see :ref:`frequency_dependence` for details on how this parameter influences
effect size simulation. If None, alpha will be 0.
random_seed : int, default None
Random seed of simulation. If None, simulation will be conducted randomly.

Returns
-------
pandas.DataFrame
Trait dataframe that includes simulated effect sizes.

Raises
------
ValueError
If the number of mutations in `ts` is smaller than `num_causal`.
ValueError
If both `num_causal` and `causal_sites` are specified.
ValueError
If there are repeated values in `causal_sites`.

See Also
--------
trait_model : Return a trait model, which can be used as `model` input.
genetic_value : The trait dataframe output can be used as an input to obtain
genetic values.

Notes
-----
The simulation output is given as a :py:class:`pandas.DataFrame` and contains the
following columns:

* **position**: Position of sites that have causal allele in genome coordinates.
* **site_id**: Site IDs that have causal allele. The output dataframe has sorted
site IDs.
* **effect_size**: Simulated effect size of causal allele.
* **causal_allele**: Causal allele.
* **allele_freq**: Allele frequency of causal allele. It is described in detail
in :ref:`trait_frequency_dependence`.
* **trait_id**: Trait ID.

Examples
--------
:type ts: tskit.TreeSequence
:param model: Trait model that will be used to simulate effect sizes.
:type model: tstrait.TraitModel
:param num_causal: Number of causal sites that will be randomly selected.
If both `num_causal` and `causal_sites` are None, number of causal sites
will be 1.
:type num_causal: int
:param causal_sites: List of site IDs that have causal allele. If None,
causal site IDs will be chosen randomly according to `num_causal`.
:type causal_sites: list
:param alpha: Parameter that determines the degree of the frequency
dependence model. Please see :ref:`frequency_dependence` for details on how
this parameter influences effect size simulation. If None, alpha will be 0.
:type alpha: float
:param random_seed: Random seed of simulation. If None, simulation will be
conducted randomly.
:type random_seed: int
:returns: Trait dataframe that includes simulated effect sizes.
:rtype: pandas.DataFrame
:raises ValueError: If the number of mutations in `ts` is smaller than `num_causal`.
:raises ValueError: If both `num_causal` and `causal_sites` are specified.
:raises ValueError: If there are repeated values in `causal_sites`.

.. seealso::
:func:`trait_model` Return a trait model, which can be used as `model` input.

:func:`genetic_value` The trait dataframe output can be used as an input
to obtain genetic values.

.. note::
The simulation output is given as a :py:class:`pandas.DataFrame` and contains the
following columns:

* **position**: Position of sites that have causal allele in genome
coordinates.
* **site_id**: Site IDs that have causal allele. The output dataframe
has sorted site IDs.
* **effect_size**: Simulated effect size of causal allele.
* **causal_allele**: Causal allele.
* **allele_freq**: Allele frequency of causal allele. It is described
in detail in :ref:`trait_frequency_dependence`.
* **trait_id**: Trait ID.

.. rubric:: Examples

See :ref:`sim_trait` for worked examples.
"""
ts = _check_instance(ts, "ts", tskit.TreeSequence)
Expand Down
Loading