Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,37 +1,37 @@
# pre-commit run --all-files
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
rev: v5.0.0
hooks:
- id: check-merge-conflict
- id: debug-statements
- id: mixed-line-ending
- id: check-case-conflict
- id: check-yaml
- repo: https://github.com/asottile/reorder_python_imports
rev: v3.10.0
rev: v3.14.0
hooks:
- id: reorder-python-imports
args: [--application-directories=python,
]
- repo: https://github.com/asottile/pyupgrade
rev: v3.10.1
rev: v3.19.1
hooks:
- id: pyupgrade
args: [--py3-plus, --py37-plus]
- repo: https://github.com/psf/black
rev: 23.7.0
rev: 25.1.0
hooks:
- id: black
language_version: python3
- repo: https://github.com/pycqa/flake8
rev: 6.1.0
rev: 7.1.2
hooks:
- id: flake8
args: [--config=.flake8]
additional_dependencies: ["flake8-bugbear==22.10.27", "flake8-builtins==2.0.1"]
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.16.0
rev: 1.19.1
hooks:
- id: blacken-docs
args: [--skip-errors]
Expand Down
12 changes: 4 additions & 8 deletions tstrait/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
"""
tstrait
=======

tstrait is a quantitative trait simulator of a tree sequence data.

See https://tskit.dev/ for complete documentation.
"""
# tstrait
# =======
# tstrait is a quantitative trait simulator of a tree sequence data.
# See https://tskit.dev/ for complete documentation.
from .provenance import __version__ # NOQA
from .simulate_effect_size import (
sim_trait,
Expand Down
104 changes: 45 additions & 59 deletions verification.py
Original file line number Diff line number Diff line change
@@ -1,62 +1,48 @@
"""
Script to automate verification of tstrait against known statistical
results and benchmark programs such as AlphaSimR and simplePHENOTYPES.

We have conducted the following tests:

1. Exact tests
We simulated effect sizes and phenotypes by using AlphaSimR, simplePHENOTYPES
and the simulation framework described in ARG-Needle paper, and used the
simulated effect sizes in tstrait to simulate phenotypes, while setting the
environmental noise to be zero in all simulations. We then tested if the
simulated phenotypes in tstrait exactly match the simulated phenotypes
of external programs.

This test aims to examine whether tstrait can correctly use the genetic information
of individuals to accurately compute the genetic values. We have validated the
tstrait's output for single trait simulation and pleiotropic trait simulation.
These tests are implemented in `ExactTest` class.

2. Comparison tests
We simulated phenotypes in AlphaSimR, simplePHENOTYPES and the simulation
framework described in ARG-Needle paper by using the same parameters as the
tstrait simulation. We have simulated traits for a single individual in the
tree sequence multiple times and examined if their phenotype distributions match
by using a QQ-plot.

This test serves as an end to end testing of tstrait with environmental noise
simulation and tries to examine if the statistical properties of the
simulated traits matches the output of different simulation packages.
We have examined the tstrait output for different values of
heritability and the alpha parameter that is used in the frequency dependence
architecture. These tests are implemented in ComparisonTest.

3. Statistical tests
We have examined the statistical properties of tstrait's simulation output.
The tests in `EffectSizeDistribution` examine the statistical properties of
simulated effect sizes and the tests in `EnvironmentalNoise` examine the
simulated environmental noise.

NOTE: The properties of tstrait's simulation algorithm (such as whether it
can correctly detect mutations in a tree sequence) are validated in unit tests.

THe differences between each simulators are highlighted as the following:
1. simplePHENOTYPES
- Effect sizes can only be simulated from geometric series, so a normal
distribution must be specified if we want to simulate traits where effect
sizes are drawn from a normal distribution
- Ancestral state is set as a causal state in simplePHENOTYPES

2. AlphaSimR
- Genetic values are normalized in the simulation process

3. Simulation framework in ARG-Needle paper
- We assume that all sites are causal

These codes are largely adapted from msprime/verification.py. Please
see its documentation for usage.

"""
# Script to automate verification of tstrait against known statistical
# results and benchmark programs such as AlphaSimR and simplePHENOTYPES.
# We have conducted the following tests:
# 1. Exact tests
# We simulated effect sizes and phenotypes by using AlphaSimR, simplePHENOTYPES
# and the simulation framework described in ARG-Needle paper, and used the
# simulated effect sizes in tstrait to simulate phenotypes, while setting the
# environmental noise to be zero in all simulations. We then tested if the
# simulated phenotypes in tstrait exactly match the simulated phenotypes
# of external programs.
# This test aims to examine whether tstrait can correctly use the genetic information
# of individuals to accurately compute the genetic values. We have validated the
# tstrait's output for single trait simulation and pleiotropic trait simulation.
# These tests are implemented in `ExactTest` class.
# 2. Comparison tests
# We simulated phenotypes in AlphaSimR, simplePHENOTYPES and the simulation
# framework described in ARG-Needle paper by using the same parameters as the
# tstrait simulation. We have simulated traits for a single individual in the
# tree sequence multiple times and examined if their phenotype distributions match
# by using a QQ-plot.
# This test serves as an end to end testing of tstrait with environmental noise
# simulation and tries to examine if the statistical properties of the
# simulated traits matches the output of different simulation packages.
# We have examined the tstrait output for different values of
# heritability and the alpha parameter that is used in the frequency dependence
# architecture. These tests are implemented in ComparisonTest.
# 3. Statistical tests
# We have examined the statistical properties of tstrait's simulation output.
# The tests in `EffectSizeDistribution` examine the statistical properties of
# simulated effect sizes and the tests in `EnvironmentalNoise` examine the
# simulated environmental noise.
# NOTE: The properties of tstrait's simulation algorithm (such as whether it
# can correctly detect mutations in a tree sequence) are validated in unit tests.
# THe differences between each simulators are highlighted as the following:
# 1. simplePHENOTYPES
# - Effect sizes can only be simulated from geometric series, so a normal
# distribution must be specified if we want to simulate traits where effect
# sizes are drawn from a normal distribution
# - Ancestral state is set as a causal state in simplePHENOTYPES
# 2. AlphaSimR
# - Genetic values are normalized in the simulation process
# 3. Simulation framework in ARG-Needle paper
# - We assume that all sites are causal
# These codes are largely adapted from msprime/verification.py. Please
# see its documentation for usage.
import argparse
import concurrent.futures
import inspect
Expand Down