diff --git a/docs/stats.md b/docs/stats.md
index 87a2ed5f83..700a0268e2 100644
--- a/docs/stats.md
+++ b/docs/stats.md
@@ -684,14 +684,319 @@ and boolean expressions (e.g., {math}`(x > 0)`) are interpreted as 0/1.
   and {math}`v_j` is the covariance of the trait with the j-th covariate.
 
 
+(sec_stats_multi_site)=
+
 ## Multi site statistics
 
-:::{todo}
-Document statistics that use information about correlation between sites, such as
-LdCalculator (and perhaps reference {ref}`sec_identity`). Note that if we have a general
-framework which has the same calling conventions as the single site stats,
-we can rework the sections above.
-:::
+(sec_stats_two_locus)=
+
+### Two-locus statistics
+
+The {meth}`~TreeSequence.ld_matrix` method provides an interface to
+a collection of two-locus statistics with predefined summary functions (see
+{ref}`sec_stats_two_locus_summary_functions`) and `site` and `branch`
+{ref}`modes <sec_stats_mode>`. The LD matrix method differs from other
+statistics methods in that it provides a unified API with an argument to
+specify different two-locus summaries of the data. It otherwise behaves
+similarly to most other functions with respect to `sample_sets` and `indexes`.
+
+Two-locus statistics can be computed using two modes, either `site` or
+`branch`, and these should be interpreted in the same way as these modes in the
+single-site statistics. Within this framework, statistics may be either
+polarised or unpolarised. For statistics that are polarised, we compute
+statistic values for pairs of derived alleles. Unpolarised statistics compute
+statistics over all possible alleles, derived and ancestral, and the result in
+averaged over statistics computed for all pairs of alleles (see weighting
+schemes below). The option for polarisation is not exposed to the user.
+Instead, we implement statistics that are polarised where appropriate.
+
+(sec_stats_two_locus_site)=
+
+#### Site
+
+The `site` mode computes two-locus statistics summarized over alleles between
+all pairs of specified sites. The default behavior, leaving `sites`
+unspecified, will compute a matrix for all pairs of sites, with
+one row and column for each site in the tree sequence (i.e., an n x n
+matrix where n is the number of sites in the tree sequence). We can also
+restrict the output to a subset of sites, either by specifying a single vector
+for both rows and columns or a pair of vectors for the row sites and column
+sites separately.
+
+The following computes a matrix of the {math}`r^2` measure of linkage
+disequilibrium (LD) computed pairwise between the first 4 sites in the tree
+sequence among all samples. In our computations, row sites are used as the row
+("left-hand") loci and column sites are used as the column ("right-hand")
+locus, and with a single list of sites specified, we obtain a symmetric square
+matrix. 
+
+```{code-cell} ipython3
+ld = ts.ld_matrix(sites=[[0, 1, 2, 3]])
+print(ld)
+```
+
+The following demonstrates how we can specify the row and column sites
+independently of each other. We're specifying 3 columns and 2 rows, which
+computes a subset of the matrix shown above.
+
+```{code-cell} ipython3
+ld = ts.ld_matrix(sites=[[0, 1], [1, 2, 3]])
+print(ld)
+```
+
+Because we allow for two-locus statistics to be computed for multi-allelic
+data, we need to be able to combine statistical results from each pair of
+alleles into one summary for a pair of sites. We use two implementations for
+combining results from multiple alleles: `hap_weighted` and `total_weighted`.
+These are statistic-specific and not chosen by the user.
+
+Briefly, consider a pair of sites with {math}`n` alleles at the first locus and
+{math}`m` alleles at the second. Write {math}`f_{i,j}` as the statistic
+computed for focal alleles {math}`A_i` and {math}`B_j`, with haplotype weights
+{math}`(A_i B_j, A_i b_j, a_i B_j)`, where {math}`a_i` and {math}`b_j` are the
+collection of alleles that are not the focal alleles {math}`A_i` or
+{math}`B_j`, respectively. Then the weighting schemes are defined as:
+
+- `hap_weighted`: {math}`\sum_{i=1}^{n}\sum_{j=1}^{m}p(A_{i}B_{j})f_{ij}`,
+  where {math}`p(A_{i}B_{j})` is the frequency of haplotype {math}`A_{i}B_{j}`.
+  This method was first introduced in [Karlin
+  (1981)](https://doi.org/10.1111/j.1469-1809.1981.tb00308.x) and reviewed in
+  [Zhao (2007)](https://doi.org/10.1017/S0016672307008634).
+
+- `total_weighted`: {math}`\frac{1}{n m}\sum_{i=1}^{n}\sum_{j=1}^{m}f_{ij}`.
+  This method assigns equal weight to each of the possible pairs of focal
+  alleles at the two sites, taking the arithmetic mean of statistics over
+  focal haplotypes.
+
+Out of all of the available summary functions, only {math}`r^2` uses
+`hap_weighted` normalisation, with the remainder using uniform weighting
+(`total_weighted`).
+
+(sec_stats_two_locus_branch)=
+
+#### Branch
+
+The `branch` mode computes expected two-locus statistics between pairs of
+trees, conditioned on the marginal topologies and branch lengths of those
+trees. The trees for which we compute statistics are specified by positions,
+and for a pair of positions we consider all possible haplotypes that could be
+generated by a single mutation occurring at the two trees.
+
+For two trees, one with {math}`n` branches and the other with {math}`m`
+branches, there are {math}`nm` possible pairs of branches that may carry the
+pair of mutations. For each pair, we compute the two-locus statistic, and then
+sum these values weighted by the product of the two branch lengths. Given the
+two mutations occur, this accounts for the relative probability that the two
+mutations fall on any pair of branches.
+
+The time complexity of this method is quadratic in the number of samples,
+due to the pairwise comparisons of branches from each pair of trees.
+By default, this method computes
+a symmetric matrix for all pairs of trees, with rows and columns representing
+each tree in the tree sequence. Similar to the site method, we can restrict the
+output to a subset of trees, either by specifying a vector of positions or
+a pair of vectors for row and column positions separately. To select a specific
+tree, the specified positions must land in the tree span (`[start, end)`).
+
+In the following, we compute a matrix of expected {math}`r^2` within and
+between the first 4 trees in the tree sequence. The tree breakpoints are
+a convenient way to specify those first four trees.
+
+```{code-cell} ipython3
+ld = ts.ld_matrix(mode="branch", positions=[ts.breakpoints(as_array=True)[0:4]])
+print(ld)
+```
+
+Again, we can specify the row and column trees separately.
+
+```{code-cell} ipython3
+breakpoints = ts.breakpoints(as_array=True)
+ld = ts.ld_matrix(mode="branch", positions=[breakpoints[[0]], breakpoints[0:4]])
+print(ld)
+```
+
+(sec_stats_two_locus_sample_sets)=
+
+#### Sample Sets
+
+The two-locus API provides a mechanism by which to subset the samples under
+consideration, providing the ability to compute a separate LD matrix for each
+sample set or an LD matrix between sample sets. Output dimensions are handled
+in the same manner as the rest of the stats API (see
+{ref}`sec_stats_output_dimensions`). The two-locus API differs from the rest of
+the stats API in that we handle one-way and two-way statistics in the same
+function call.
+
+To compute a two-way two-locus statistic, the `index` argument must be
+provided. The statistics are selected in the same way (with the `stat`
+argument), but we provide a restricted set of two-way statistics (see
+{ref}`sec_stats_two_locus_summary_functions_two_way`). The dimension-dropping
+rules for the result follow the rest of the tskit stats API in that a single list
+or tuple will produce a single two-dimensional matrix, while list of these
+will produce a three-dimensional array, with the outer dimension of length
+equal to the length of the list.
+
+For concreteness, we would expect the following dimensions with the specified
+`sample_sets` and `indexes` arguments.
+
+```
+# one-way
+ts.ld_matrix(sample_sets=None) -> 2 dimensions
+ts.ld_matrix(sample_sets=[0, 1, 2, 3]) -> 2 dimensions
+ts.ld_matrix(sample_sets=[[0, 1, 2, 3]]) -> 3 dimensions
+# two-way
+ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=(0, 1)) -> 2 dimensions
+ts.ld_matrix(sample_sets=[[0, 1, 2, 3], [4, 5, 6, 7]], indexes=[(0, 1)]) -> 3 dimensions
+```
+
+By computing the LD statistic for a subset of samples in the tree sequence, it
+is possible that statistics are computed over pairs of sites that are not
+variable within the sample set. This can result in zero- or nan-valued statistics
+as outputs. To avoid this, the tree sequence may first be simplified to the
+sample subset, which will remove sites that are invariant within that subset:
+
+```{code-cell} ipython3
+ts = msprime.sim_ancestry(
+    20,
+    population_size=10000,
+    sequence_length=1000,
+    recombination_rate=2e-8,
+    random_seed=12)
+ts = msprime.sim_mutations(ts, rate=2e-8, random_seed=12)
+
+ld_all = ts.ld_matrix(mode="site")
+print(ld_all)
+```
+
+Considering a subset of samples:
+
+```{code-cell} ipython3
+ld_sub = ts.ld_matrix(mode="site", sample_sets=range(8))
+print(ld_sub)
+```
+
+Simplifying and computing the LD matrix avoids computing {math}`r^2` for
+invariant sites in the subsample:
+
+```{code-cell} ipython3
+ts_simp = ts.simplify(samples=range(8), filter_sites=True)
+ld_sub = ts_simp.ld_matrix(mode="site", sample_sets=range(8))
+print(ld_sub)
+```
+
+Any remaining sites that produce {math}`r^2` with a nan values are those with
+a mutation that occurs above the root of the tree, so that all samples in the
+subset carry the derived allele (so that its allele frequency is 1). Also note
+that under certain mutation models, back mutations may create sites that are
+invariant in a sample, and this will also result in nan values observed in the
+LD matrix.
+
+(sec_stats_two_locus_sample_one_way_stats)=
+
+##### One-way Statistics
+
+One-way statistics are summaries of two loci in a single sample set, using a
+triple of haplotype counts {math}`\{w_{Ab}, w_{aB}, w_{AB}\}` and the size of
+the sample set {math}`n`, where the capitalized and lowercase letters in our notation
+represent alternate alleles.
+
+(sec_stats_two_locus_sample_two_way_stats)=
+
+##### Two-way Statistics
+
+Two-way statistics are summaries of haplotype counts between two sample sets,
+which operate on the three haplotype counts (as in one-way stats, above)
+computed from each sample set, indexed by `(i, j)`. These statistics take on
+a different meaning from their one-way counterparts. For instance {math}`D_i
+D_j` can be thought of as the covariance between two loci when computed for
+a haplotype in a single sample set.
+
+Only a subset of our summary functions are two-way statistics (see
+{ref}`sec_two_locus_summary_functions_two_way`). Note that the unbiased two-way
+statistics expect non-overlapping sample sets, and we do not make any
+assertions about the sample sets and assume that `i` and `j` represent disjoint
+sets of samples (see also the note in {meth}`~TreeSequence.divergence`).
+
+(sec_stats_two_locus_summary_functions)=
+
+#### Summary Functions
+
+(sec_stats_two_locus_summary_functions_one_way)=
+
+##### One-way
+
+The two-locus summary functions all take haplotype counts and sample set size as
+input. Each of our summary functions has the signature
+{math}`f(w_{Ab}, w_{aB}, w_{AB}, n)`, converting to haplotype frequencies
+{math}`\{p_{Ab}, p_{aB}, p_{AB}\}` where appropriate by dividing by {math}`n`.
+
+`D`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{ab} - p_{a}p_{b}`
+
+  This statistic is polarised, as the unpolarised result, which averages over
+  allele labelings, is zero. Uses the `total` weighting method.
+
+`D_prime`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{D_{\max}}`,
+
+  where {math}
+            \min\{p_A (1-p_B), p_B (1-p_B)\} & \textrm{if }D>=0 \\
+            \min\{p_A p_B, (1-p_B) (1-p_B)\} & \textrm{otherwise}
+        \end{cases}```
+
+  and {math}`D` is defined above. Polarised, `total` weighted.
+
+`D2`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = (p_{ab} - p_{a} p_{b})^2`
+
+  Unpolarised, `total` weighted.
+
+`Dz`
+:  {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D (1 - 2 p_{a})(1 - 2p_{b})`,
+
+  where {math}`D` is defined above. Unpolarised, `total` weighted.
+
+`pi2`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = p_{a}p_{b}(1-p_{a})(1-p_{b})`
+
+  Unpolarised, `total` weighted.
+
+`r`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D}{\sqrt{p_{a}p_{b}(1-p_{a})(1-p_{b})}}`,
+
+  where {math}`D` is defined above. Polarised, `total` weighted.
+
+`r2`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{D^{2}}{p_{a}p_{b}(1-p_{a})(1-p_{b})}`,
+
+  where {math}`D` is defined above. Unpolarised, `haplotype` weighted.
+
+Unbiased two-locus statistics from the Hill-Robertson (1968) system are
+computed from haplotype counts. Definitions of these unbiased estimators can
+be found in [Ragsdale and Gravel
+(2020)](https://doi.org/10.1093/molbev/msz265). They require at least 4 samples
+to be valid and are specified as `stat="D2_unbiased"`, `"Dz_unbiased"`, or
+`"pi2_unbiased"`.
+
+(sec_two_locus_summary_functions_two_way)=
+
+(sec_stats_two_locus_summary_functions_two_way)=
+
+##### Two-way
+
+Two-way statistics are indexed by sample sets {math}`i, j` and compute values
+using haplotype counts within pairs of sample sets.
+
+`D2`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = D_i * D_j`,
+
+where {math}`D` is defined above.
+
+`r2`
+: {math}`f(w_{Ab}, w_{aB}, w_{AB}, n) = \frac{(p_{AB_i} - (p_{A_i}  p_{B_i})) (p_{AB_j} - (p_{A_j}  p_{B_j}))}{\sqrt{p_{A_i} (1 - p_{A_i}) p_{B_i} (1 - p_{B_i})}\sqrt{p_{A_j} (1 - p_{A_j}) p_{B_j} (1 - p_{B_j})}}`
+
+And `D2_unbiased`, which can be found in [Ragsdale and Gravel
+(2020)](https://doi.org/10.1093/molbev/msz265).
 
 
 (sec_stats_notes)=
diff --git a/python/CHANGELOG.rst b/python/CHANGELOG.rst
index 916412cc81..bc9b6217da 100644
--- a/python/CHANGELOG.rst
+++ b/python/CHANGELOG.rst
@@ -7,6 +7,12 @@ In development
 - Add ``json+struct`` metadata codec that allows storing binary data using a struct
   schema alongside JSON metadata. (:user:`benjeffery`, :pr:`3306`)
 
+**Features**
+
+- Add ``TreeSequence.ld_matrix`` stats method and documentation, for computing
+  two-locus statistics in site and branch mode.
+  (:user:`lkirk`, :user:`apragsdale`, :pr:`3416`) 
+
 --------------------
 [1.0.2] - 2026-03-06
 --------------------
diff --git a/python/tskit/trees.py b/python/tskit/trees.py
index 45d2da59e0..2f0d9e9089 100644
--- a/python/tskit/trees.py
+++ b/python/tskit/trees.py
@@ -10930,12 +10930,92 @@ def impute_unknown_mutations_time(
     def ld_matrix(
         self,
         sample_sets=None,
-        sites=None,
-        positions=None,
         mode="site",
         stat="r2",
+        sites=None,
+        positions=None,
         indexes=None,
     ):
+        r"""
+
+        Returns a matrix of the specified two-locus statistic (default
+        :math:`r^2`) computed from sample allelic states or branch lengths.
+        The resulting linkage disequilibrium (LD) matrix represents either the
+        two-locus statistic as computed between all pairs of specified
+        ``sites`` ("site" mode, producing a num_sites-by-num_sites sized
+        matrix), or as computed from the branch structures at marginal trees
+        between pairs of trees at all specified ``positions`` ("branch" mode,
+        producing a num_positions-by-num_positions sized matrix).
+
+        In the site mode, the sites under consideration can be restricted using
+        the ``sites`` argument. Sites can be passed as a list of lists,
+        specifying the ``[[row_sites], [col_sites]]``, resulting in a
+        rectangular matrix, or by specifying a single list of ``[sites]``, in
+        which a square matrix will be produced (see
+        :ref:`sec_stats_two_locus_site` for examples).
+
+        Similarly, in the branch mode, the ``positions`` argument specifies
+        loci for which the expectation for the two-locus statistic is computed
+        over pairs of trees at those positions. LD stats are computed between
+        trees whose ``[start, end)`` contains the given position (such that
+        repeats of trees are possible). Similar to the site mode, a nested list
+        of row and column positions can be specified separately (resulting in a
+        rectangular matrix) or a single list of a specified positions results
+        in a square matrix (see :ref:`sec_stats_two_locus_branch` for
+        examples).
+
+        Some LD statistics are defined for two sample sets instead of within a
+        single set of samples. If the ``indexes`` argument is specified, at
+        least two sample sets must also be specified. ``indexes`` specifies the
+        indexes of the sample sets in the ``sample_sets`` list
+        between which to compute LD.
+
+        For more on how the ``indexes`` and ``sample_sets`` interact with the
+        output dimensions, see the :ref:`sec_stats_two_locus_sample_sets`
+        section.
+
+        **Available Stats** (use ``Stat Name`` in the ``stat`` keyword
+        argument).
+
+        ======================= ========== ================ ==============
+        Stat                     Polarised Multi Sample Set Stat Name
+        ======================= ========== ================ ==============
+        :math:`r^2`              n          y               "r2"
+        :math:`r`                y          n               "r"
+        :math:`D^2`              n          y               "D2"
+        :math:`D`                y          n               "D"
+        :math:`D'`               y          n               "D_prime"
+        :math:`D_z`              n          n               "Dz"
+        :math:`\pi_2`             n          n               "pi2"
+        :math:`\widehat{D^2}`    n          y               "D2_unbiased"
+        :math:`\widehat{D_z}`    n          n               "Dz_unbiased"
+        :math:`\widehat{\pi_2}`  n          n               "pi2_unbiased"
+        ======================= ========== ================ ==============
+
+        :param list sample_sets: A list of lists of sample Node IDs, specifying
+            the groups of nodes to compute the statistic with. Defaults to all
+            samples grouped by population.
+        :param str mode: A string giving the "type" of the statistic to be
+            computed. Defaults to "site", can be "site" or "branch".
+        :param str stat: A string giving the selected two-locus statistic to
+            compute. Defaults to "r2".
+        :param list sites: A list of sites over which to compute LD. Can be
+            specified as a list of lists to control the row and column sites.
+            Only applicable in site mode. Specify as
+            ``[[row_sites], [col_sites]]`` or ``[all_sites]``.
+            Defaults to all sites.
+        :param list positions: A list of genomic positions where expected LD is
+            computed. Only applicable in branch mode. Can be specified as a list
+            of lists to control the row and column positions. Specify as
+            ``[[row_positions], [col_positions]]`` or ``[all_positions]``.
+            Defaults to the leftmost coordinates of all trees.
+        :param list indexes: A list of 2-tuples or a single 2-tuple, specifying
+            the indexes of two sample sets over which to compute a two-way LD
+            statistic. Only :math:`r^2`, :math:`D^2`, and :math:`\widehat{D^2}`
+            are implemented for two-way statistics.
+        :return: A 2D or 3D array of LD matrices.
+        :rtype: numpy.ndarray
+        """
         one_way_stats = {
             "D": self._ll_tree_sequence.D_matrix,
             "D2": self._ll_tree_sequence.D2_matrix,