Skip to content

Commit f0d032c

Browse files
gasseAntoinePrv
authored andcommitted
Switched back to Mathjax, replaced the argmax and nindep commands
1 parent c2f9a31 commit f0d032c

File tree

2 files changed

+10
-24
lines changed

2 files changed

+10
-24
lines changed

docs/conf.py.in

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -53,20 +53,6 @@ napoleon_google_docstring = False
5353
napoleon_numpy_docstring = True
5454

5555

56-
# LaTex configuration (for math)
57-
extensions += ["sphinx.ext.imgmath"]
58-
imgmath_image_format = "svg"
59-
imgmath_latex_preamble = r'''
60-
\DeclareMathOperator*{\argmax}{arg\,max}
61-
\DeclareMathOperator*{\argmin}{arg\,min}
62-
\newcommand\indep{\protect\mathpalette{\protect\independenT}{\perp}}
63-
\def\independenT#1#2{\mathop{\rlap{$#1#2$}\mkern2mu{#1#2}}}
64-
\newcommand\nindep{\protect\mathpalette{\protect\nindependenT}{\perp}}
65-
\def\nindependenT#1#2{\mathop{\rlap{$#1#2$}\mkern2mu{\not#1#2}}}
66-
\newcommand{\overbar}[1]{\mkern 1.5mu\overline{\mkern-1.5mu#1\mkern-1.5mu}\mkern 1.5mu}
67-
'''
68-
69-
7056
# Preprocess docstring to remove "core" from type name
7157
def preprocess_signature(app, what, name, obj, options, signature, return_annotation):
7258
if signature is not None:

docs/discussion/theory.rst

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Ecole Theoretical Model
22
=======================
33

4-
The ECOLE API and classes directly relate to the different components of
4+
The Ecole API and classes directly relate to the different components of
55
an episodic `partially-observable Markov decision process <https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process>`_
66
(PO-MDP).
77

@@ -20,7 +20,7 @@ Consider a regular Markov decision process
2020
.. note::
2121

2222
The choice of having deterministic rewards :math:`r_t = R(s_t)` is
23-
arbitrary here, in order to best fit the ECOLE API. Note that it is
23+
arbitrary here, in order to best fit the Ecole API. Note that it is
2424
not a restrictive choice though, as any MDP with stochastic rewards
2525
:math:`r_t \sim p_{reward}(r_t|s_{t-1},a_{t-1},s_{t})`
2626
can be converted into an equivalent MDP with deterministic ones,
@@ -56,16 +56,16 @@ reward,
5656
.. math::
5757
:label: mdp_control
5858
59-
\pi^\star = \argmax_{\pi} \lim_{T \to \infty}
60-
\mathbb{E}_\tau\left[\sum_{t=0}^{T} r_t\right]
59+
\pi^\star = \underset{\pi}{\operatorname{arg\,max}}
60+
\lim_{T \to \infty} \mathbb{E}_\tau\left[\sum_{t=0}^{T} r_t\right]
6161
\text{,}
6262
6363
where :math:`r_t := R(s_t)`.
6464

6565
.. note::
6666

6767
In the general case this quantity may not be bounded, for example for MDPs
68-
that correspond to continuing tasks. In ECOLE we garantee that all
68+
that correspond to continuing tasks. In Ecole we garantee that all
6969
environments correspond to **episodic** tasks, that is, each episode is
7070
garanteed to start from an initial state :math:`s_0`, and end in a
7171
terminal state :math:`s_{final}`. For convenience this terminal state can
@@ -95,7 +95,7 @@ non-Markovian nature of those trajectories, that is,
9595

9696
.. math::
9797
98-
o_{t+1},r_{t+1} \nindep o_0,r_0,a_0,\dots,o_{t-1},r_{t-1},a_{t-1} \mid o_t,r_t,a_t
98+
o_{t+1},r_{t+1} \mathop{\rlap{\perp}\mkern2mu{\not\perp}} o_0,r_0,a_0,\dots,o_{t-1},r_{t-1},a_{t-1} \mid o_t,r_t,a_t
9999
\text{,}
100100
101101
the decision-maker must take into account the whole history of past
@@ -117,14 +117,14 @@ The PO-MDP control problem can then be written identically to the MDP one,
117117
.. math::
118118
:label: pomdp_control
119119
120-
\pi^\star = \argmax_{\pi} \lim_{T \to \infty}
120+
\pi^\star = \underset{\pi}{\operatorname{arg\,max}} \lim_{T \to \infty}
121121
\mathbb{E}_\tau\left[\sum_{t=0}^{T} r_t\right]
122122
\text{.}
123123
124-
ECOLE as PO-MDP components
124+
Ecole as PO-MDP components
125125
--------------------------
126126

127-
The following ECOLE components can be directly translated into PO-MDP
127+
The following Ecole components can be directly translated into PO-MDP
128128
components from the above formulation:
129129

130130
* :py:class:`~ecole.typing.RewardFunction` <=> :math:`R`
@@ -160,6 +160,6 @@ environment.
160160

161161
As can be seen from :eq:`pomdp_control`, the initial reward :math:`r_0`
162162
returned by :py:meth:`~ecole.environment.EnvironmentComposer.reset`
163-
does not affect the control problem. In ECOLE we
163+
does not affect the control problem. In Ecole we
164164
nevertheless chose to preserve this initial reward, in order to obtain
165165
meaningfull cumulated episode rewards (e.g., total running time).

0 commit comments

Comments
 (0)