GSoC 2022 - Part 1
Cheers to another summer of code!
Following my first GSoC, I had such a great experience with the PyMC community that I decided to undertake a second summer of code!
My Project
Last year, my project entailed adding a Dirichlet Process (DP) submodule to PyMC. It’s still a work-in-progress, but hopefully I will have something working by the end of the summer in PyMC experimental! In January, I was able to merge a distribution classes for weights obtained via a stick-breaking process; you can read about the process in this blogpost.
A major focal point of my 2022 GSoC project revolves around improving AePPL’s functionality for mixture models. Notably, I aim to address the following issues:
-
Support mixture graphs defined by
switch
; -
Support mixture graphs defined by
ifelse
; - Extend mixture functionality for multivariate components.
Following my first summer, I decided that it would be good to further invest time diving into a codebase where I know that I need to help to understand. Given that I needed a break from my PhD studies, this time investment into a theoretically challenging project was something that appealed to me but also a good investment for the longer-term, both on a personal and professional level. Aesara, the computational backend to PyMC, is not an easy package to wrap one’s head around; its use is non-trivial and its codebase daunting. On top of that, AePPL provides the log-probability of random graphs written in Aesara and the delineation in package differences is not obvious. As my first blogpost for this summer’s GSoC, it would be good to talk a bit about Aesara and AePPL.
Aesara 101
The best description is the one taken straight from the GitHub repository and Read the Docs: “Aesara is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.” Effectively, Aesara allows users to define mathematical expressions via graph-like structures. A more commonly used tool that permits similar symbolic computation functionalities is the very famous and expensive MATLAB.
Defining Expressions
import aesara
from aesara import tensor as at
x = at.dscalar()
y = at.dscalar()
z = x + y
The example above was taken straight from the frontpage of Aesara. Here, x
, y
and z
are tensor variables of type TensorVariable
which do not have a value associated. In Aesara, such mathematical expressions have a corresponding graph; in order to produce any meaningful numerical computation that humans would understand, we would have to compile the graphs, often optimize them and then feed values. A little more on the optimization of graphs in a few scrolls.
Evaluating Expressions
Any variable in Aesara can be evaluated by calling .eval()
but with provided appropriate inputs.
z.eval() # yields MissingInputError
z.eval({x: 3., y: 4.}) # yields array(7.)
Likewise, we can call upon aesara.function
to do the compilation for us and retrieve a function that Python users are more familiar with.
f = aesara.function([x, y], z)
f(3., 4.) # yields array(7.)
Optimizing Expressions
x_1 = at.log(at.exp(x))
x_2 = at.exp(at.log(x))
f1 = aesara.function([x], x_1)
f2 = aesara.function([x], x_2)
Observe that the exponential function is the inverse of the logarithm function and vice-versa. x_1
is in fact identical to x
, but domain restriction renders x_2
equal to x
only for positive x
. It can be written in a piecewise fashion as follows:
As such, using the very useful aesara.dprint
function to print computational graphs, f1
and f2
.
aesara.dprint(f1)
# DeepCopyOp [id A] 0
# |<TensorType(float64, ())> [id B]
aesara.dprint(f2)
# Elemwise{Composite{Switch(GE(i0, i1), i0, i2)}} [id A] 0
# |<TensorType(float64, ())> [id B]
# |TensorConstant{0} [id C]
# |TensorConstant{nan} [id D]
The printed optimized graph shows that f1
is just the identify function, but f2
has some domain restriction for the input, although the exact restriction is not exactly obvious from the printed graph.
AePPL 201
(201 because there is no 101 due to its complexity...)
Note that, in the description of Aesara, there is no mentioning of random variables, log-probability nor stochastic primitives. This is because Aesara is the computational framework that solely focuses on defining and optimizing the graph-like that underpins mathematical expressions irrespective of the deterministic or stochastic nature of variables.
A good example of AePPL’s utility is its ability to provide log-probabilities is in the case of a two component mixture model. Consider a random variable \(Z\) with a mixture density \(f_Z(z)\) as follows:
\[f_Z(z) = w_1 f_{Z_1}(z) + w_2 f_{Z_2}(z)\]with \(Z_1 \sim \mathcal{N}(-5, 0.1)\) and \(Z_2 \sim \mathcal{N}(5, 0.1)\). Equivalently, the Aesara graph can be constructed using an at.stack
operation:
from aesara import tensor as at
from aeppl import joint_logprob
srng = at.random.RandomStream(seed=2320)
I_rv = srng.bernoulli(0.5, name="I")
X1_rv = srng.normal(loc=-5, scale=0.1, name="X")
X2_rv = srng.normal(loc=5, scale=0.1, name="X")
Z_rv = at.stack([X1_rv, X2_rv])[I_rv]
logp = joint_logprob({Z_rv: Z_rv.copy(), I_rv: I_rv.copy()})
As such, the log-probability function is able to provide the density of the correct component using indexing here:
logp.eval({z_vv: 5, i_vv: 1}) # array(0.69049938)
logp.eval({z_vv: 5, i_vv: 0}) # array(-4999.30950062)