Probula is a Bayesian inference framework implemented in Scala 3. It follows purely functional style. It has been created as a small alternative to the core of Figaro, when Figaro stayed at Scala 2, which made it difficult to use it for teaching in a Scala 3 based course.
Probula models hierarchical Bayesian models as multivariate distributions and performs inference via importance sampling over lazy streams.
Example
Probula models are opened with the Probula keyword (an object), which offers
independent variables (prior), dependent variables, and likelihood
constructors. All models return a Dist[T] object, where T is a tuple of
all variables in the model (so generatively a model is a generator of tuples).
For example, the following is a generator of pairs:
val gen = Probula
.uniformC("x")(-100.0, 100.0)
.probDep("y") { x => Probula.gaussian(2.0 * x + 1.0, 1.0) }
where x is uniformly distributed between -100 and 100, and y is a dependent
variable, a linear transformation of x with some Gaussian noise.
We can generate a sample of 20 values from the gen model as follows:
val data = gen.sample(20.sampleSize).values
The values method extracts a lazy list of samples represented as tuples; in
this case, pairs of doubles, (x, y).
We use the above sample as an observed data set (in the likelihood) in a
Bayesian linear regression model to recover the model parameters a, b and
σ using Bayesian inference.
val model = Probula
.gaussian("a")(0.0, 10.0) // prior
.gaussian("b")(0.0, 3.0) // prior
.uniformC("σ")(0.0, 3.0) // prior
.likelihood(data): (x: Double, y: Double) => // likelihood
(a, b, σ) =>
Probula.gaussian(a * x + b, σ).observe(y)
val sample = model.sample(10000.sampleSize) // posterior
println(s"E(a) = ${sample._1.mean}")
println(s"E(b) = ${sample._2.mean}")
Nodes "a", "b", "σ" are priors. We use the Gaussian likelihood to observe the
data. Naming columns is not required (the name parameter group can be dropped),
but makes export easier to interpret.
Posterior Inference
Importance Sampling
Probula uses importance sampling: Dist.sample(n) draws n
weighted samples from the prior, reweighted by likelihood scores.
The result is an IData[T] object supporting statistical queries:
mean,expectedValue— weighted sample meanmedian— weighted sample medianvariance— unbiased weighted varianceprobability(p),pr(p)— estimated probability of an eventpercentile(q)— weighted q-th percentile ($q \in [0, 1]$)
Grid Approximation
For models where all distributions have a known density
(HasDensity – all built-in distributions have it),
grid approximation evaluates the posterior exactly on
a discrete grid. Use the doubles DSL to build grids:
val grid = for
a <- 50 doubles (-10.0 -> 10.0)
b <- 50 doubles (-5.0 -> 5.0)
σ <- 50 doubles (0.01 -> 3.0)
yield (a, b, σ)
val posterior = model.gridApproximation(grid)
println(s"E(a) = ${posterior._1.mean}")
Grid approximation is exact but grows exponentially in the number of parameters – it works onlye for models with a small number of parameters.
Tutorial
The main tutorial is a runnable scala-cli script that walks through probula’s core concepts:
- Discrete and continuous distributions
- Mapping and composition (
map,->,detDep) - Conditioning and Bayesian inference (
condition,matching) - Statistical queries (
pr,mean,median)
Run it directly:
scala-cli doc/tutorial/Basic.sc
And read the source:
doc/tutorial/Basic.sc
Interface to ArviZ
Probula can export posterior samples as CSV for analysis in ArviZ (Python) or other tools.
The workflow:
- Build and sample a model in Scala
- Export with
IData.csv - Load into ArviZ as
InferenceData
Full example with instructions:
doc/arviz/README.md
Runnable script:
scala-cli doc/arviz/ArvizExample.sc
This generates a posterior.csv file ready for ArviZ analysis.
Using Probula in another project
scala-cli
//> using repository "https://codeberg.org/api/packages/wasowski/maven"
//> using dep "io.codeberg.wasowski::probula:0.2.0"
sbt
resolvers += "Codeberg" at
"https://codeberg.org/api/packages/wasowski/maven"
libraryDependencies +=
"io.codeberg.wasowski" %% "probula" % "0.2.0"