Probula

Probula is a Bayesian inference framework implemented in Scala 3. It follows purely functional style. It has been created as a small alternative to the core of Figaro, when Figaro stayed at Scala 2, which made it difficult to use it for teaching in a Scala 3 based course.

Probula models hierarchical Bayesian models as multivariate distributions and performs inference via importance sampling over lazy streams.

Example

Probula models are opened with the Probula keyword (an object), which offers independent variables (prior), dependent variables, and likelihood constructors. All models return a Dist[T] object, where T is a tuple of all variables in the model (so generatively a model is a generator of tuples).

For example, the following is a generator of pairs:

val gen = Probula
  .uniformC("x")(-100.0, 100.0)
  .probDep("y") { x => Probula.gaussian(2.0 * x + 1.0, 1.0) }

where x is uniformly distributed between -100 and 100, and y is a dependent variable, a linear transformation of x with some Gaussian noise.

We can generate a sample of 20 values from the gen model as follows:

val data = gen.sample(20.sampleSize).values

The values method extracts a lazy list of samples represented as tuples; in this case, pairs of doubles, (x, y).

We use the above sample as an observed data set (in the likelihood) in a Bayesian linear regression model to recover the model parameters a, b and σ using Bayesian inference.

val model = Probula
  .gaussian("a")(0.0, 10.0)                    // prior
  .gaussian("b")(0.0,  3.0)                    // prior
  .uniformC("σ")(0.0,  3.0)                    // prior
  .likelihood(data): (x: Double, y: Double) => // likelihood
    (a, b, σ) =>
      Probula.gaussian(a * x + b, σ).observe(y)

val sample = model.sample(10000.sampleSize)    // posterior
println(s"E(a) = ${sample._1.mean}")
println(s"E(b) = ${sample._2.mean}")

Nodes "a", "b", "σ" are priors. We use the Gaussian likelihood to observe the data. Naming columns is not required (the name parameter group can be dropped), but makes export easier to interpret.

Posterior Inference

Importance Sampling

Probula uses importance sampling: Dist.sample(n) draws n weighted samples from the prior, reweighted by likelihood scores. The result is an IData[T] object supporting statistical queries:

mean, expectedValue — weighted sample mean
median — weighted sample median
variance — unbiased weighted variance
probability(p), pr(p) — estimated probability of an event
percentile(q) — weighted q-th percentile ($q \in [0, 1]$)

Grid Approximation

For models where all distributions have a known density (HasDensity – all built-in distributions have it), grid approximation evaluates the posterior exactly on a discrete grid. Use the doubles DSL to build grids:

val grid = for
  a <- 50 doubles (-10.0 -> 10.0)
  b <- 50 doubles (-5.0 -> 5.0)
  σ <- 50 doubles (0.01 -> 3.0)
yield (a, b, σ)
val posterior = model.gridApproximation(grid)
println(s"E(a) = ${posterior._1.mean}")

Grid approximation is exact but grows exponentially in the number of parameters – it works onlye for models with a small number of parameters.

Tutorial

The main tutorial is a runnable scala-cli script that walks through probula’s core concepts:

Discrete and continuous distributions
Mapping and composition (map, ->, detDep)
Conditioning and Bayesian inference (condition, matching)
Statistical queries (pr, mean, median)

Run it directly:

scala-cli doc/tutorial/Basic.sc

And read the source: doc/tutorial/Basic.sc

Interface to ArviZ

Probula can export posterior samples as CSV for analysis in ArviZ (Python) or other tools.

The workflow:

Build and sample a model in Scala
Export with IData.csv
Load into ArviZ as InferenceData

Full example with instructions: doc/arviz/README.md

Runnable script:

scala-cli doc/arviz/ArvizExample.sc

This generates a posterior.csv file ready for ArviZ analysis.

Using Probula in another project

scala-cli

//> using repository "https://codeberg.org/api/packages/wasowski/maven"
//> using dep "io.codeberg.wasowski::probula:0.2.0"

sbt

resolvers += "Codeberg" at
  "https://codeberg.org/api/packages/wasowski/maven"
libraryDependencies +=
  "io.codeberg.wasowski" %% "probula" % "0.2.0"

Keyboard shortcuts