Identifiability
Søren Hauberg
Technical University of Denmark
sohau@dtu.dk
This school is about...
Here's what's gonna happen
...functions
Generative models
A generative model consists of two parts:
A (prior) distribution over latent representation
A mapping from to the observation space:
Generative models
A generative model consists of two parts:
A (prior) distribution over latent representation
A mapping from to the observation space:
observation
Generative models
A generative model consists of two parts:
A (prior) distribution over latent representation
A mapping from to the observation space:
observation
latent representation
Generative models
A generative model consists of two parts:
A (prior) distribution over latent representation
A mapping from to the observation space:
observation
latent representation
mapping:
(either deterministic of stochastic)
We call this the 'decoder'
Generative models
A generative model consists of two parts:
A (prior) distribution over latent representation
A mapping from to the observation space:
observation
latent representation
mapping:
(either deterministic of stochastic)
We call this the 'decoder'
Examples: GANs, VAEs, PPCA, GPLVM, ...
We often use the latent
representation to get a
glimpse into the data
generating process. This
is in an effort to try to get
an understanding of the
mechanisms behind the
world we observe through
data.
Latent representations are,
thus, used for
humans to
learn from ML
models.
Identifiability in generative models
When fitting a generative model to data, we often find that different
training runs give different latent representations.
Four different VAEs fitted to
a selection of proteins from
the same family. All are equal
in terms of goodness-of-fit.
Identifiability in generative models
When fitting a generative model to data, we often find that different
training runs give different latent representations.
Four different VAEs fitted to
a selection of proteins from
the same family. All are equal
in terms of goodness-of-fit.
Which one do I trust?
Which one do I analyze?
Which one do I learn from?
Identifiability in generative models
To get an understanding of this issue, we should look at the core issue of
identifiability.
We say that a model is
identifiable
if a change in model parameters imply
a change in the modeled density.
Identifiability in generative models
To get an understanding of this issue, we should look at the core issue of
identifiability.
We say that a model is
identifiable
if a change in model parameters imply
a change in the modeled density. Technically:
If is a density parametrized by , then our model is
identifiable
if
the map is bijective (one-to-one).
Identifiability in generative models
To get an understanding of this issue, we should look at the core issue of
identifiability.
We say that a model is
identifiable
if a change in model parameters imply
a change in the modeled density. Technically:
If is a density parametrized by , then our model is
identifiable
if
the map is bijective (one-to-one).
Identifiability in generative models
To get an understanding of this issue, we should look at the core issue of
identifiability.
We say that a model is
identifiable
if a change in model parameters imply
a change in the modeled density. Technically:
If is a density parametrized by , then our model is
identifiable
if
the map is bijective (one-to-one).
If this is not satisfied, then it implies that two different set of model parameters might
be equally good, and then we don't know which one we should interpret and learn from.
Identifiability in generative models
To get an understanding of this issue, we should look at the core issue of
identifiability.
We say that a model is
identifiable
if a change in model parameters imply
a change in the modeled density. Technically:
If is a density parametrized by , then our model is
identifiable
if
the map is bijective (one-to-one).
If this is not satisfied, then it implies that two different set of model parameters might
be equally good, and then we don't know which one we should interpret and learn from.
Sadly, practically all generative models do not satisfy the identifiability condition:
Identifiability in generative models
To get an understanding of this issue, we should look at the core issue of
identifiability.
We say that a model is
identifiable
if a change in model parameters imply
a change in the modeled density. Technically:
If is a density parametrized by , then our model is
identifiable
if
the map is bijective (one-to-one).
If this is not satisfied, then it implies that two different set of model parameters might
be equally good, and then we don't know which one we should interpret and learn from.
Sadly, practically all generative models do not satisfy the identifiability condition:
Assume you have a generative model with a decoder and latent representation .
Then if is a bijection, then
Alternative decoder
Alternative latent representation
is a different way of getting the same model, but with a potentially very different
latent representation. If your model class for is a very flexible function (e.g. a deep
neural network), then you almost surely suffer quite strongly from this issue.
Identifiability in generative models
Alternative decoder
Alternative latent representation
Identifiability in generative models
Alternative decoder
Alternative latent representation
Our goal today is to develop the
machinery to solve this identifiability
problem.
The solution will rely on classic
differential geometry, which we
will extend to cover the stochastic
case as well.
Lecture notes, software and more:
www2.compute.dtu.dk/~sohau/weekendwithbernie
Latent space geometry
But what if I don't care about identifiability?
We'll see how to build adaptive priors for generative models without increasing model capacity...
Identifiability
But what if I don't care about identifiability?
We'll see that identifiability allow us to reveal the underpinnings of biology...
But what if I don't care about identifiability?
We'll see how identifiabile representations lead to excellent motion primitives for robot control...
But what if I don't care about identifiability?
Identifiability is essential if
we ever are to open the
black box of deep learning...
in particular, distributions over functions.
This school is about...
Here's what's gonna happen
...functions
in particular, distributions over functions.
Here's what I'll be doing:
time
I'll completely ignore the distributional
part and will go have fun with generative
models...
This school is about...
Here's what's gonna happen
...functions
in particular, distributions over functions.
Here's what I'll be doing:
time
I'll show you how you are fundamentally
screwed in this setting.
This school is about...
Here's what's gonna happen
...functions
in particular, distributions over functions.
Here's what I'll be doing:
time
I'll show you that geometry unscrews
you...
This school is about...
Here's what's gonna happen
...functions
in particular, distributions over functions.
Here's what I'll be doing:
time
I'll show you that geometry actually
doesn't work
(back to being screwed)
This school is about...
Here's what's gonna happen
...functions
in particular, distributions over functions.
Here's what I'll be doing:
time
I'll come crawling back to the distributional
stuff and we see that geometry works
after all!
(GPSS for the win!)
A simple generative model
Consider a simple generative model with a Gaussian latent prior,
and a deterministic deocder. The GAN is an example of such a model.
By the same arguments as before, we have an identifiability problem:
New latent representation
New decoder
Assuming that
By the same arguments as before, we have an identifiability problem:
New latent representation
New decoder
Assuming that
Does such mappings exist?
A simple example
(rotate each latent point by a number of degrees that depend on the norm of the point):
Assuming that
Does such mappings exist?
A simple example
(rotate each latent point by a number of degrees that depend on the norm of the point):
Pairwise distances are not
identifiable
The geometric solution
This identifiability problem was effectively solved in the 19th century by
Gauss, Riemann, and colleagues. The solution, like any good solution,
is rather simple:
Don't define distances directly in the latent representation space;
instead define them in observation space and bring them from there
into the latent representation.
The geometric solution
This identifiability problem was effectively solved in the 19th century by
Gauss, Riemann, and colleagues. The solution, like any good solution,
is rather simple:
Don't define distances directly in the latent representation space;
instead define them in observation space and bring them from there
into the latent representation.
(For now) assume that the observation space is Euclidean,
Don't define distances directly in the latent representation space;
instead define them in observation space and bring them from there
into the latent representation.
(For now) assume that the observation space is Euclidean,
Don't define distances directly in the latent representation space;
instead define them in observation space and bring them from there
into the latent representation.
(For now) assume that the observation space is Euclidean,
Don't define distances directly in the latent representation space;
instead define them in observation space and bring them from there
into the latent representation.
(For now) assume that the observation space is Euclidean,
Don't define distances directly in the latent representation space;
instead define them in observation space and bring them from there
into the latent representation.
(For now) assume that the observation space is Euclidean,
Define
length of
latent curve
to be the
length of the
decoded curve
.
From that definition, you just go where the math takes you...
By construction this is invariant
to reparametrizations, i.e. this is
an identifiable distance measure.
To arrive at a notion of distance, we will need to look at shortest paths;
to understand these, we'll need to understand curves in Euclidean space.
Curves
To arrive at a notion of distance, we will need to look at shortest paths;
to understand these, we'll need to understand curves in Euclidean space.
Curves
Let be a smooth curve in a
D
-dimensional space.
To arrive at a notion of distance, we will need to look at shortest paths;
to understand these, we'll need to understand curves in Euclidean space.
Curves
Let be a smooth curve in a
D
-dimensional space.
We can then approximate this curve by a set of straight lines connecting
where
We can then approximate the length of the curve as
We can then approximate the length of the curve as
Taking the limit
We can then approximate the length of the curve as
Taking the limit
We can then approximate the length of the curve as
Taking the limit
Finally, we have the tools for defining distances on manifolds.
Manifold distances
Finally, we have the tools for defining distances on manifolds.
Manifold distances
First, we define that the length of a curve is the Euclidean length of its immersion , i.e.
Next, we say that the shortest path between two points is
(technically it should be an infimum...)
Finally, we say that the distance between points is the length of the shortest connecting path
To get a deeper understanding of distances, let us look at the length of a curve
The metric
(definition)
To get a deeper understanding of distances, let us look at the length of a curve
The metric
(definition)
(chain rule)
To get a deeper understanding of distances, let us look at the length of a curve
The metric
(definition)
(chain rule)
(definition of norm)
We can also define a notion of
angle
. Consider two intersecting curves
Let the point of intersection be given by
Then we can curve velocities at this point
We can also define a notion of
angle
. Consider two intersecting curves
Let the point of intersection be given by
Then we can curve velocities at this point
And finally define the angle between curves as
Riemannian metrics
This is the biggest
conceptual difference
to the standard
Euclidean interpretation.
Intermediate summary
~
~
Regarding reparametrizations
Minimizing here,
gives an invariant
distance measure
Our goal is to define distances on latent representations
using curve lengths in the ambient observation space.
Intermediate summary
~
~
The Riemannian metric
defines a position-
dependent inner product, mimicks
how infinitesimals behave in the
tangent space:
Intermediate summary
~
~
This allow us to define
angles between curves
as well as distances.
Basically, you can get
what you expect from
inner products
-- just keep in mind that
it's only locally defined.
To get a deeper understanding of distances, let us look at the length of a curve
The metric
(definition)
(chain rule)
(definition of norm)
We call
the (Riemannian) metric
To get a deeper understanding of distances, let us look at the length of a curve
The metric
(definition)
(chain rule)
(definition of norm)
We call
the (Riemannian) metric
We can think of this as defining a local (infinitessimal) inner product
With this notation we can, e.g., define a local norm
Identifiability
Søren Hauberg
1