IdentifiabilitySøren HaubergTechnical University of Denmarksohau@dtu.dkThis school is about...Here's what's gonna happen...functionsGenerative modelsA generative model consists of two parts:A (prior) distribution over latent representation A mapping from to the observation space:Generative modelsA generative model consists of two parts:A (prior) distribution over latent representation A mapping from to the observation space:observationGenerative modelsA generative model consists of two parts:A (prior) distribution over latent representation A mapping from to the observation space:observationlatent representationGenerative modelsA generative model consists of two parts:A (prior) distribution over latent representation A mapping from to the observation space:observationlatent representationmapping:(either deterministic of stochastic)We call this the 'decoder'Generative modelsA generative model consists of two parts:A (prior) distribution over latent representation A mapping from to the observation space:observationlatent representationmapping:(either deterministic of stochastic)We call this the 'decoder'Examples: GANs, VAEs, PPCA, GPLVM, ...We often use the latentrepresentation to get aglimpse into the datagenerating process. Thisis in an effort to try to getan understanding of themechanisms behind theworld we observe throughdata.Latent representations are,thus, used for humans tolearn from ML models.Identifiability in generative modelsWhen fitting a generative model to data, we often find that differenttraining runs give different latent representations.Four different VAEs fitted toa selection of proteins fromthe same family. All are equalin terms of goodness-of-fit.Identifiability in generative modelsWhen fitting a generative model to data, we often find that differenttraining runs give different latent representations.Four different VAEs fitted toa selection of proteins fromthe same family. All are equalin terms of goodness-of-fit.Which one do I trust?Which one do I analyze?Which one do I learn from?Identifiability in generative modelsTo get an understanding of this issue, we should look at the core issue of identifiability.We say that a model is identifiable if a change in model parameters implya change in the modeled density.Identifiability in generative modelsTo get an understanding of this issue, we should look at the core issue of identifiability.We say that a model is identifiable if a change in model parameters implya change in the modeled density. Technically:If is a density parametrized by , then our model is identifiable ifthe map is bijective (one-to-one).Identifiability in generative modelsTo get an understanding of this issue, we should look at the core issue of identifiability.We say that a model is identifiable if a change in model parameters implya change in the modeled density. Technically:If is a density parametrized by , then our model is identifiable ifthe map is bijective (one-to-one).Identifiability in generative modelsTo get an understanding of this issue, we should look at the core issue of identifiability.We say that a model is identifiable if a change in model parameters implya change in the modeled density. Technically:If is a density parametrized by , then our model is identifiable ifthe map is bijective (one-to-one).If this is not satisfied, then it implies that two different set of model parameters mightbe equally good, and then we don't know which one we should interpret and learn from.Identifiability in generative modelsTo get an understanding of this issue, we should look at the core issue of identifiability.We say that a model is identifiable if a change in model parameters implya change in the modeled density. Technically:If is a density parametrized by , then our model is identifiable ifthe map is bijective (one-to-one).If this is not satisfied, then it implies that two different set of model parameters mightbe equally good, and then we don't know which one we should interpret and learn from.Sadly, practically all generative models do not satisfy the identifiability condition:Identifiability in generative modelsTo get an understanding of this issue, we should look at the core issue of identifiability.We say that a model is identifiable if a change in model parameters implya change in the modeled density. Technically:If is a density parametrized by , then our model is identifiable ifthe map is bijective (one-to-one).If this is not satisfied, then it implies that two different set of model parameters mightbe equally good, and then we don't know which one we should interpret and learn from.Sadly, practically all generative models do not satisfy the identifiability condition:Assume you have a generative model with a decoder and latent representation .Then if is a bijection, thenAlternative decoderAlternative latent representationis a different way of getting the same model, but with a potentially very differentlatent representation. If your model class for is a very flexible function (e.g. a deepneural network), then you almost surely suffer quite strongly from this issue.Identifiability in generative modelsAlternative decoderAlternative latent representationIdentifiability in generative modelsAlternative decoderAlternative latent representationOur goal today is to develop themachinery to solve this identifiabilityproblem.The solution will rely on classicdifferential geometry, which wewill extend to cover the stochasticcase as well.Lecture notes, software and more:www2.compute.dtu.dk/~sohau/weekendwithbernieLatent space geometryBut what if I don't care about identifiability?We'll see how to build adaptive priors for generative models without increasing model capacity...IdentifiabilityBut what if I don't care about identifiability?We'll see that identifiability allow us to reveal the underpinnings of biology...But what if I don't care about identifiability?We'll see how identifiabile representations lead to excellent motion primitives for robot control...But what if I don't care about identifiability?Identifiability is essential ifwe ever are to open theblack box of deep learning...in particular, distributions over functions.This school is about...Here's what's gonna happen...functionsin particular, distributions over functions.Here's what I'll be doing:timeI'll completely ignore the distributionalpart and will go have fun with generativemodels...This school is about...Here's what's gonna happen...functionsin particular, distributions over functions.Here's what I'll be doing:timeI'll show you how you are fundamentallyscrewed in this setting.This school is about...Here's what's gonna happen...functionsin particular, distributions over functions.Here's what I'll be doing:timeI'll show you that geometry unscrewsyou...This school is about...Here's what's gonna happen...functionsin particular, distributions over functions.Here's what I'll be doing:timeI'll show you that geometry actuallydoesn't work(back to being screwed)This school is about...Here's what's gonna happen...functionsin particular, distributions over functions.Here's what I'll be doing:timeI'll come crawling back to the distributionalstuff and we see that geometry worksafter all!(GPSS for the win!)A simple generative modelConsider a simple generative model with a Gaussian latent prior,and a deterministic deocder. The GAN is an example of such a model.By the same arguments as before, we have an identifiability problem:New latent representationNew decoderAssuming thatBy the same arguments as before, we have an identifiability problem:New latent representationNew decoderAssuming thatDoes such mappings exist?A simple example(rotate each latent point by a number of degrees that depend on the norm of the point):Assuming thatDoes such mappings exist?A simple example(rotate each latent point by a number of degrees that depend on the norm of the point):Pairwise distances are notidentifiableThe geometric solutionThis identifiability problem was effectively solved in the 19th century byGauss, Riemann, and colleagues. The solution, like any good solution,is rather simple:Don't define distances directly in the latent representation space;instead define them in observation space and bring them from thereinto the latent representation.The geometric solutionThis identifiability problem was effectively solved in the 19th century byGauss, Riemann, and colleagues. The solution, like any good solution,is rather simple:Don't define distances directly in the latent representation space;instead define them in observation space and bring them from thereinto the latent representation.(For now) assume that the observation space is Euclidean,Don't define distances directly in the latent representation space;instead define them in observation space and bring them from thereinto the latent representation.(For now) assume that the observation space is Euclidean,Don't define distances directly in the latent representation space;instead define them in observation space and bring them from thereinto the latent representation.(For now) assume that the observation space is Euclidean,Don't define distances directly in the latent representation space;instead define them in observation space and bring them from thereinto the latent representation.(For now) assume that the observation space is Euclidean,Don't define distances directly in the latent representation space;instead define them in observation space and bring them from thereinto the latent representation.(For now) assume that the observation space is Euclidean,Define length of latent curve to be thelength of the decoded curve.From that definition, you just go where the math takes you...By construction this is invariantto reparametrizations, i.e. this isan identifiable distance measure.To arrive at a notion of distance, we will need to look at shortest paths;to understand these, we'll need to understand curves in Euclidean space.CurvesTo arrive at a notion of distance, we will need to look at shortest paths;to understand these, we'll need to understand curves in Euclidean space.CurvesLet be a smooth curve in a D-dimensional space.To arrive at a notion of distance, we will need to look at shortest paths;to understand these, we'll need to understand curves in Euclidean space.CurvesLet be a smooth curve in a D-dimensional space.We can then approximate this curve by a set of straight lines connectingwhereWe can then approximate the length of the curve asWe can then approximate the length of the curve asTaking the limitWe can then approximate the length of the curve asTaking the limitWe can then approximate the length of the curve asTaking the limitFinally, we have the tools for defining distances on manifolds.Manifold distancesFinally, we have the tools for defining distances on manifolds.Manifold distancesFirst, we define that the length of a curve is the Euclidean length of its immersion , i.e.Next, we say that the shortest path between two points is(technically it should be an infimum...)Finally, we say that the distance between points is the length of the shortest connecting pathTo get a deeper understanding of distances, let us look at the length of a curveThe metric(definition)To get a deeper understanding of distances, let us look at the length of a curveThe metric(definition)(chain rule)To get a deeper understanding of distances, let us look at the length of a curveThe metric(definition)(chain rule)(definition of norm)We can also define a notion of angle. Consider two intersecting curvesLet the point of intersection be given byThen we can curve velocities at this pointWe can also define a notion of angle. Consider two intersecting curvesLet the point of intersection be given byThen we can curve velocities at this pointAnd finally define the angle between curves asRiemannian metricsThis is the biggestconceptual differenceto the standardEuclidean interpretation.Intermediate summary~~Regarding reparametrizationsMinimizing here,gives an invariantdistance measureOur goal is to define distances on latent representationsusing curve lengths in the ambient observation space.Intermediate summary~~The Riemannian metricdefines a position-dependent inner product, mimickshow infinitesimals behave in thetangent space:Intermediate summary~~This allow us to defineangles between curvesas well as distances.Basically, you can getwhat you expect from inner products-- just keep in mind thatit's only locally defined.To get a deeper understanding of distances, let us look at the length of a curveThe metric(definition)(chain rule)(definition of norm)We callthe (Riemannian) metricTo get a deeper understanding of distances, let us look at the length of a curveThe metric(definition)(chain rule)(definition of norm)We callthe (Riemannian) metricWe can think of this as defining a local (infinitessimal) inner productWith this notation we can, e.g., define a local normIdentifiabilitySøren Hauberg
1