0
1
2
3
4
0.1
0.15
0.2
0.25
GPs for Numericswith applications in DTI
Søren Hauberg, Cognitive Systems, Technical University of Denmark
Work in progress and in collaboration withMichael Schober (MPI-IS), Philipp Hennig (MPI-IS),Aasa Feragen (DIKU), Niklas Kasenburg (DIKU), Matt Liptrot (DIKU/DTU).
GPs for Numericswith applications in DTI
Søren Hauberg, Cognitive Systems, Technical University of Denmark
Abstract Goal:
Give you a first taste of a wonderfully simple mindset:"Probabilistic Numerics".
GPs for Numericswith applications in DTI
Søren Hauberg, Cognitive Systems, Technical University of Denmark
Abstract Goal:
Concrete Goals:
Give you a first taste of a wonderfully simple mindset:"Probabilistic Numerics".
1) Show you how to solve ordinary differentialequations (ODEs) using GP regression.
2) Show you medical example of this fortractography in DTI.
GPs for Numericswith applications in DTI
Søren Hauberg, Cognitive Systems, Technical University of Denmark
Outline:
The "Whats & Hows of ODEs".
A quick recap of GP regression.
How to build your own ODE solver inN easy steps.
Example: Probabilistic shortest pathtractography.
The Whats & Hows of ODEs
You are looking for a curve:
but I won't tell how the curve looks!
You are looking for a curve:
but I won't tell how the curve looks!
I'm going to give you some "hints" as to how thecurve looks:
ODE lingo
The "initial conditions"
The "ordinary differential equation (ODE)"
Q: Can you tell what the curve looks like?
A: In general, NO (it's a really hard question)
Numerical approximations are needed!
You are looking for a curve:
but I won't tell how the curve looks!
I'm going to give you some "hints" as to how thecurve looks:
You are looking for a curve:
but I won't tell how the curve looks!
I'm going to give you some "hints" as to how thecurve looks:
Let's draw a picture
0
1
2
3
4
0.1
0.15
0.2
0.25
Let's draw a picture
0
1
2
3
4
0.1
0.15
0.2
0.25
Let's draw a picture
The initial condition
Evaluations of the ODENow, how does the curve look?
I'm slowing down the tuneI never liked it fastYou want to get there soonI want to get there last
A word of warning!
Leonard Cohen, "Slow"
Classic Numerical ODE solvers
Classic Numerical ODE solvers
Euler's method
Classic Numerical ODE solvers
Euler's method
Runge-Kutta (RK4; the method people actually use)
(These equations constitute a landmark achievement of modern science!)
So, what just happened?
So, what just happened?
1) we evaluate f at select locations to get"observations" k1, k2, k3, and k4.
So, what just happened?
1) we evaluate f at select locations to get"observations" k1, k2, k3, and k4.
2) we estimate cn as a weighted averageof the observations.
So, what just happened?
1) we evaluate f at select locations to get"observations" k1, k2, k3, and k4.
2) we estimate cn as a weighted averageof the observations.
3) we estimate the curve with a linearoperation
So, what just happened?
1) we evaluate f at select locations to get"observations" k1, k2, k3, and k4.
2) we estimate cn as a weighted averageof the observations.
3) we estimate the curve with a linearoperation
Does any of this remind you of anything?
A quick recap ofGP Regression(for details, see Philipp's session)
Gaussians & Linear operators
Now consider
Then
Gaussians & Linear operators
If we make go to 0, then we see that A become the operatorwhich computes the derivative of the curve
All I'm really saying is that since Gaussians are closed underlinear operators, and since differentiation is linear, then thederivative of a GP is another GP:
Then
If we make go to 0, then we see that A become the operatorwhich computes the derivative of the curve
All I'm really saying is that since Gaussians are closed underlinear operators, and since differentiation is linear, then thederivative of a GP is another GP:
An example (to explain notation):
GP Regression in One Slide
From independent observations
GP Regression in One Slide
From independent observations
we make predictions
GP Regression in One Slide
From independent observations
we make predictions
where
Note that the mean prediction
is merely a weighted average of the observations;the weights are determined by the different covariances.
We are ready forGP ODE Solvers!
We want so solve
We want so solve
Pick "representer" t's
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
// estimate solution// with GP regression
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
// estimate solution// with GP regression
// extrapolate
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
-2
0
2
4
6
8
10
-4
-3
-2
-1
0
1
2
3
4
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
// estimate solution// with GP regression
// extrapolate
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
// estimate solution// with GP regression
// extrapolate
This is an observation;what is its variance?
// evaluate ODE
This is an observation;what is its variance?
// evaluate ODE
This is an observation;what is its variance?
This input is the resultof the previous iteration,i.e. it is stochastic.
// evaluate ODE
This is an observation;what is its variance?
For consistency, let's assume
// evaluate ODE
This is an observation;what is its variance?
For consistency, let's assume
One approximation of the variance isto linearize the ODE locally
// evaluate ODE
This is an observation;what is its variance?
For consistency, let's assume
Application specific choices may also work.
Which covariance function should we pick?
Application-specific choices should be chosen when possible(if you have prior knowledge, then use it!)
Which covariance function should we pick?
Application-specific choices should be chosen when possible(if you have prior knowledge, then use it!)
Otherwise:
Gives RK1
Gives RK2
Gives RK3
(See Schober et al., NIPS 2014 for details)
Enough Already!Show Some Applications!
Observation noise
Observation noise
Observation noise
Observation noise
Path uncertainty is directly related to data noise
Single path distriution
Data uncertainty
Example tracts
Dijkstra in CST
GP in CST
Dijkstra in ILF
GP in ILF
Let's do a Summary
I've shown you how to solve ODEs usingthe basic tools of GP regression. Simple isn't it?
A few references (for more see http://probabilistic-numerics.org/)
Skilling, J. (1991). Bayesian solution of ordinary differential equations.Maximum Entropy And Bayesian Methods, Seattle.
Hennig, P., & Hauberg, S. (2014).Probabilistic Solutions to Differential Equations and their Application to Riemannian Statistics. AISTATS.
Chkrebtii, O., Campbell, D. A., Girolami, M. A., & Calderhead, B. (2013). Bayesian Uncertainty Quantification for Differential Equations. ArXiv PrePrint 1306.2365.
Schober, M., Kasenburg, N., Feragen, A., Hennig, P., & Hauberg, S. (2014).Probabilistic shortest path tractography in DTI using Gaussian Process ODE solvers. MICCAI.
Schober, M., Duvenaud, D. K., & Hennig, P. (2014).Probabilistic ODE Solvers with Runge-Kutta Means. NIPS.
Barber, D. (2014). On solving Ordinary Differential Equations using Gaussian Processes. ArXiv Pre-Print 1408.3807.
Hauberg, S., Schober, M., Liptrot, M., Hennig, P., & Feragen, A. (2015).A Random Riemannian Metric for Probabilistic Shortest-Path Tractography. MICCAI.
Conrad, P. R., Girolami, M., Särkkä, S., Stuart, A., & Zygalakis, K. (2015).Probability Measures for Numerical Solutions of Differential Equations. ArXiv:1506.04592 [Stat].
I've shown you how to solve ODEs usingthe basic tools of GP regression. Simple isn't it?
This has many benefits:
A useful model of uncertainty(e.g. you can sample from the solution space)
I've shown you how to solve ODEs usingthe basic tools of GP regression. Simple isn't it?
This has many benefits:
A useful model of uncertainty(e.g. you can sample from the solution space)
You can optionally encode prior knowledge(just pick a suitable covariance function)
I've shown you how to solve ODEs usingthe basic tools of GP regression. Simple isn't it?
This has many benefits:
A useful model of uncertainty(e.g. you can sample from the solution space)
You can optionally encode prior knowledge(just pick a suitable covariance function)
You can easily represent uncertainty in the ODE itself(no alternative solvers really does that)
I've shown you how to solve ODEs usingthe basic tools of GP regression. Simple isn't it?
This has many benefits:
A useful model of uncertainty(e.g. you can sample from the solution space)
You can optionally encode prior knowledge(just pick a suitable covariance function)
You can easily represent uncertainty in the ODE itself(no alternative solvers really does that)
The approach generalizes to many other numerical tasks.(see http://probabilistic-numerics.org/)
Seriously,Why, oh why?Who cares about Numerics?
We all do numerics all the time (I invert at least one matrix every day).In my experience, once you allow yourself to start thinnking aboutsomething as "low-level" numerics you start to see the problems with your model.
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
// estimate solution// with GP regression
// extrapolate
Thoughts:
Structurally similar to Runge-Kutta.
Gives useful "error bars".
Easy to add prior knowledge.
Extends to boundary value problems.
Can be efficiently implemented asa filter.
We want so solve
Pick "representer" t's
A straight-forward algorithm is then
// evaluate ODE
// estimate solution// with GP regression
// extrapolate
Thoughts:
Structurally similar to Runge-Kutta.
Gives useful "error bars".
Easy to add prior knowledge.
Extends to boundary value problems.
Can be efficiently implemented asa filter.
Questions?
Do observations have noise?
Which covariance function?
Alternatives include the unscented transformor a Monte Carlo scheme.
image/svg+xml
A local distance is estimated froma Gaussian approximation to thediffusion process.
A local distance is estimated froma Gaussian approximation to thediffusion process.
Shortest paths are found by optimization
A local distance is estimated froma Gaussian approximation to thediffusion process.
Shortest paths are found by optimization
Such paths are solutions to the differential equation
which can be solved numerically.