Modeling molecules

Bayesian approaches to understanding protein biophysics

Keegan Hines, Tom Middendorf, Rick Aldrich

Proteins

  • Fundamental unit of computation and signal processing in biological systems
  • Fold into complex structures which determine their function

Proteins

  • Proteins are dynamic, exploring very many conformational states.
  • What can we do to understand these dynamics?

Jensen et al., 2012, Mechanism of Voltage Gating in Potassium Channels, Science , 336, 6078.

Modeling Proteins

Physiological relevance- we only need to account for some of this complexity

Calmodulin

Calcium signaling via calmodulin plays a vital role in many biological processes including ion channel modulation and synaptic plasticity.

Calmodulin

  • A sequential binding model is often used to study CaM
  • Current estimates of binding parameters vary wildly

Sequential Binding Model

Large regions of this parameter space can fit any data extremely well

Parameter Identifiability

Parameter Identifiability

Practical Non-identifiability

Structural Non-identifiability

Parameters cannot be inferred accurately even with noiseless data

Identifying Identifiability

Analytical methods exist, but can only be used in special cases. Worse, such methods can be misleading, as in the case of practical non-identifiability.



We might calculate the Error (likelihood) over the whole parameter space, but this is infeasible for many parameters.



We need an efficent way to identify the regions of parameter space that lead to good agreement with the data.

Bayesian Inference

The posterior distribution quantifies which regions of the parameter space provide a good explanation of the data.


Bayes' rule specifies how to calculate posterior probability, and Markov chain Monte Carlo provides an efficient method to estimate high-dimensional posterior distributions.

Markov chain Monte Carlo

Estimate a probability distribution by drawing samples from it.

Markov chain Monte Carlo

Additional Applications

Dynamical Systems

Additional Applications

Dynamical Systems

Additional Applications

Dynamical Systems

Additional Applications

Dynamical Systems (Non-Identifiable)

Additional Applications

Dynamical Systems (Non-Identifiable)

Additional Applications

Hidden Markov Models

Model Selection and Experimental Design

Determining non-identifiability forces the investigator to consider simpler models or to design innovative experimentation to provide data which can better constrain the parameters.



In the case of calcium binding to calmodulin, novel experiments have been developed to measure site specific binding curves (Jenni Greeson-Bernier).

Conclusions

  • Mechanistic models of proteins systems are important, though merely fitting data to models is insufficient: fits may not be unique
  • Non-identifiability is a concern not only for large and complex models, but also for extremely simple 2- and 3-parameter biophysical systems
  • New methods are required to determine the accuracy and identifiabilty of nonlinear models
  • Bayesian inference (& MCMC) is well suited to provide accurate parameter estimates and a direct assessment of identifiability
  • This approach will yield more accurate modeling and will force more innovative experimentation