Keith Hughitt
Graphical models provide a way to represent the conditional dependencies between a number of random variables. They provide a visual way of representing the joint distribution of the entire set of RVs.
Components:
Types:
Bayesian networks
(causal relationships)Markov random fields
/ Markov networks
See The Elements of Statistical Learning (ESM) chapter 17 for an overview of undirected graphical models.
Constructing graphical models from data:
Assume that the observations have a multivariate Gaussian distribution with mean \(\mu\) and covariance matrix \(\Sigma\).
The inverse covariance matrix \(\Theta = \Sigma^{-1}\) (aka concentration matrix or precision matrix) contains information about the partial covariances between each pair of nodes conditioned on all other variables (ESL.)
If the $ij$th component of \(\Theta\) is zero, then variables \(i\) and \(j\) are conditionally independent, given the other variables.
This is not necessarily true, however, if data is not MV gaussian! (see talk linked to at end of presentation.)
The basic goal of cell signaling to provide a mechanism for cells to respond convey a message from one part of a cell to another.
Cell signaling often begins at the cell membrane ("Signal transduction").
Lipid rafts help to bring together components that function together for a particular signaling pathway: receptors, co-receptors, etc.
It's not quite a 1:1 process...
Figure 1A: Network inference using flow cytometry data (Sachs et al 2005)
Flow cytometry a high-throughput single-cell method that is generally used for one of two things:
However, it can also be used to measure many other properties of cells (expression, morphology, enzymatic activity, etc.)
Figure 1B-C: Network inference using flow cytometry data (Sachs et al 2005)
Other authors have suggested exact solutions:
Interior point optimization is used to determine an exact maximiation.
Approach
In glasso
, the procedure stops when the average absolute change in \(W\) is
less than \(t \cdot \text{ave} |S^{-\text{diag}}|\) where \(S^{-\text{diag}}\) are
the off-diagonal elements of the empirical coveriance matrix \(S\) and \(t\) is
a fixed threshold, set by default at 0.001.
Sparse
Dense
Compared performance to COVSEL
method from Banerjee et al. (2007).
Result: Graphical lasso is 30-4000 times faster than COVSEL and 2-10 slower than the approximate method.
Even for dense problems, finishes in ~1min for p=1000 features. (Hard to tell from graph how it will scale to many more features, however).
To demonstrate the usefulness of the graphical lasso on real-world problems, the method is applied to cytometry data from Sachs et al. (2003)
Figure 2: Original network derived in Sachs et al.
Same network, but from the original Sachs et al. paper: better biological interpretation.
Figure 3: Undirected graphs generated using graphical lasso with various settings for the regularization parameter, \(\rho\).
Figure 4: Profile of coefficients as the total \(L_1\) norm of the coefficient vector increases and \(\rho\) decreases. (The density of network increases as \(L_1\) increases.)
Figure 5: LHS - tenfold cross-validation. RHS - regression sum of squares of the exact graphical lasso vs. Meinhausen-Buhlmann approximation.
glasso
.For a more recent (Dec 2012) discussion on this problem, check out this Neural Information Processing (NIPS) talk by Po-Ling Loh:
No voodoo here! Learning discrete graphical models via inverse covariance estimation
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-unknown-linux-gnu (64-bit)
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitcitations_0.4-7 bibtex_0.3-6 knitr_1.5
## [4] igraph_0.6.5-2 slidify_0.3.3 colorout_1.0-0
## [7] vimcom_0.9-9 setwidth_1.0-3
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.3 evaluate_0.5.1 formatR_0.9 httr_0.2
## [5] markdown_0.6.3 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2
## [9] whisker_0.3-2 XML_3.98-1.1 xtable_1.7-1 yaml_2.1.8