InterJournal Complex Systems, 1750
Status: Accepted
Manuscript Number: [1750]
Submission Date: 2006
Remembrance of experiments past: a redescription approach for knowledge discovery in complex systems
Author(s): Samantha Kleinberg ,Marco Antoniotti ,Satish Tadepalli ,Naren Ramakrishnan

Subject(s): CX.30



A complex system creates a “whole that is larger than the sum of its parts,” by coordinating many interacting simpler component processes. Yet, each of these processes is difficult to decipher as their visible signatures are only seen in a syntactic background, devoid of the context. Examples of such visible datasets are time-course description of gene-expression abundance levels, neural spike-trains, or click-streams for web pages. It has now become rather effortless to collect voluminous datasets of this nature; but how can we make sense of them and draw significant conclusions? For instance, in the case of time-course gene-expression datasets, rather than following small sets of known genes, can we develop a holistic approach that provides a view of the entire system as it evolves through time? We have developed GOALIE (Gene-Ontology for Algorithmic Logic and Invariant Extraction) – a systems biology application that presents global and dynamic perspectives (e.g., invariants) inferred collectively over a gene-expression dataset. Such perspectives are important in order to obtain a process-level understanding of the underlying cellular machinery; especially how cells react, respond, and recover from environmental changes. GOALIE uncovers formal temporal logic models of biological processes by redescribing time course microarray data into the vocabulary of biological processes and then piecing these redescriptions together into a Kripke structure. In such a model, possible worlds encode transcriptional states and are connected to future possible worlds by state transitions. An HKM (Hidden Kripke Model) constructed in this manner then supports various query, inference, and comparative assessment tasks, besides providing descriptive process-level summaries. The formal basis for GOALIE is a multi-attribute information bottleneck (IB) formulation, where we aim to retain the most relevant information about states and their transitions while at the same time compressing the number of syntactic signatures used for representing the data. We describe the mathematical formulation, software implementation, and a case study of the yeast (S. cerevisiae) cell cycle

Retrieve Manuscript
Submit referee report/comment

Public Comments: