Lab Home | Phone | Search | ||||||||
|
||||||||
From gravitational fluctuations in space-time to signaling events in living cells, natural phenomena are often driven by complex stochastic processes. Ability to disambiguate the underlying physics or biology from realistic finite-time observations, in the absence of a priori system knowledge, is key to unraveling some of the most challenging scientific mysteries of our time. Such investigation is crucially dependent on the ability to compare and contrast data - to identify connections and spot outliers. The discriminating characteristics to look for in data is often determined by heuristics designed by experts, e.g. , distinct shapes of “folded†lightcurves may be used as “features†to classify variable stars, while determination of pathological brain states might require a Fourier analysis of brainwave activity. Finding good features is non-trivial, and presents the key bottle-neck in automating the search for novel phenomena. Here, we propose a universal solution to this problem: we delineate a principle for quantifying universal causal similarity between sources of arbitrary data streams, without a priori knowledge, features or training. We uncover an algebraic structure on a space of symbolic models for quantized data, and show that such stochastic generators may be added and uniqely inverted; and that a model and its inverse always sum to the generator of flat white noise. Therefore, every data stream has an anti-stream: data generated by the inverse model. Similarity between two streams, then, is the degree to which one, when summed to the other’s anti-stream, mutually annihilates all statistical structure to noise. We call this data smashing. We present diverse applications, including disambiguation of brainwaves pertaining to epileptic seizures, detection of anomalous cardiac rhythms, cognitive finger-printing, and classification of astronomical objects from raw photometry. In our examples, the data smashing principle, without access to any domain knowledge, meets or exceeds the performance of specialized algorithms tuned by domain experts. Finally, we show that how such zero-knowledge techniques lay the framework for seeking out incipient causality networks in complex systems, primary examples being that of emerging high order positional correlations in molecular evolution of retro-viral genomes, causal connections in high-frequency price fluctuations in the financial market, and long-range spatial dependencies in seismic events. Host: Marian Anghel |