Structure theorem

 

Structure theorem[edit]

The structure theorem is of central importance to TDA; as commented by G. Carlsson, "what makes homology useful as a discriminator between topological spaces is the fact that there is a classification theorem for finitely generated abelian groups."[3] (see the fundamental theorem of finitely generated abelian groups).

The main argument used in the proof of the original structure theorem is the standard structure theorem for finitely generated modules over a principal ideal domain.[9] However, this argument fails if the indexing set is .[3]

In general, not every persistence module can be decomposed into intervals.[70] Many attempts have been made at relaxing the restrictions of the original structure theorem.[clarification needed] The case for pointwise finite-dimensional persistence modules indexed by a locally finite subset of  is solved based on the work of Webb.[71] The most notable result is done by Crawley-Boevey, which solved the case of . Crawley-Boevey's theorem states that any pointwise finite-dimensional persistence module is a direct sum of interval modules.[72]

To understand the definition of his theorem, some concepts need introducing. An interval in  is defined as a subset  having the property that if  and if there is an  such that , then  as well. An interval module  assigns to each element  the vector space  and assigns the zero vector space to elements in . All maps  are the zero map, unless  and , in which case  is the identity map.[34] Interval modules are indecomposable.[73]

Although the result of Crawley-Boevey is a very powerful theorem, it still doesn't extend to the q-tame case.[70] A persistence module is q-tame if the rank of  is finite for all . There are examples of q-tame persistence modules that fail to be pointwise finite.[74] However, it turns out that a similar structure theorem still holds if the features that exist only at one index value are removed.[73] This holds because the infinite dimensional parts at each index value do not persist, due to the finite-rank condition.[75] Formally, the observable category  is defined as , in which  denotes the full subcategory of  whose objects are the ephemeral modules ( whenever ).[73]

Note that the extended results listed here do not apply to zigzag persistence, since the analogue of a zigzag persistence module over  is not immediately obvious.

Statistics[edit]

Real data is always finite, and so its study requires us to take stochasticity into account. Statistical analysis gives us the ability to separate true features of the data from artifacts introduced by random noise. Persistent homology has no inherent mechanism to distinguish between low-probability features and high-probability features.

One way to apply statistics to topological data analysis is to study the statistical properties of topological features of point clouds. The study of random simplicial complexes offers some insight into statistical topology. K. Turner et al.[76] offers a summary of work in this vein.

A second way is to study probability distributions on the persistence space. The persistence space  is , where  is the space of all barcodes containing exactly  intervals and the equivalences are  if .[77] This space is fairly complicated; for example, it is not complete under the bottleneck metric. The first attempt made to study it is by Y. Mileyko et al.[78] The space of persistence diagrams  in their paper is defined as

where  is the diagonal line in . A nice property is that  is complete and separable in the Wasserstein metric . Expectation, variance, and conditional probability can be defined in the Fréchet sense. This allows many statistical tools to be ported to TDA. Works on null hypothesis significance test,[79] confidence intervals,[80] and robust estimates[81] are notable steps.

A third way is to consider the cohomology of probabilistic space or statistical systems directly, called information structures and basically consisting in the triple (), sample space, random variables and probability laws.[82][83] Random variables are considered as partitions of the n atomic probabilities (seen as a probability (n-1)-simplex, ) on the lattice of partitions (). The random variables or modules of measurable functions provide the cochain complexes while the coboundary is considered as the general homological algebra first discovered by Hochschild with a left action implementing the action of conditioning. The first cocycle condition corresponds to the chain rule of entropy, allowing to derive uniquely up to the multiplicative constant, Shannon entropy as the first cohomology class. The consideration of a deformed left-action generalises the framework to Tsallis entropies. The information cohomology is an example of ringed topos. Multivariate k-Mutual information appear in coboundaries expressions, and their vanishing, related to cocycle condition, gives equivalent conditions for statistical independence.[84] Minima of mutual-informations, also called synergy, give rise to interesting independence configurations analog to homotopical links. Because of its combinatorial complexity, only the simplicial subcase of the cohomology and of information structure has been investigated on data. Applied to data, those cohomological tools quantifies statistical dependences and independences, including Markov chains and conditional independence, in the multivariate case.[85] Notably, mutual-informations generalize correlation coefficient and covariance to non-linear statistical dependences. These approaches were developed independently and only indirectly related to persistence methods, but may be roughly understood in the simplicial case using Hu Kuo Tin Theorem that establishes one-to-one correspondence between mutual-informations functions and finite measurable function of a set with intersection operator, to construct the Čech complex skeleton. Information cohomology offers some direct interpretation and application in terms of neuroscience (neural assembly theory and qualitative cognition [86]), statistical physic, and deep neural network for which the structure and learning algorithm are imposed by the complex of random variables and the information chain rule.[87]

Persistence landscapes, introduced by Peter Bubenik, are a different way to represent barcodes, more amenable to statistical analysis.[88] The persistence landscape of a persistent module  is defined as a function , where  denotes the extended real line and . The space of persistence landscapes is very nice: it inherits all good properties of barcode representation (stability, easy representation, etc.), but statistical quantities can be readily defined, and some problems in Y. Mileyko et al.'s work, such as the non-uniqueness of expectations,[78] can be overcome. Effective algorithms for computation with persistence landscapes are available.[89] Another approach is to use revised persistence, which is image, kernel and cokernel persistence.[90]

Applications[edit]

Classification of applications[edit]

More than one way exists to classify the applications of TDA. Perhaps the most natural way is by field. A very incomplete list of successful applications includes [91] data skeletonization,[92] shape study,[93] graph reconstruction,[94][95][96] [97] [98] image analysis, [99][100] material,[101][102] progression analysis of disease,[103][104] sensor network,[66] signal analysis,[105] cosmic web,[106] complex network,[107][108][109][110] fractal geometry,[111] viral evolution,[112] propagation of contagions on networks ,[113] bacteria classification using molecular spectroscopy,[114] hyperspectral imaging in physical-chemistry,[115] remote sensing,[116] feature selection,[117] and early warning signs of financial crashes.[118]

Another way is by distinguishing the techniques by G. Carlsson,[77]

one being the study of homological invariants of data one individual data sets, and the other is the use of homological invariants in the study of databases where the data points themselves have geometric structure.

Characteristics of TDA in applications[edit]

There are several notable interesting features of the recent applications of TDA:

  1. Combining tools from several branches of mathematics. Besides the obvious need for algebra and topology, partial differential equations,[119] algebraic geometry,[40] representation theory,[53] statistics, combinatorics, and Riemannian geometry[75] have all found use in TDA.
  2. Quantitative analysis. Topology is considered to be very soft since many concepts are invariant under homotopy. However, persistent topology is able to record the birth (appearance) and death (disappearance) of topological features, thus extra geometric information is embedded in it. One evidence in theory is a partially positive result on the uniqueness of reconstruction of curves;[120] two in application are on the quantitative analysis of Fullerene stability and quantitative analysis of self-similarity, separately.[111][121]
  3. The role of short persistence. Short persistence has also been found to be useful, despite the common belief that noise is the cause of the phenomena.[122] This is interesting to the mathematical theory.

One of the main fields of data analysis today is machine learning. Some examples of machine learning in TDA can be found in Adcock et al.[123] A conference is dedicated to the link between TDA and machine learning. In order to apply tools from machine learning, the information obtained from TDA should be represented in vector form. An ongoing and promising attempt is the persistence landscape discussed above. Another attempt uses the concept of persistence images.[124] However, one problem of this method is the loss of stability, since the hard stability theorem depends on the barcode representation.

Impact on mathematics[edit]

Topological data analysis and persistent homology have had impacts on Morse theory[citation needed]. Morse theory has played a very important role in the theory of TDA, including on computation. Some work in persistent homology has extended results about Morse functions to tame functions or, even to continuous functions[citation needed]. A forgotten result of R. Deheuvels long before the invention of persistent homology extends Morse theory to all continuous functions.[125]

One recent result is that the category of Reeb graphs is equivalent to a particular class of cosheaf.[126] This is motivated by theoretical work in TDA, since the Reeb graph is related to Morse theory and MAPPER is derived from it. The proof of this theorem relies on the interleaving distance.

Persistent homology is closely related to spectral sequences.[127][128] In particular the algorithm bringing a filtered complex to its canonical form[10] permits much faster calculation of spectral sequences than the standard procedure of calculating  groups page by page. Zigzag persistence may turn out to be of theoretical importance to spectral sequences

Comments

Popular posts from this blog

Regression and correlation analysis

Topological data analysis