Projects

4 min read

ATS: Adaptive Template Systems

The use of persistent homology to aid in machine learning is one of the main applications of applied topology to data science. One of the biggest challenges in this area arises when looking for a reliable and robust method to extract features from persistent diagrams. Different approaches have been taken to answer to this need, one of them in the form of Template Functions. These are continuous real-valued compactly supported functions on $W := \lbrace \left( x_1,x_2 \right) \in \mathbb{R}^2 \vert 0 \leq x_1 < x_2 \rbrace$, which when integrated against persistence measures (whereby a diagram is replaced by a sum of Dirac deltas) yields a continuous functions on $\mathcal{D}$, the space of persistent diagram.

The same work shows that one can construct countable families of template functions, or template systems, which in turn give rise to dense subsets of $C(\mathcal{D},\mathbb{R})$ with respect to the compact-open topology. As well as provides theoretical means to use such template systems to generate polynomial features for supervised machine learning problems on persistent diagrams.

Our contribution consist of a reliable methodology to produce template systems that are attuned (adaptive) to the input data set and the specific supervised learning problem (Polanco & Perea 18th IEEE ICMLA, 2019). We tested our approach in synthetic data sets, benchmark data sets like the shape classification problem in SHREC14 and real applications like protein classification, obtaining comparable or better classification results than the state of the art.

It is of important notice that when working on protein classification problems, the state of the art consisted of handpicked features, while our methodology provides an automated process for feature extraction. The implementation for our adaptive template systems can be found in Github for every one to access and replicate our results.

Lens coordinates

One of the main problem in Topological Data Analysis is how to use topological signatures like persistent cohomology to help parametrize spaces underlying a given data set. This question becomes relevant whenever nonlinear dimensionality reduction is a factor to take into consideration since the presence of nontrivial topology features like loops, voids, non-orientability, torsion, et. can prevent accurate descriptions with low-dimensional Euclidean coordinates.

Our results (Polanco & Perea CCCG ,2019) show that we can use tools like classifying spaces and the Brown’s representability theorem to successfully map data sets into Lens spaces. To firmly understand the nature of this “data embedding” we developed a dimensionality reduction algorithm on $L_q^n$ inspired by Principal Component Analysis, since the coordinatization obtained from the previous methodology is potentially high dimensional. We call this algorithm LPCA, which allow us to produce families of point cloud of decreasing dimension that minimizes an appropriate notion of distortion. Such an example of the end result can be seen on the figure on the left where we see the end result after computing Lens coordinates and running our dimensionality reduction algorithm LPCA to obtain a low dimensional coordinatization of a sampling of the Moore space $M(\mathbb{Z}_3, 1)$ into the Lens space $L_3^2$. Our implementation for Lens coordinates and LPCA can be accessed on the following repository on Github.

Cup products and quasi-periodicity detection

Many natural phenomenons are characterized by their periodic nature, including animal locomotion, biological processes, pendulums, etc. Part of understanding periodic process is being able to differentiate them from quasiperiodic occurrences. Significant advances have been made (Tralie & Perea, SIAM Journal of ImagingSciences, 2018) to use topological data analysis to classy and understand quasiperiodic signals. This methodologies make use of persistent 2-dimensional homology to obtain quasiperiodic scores that indicate the degree of periodicity or quiasiperiodicity of a signal. There is a significant computational disadvantage in this approach since it requires, the often expensive, computation of large simplicial complexes.

Our contribution in this area uses the algebraic structure of the cohomology ring to obtain classes in the $2$-dimensional persistent diagram by only computing classes in dimension $1$, saving valuable computational time in this manner and obtaining more reliable quasiperiodicity scores. On the right we see an example of the persistent diagram and persistent cup product of two $1$-dimensional classes computed from the delayed embedding of a the quasi periodic function $f(t) = \cos(5t) + \cos(5\sqrt{3}t)$, which is homoemorphic to a $2$-dimensional torus.

For those interested in checking our implementation of this quasiperiodic detection score, the base code is available in Github.