# The Data Scientist’s Guide to Topological Data Analysis: Preamble

Topological Data Analysis, abbreviated TDA, is a suite of data analytic methods inspired by the mathematical field of algebraic topology. TDA is attractive yet elusive for most data scientists, since its potential as a data exploration tool is often communicated through esoteric terminology unfamiliar to non-mathematicians. The purpose of this guide is to bridge the communication gap between academia and industry, so that non-mathematician data scientists may add current TDA methods to their analytic toolkits and anticipate new developments in the field of TDA.

The guide begins with an overview of Mapper, a TDA algorithm that has recently transitioned from academia to industry with commercial success. We explain the Mapper algorithm, demo open-source software, and present a handful of its commercial use-cases (some of which are original). Then, we switch to persistent homology, a TDA method that has not yet broken through to industry but is supported by a growing body of academic work. We explain the intuition behind homotopy, approximation, homology, and persistence, and demo open-source persistent homology software. It is hoped that the data scientist reading this guide will be inspired to give Mapper a try in their future analytic work, and be on the lookout for future developments in persistent homology that push it from academia to industry.

## Mapper

*Algorithm*. The Mapper algorithm maps high-dimensional data into smaller networks that retain the main topological features of the data and are easy to visualize.*Software*. To run the Mapper algorithm on small to medium-size datasets, one can use the open source R package TDAmapper.*Use-Cases at Ayasdi*. On a larger scale, Mapper has been used commercially by the company Ayasdi to forecast returns, detect fraud, aid in oil and gas exploration, plan ad campaigns, and discover biomarkers.*Use-Cases at Aunalytics*. At Aunalytics, Mapper (via R's TDAmapper) provided granular insights on a location tracking dataset, and revealed insights in a sparse call-center dataset even though there was little cohesion in the resulting network.

## Persistent Homology

*Homotopy*. Algebraic topology aims to describe the connectivity of any arbitrary space. It does this by computing the homotopy, or number of "loops" in each dimension.*Approximation*. In computational topology, datasets can be interpreted as samples taken from an underlying topological space, and for any given margin of error a topology can be constructed to approximate the underlying space.*Homology*. Homotopy groups are extremely difficult to compute in high dimensions. Homology is a similar concept which can be easier to compute.*Persistence*. Persistence barcode plots show which topological features persist through many scales of the data, and can be used to calculate similarity between different spaces.*Software*. To compute persistent homology of small to medium-size datasets, one can use the open source R package TDA.