# Chapter 6 Clustering

## 6.1 Clustering algorithm

The clustering algorithm of Seurat is graph-based. Based on the identified principal components, Seurat finds the k nearest neighbours of each cell and builds a new matrix based on it. The edge weight between two cells is computed with the intersection of set of neighbours of each of the two cells. This consideration defines a new metric of similarity between cells. If two cells share the same nearest neighbors, they are more likely to be of the same kind (cell type, cell state, etc). To find clusters on this new metric, an algorithm optimizes the standard modularity function, used to measure the strength of communities inside a larger network.

See Macosko et al. 2015 and Seurat on Github.

`## Computing nearest neighbor graph`

`## Computing SNN`

## 6.2 tSNE and UMAP projections

Seurat utilises both tSNE and UMAP algorithms to help visualize data. Both algorithms take the data computed by the PCA as input (euclidian metric) and projects it into a two or three dimensions space. They are non-linear projection algorithm that try to keep the distance between cells from the input high-dimensional space to fill the reduced output space.

```
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
```

`## 12:00:52 UMAP embedding parameters a = 0.9922 b = 1.112`

`## 12:00:52 Read 2799 rows and found 10 numeric columns`

`## 12:00:52 Using Annoy for neighbor search, n_neighbors = 30`

`## 12:00:52 Building Annoy index with metric = cosine, n_trees = 50`

`## 0% 10 20 30 40 50 60 70 80 90 100%`

`## [----|----|----|----|----|----|----|----|----|----|`

```
## **************************************************|
## 12:00:52 Writing NN index file to temp file /var/folders/87/yvyz5xss5l57q5xhy9cmcgw00000gn/T//RtmpWZHzd5/file3cdd6f9d9ffa
## 12:00:52 Searching Annoy index using 1 thread, search_k = 3000
## 12:00:53 Annoy recall = 100%
## 12:00:53 Commencing smooth kNN distance calibration using 1 thread
## 12:00:54 Initializing from normalized Laplacian + noise
## 12:00:54 Commencing optimization for 500 epochs, with 111782 positive edges
## 12:00:58 Optimization finished
```

## 6.3 Projection results

Each point on this map represents a cell, colored by the cluster to which it belongs. We will now search the most differentially expressed genes between clusters to assign a cluster to a cellular type or state.