# Clustering Panels

`Tutorials` are the best documentation — [<mark style="color:blue;">`Clustering Panels Tutorial`</mark>](https://colab.research.google.com/github/sovai-research/sovai-public/blob/main/notebooks/computational/Clustering%20Notebook.ipynb)

### Introduction

Can be used to cluster any panel dataset. It is particularly useful for financial analysts, data scientists, and researchers working with time-series data across multiple entities (e.g., stocks, companies) and variables.

#### Initialization

The CustomDataFrame can be initialized using the `sov.data()` function:

```python
import sovai as sov

sov.token_auth(token="your_token_here")
df = sov.data("accounting/weekly")
```

Basic Clustering

Perform clustering on all features:

```python
df_cluster = df.cluster()
```

<figure><img src="https://1304136543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCbqQ4ogM0YiEs5Z9Djdn%2Fuploads%2Fgit-blob-9f3998e1c3d51e31fc00c6b4386c84c2f5f59383%2Fclustering_panels_1.png?alt=media" alt=""><figcaption></figcaption></figure>

Feature-Specific Clustering

Cluster based on specific features:

```python
df_cluster_ebit = df.cluster(features=["ebit"])
df_cluster_multi = df.cluster(features=["total_assets", "total_debt", "ebit"])
```

#### Summary Clustering

Get a quick summary of the last 6-months data:

```python
df.cluster("summary")
```

<figure><img src="https://1304136543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCbqQ4ogM0YiEs5Z9Djdn%2Fuploads%2Fgit-blob-2e51671dd80d7a067905d5d93acdfebbd57eee9c%2Fclustering_panels_2.png?alt=media" alt=""><figcaption></figcaption></figure>

### Visualization Methods

#### Line Plot

Visualize cluster centroids and distances:

```python
df.cluster("line_plot")
```

<figure><img src="https://1304136543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCbqQ4ogM0YiEs5Z9Djdn%2Fuploads%2Fgit-blob-e3b1a6583e910d6cfdf7affd847fe8c4f3327fc9%2Fclustering_panels_3.png?alt=media" alt=""><figcaption></figcaption></figure>

**Scatter Plot**

Create a scatter plot of clustered data:

```python
df.cluster("scatter_plot")
```

<figure><img src="https://1304136543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCbqQ4ogM0YiEs5Z9Djdn%2Fuploads%2Fgit-blob-ec8603206a5e07319eadac2375a33adc979a7b8b%2Fclustering_panels_4.png?alt=media" alt=""><figcaption></figcaption></figure>

#### Animation Plot

Generate an animated plot of cluster evolution:

```python
df.cluster("animation_plot")
```

<figure><img src="https://1304136543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCbqQ4ogM0YiEs5Z9Djdn%2Fuploads%2Fgit-blob-0161e5a946c807ea4a0a817629285c133505f5bb%2Fclustering_panels_5.png?alt=media" alt=""><figcaption></figcaption></figure>

### Advanced Analysis

#### Distance Calculation

Calculate distances between ticker-cluster combinations:

```python
df_dist = df_cluster.drop(columns=["labels"]).distance(orient="time-series")
```

<figure><img src="https://1304136543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCbqQ4ogM0YiEs5Z9Djdn%2Fuploads%2Fgit-blob-83ecb2d84b60b2c647dca616a54e27773e0cfee1%2Fclustering_panels_6.png?alt=media" alt=""><figcaption></figcaption></figure>

### Examples

#### Basic Clustering and Visualization

```python
import sovai as sov

sov.token_auth(token="your_token_here")
df_accounting = sov.data("accounting/weekly")
df_mega = df_accounting.select_stocks("mega").date_range("2018-01-01")
df_cluster = df_mega.cluster()
df_mega.cluster("line_plot")
```

#### Feature-Specific Clustering and Distance Analysis

```python
df_cluster_ebit = df_mega.cluster(features=["ebit"])
df_dist = df_cluster_ebit.drop(columns=["labels"]).distance(orient="time-series")
similar_to_amzn = df_dist.sort_values(["AMZN"])[["AMZN"]].T
```
