Clustering Panels

Clustering specifically designed for multivariate panel clustering of financial and time-series data

Tutorials are the best documentation — Clustering Panels Tutorial

Introduction

Can be used to cluster any panel dataset. It is particularly useful for financial analysts, data scientists, and researchers working with time-series data across multiple entities (e.g., stocks, companies) and variables.

Initialization

The CustomDataFrame can be initialized using the sov.data() function:

import sovai as sov

sov.token_auth(token="your_token_here")
df = sov.data("accounting/weekly")

Basic Clustering

Perform clustering on all features:

df_cluster = df.cluster()

Feature-Specific Clustering

Cluster based on specific features:

df_cluster_ebit = df.cluster(features=["ebit"])
df_cluster_multi = df.cluster(features=["total_assets", "total_debt", "ebit"])

Summary Clustering

Get a quick summary of the last 6-months data:

df.cluster("summary")

Visualization Methods

Line Plot

Visualize cluster centroids and distances:

df.cluster("line_plot")

Scatter Plot

Create a scatter plot of clustered data:

df.cluster("scatter_plot")

Animation Plot

Generate an animated plot of cluster evolution:

df.cluster("animation_plot")

Advanced Analysis

Distance Calculation

Calculate distances between ticker-cluster combinations:

df_dist = df_cluster.drop(columns=["labels"]).distance(orient="time-series")

Examples

Basic Clustering and Visualization

import sovai as sov

sov.token_auth(token="your_token_here")
df_accounting = sov.data("accounting/weekly")
df_mega = df_accounting.select_stocks("mega").date_range("2018-01-01")
df_cluster = df_mega.cluster()
df_mega.cluster("line_plot")

Feature-Specific Clustering and Distance Analysis

df_cluster_ebit = df_mega.cluster(features=["ebit"])
df_dist = df_cluster_ebit.drop(columns=["labels"]).distance(orient="time-series")
similar_to_amzn = df_dist.sort_values(["AMZN"])[["AMZN"]].T

Last updated