Feature Importance

The feature importance module in the sovai library offers multiple unsupervised algorithms to quantify the significance of each feature in financial datasets.

Tutorials are the best documentation — Feature Importance Tutorial

Feature Importance Methods

The module supports several methods for calculating feature importance:

Random Projection

df_mega.importance("random_projection")

Reflects how much each feature contributes to the variance in the randomly projected space.

Random Fourier Features

df_mega.importance("fourier")

Indicates how strongly each feature influences the approximation of non-linear relationships in the Fourier-transformed space.

Independent Component Analysis (ICA)

df_mega.importance("ica")

Based on the magnitude of each feature's contribution to the extracted independent components, representing underlying independent signals in the data.

Truncated Singular Value Decomposition (SVD)

df_mega.importance("svd")

Determined by each feature's influence on the principal singular vectors, which represent directions of maximum variance in the data.

Sparse Random Projection

df_mega.importance("sparse_projection")

Based on how much each feature contributes to the variance in the sparsely projected space, similar to standard Random Projection but with improved computational efficiency.

Clustered SHAP Ensemble

df_mega.importance("shapley")

Iteratively applies clustering, uses XGBoost to predict cluster membership, calculates SHAP values, and averages results across multiple runs to determine feature importance in identifying natural data structures.

Global Feature Importance

To calculate global feature importance across all methods:

df_mega.feature_importance()

Feature Selection

Example of selecting top features based on importance scores:

feature_importance = df_mega.importance("sparse_projection")
df_select = df_mega[feature_importance["feature"].head(25)]

Last updated