SOV.AI
  • Data & Screens
  • GET STARTED
    • Blog (Screener)
    • 🚀Quick Start
    • ⭐Tutorials
    • 💻Installation
    • ⚒️Release Notes
    • 🔘About
  • REALTIME DATASETS
    • Equity Datasets
      • Accounting Data
      • Bankruptcy Predictions
      • Employee Visa
      • Earnings Surprise
      • Congressional Data
      • Factor Signals
      • Financial Ratios
      • Government Contracts
      • Institutional Trading
      • Insider Flow Prediction
      • Liquidity Data
      • Lobbying Data
      • News Sentiment
      • Price Breakout
      • Risk Indicators
      • SEC Edgar Search
      • SEC 10K Filings
      • Short Selling
      • Wikipedia Views
      • Patents Data
    • Economic Datasets
      • Asset Rotation
      • Core Economic Data
      • ETF Flows
      • Government Traffic
      • 🏳️Turing Risk Index
    • Sectorial Datasets
      • Airbnb Data
      • Box Office Stats
      • CFPB Complaints
      • Phrama Clinical Trials
      • Request Datasets
  • Asset Managment
    • Signal Evaluation
    • Weight Optimization
    • Screens and Filters
  • Pattern Recognition
    • Pairwise Distance
    • Anomaly Detection
    • Clustering Panels
  • Feature Processing
    • Extract Features
    • Neutralize Features
    • Select Features
    • Dimensionality Reduction
    • Feature Importance
  • Time Series
    • Nowcasting Series
    • TS Decomposition
    • Time Segmentation
  • Dashboard Examples
    • 🔰Bankruptcy Prediction
    • 🛰️Turing Risk Index
  • IMPORTANT LINKS
    • ⚙️Main Website
    • 👮Forum and Issues
    • 🙋Web Application
    • 📤LinkedIn
    • 🟢Buy Subscription
Powered by GitBook
On this page
  • Feature Importance Methods
  • Global Feature Importance
  • Feature Selection

Was this helpful?

  1. Feature Processing

Feature Importance

The feature importance module in the sovai library offers multiple unsupervised algorithms to quantify the significance of each feature in financial datasets.

PreviousDimensionality ReductionNextNowcasting Series

Last updated 6 months ago

Was this helpful?

Tutorials are the best documentation —

Feature Importance Methods

The module supports several methods for calculating feature importance:

Random Projection

df_mega.importance("random_projection")

Reflects how much each feature contributes to the variance in the randomly projected space.

Random Fourier Features

df_mega.importance("fourier")

Indicates how strongly each feature influences the approximation of non-linear relationships in the Fourier-transformed space.

Independent Component Analysis (ICA)

df_mega.importance("ica")

Based on the magnitude of each feature's contribution to the extracted independent components, representing underlying independent signals in the data.

Truncated Singular Value Decomposition (SVD)

df_mega.importance("svd")

Determined by each feature's influence on the principal singular vectors, which represent directions of maximum variance in the data.

Sparse Random Projection

df_mega.importance("sparse_projection")

Based on how much each feature contributes to the variance in the sparsely projected space, similar to standard Random Projection but with improved computational efficiency.

Clustered SHAP Ensemble

df_mega.importance("shapley")

Iteratively applies clustering, uses XGBoost to predict cluster membership, calculates SHAP values, and averages results across multiple runs to determine feature importance in identifying natural data structures.

Global Feature Importance

To calculate global feature importance across all methods:

df_mega.feature_importance()

Feature Selection

Example of selecting top features based on importance scores:

feature_importance = df_mega.importance("sparse_projection")
df_select = df_mega[feature_importance["feature"].head(25)]
Feature Importance Tutorial