SOV.AI
  • Data & Screens
  • GET STARTED
    • Blog (Screener)
    • 🚀Quick Start
    • ⭐Tutorials
    • 💻Installation
    • ⚒️Release Notes
    • 🔘About
  • REALTIME DATASETS
    • Equity Datasets
      • Accounting Data
      • Bankruptcy Predictions
      • Employee Visa
      • Earnings Surprise
      • Congressional Data
      • Factor Signals
      • Financial Ratios
      • Government Contracts
      • Institutional Trading
      • Insider Flow Prediction
      • Liquidity Data
      • Lobbying Data
      • News Sentiment
      • Price Breakout
      • Risk Indicators
      • SEC Edgar Search
      • SEC 10K Filings
      • Short Selling
      • Wikipedia Views
      • Patents Data
    • Economic Datasets
      • Asset Rotation
      • Core Economic Data
      • ETF Flows
      • Government Traffic
      • 🏳️Turing Risk Index
    • Sectorial Datasets
      • Airbnb Data
      • Box Office Stats
      • CFPB Complaints
      • Phrama Clinical Trials
      • Request Datasets
  • Asset Managment
    • Signal Evaluation
    • Weight Optimization
    • Screens and Filters
  • Pattern Recognition
    • Pairwise Distance
    • Anomaly Detection
    • Clustering Panels
  • Feature Processing
    • Extract Features
    • Neutralize Features
    • Select Features
    • Dimensionality Reduction
    • Feature Importance
  • Time Series
    • Nowcasting Series
    • TS Decomposition
    • Time Segmentation
  • Dashboard Examples
    • 🔰Bankruptcy Prediction
    • 🛰️Turing Risk Index
  • IMPORTANT LINKS
    • ⚙️Main Website
    • 👮Forum and Issues
    • 🙋Web Application
    • 📤LinkedIn
    • 🟢Buy Subscription
Powered by GitBook
On this page
  • Reduction Techniques
  • Usage Examples.
  • Advanced Usage
  • Performance Considerations

Was this helpful?

  1. Feature Processing

Dimensionality Reduction

Implements multiple reduction techniques including PCA, SVD, Factor Analysis, Gaussian Random Projection, and UMAP.

PreviousSelect FeaturesNextFeature Importance

Last updated 9 months ago

Was this helpful?

Tutorials are the best documentation —

Reduction Techniques

The module supports the following dimensionality reduction methods:

  • PCA (Principal Component Analysis)

  • Factor Analysis

  • Gaussian Random Projection

  • UMAP (Uniform Manifold Approximation and Projection)

Usage Examples.

Authenticate and load data

import sovai as sov
sov.token_auth(token="your_token_here")
df_mega = sov.data("accounting/weekly").select_stocks("mega").date_range("2018-01-01") 

1. Basic Usage with PCA

# Reduce dimensions using PCA
result = df_mega.reduce_dimensions(method="pca", n_components=10)
print(result.head())

2. Using Gaussian Random Projection

# Reduce dimensions using Gaussian Random Projection
result = df_mega.reduce_dimensions(method="gaussian_random_projection", n_components=10)
print(result.head())

3. UMAP with Verbose Output

# Reduce dimensions using UMAP with verbose output
result = df_mega.reduce_dimensions(method="umap", verbose=True, n_components=10)
print(result.head())

4. Factor Analysis

# Reduce dimensions using Factor Analysis with verbose output
result = df_mega.reduce_dimensions(method="factor_analysis", verbose=True, n_components=10)
print(result.head())

Advanced Usage

The underlying dimensionality_reduction function offers more control over the reduction process:

from dimensionality_reduction import dimensionality_reduction

# Assuming df is your input DataFrame
result = dimensionality_reduction(df, method='pca', explained_variance=0.95, verbose=True)
print(result.head())

This advanced usage allows for specifying the amount of variance to be explained if n_components is not provided.

Performance Considerations

  • The dimensionality reduction process can be computationally intensive, especially for large datasets or when using methods like UMAP.

  • PCA and Truncated SVD are generally faster than UMAP for large datasets.

  • Consider using a smaller number of components or a subset of your data if performance is a concern.

Dimensionality Reduction Tutorial