SOV.AI
  • Data & Screens
  • GET STARTED
    • Blog (Screener)
    • 🚀Quick Start
    • ⭐Tutorials
    • 💻Installation
    • ⚒️Release Notes
    • 🔘About
  • REALTIME DATASETS
    • Equity Datasets
      • Accounting Data
      • Bankruptcy Predictions
      • Employee Visa
      • Earnings Surprise
      • Congressional Data
      • Factor Signals
      • Financial Ratios
      • Government Contracts
      • Institutional Trading
      • Insider Flow Prediction
      • Liquidity Data
      • Lobbying Data
      • News Sentiment
      • Price Breakout
      • Risk Indicators
      • SEC Edgar Search
      • SEC 10K Filings
      • Short Selling
      • Wikipedia Views
      • Patents Data
    • Economic Datasets
      • Asset Rotation
      • Core Economic Data
      • ETF Flows
      • Government Traffic
      • 🏳️Turing Risk Index
    • Sectorial Datasets
      • Airbnb Data
      • Box Office Stats
      • CFPB Complaints
      • Phrama Clinical Trials
      • Request Datasets
  • Asset Managment
    • Signal Evaluation
    • Weight Optimization
    • Screens and Filters
  • Pattern Recognition
    • Pairwise Distance
    • Anomaly Detection
    • Clustering Panels
  • Feature Processing
    • Extract Features
    • Neutralize Features
    • Select Features
    • Dimensionality Reduction
    • Feature Importance
  • Time Series
    • Nowcasting Series
    • TS Decomposition
    • Time Segmentation
  • Dashboard Examples
    • 🔰Bankruptcy Prediction
    • 🛰️Turing Risk Index
  • IMPORTANT LINKS
    • ⚙️Main Website
    • 👮Forum and Issues
    • 🙋Web Application
    • 📤LinkedIn
    • 🟢Buy Subscription
Powered by GitBook
On this page
  • Feature Extraction Module
  • Feature Categories
  • Usage Examples
  • Advanced Usage

Was this helpful?

  1. Feature Processing

Extract Features

The feature extractor module generates features that can be categorized into several types based on the nature of the calculations.

PreviousClustering PanelsNextNeutralize Features

Last updated 6 months ago

Was this helpful?

Tutorials are the best documentation —

Feature Extraction Module

This module provides powerful feature extraction capabilities for time series data, particularly focused on financial and accounting metrics. It leverages the sovai library for data retrieval and a custom feature_extractor function for generating a wide range of statistical and time series features.

Feature Categories

The feature_extractor generates features that fall into several categories:

  • Statistical Features

  • Entropy and Complexity Features

  • Frequency and Streak Features

  • Energy and Magnitude Features

  • Distributional Features

  • Position Features

Usage Examples

import sovai as sov

# Authenticate and load data
sov.token_auth(token="your_token_here")
df_mega = sov.data("accounting/weekly").select_stocks("mega").date_range("2018-01-01")

1. Basic Usage with Default Parameters

# Extract features with default parameters
result = df_mega.extract_features(every="all")
print(result.head())

2. Weekly Rolling Features

# Extract features with a 12-week lookback, calculated weekly
result = df_mega.extract_features(lookback=12, every='week')
print(result.head())

3. Custom Feature List

# Extract specific features with custom parameters
custom_features = ["operating_working_capital", "cash_short_term"]
result = df_mega.extract_features(lookback=12, every='week', features=custom_features)
print(result.head())

4. Monthly Rolling Features

# Use monthly rolling features with a 2-month lookback
result = df_mega.extract_features(lookback='2mo', every='month')
print(result.head())

Advanced Usage

The underlying feature_extractor function offers more granular control over the feature extraction process. It can be used directly for more advanced use cases:

import polars as pl
from feature_extractor import feature_extractor

# Assuming df is your input DataFrame
result = feature_extractor(df, entity_col='ticker', date_col='date', 
                           lookback='1mo', every='week', verbose=True)
print(result.head())

This advanced usage allows for more customization, including specifying entity and date columns, adjusting lookback periods, and enabling verbose output for debugging.

Statistical Features

  • Mean and Variance Related:

    • mean_abs_change

    • variation_coefficient

    • mean_change

    • mean_second_derivative_central

Entropy and Complexity Features

  • Entropy:

    • binned_entropy

  • Complexity:

    • lempel_ziv_complexity

Frequency and Streak Features

  • Frequency:

    • number_crossings

    • number_peaks

  • Streak:

    • longest_streak_above_mean

    • longest_losing_streak

    • longest_winning_streak

Energy and Magnitude Features

  • Energy:

    • absolute_energy

  • Magnitude:

    • absolute_maximum

    • absolute_sum_of_changes

    • max_abs_change

Statistical and Distributional Features

  • Statistical:

    • root_mean_square

    • ratio_beyond_r_sigma

  • Distributional:

    • benford_correlation

    • percent_reoccurring_points

    • percent_reoccurring_values

Position Features

  • Positions:

    • first_location_of_maximum

    • first_location_of_minimum

    • last_location_of_maximum

    • last_location_of_minimum

These categories help organize the wide range of features generated, which capture different aspects of the time series data, making them useful for various analytical and predictive tasks.

Extract Features Tutorial