Extract Features
The feature extractor module generates features that can be categorized into several types based on the nature of the calculations.
Tutorials are the best documentation — Extract Features Tutorial
Feature Extraction Module
This module provides powerful feature extraction capabilities for time series data, particularly focused on financial and accounting metrics. It leverages the sovai library for data retrieval and a custom feature_extractor function for generating a wide range of statistical and time series features.
Feature Categories
The feature_extractor generates features that fall into several categories:
Statistical Features
Entropy and Complexity Features
Frequency and Streak Features
Energy and Magnitude Features
Distributional Features
Position Features
Usage Examples
import sovai as sov
# Authenticate and load data
sov.token_auth(token="your_token_here")
df_mega = sov.data("accounting/weekly").select_stocks("mega").date_range("2018-01-01")1. Basic Usage with Default Parameters

2. Weekly Rolling Features
3. Custom Feature List
4. Monthly Rolling Features
Advanced Usage
The underlying feature_extractor function offers more granular control over the feature extraction process. It can be used directly for more advanced use cases:
This advanced usage allows for more customization, including specifying entity and date columns, adjusting lookback periods, and enabling verbose output for debugging.
Statistical Features
Mean and Variance Related:
mean_abs_changevariation_coefficientmean_changemean_second_derivative_central
Entropy and Complexity Features
Entropy:
binned_entropy
Complexity:
lempel_ziv_complexity
Frequency and Streak Features
Frequency:
number_crossingsnumber_peaks
Streak:
longest_streak_above_meanlongest_losing_streaklongest_winning_streak
Energy and Magnitude Features
Energy:
absolute_energy
Magnitude:
absolute_maximumabsolute_sum_of_changesmax_abs_change
Statistical and Distributional Features
Statistical:
root_mean_squareratio_beyond_r_sigma
Distributional:
benford_correlationpercent_reoccurring_pointspercent_reoccurring_values
Position Features
Positions:
first_location_of_maximumfirst_location_of_minimumlast_location_of_maximumlast_location_of_minimum
These categories help organize the wide range of features generated, which capture different aspects of the time series data, making them useful for various analytical and predictive tasks.
Last updated
Was this helpful?