tablePandas Extensions

API reference for sovai.extensions.pandas_extensions

Module: sovai.extensions.pandas_extensions

Classes

CustomDataFrame

class CustomDataFrame(pd.DataFrame)

Attributes

  • attrs

Methods

filter()

def filter(
    self,
    conditions: Union[str, List[str]],
    verbose: bool = False,
) -> CustomDataFrame

Filter the DataFrame based on given conditions.

Parameters

Parameter
Type
Description

conditions

Union[str, List[str]]

A string or list of strings describing the filtering conditions.

verbose

bool

If True, print detailed information about the filtering process.

Returns

: A filtered CustomDataFrame.


merge_data()

Merge the current DataFrame with the combined DataFrame based on ticker and a specified column.

Parameters

Parameter
Type
Description

column

str

The column from the combined DataFrame to merge.

Returns

: A new CustomDataFrame with the merged data.


cointegration()

Calculate an approximate cointegration proxy using shifted cosine similarity.

Parameters

Parameter
Type
Description

df

Pandas DataFrame with MultiIndex.

level

The level of the MultiIndex to group by (default 'ticker').

shift

The number of periods to shift for lagged comparison (default 1 month).

Returns

: DataFrame of shifted cosine similarities.


normalize_min_max()

Apply Min-Max normalization.

Parameters

Parameter
Type
Description

matrix


select_features()

Selects features based on importance scores from various methods.

Parameters

Parameter
Type
Description

method

The method to use for calculating feature importance ('random_projection', 'fourier', 'ica', 'svd', 'sparse_projection').

n_components

Number of components to keep. If specified, this takes precedence over variability.

variability

The explained variance threshold (default 0.90).

Returns

: CustomDataFrame with selected features.


ticker()

Orthogonalizes the features of the DataFrame using the Gram-Schmidt process.

Parameters

Parameter
Type
Description

ticker

Default: 'AAPL'

Returns

: CustomDataFrame with orthogonalized features.


date()

Selects data for a specific date or date range from the DataFrame.

Parameters

Parameter
Type
Description

date_inputs

str or tuple of str or multiple str, the date(s) in any format

Returns

: CustomDataFrame with selected data


select_stocks()

Select stocks based on market capitalization category.

Parameters

Parameter
Type
Description

market_cap

str

Market capitalization category (e.g., "mega", "large", "mid", "small")

Returns

CustomDataFrame: Filtered dataframe containing only stocks of the specified market cap


date_range()

Selects data for a specific date range from the DataFrame.

Parameters

Parameter
Type
Description

date_inputs

str or multiple str, the date(s) in any format

Returns

: CustomDataFrame with selected data


extract_features()

Extracts features from the CustomDataFrame and returns a new CustomDataFrame with the extracted features.

Parameters

Parameter
Type
Description

entity_col

Default: 'ticker'

date_col

Default: 'date'

lookback

Default: None

features

Default: None

every

Default: 'all'

verbose

Default: False


reduce_dimensions()

Perform dimensionality reduction on the CustomDataFrame.

Parameters

Parameter
Type
Description

method

str

Dimensionality reduction method. Options: 'pca', 'truncated_svd', 'factor_analysis', 'gaussian_random_projection', 'umap'

explained_variance

float

Amount of variance to be explained (0 to 1)

verbose

bool

If True, print additional information

Returns

CustomDataFrame: Reduced data in panel format


weight_optimization()

Perform dimensionality reduction on the CustomDataFrame.

Parameters

Parameter
Type
Description

method

str

Dimensionality reduction method.

Options

'pca', 'truncated_svd', 'factor_analysis', 'gaussian_random_projection', 'umap'

explained_variance

float

Amount of variance to be explained (0 to 1)

verbose

bool

If True, print additional information

Returns

CustomDataFrame: Reduced data in panel format


signal_evaluator()

Perform weight optimization on the input multi-index DataFrame.

Parameters

Parameter
Type
Description

verbose

Default: False

Returns

SignalEvaluator: A SignalEvaluator object with optimized weights


feature_importance()

Computes feature importance using SHAP values based on multiple simulations.

Parameters

Parameter
Type
Description

num_simulations

The number of simulations to run (default 4).

clustering_method

The clustering method to use ('OPTICS' or 'KMeans').

Returns

: A DataFrame with average SHAP values per feature.



Last updated