Pairwise Distance
Pairwise statistics for distance and similarity between stocks in cross-section, time-series, and panel orientations.
Tutorials
are the best documentation — Pairwise Distance Tutorial
Pairwise Distance Statistics Module
dataframe.distance()
Tutorial for Context.
Features
Cross-sectional distance calculations
Time-series distance calculations
Panel data distance calculations (Tucker decomposition)
Multiple distance metrics and statistical tests
Usage
The module is integrated into a custom DataFrame class, allowing for easy calculation of pairwise distances.
Distance Calculation Methods
1. Cross-Sectional Distance
Calculates distances between stocks based on their attributes at each time point.
Parameters:
orient
: Set to "cross-sectional"distance
: Distance metric (e.g., 'cosine', 'euclidean')calculations
: List of features to include in the distance calculation
Available Calculations:
mean
: Average valueskew
: Skewnessstd
: Standard deviationdiffm
: First difference meanzcr
: Zero crossing ratemac
: Mean absolute changesc
: Spectral centroidtp
: Turning pointsacl1
: Autocorrelation at lag 1hjorthm
: Hjorth mobilityhurst
: Hurst exponenthist
: Histogram mode (5 bins)timerev
: Time reversibility statistic
2. Time-Series Distance
Computes distances between stocks based on their time-series behavior.
Parameters:
orient
: Set to "time-series"metric
: Distance metric to use
Available Metrics:
pearson
: Pearson correlationspearman
: Spearman correlationdtw
: Dynamic Time Warpingeuclidean
: Euclidean distanceeuclidean_int
: Euclidean distance with interpolationpec
: Power Envelope Correlationfrechet
: Fréchet distancekl_divergence
: Kullback-Leibler divergencewasserstein
: Wasserstein distancejaccard
: Jaccard distancebray_curtis
: Bray-Curtis dissimilarityhausdorff
: Hausdorff distancemanhattan
: Manhattan distancechi2
: Chi-squared distancehellinger
: Hellinger distancecanberra
: Canberra distanceshannon_entropy
: Shannon entropy-based distancesample_entropy
: Sample entropyapprox_entropy
: Approximate entropyjensen_shannon
: Jensen-Shannon divergencerenyi_entropy
: Rényi entropytsallis_entropy
: Tsallis entropymutual_information
: Mutual information-based distance
3. Panel Data Distance
Utilizes Tucker decomposition to calculate distances considering both cross-sectional and time-series aspects.
Parameters:
orient
: Set to "panel"
Notes
The module handles missing values by imputing them with the median.
Some distance calculations may be computationally intensive for large datasets.
The Tucker decomposition for panel data provides an estimated rank of the decomposition.
Example
Date instead of Ticker
While previous examples focused on calculating distances between stocks, we can also compute distances between dates using the same methods.
This allows for analyzing how market conditions change over time.
Converting to Date
All previous distance calculation functions can be modified to work with dates by specifying on="date"
. Here are the key functions adapted for date-based analysis:
These functions calculate distances between different dates based on the market conditions or stock behaviors on those dates.
Sorting and Analyzing Date Distances
To analyze the distances for a specific date:
This code:
Selects the most recent date
Sorts the distances for that date
Displays the results as a transposed row
The output shows how similar or different market conditions on other dates were compared to the selected date, allowing for temporal analysis of market behavior.
This approach can help identify patterns, trends, or anomalous periods in market history by comparing the similarity of market conditions across different dates
Last updated