Last updated
Was this helpful?
Last updated
Was this helpful?
Tutorials
are the best documentation —
dataframe.distance()
Cross-sectional distance calculations
Time-series distance calculations
Panel data distance calculations (Tucker decomposition)
Multiple distance metrics and statistical tests
The module is integrated into a custom DataFrame class, allowing for easy calculation of pairwise distances.
Calculates distances between stocks based on their attributes at each time point.
Parameters:
orient
: Set to "cross-sectional"
distance
: Distance metric (e.g., 'cosine', 'euclidean')
calculations
: List of features to include in the distance calculation
Available Calculations:
mean
: Average value
skew
: Skewness
std
: Standard deviation
diffm
: First difference mean
zcr
: Zero crossing rate
mac
: Mean absolute change
sc
: Spectral centroid
tp
: Turning points
acl1
: Autocorrelation at lag 1
hjorthm
: Hjorth mobility
hurst
: Hurst exponent
hist
: Histogram mode (5 bins)
timerev
: Time reversibility statistic
Computes distances between stocks based on their time-series behavior.
Parameters:
orient
: Set to "time-series"
metric
: Distance metric to use
Available Metrics:
pearson
: Pearson correlation
spearman
: Spearman correlation
dtw
: Dynamic Time Warping
euclidean
: Euclidean distance
euclidean_int
: Euclidean distance with interpolation
pec
: Power Envelope Correlation
frechet
: Fréchet distance
kl_divergence
: Kullback-Leibler divergence
wasserstein
: Wasserstein distance
jaccard
: Jaccard distance
bray_curtis
: Bray-Curtis dissimilarity
hausdorff
: Hausdorff distance
manhattan
: Manhattan distance
chi2
: Chi-squared distance
hellinger
: Hellinger distance
canberra
: Canberra distance
shannon_entropy
: Shannon entropy-based distance
sample_entropy
: Sample entropy
approx_entropy
: Approximate entropy
jensen_shannon
: Jensen-Shannon divergence
renyi_entropy
: Rényi entropy
tsallis_entropy
: Tsallis entropy
mutual_information
: Mutual information-based distance
Utilizes Tucker decomposition to calculate distances considering both cross-sectional and time-series aspects.
Parameters:
orient
: Set to "panel"
The module handles missing values by imputing them with the median.
Some distance calculations may be computationally intensive for large datasets.
The Tucker decomposition for panel data provides an estimated rank of the decomposition.
While previous examples focused on calculating distances between stocks, we can also compute distances between dates using the same methods.
This allows for analyzing how market conditions change over time.
All previous distance calculation functions can be modified to work with dates by specifying on="date"
. Here are the key functions adapted for date-based analysis:
These functions calculate distances between different dates based on the market conditions or stock behaviors on those dates.
To analyze the distances for a specific date:
This code:
Selects the most recent date
Sorts the distances for that date
Displays the results as a transposed row
The output shows how similar or different market conditions on other dates were compared to the selected date, allowing for temporal analysis of market behavior.
This approach can help identify patterns, trends, or anomalous periods in market history by comparing the similarity of market conditions across different dates
Pairwise statistics for distance and similarity between stocks in cross-section, time-series, and panel orientations.