# Dimensionality Reduction

`Tutorials` are the best documentation — [<mark style="color:blue;">`Dimensionality Reduction Tutorial`</mark>](https://colab.research.google.com/github/sovai-research/sovai-public/blob/main/notebooks/computational/Dimensionality%20Reduction.ipynb)

### Reduction Techniques

The module supports the following dimensionality reduction methods:

* PCA (Principal Component Analysis)
* Factor Analysis
* Gaussian Random Projection
* UMAP (Uniform Manifold Approximation and Projection)

### Usage Examples.

#### Authenticate and load data

```python
import sovai as sov
sov.token_auth(token="your_token_here")
df_mega = sov.data("accounting/weekly").select_stocks("mega").date_range("2018-01-01") 
```

#### 1. Basic Usage with PCA

```python
# Reduce dimensions using PCA
result = df_mega.reduce_dimensions(method="pca", n_components=10)
print(result.head())
```

#### 2. Using Gaussian Random Projection

```python
# Reduce dimensions using Gaussian Random Projection
result = df_mega.reduce_dimensions(method="gaussian_random_projection", n_components=10)
print(result.head())
```

#### 3. UMAP with Verbose Output

```python
# Reduce dimensions using UMAP with verbose output
result = df_mega.reduce_dimensions(method="umap", verbose=True, n_components=10)
print(result.head())
```

#### 4. Factor Analysis

```python
# Reduce dimensions using Factor Analysis with verbose output
result = df_mega.reduce_dimensions(method="factor_analysis", verbose=True, n_components=10)
print(result.head())
```

### Advanced Usage

The underlying `dimensionality_reduction` function offers more control over the reduction process:

```python
from dimensionality_reduction import dimensionality_reduction

# Assuming df is your input DataFrame
result = dimensionality_reduction(df, method='pca', explained_variance=0.95, verbose=True)
print(result.head())
```

This advanced usage allows for specifying the amount of variance to be explained if `n_components` is not provided.

### Performance Considerations

* The dimensionality reduction process can be computationally intensive, especially for large datasets or when using methods like UMAP.
* PCA and Truncated SVD are generally faster than UMAP for large datasets.
* Consider using a smaller number of components or a subset of your data if performance is a concern.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sov.ai/feature-processing/dimensionality-reduction.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
