Geometric Analysis

Geometric estimators calculate the "Intrinsic Dimension" (ID) based on distances between points, rather than variance of global projections. This is crucial for manifolds that are non-linear (e.g., a Swiss Roll).

The Swiss Roll Problem

A "Swiss Roll" is a 2D plane rolled up in 3D.

PCA will see it as 3D (because variance exists in x, y, z).
Geometric ID should see it as 2D (locally, it's a plane).

import numpy as np
import effdim
from sklearn.datasets import make_swiss_roll

# Generate Swiss Roll
X, _ = make_swiss_roll(n_samples=2000, noise=0.01)

# Compute dimensionalities
results = effdim.compute_dim(X)

# PCA
pca_dim = results['pca_explained_variance_95']
print(f"Global PCA Dimension: {pca_dim}")
# Likely 3, because the roll occupies 3D volume globally.

# kNN Intrinsic Dimension (MLE)
knn_dim = results['mle_dimensionality']
print(f"kNN Intrinsic Dimension: {knn_dim:.2f}")
# Should be close to 2.0

# Two-NN
twonn_dim = results['two_nn_dimensionality']
print(f"Two-NN Intrinsic Dimension: {twonn_dim:.2f}")
# Should be close to 2.0

When to use Geometric Estimators?

Non-linear manifolds: Image datasets (digits, faces) often lie on low-dimensional non-linear manifolds.
Manifold Learning: Checking if your autoencoder latent space has matched the intrinsic dimension of the data.
Local Analysis: Using pure geometry approaches can capture local variability better.

Limitations

Computational Cost: Requires computing nearest neighbors, which can be slow for large \(N\). effdim utilizes the highly efficient CFaiss implementation under the hood to speed this up.
Curse of Dimensionality: In extremely high dimensions, distance concentration can make geometric estimation unstable.