API Reference
Main Interface
compute_dim(data)
Compute the effective dimensionality of the given data using the specified method.
Parameters:
data : Union[np.ndarray, List[np.ndarray]] Input data. Can be a single numpy array or a list of numpy arrays. Returns: dict A dictionary containing the results of the effective dimensionality computation.
Source code in src/effdim/api.py
Metrics (Spectral)
geometric_mean_eff_dimensionality(spectrum)
Compute the Geometric Mean Effective Dimensionality of the given spectrum.
Parameters:
spectrum : np.ndarray Array of eigenvalues.
Returns:
float Geometric Mean Effective Dimensionality value.
Source code in src/effdim/metrics.py
participation_ratio(spectrum)
Compute the Participation Ratio (PR) of the given spectrum.
Parameters:
spectrum : np.ndarray Array of eigenvalues.
Returns:
float Participation Ratio value.
Source code in src/effdim/metrics.py
pca_explained_variance(spectrum, threshold=0.95)
Compute the number of principal components required to explain a given threshold of variance.
Parameters:
spectrum : np.ndarray Array of eigenvalues (explained variance) from PCA. threshold : float The cumulative variance threshold to reach (between 0 and 1).
Returns:
int Number of principal components needed to reach the threshold.
Source code in src/effdim/metrics.py
renyi_eff_dimensionality(probabilities, alpha)
Compute the Rényi Effective Dimensionality of the given probability distribution.
Parameters:
probabilities : np.ndarray Array of probabilities. alpha : float Order of the Rényi entropy (alpha > 0 and alpha != 1).
Returns:
float Rényi Effective Dimensionality value.
Source code in src/effdim/metrics.py
shannon_entropy(probabilities)
Compute the Shannon Entropy of the given probability distribution.
Parameters:
probabilities : np.ndarray Array of probabilities.
Returns:
float Shannon Entropy value.
Source code in src/effdim/metrics.py
Geometry (Spatial)
compute_knn_distances(data, k)
Compute k nearest neighbors distances for each point in data. Returns squared distances. Excludes the point itself (distance 0).
Source code in src/effdim/geometry.py
danco_dimensionality(data, k=10, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using DANCo (Dimensionality from Angle and Norm Concentration). Exploits the concentration of angles between nearest neighbor vectors. Uses FAISS for fast nearest neighbor search.
Source code in src/effdim/geometry.py
ess_dimensionality(data, k=10, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using ESS (Expected Simplex Skewness). Analyzes the skewness of local simplices formed by nearest neighbors. Uses FAISS for fast nearest neighbor search.
Source code in src/effdim/geometry.py
gmst_dimensionality(data, geodesic=False, random_state=42)
Estimate intrinsic dimensionality using GMST (Geodesic Minimum Spanning Tree). Estimates dimension from the scaling of MST length with sample size.
Source code in src/effdim/geometry.py
396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 | |
mind_mli_dimensionality(data, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using MiND-MLi (Maximum Likelihood on Minimum Distances, single neighbor). Uses the distribution of nearest neighbor distances. Uses FAISS for fast nearest neighbor search.
Source code in src/effdim/geometry.py
mind_mlk_dimensionality(data, k=10, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using MiND-MLk (Maximum Likelihood on Minimum Distances, k neighbors). Returns the median of per-point estimates for robustness. Uses FAISS for fast nearest neighbor search.
Source code in src/effdim/geometry.py
mle_dimensionality(data, k=10, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using Levina-Bickel MLE. Includes protection against duplicate points (distance=0). Uses FAISS for fast nearest neighbor search.
Source code in src/effdim/geometry.py
tle_dimensionality(data, k=10, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using TLE (Tight Localities Estimator). Maximizes likelihood on scale-normalized distances. Uses FAISS for fast nearest neighbor search.
Source code in src/effdim/geometry.py
two_nn_dimensionality(data, precomputed_knn_dist_sq=None)
Estimate intrinsic dimensionality using Two-NN. Corrects the regression target to -log(1 - F(mu)). Uses FAISS for fast nearest neighbor search.