Skip to content

Predictors

pybasin.predictors.hdbscan_clusterer.HDBSCANClusterer

Bases: DisplayNameMixin, BaseEstimator, ClusterMixin

HDBSCAN clustering for basin stability analysis with optional auto-tuning and noise assignment (unsupervised learning).

Functions

__init__

__init__(
    hdbscan: Any = None,
    assign_noise: bool = False,
    nearest_neighbors: NearestNeighbors | None = None,
    auto_tune: bool = False,
)

Initialize HDBSCAN clusterer.

Parameters:

Name Type Description Default
hdbscan Any

A configured sklearn.cluster.HDBSCAN instance, or None to create a default one (min_cluster_size=50, min_samples=10).

None
assign_noise bool

Whether to assign noise points to nearest clusters using KNN.

False
nearest_neighbors NearestNeighbors | None

A configured sklearn.neighbors.NearestNeighbors instance for noise assignment, or None to create a default one (n_neighbors=5). Only used when assign_noise=True.

None
auto_tune bool

Whether to automatically tune min_cluster_size using silhouette score. The tuned value overrides the one on the hdbscan instance.

False

fit_predict

fit_predict(X: ndarray, y: Any = None) -> np.ndarray

Fit and predict labels using HDBSCAN clustering with optional noise assignment.

Parameters:

Name Type Description Default
X ndarray

Feature array to cluster.

required
y Any

Ignored (present for sklearn API compatibility).

None

Returns:

Type Description
ndarray

Cluster labels.


pybasin.predictors.dbscan_clusterer.DBSCANClusterer

Bases: DisplayNameMixin, BaseEstimator, ClusterMixin

DBSCAN clustering for basin stability analysis with optional epsilon auto-tuning (unsupervised learning).

When auto_tune=True, replicates the epsilon search from the MATLAB bSTAB classify_solution.m unsupervised branch:

  1. Precompute the pairwise Euclidean distance matrix.
  2. Build an epsilon grid from the feature ranges.
  3. For each candidate epsilon, run DBSCAN and record the minimum per-sample silhouette score (worst-case cluster quality).
  4. Find the most prominent peak in the silhouette curve above a height threshold.
  5. Fall back to the global maximum if no peak is found.

Functions

__init__

__init__(
    dbscan: DBSCAN | None = None,
    auto_tune: bool = False,
    n_eps_grid: int = 200,
    tune_sample_size: int = 2000,
    min_peak_height: float = 0.9,
    assign_noise: bool = False,
    nearest_neighbors: NearestNeighbors | None = None,
)

Initialize DBSCAN clusterer.

Parameters:

Name Type Description Default
dbscan DBSCAN | None

A configured sklearn.cluster.DBSCAN instance, or None to create a default one (eps=0.5, min_samples=10). When auto_tune=True, the tuned epsilon overrides dbscan.eps.

None
auto_tune bool

Whether to automatically find the optimal epsilon using silhouette-based peak analysis (MATLAB bSTAB algorithm).

False
n_eps_grid int

Number of epsilon candidates to evaluate during auto-tuning.

200
tune_sample_size int

Maximum number of samples to use during the epsilon search. If the dataset is larger, a random subsample is drawn to keep the search fast.

2000
min_peak_height float

Minimum silhouette peak height for the peak finder during auto-tuning.

0.9
assign_noise bool

Whether to assign noise points (-1) to the nearest cluster using KNN.

False
nearest_neighbors NearestNeighbors | None

A configured sklearn.neighbors.NearestNeighbors instance for noise assignment, or None to create a default one (n_neighbors=5). Only used when assign_noise=True.

None

fit_predict

fit_predict(X: ndarray, y: Any = None) -> np.ndarray

Fit and predict labels using DBSCAN clustering.

Parameters:

Name Type Description Default
X ndarray

Feature array of shape (n_samples, n_features).

required
y Any

Ignored (present for sklearn API compatibility).

None

Returns:

Type Description
ndarray

Cluster labels (-1 for noise unless assign_noise=True).


pybasin.predictors.dynamical_system_clusterer.DynamicalSystemClusterer

Bases: DisplayNameMixin, BaseEstimator, ClusterMixin

Two-stage hierarchical clustering for dynamical systems.

This clusterer uses physics-based heuristics to classify trajectories into attractor types (Stage 1) and then sub-classifies within each type (Stage 2).

Stage 1: Attractor Type Classification

Fixed Point (FP) Detection: Heuristic: variance < fp_variance_threshold

A trajectory is classified as converging to a fixed point if the variance
of its steady-state values is extremely low. The threshold should be set
based on the expected numerical precision of your integration.

IMPORTANT: If features are normalized/scaled (e.g., StandardScaler), the
variance values will be transformed. For normalized features with unit
variance, use a threshold relative to 1.0 (e.g., 1e-4). For unnormalized
features, use absolute thresholds based on your system's scale.

Limit Cycle (LC) Detection: Heuristic: (periodicity_strength > lc_periodicity_threshold AND variance < chaos_variance_threshold) OR has_drift

A trajectory is classified as a limit cycle if:
1. It shows strong periodic behavior (high autocorrelation periodicity)
   AND has bounded variance (not chaotic), OR
2. It shows monotonic drift (rotating solutions like pendulum rotations)

The periodicity_strength comes from autocorrelation analysis and ranges
from 0 (no periodicity) to 1 (perfect periodicity). Values above 0.5
typically indicate clear periodic behavior.

Chaos Detection: Heuristic: NOT FP AND NOT LC (default fallback)

Trajectories that don't meet FP or LC criteria are classified as chaotic.
High variance combined with low periodicity strength indicates chaos.

Stage 2: Sub-classification

Within each attractor type, trajectories are further clustered: - FP: Clustered by steady-state location (mean values) - LC: Hierarchically clustered by period number, then amplitude/mean - Chaos: Clustered by spatial mean location

Required Features

Feature names must follow the convention: state_X__feature_name

Required base features: - variance: Steady-state variance (FP detection) - amplitude: Peak-to-peak amplitude (LC sub-classification) - mean: Steady-state mean (FP/chaos sub-classification) - linear_trend__attr_slope: Linear drift rate (rotating LC detection) - autocorrelation_periodicity__output_strength: Periodicity measure [0-1] - autocorrelation_periodicity__output_period: Detected period - spectral_frequency_ratio: Ratio for period-n detection

Note: This clusterer requires feature names to be set via set_feature_names() before calling fit_predict(). The BasinStabilityEstimator handles this automatically during the estimation process.

Functions

__init__

__init__(
    drift_threshold: float = 0.1,
    drift_fraction: float = 0.3,
    tiers: list[str] | None = None,
    fp_variance_threshold: float = 1e-06,
    fp_sub_classifier: Any = None,
    lc_periodicity_threshold: float = 0.5,
    lc_sub_classifier: Any = None,
    chaos_variance_threshold: float = 5.0,
    chaos_sub_classifier: Any = None,
)

Initialize the dynamical system clusterer.

Parameters:

Name Type Description Default
drift_threshold float

Minimum |slope| to consider a dimension as drifting. Drifting dimensions (e.g., pendulum angle during rotation) are excluded from variance/mean calculations for FP and chaos sub-classification to avoid spurious splits. Also used to detect rotating limit cycles. Units: [state_units / time_units]. Default: 0.1.

0.1
drift_fraction float

Minimum fraction of trajectories with |slope| > drift_threshold for a dimension to be flagged as drifting. Default: 0.3 (i.e., 30% of trajectories must show drift).

0.3
tiers list[str] | None

List of attractor types to detect, in priority order. First matching tier wins. Options: "FP", "LC", "chaos". Default: ["FP", "LC", "chaos"].

None
fp_variance_threshold float

Maximum variance to classify as fixed point. For unnormalized features, set based on expected steady-state fluctuations (e.g., 1e-6 for well-converged integrations). For normalized features (unit variance), use relative threshold (e.g., 1e-4 meaning 0.01% of typical variance). Default: 1e-6.

1e-06
fp_sub_classifier Any

Custom sub-classifier for fixed points. Input: mean values per non-drifting dimension. Default: HDBSCAN with min_cluster_size=50.

None
lc_periodicity_threshold float

Minimum periodicity strength [0-1] to classify as limit cycle. The periodicity strength measures how well the autocorrelation matches periodic behavior (0.0 = no periodic pattern, 0.3-0.5 = weak/noisy, 0.5-0.8 = clear periodic, 0.8-1.0 = strong/clean limit cycle). Default: 0.5.

0.5
lc_sub_classifier Any

Custom sub-classifier for limit cycles. Input: [freq_ratio, amplitude, mean] features. Default: Hierarchical period-based clustering.

None
chaos_variance_threshold float

Maximum variance for limit cycle. Trajectories with variance above this AND low periodicity are classified as chaotic. Set based on expected LC amplitude range. For normalized features, typical LC variance is ~0.5-2.0. Default: 5.0.

5.0
chaos_sub_classifier Any

Custom sub-classifier for chaotic attractors. Input: mean values per dimension. Default: HDBSCAN with auto_tune=True.

None

needs_feature_names

needs_feature_names() -> bool

This clusterer requires feature names to parse physics-based features.

set_feature_names

set_feature_names(feature_names: list[str]) -> None

Set feature names and build feature indices.

Parameters:

Name Type Description Default
feature_names list[str]

List of feature names matching the feature array columns.

required

fit_predict

fit_predict(X: ndarray, y: Any = None) -> np.ndarray

Predict labels using two-stage hierarchical clustering.

Parameters:

Name Type Description Default
X ndarray

Feature array of shape (n_samples, n_features).

required
y Any

Ignored (present for sklearn API compatibility).

None

Returns:

Type Description
ndarray

Array of predicted labels with format "TYPE_subcluster".

Raises:

Type Description
RuntimeError

If set_feature_names() was not called before prediction.


pybasin.predictors.unboundedness_meta_estimator.UnboundednessMetaEstimator

Bases: DisplayNameMixin, MetaEstimatorMixin, BaseEstimator

Meta-estimator for separately labeling unbounded trajectories.

This meta-estimator wraps another estimator (classifier or clusterer) and handles unbounded trajectories separately. Unbounded trajectories are identified using a detector function and assigned a special label, while bounded trajectories are processed using the wrapped estimator.

The API adapts to the wrapped estimator type (similar to sklearn.pipeline.Pipeline): - If estimator is a clusterer: provides fit(), fit_predict(), predict() - If estimator is a classifier: provides fit(), predict(), and potentially predict_proba()

This is particularly useful in basin stability calculations where some trajectories may diverge to infinity (e.g., in the Lorenz system).

from pybasin.predictors import UnboundednessMetaEstimator
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import numpy as np

X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42)
# Add some "unbounded" samples with extreme values
X[0, :] = 1e10
X[1, :] = -1e10
clf = UnboundednessMetaEstimator(KMeans(n_clusters=3, random_state=42))
clf.fit(X)
labels = clf.predict(X)
print(f"Unbounded samples: {np.sum(labels == 'unbounded')}")

Notes:

  • Only bounded samples are used to fit the base estimator
  • The unbounded label is automatically tracked
  • If all samples are unbounded, the estimator will only predict the unbounded label
  • The estimator type validation ensures only classifiers or clusterers are accepted

Parameters:

Name Type Description Default
estimator Any

The base estimator to use for bounded trajectories. Must be a classifier or clusterer implementing fit and predict methods (or fit_predict for clustering).

required
unbounded_detector Callable[[ndarray], ndarray] | None

Function to detect unbounded trajectories. Should take a feature array of shape (n_samples, n_features) and return a boolean array of shape (n_samples,) where True indicates unbounded. If None, uses the default detector which identifies: - Trajectories with Inf/-Inf values (from JAX solver) - Trajectories with values at ±1e10 (from torch feature extractor)

None
unbounded_label int | str

Label to assign to unbounded trajectories.

'unbounded'

Attributes:

Name Type Description
estimator_ estimator object

The fitted base estimator (only fitted on bounded samples).

classes_ ndarray of shape (n_classes,)

The classes labels (only for classifiers), including the unbounded label.

labels_ ndarray of shape (n_samples,)

Cluster labels for each sample from the last fit operation (only for clusterers).

n_features_in_ int

Number of features seen during fit.

bounded_mask_ ndarray of shape (n_samples,)

Boolean mask indicating which training samples were bounded.

Functions

fit

fit(
    X: ndarray, y: ndarray | None = None
) -> UnboundednessMetaEstimator

Fit the meta-estimator.

Detects unbounded samples, then fits the base estimator only on bounded samples.

Parameters:

Name Type Description Default
X ndarray

Training data.

required
y ndarray | None

Target values. Only used if the base estimator is a classifier.

None

Returns:

Type Description
UnboundednessMetaEstimator

Fitted estimator.

predict

predict(X: ndarray) -> np.ndarray

Predict labels for samples in X.

Parameters:

Name Type Description Default
X ndarray

Samples to predict.

required

Returns:

Type Description
ndarray

Predicted labels.

fit_predict

fit_predict(
    X: ndarray, y: ndarray | None = None
) -> np.ndarray

Fit the meta-estimator and predict labels (for clusterers).

Parameters:

Name Type Description Default
X ndarray

Training data.

required
y ndarray | None

Target values (ignored for clusterers, required for classifiers).

None

Returns:

Type Description
ndarray

Predicted labels.

__sklearn_tags__

__sklearn_tags__() -> Any

Provide sklearn tags based on the wrapped estimator type.

The meta-estimator adapts its behavior based on the wrapped estimator, similar to Pipeline.


pybasin.predictors.unboundedness_meta_estimator.default_unbounded_detector

default_unbounded_detector(x: ndarray) -> np.ndarray

Default unbounded trajectory detector.

Detects unbounded trajectories based on: - NaN values (invalid/undefined trajectories) - Inf or -Inf values (from JAX solver) - Values at extreme bounds: 1e10 or -1e10 (from torch feature extractor with imputation)

Parameters:

Name Type Description Default
x ndarray

Feature array of shape (n_samples, n_features).

required

Returns:

Type Description
ndarray

Boolean array of shape (n_samples,) where True indicates unbounded.