Predictors

pybasin.predictors.hdbscan_clusterer.HDBSCANClusterer

Bases: DisplayNameMixin, BaseEstimator, ClusterMixin

HDBSCAN clustering for basin stability analysis with optional auto-tuning and noise assignment (unsupervised learning).

Functions

init

__init__(
    hdbscan: Any = None,
    assign_noise: bool = False,
    nearest_neighbors: NearestNeighbors | None = None,
    auto_tune: bool = False,
)

Initialize HDBSCAN clusterer.

Parameters:

Name	Type	Description	Default
`hdbscan`	`Any`	A configured `sklearn.cluster.HDBSCAN` instance, or `None` to create a default one (`min_cluster_size=50`, `min_samples=10`).	`None`
`assign_noise`	`bool`	Whether to assign noise points to nearest clusters using KNN.	`False`
`nearest_neighbors`	`NearestNeighbors \| None`	A configured `sklearn.neighbors.NearestNeighbors` instance for noise assignment, or `None` to create a default one (`n_neighbors=5`). Only used when `assign_noise=True`.	`None`
`auto_tune`	`bool`	Whether to automatically tune `min_cluster_size` using silhouette score. The tuned value overrides the one on the `hdbscan` instance.	`False`

fit_predict

fit_predict(X: ndarray, y: Any = None) -> np.ndarray

Fit and predict labels using HDBSCAN clustering with optional noise assignment.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature array to cluster.	required
`y`	`Any`	Ignored (present for sklearn API compatibility).	`None`

Returns:

Type	Description
`ndarray`	Cluster labels.

pybasin.predictors.dbscan_clusterer.DBSCANClusterer

Bases: DisplayNameMixin, BaseEstimator, ClusterMixin

DBSCAN clustering for basin stability analysis with optional epsilon auto-tuning (unsupervised learning).

When auto_tune=True, replicates the epsilon search from the MATLAB bSTAB classify_solution.m unsupervised branch:

Precompute the pairwise Euclidean distance matrix.
Build an epsilon grid from the feature ranges.
For each candidate epsilon, run DBSCAN and record the minimum per-sample silhouette score (worst-case cluster quality).
Find the most prominent peak in the silhouette curve above a height threshold.
Fall back to the global maximum if no peak is found.

Functions

init

__init__(
    dbscan: DBSCAN | None = None,
    auto_tune: bool = False,
    n_eps_grid: int = 200,
    tune_sample_size: int = 2000,
    min_peak_height: float = 0.9,
    assign_noise: bool = False,
    nearest_neighbors: NearestNeighbors | None = None,
)

Initialize DBSCAN clusterer.

Parameters:

Name	Type	Description	Default
`dbscan`	`DBSCAN \| None`	A configured `sklearn.cluster.DBSCAN` instance, or `None` to create a default one (`eps=0.5`, `min_samples=10`). When `auto_tune=True`, the tuned epsilon overrides `dbscan.eps`.	`None`
`auto_tune`	`bool`	Whether to automatically find the optimal epsilon using silhouette-based peak analysis (MATLAB bSTAB algorithm).	`False`
`n_eps_grid`	`int`	Number of epsilon candidates to evaluate during auto-tuning.	`200`
`tune_sample_size`	`int`	Maximum number of samples to use during the epsilon search. If the dataset is larger, a random subsample is drawn to keep the search fast.	`2000`
`min_peak_height`	`float`	Minimum silhouette peak height for the peak finder during auto-tuning.	`0.9`
`assign_noise`	`bool`	Whether to assign noise points (-1) to the nearest cluster using KNN.	`False`
`nearest_neighbors`	`NearestNeighbors \| None`	A configured `sklearn.neighbors.NearestNeighbors` instance for noise assignment, or `None` to create a default one (`n_neighbors=5`). Only used when `assign_noise=True`.	`None`

fit_predict

fit_predict(X: ndarray, y: Any = None) -> np.ndarray

Fit and predict labels using DBSCAN clustering.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature array of shape `(n_samples, n_features)`.	required
`y`	`Any`	Ignored (present for sklearn API compatibility).	`None`

Returns:

Type	Description
`ndarray`	Cluster labels (`-1` for noise unless `assign_noise=True`).

pybasin.predictors.dynamical_system_clusterer.DynamicalSystemClusterer

Bases: DisplayNameMixin, BaseEstimator, ClusterMixin

Two-stage hierarchical clustering for dynamical systems.

This clusterer uses physics-based heuristics to classify trajectories into attractor types (Stage 1) and then sub-classifies within each type (Stage 2).

Stage 1: Attractor Type Classification

Fixed Point (FP) Detection: Heuristic: variance < fp_variance_threshold

A trajectory is classified as converging to a fixed point if the variance
of its steady-state values is extremely low. The threshold should be set
based on the expected numerical precision of your integration.

IMPORTANT: If features are normalized/scaled (e.g., StandardScaler), the
variance values will be transformed. For normalized features with unit
variance, use a threshold relative to 1.0 (e.g., 1e-4). For unnormalized
features, use absolute thresholds based on your system's scale.

Limit Cycle (LC) Detection: Heuristic: (periodicity_strength > lc_periodicity_threshold AND variance < chaos_variance_threshold) OR has_drift

A trajectory is classified as a limit cycle if:
1. It shows strong periodic behavior (high autocorrelation periodicity)
   AND has bounded variance (not chaotic), OR
2. It shows monotonic drift (rotating solutions like pendulum rotations)

The periodicity_strength comes from autocorrelation analysis and ranges
from 0 (no periodicity) to 1 (perfect periodicity). Values above 0.5
typically indicate clear periodic behavior.

Chaos Detection: Heuristic: NOT FP AND NOT LC (default fallback)

Trajectories that don't meet FP or LC criteria are classified as chaotic.
High variance combined with low periodicity strength indicates chaos.

Stage 2: Sub-classification

Within each attractor type, trajectories are further clustered: - FP: Clustered by steady-state location (mean values) - LC: Hierarchically clustered by period number, then amplitude/mean - Chaos: Clustered by spatial mean location

Required Features

Feature names must follow the convention: state_X__feature_name

Required base features: - variance: Steady-state variance (FP detection) - amplitude: Peak-to-peak amplitude (LC sub-classification) - mean: Steady-state mean (FP/chaos sub-classification) - linear_trend__attr_slope: Linear drift rate (rotating LC detection) - autocorrelation_periodicity__output_strength: Periodicity measure [0-1] - autocorrelation_periodicity__output_period: Detected period - spectral_frequency_ratio: Ratio for period-n detection

Note: This clusterer requires feature names to be set via set_feature_names() before calling fit_predict(). The BasinStabilityEstimator handles this automatically during the estimation process.

Functions

init

__init__(
    drift_threshold: float = 0.1,
    drift_fraction: float = 0.3,
    tiers: list[str] | None = None,
    fp_variance_threshold: float = 1e-06,
    fp_sub_classifier: Any = None,
    lc_periodicity_threshold: float = 0.5,
    lc_sub_classifier: Any = None,
    chaos_variance_threshold: float = 5.0,
    chaos_sub_classifier: Any = None,
)

Initialize the dynamical system clusterer.

Parameters:

Name	Type	Description	Default
`drift_threshold`	`float`	Minimum \|slope\| to consider a dimension as drifting. Drifting dimensions (e.g., pendulum angle during rotation) are excluded from variance/mean calculations for FP and chaos sub-classification to avoid spurious splits. Also used to detect rotating limit cycles. Units: [state_units / time_units]. Default: 0.1.	`0.1`
`drift_fraction`	`float`	Minimum fraction of trajectories with \|slope\| > drift_threshold for a dimension to be flagged as drifting. Default: 0.3 (i.e., 30% of trajectories must show drift).	`0.3`
`tiers`	`list[str] \| None`	List of attractor types to detect, in priority order. First matching tier wins. Options: "FP", "LC", "chaos". Default: ["FP", "LC", "chaos"].	`None`
`fp_variance_threshold`	`float`	Maximum variance to classify as fixed point. For unnormalized features, set based on expected steady-state fluctuations (e.g., 1e-6 for well-converged integrations). For normalized features (unit variance), use relative threshold (e.g., 1e-4 meaning 0.01% of typical variance). Default: 1e-6.	`1e-06`
`fp_sub_classifier`	`Any`	Custom sub-classifier for fixed points. Input: mean values per non-drifting dimension. Default: HDBSCAN with min_cluster_size=50.	`None`
`lc_periodicity_threshold`	`float`	Minimum periodicity strength [0-1] to classify as limit cycle. The periodicity strength measures how well the autocorrelation matches periodic behavior (0.0 = no periodic pattern, 0.3-0.5 = weak/noisy, 0.5-0.8 = clear periodic, 0.8-1.0 = strong/clean limit cycle). Default: 0.5.	`0.5`
`lc_sub_classifier`	`Any`	Custom sub-classifier for limit cycles. Input: [freq_ratio, amplitude, mean] features. Default: Hierarchical period-based clustering.	`None`
`chaos_variance_threshold`	`float`	Maximum variance for limit cycle. Trajectories with variance above this AND low periodicity are classified as chaotic. Set based on expected LC amplitude range. For normalized features, typical LC variance is ~0.5-2.0. Default: 5.0.	`5.0`
`chaos_sub_classifier`	`Any`	Custom sub-classifier for chaotic attractors. Input: mean values per dimension. Default: HDBSCAN with auto_tune=True.	`None`

needs_feature_names

needs_feature_names() -> bool

This clusterer requires feature names to parse physics-based features.

set_feature_names

set_feature_names(feature_names: list[str]) -> None

Set feature names and build feature indices.

Parameters:

Name	Type	Description	Default
`feature_names`	`list[str]`	List of feature names matching the feature array columns.	required

fit_predict

fit_predict(X: ndarray, y: Any = None) -> np.ndarray

Predict labels using two-stage hierarchical clustering.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Feature array of shape (n_samples, n_features).	required
`y`	`Any`	Ignored (present for sklearn API compatibility).	`None`

Returns:

Type	Description
`ndarray`	Array of predicted labels with format "TYPE_subcluster".

Raises:

Type	Description
`RuntimeError`	If set_feature_names() was not called before prediction.

pybasin.predictors.unboundedness_meta_estimator.UnboundednessMetaEstimator

Bases: DisplayNameMixin, MetaEstimatorMixin, BaseEstimator

Meta-estimator for separately labeling unbounded trajectories.

This meta-estimator wraps another estimator (classifier or clusterer) and handles unbounded trajectories separately. Unbounded trajectories are identified using a detector function and assigned a special label, while bounded trajectories are processed using the wrapped estimator.

The API adapts to the wrapped estimator type (similar to sklearn.pipeline.Pipeline): - If estimator is a clusterer: provides fit(), fit_predict(), predict() - If estimator is a classifier: provides fit(), predict(), and potentially predict_proba()

This is particularly useful in basin stability calculations where some trajectories may diverge to infinity (e.g., in the Lorenz system).

from pybasin.predictors import UnboundednessMetaEstimator
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import numpy as np

X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42)
# Add some "unbounded" samples with extreme values
X[0, :] = 1e10
X[1, :] = -1e10
clf = UnboundednessMetaEstimator(KMeans(n_clusters=3, random_state=42))
clf.fit(X)
labels = clf.predict(X)
print(f"Unbounded samples: {np.sum(labels == 'unbounded')}")

Notes:

Only bounded samples are used to fit the base estimator
The unbounded label is automatically tracked
If all samples are unbounded, the estimator will only predict the unbounded label
The estimator type validation ensures only classifiers or clusterers are accepted

Parameters:

Name	Type	Description	Default
`estimator`	`Any`	The base estimator to use for bounded trajectories. Must be a classifier or clusterer implementing `fit` and `predict` methods (or `fit_predict` for clustering).	required
`unbounded_detector`	`Callable[[ndarray], ndarray] \| None`	Function to detect unbounded trajectories. Should take a feature array of shape (n_samples, n_features) and return a boolean array of shape (n_samples,) where True indicates unbounded. If None, uses the default detector which identifies: - Trajectories with Inf/-Inf values (from JAX solver) - Trajectories with values at ±1e10 (from torch feature extractor)	`None`
`unbounded_label`	`int \| str`	Label to assign to unbounded trajectories.	`'unbounded'`

Attributes:

Name	Type	Description
`estimator_`	`estimator object`	The fitted base estimator (only fitted on bounded samples).
`classes_`	`ndarray of shape (n_classes,)`	The classes labels (only for classifiers), including the unbounded label.
`labels_`	`ndarray of shape (n_samples,)`	Cluster labels for each sample from the last fit operation (only for clusterers).
`n_features_in_`	`int`	Number of features seen during fit.
`bounded_mask_`	`ndarray of shape (n_samples,)`	Boolean mask indicating which training samples were bounded.

Functions

fit

fit(
    X: ndarray, y: ndarray | None = None
) -> UnboundednessMetaEstimator

Fit the meta-estimator.

Detects unbounded samples, then fits the base estimator only on bounded samples.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training data.	required
`y`	`ndarray \| None`	Target values. Only used if the base estimator is a classifier.	`None`

Returns:

Type	Description
`UnboundednessMetaEstimator`	Fitted estimator.

predict

predict(X: ndarray) -> np.ndarray

Predict labels for samples in X.

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Samples to predict.	required

Returns:

Type	Description
`ndarray`	Predicted labels.

fit_predict

fit_predict(
    X: ndarray, y: ndarray | None = None
) -> np.ndarray

Fit the meta-estimator and predict labels (for clusterers).

Parameters:

Name	Type	Description	Default
`X`	`ndarray`	Training data.	required
`y`	`ndarray \| None`	Target values (ignored for clusterers, required for classifiers).	`None`

Returns:

Type	Description
`ndarray`	Predicted labels.

__sklearn_tags__

__sklearn_tags__() -> Any

Provide sklearn tags based on the wrapped estimator type.

The meta-estimator adapts its behavior based on the wrapped estimator, similar to Pipeline.

pybasin.predictors.unboundedness_meta_estimator.default_unbounded_detector

default_unbounded_detector(x: ndarray) -> np.ndarray

Default unbounded trajectory detector.

Detects unbounded trajectories based on: - NaN values (invalid/undefined trajectories) - Inf or -Inf values (from JAX solver) - Values at extreme bounds: 1e10 or -1e10 (from torch feature extractor with imputation)

Parameters:

Name	Type	Description	Default
`x`	`ndarray`	Feature array of shape (n_samples, n_features).	required

Returns:

Type	Description
`ndarray`	Boolean array of shape (n_samples,) where True indicates unbounded.