Predictors
pybasin.predictors.hdbscan_clusterer.HDBSCANClusterer
Bases: DisplayNameMixin, BaseEstimator, ClusterMixin
HDBSCAN clustering for basin stability analysis with optional auto-tuning and noise assignment (unsupervised learning).
Functions
__init__
__init__(
hdbscan: Any = None,
assign_noise: bool = False,
nearest_neighbors: NearestNeighbors | None = None,
auto_tune: bool = False,
)
Initialize HDBSCAN clusterer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hdbscan
|
Any
|
A configured |
None
|
assign_noise
|
bool
|
Whether to assign noise points to nearest clusters using KNN. |
False
|
nearest_neighbors
|
NearestNeighbors | None
|
A configured |
None
|
auto_tune
|
bool
|
Whether to automatically tune |
False
|
fit_predict
Fit and predict labels using HDBSCAN clustering with optional noise assignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature array to cluster. |
required |
y
|
Any
|
Ignored (present for sklearn API compatibility). |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Cluster labels. |
pybasin.predictors.dbscan_clusterer.DBSCANClusterer
Bases: DisplayNameMixin, BaseEstimator, ClusterMixin
DBSCAN clustering for basin stability analysis with optional epsilon auto-tuning (unsupervised learning).
When auto_tune=True, replicates the epsilon search from the MATLAB
bSTAB classify_solution.m unsupervised branch:
- Precompute the pairwise Euclidean distance matrix.
- Build an epsilon grid from the feature ranges.
- For each candidate epsilon, run DBSCAN and record the minimum per-sample silhouette score (worst-case cluster quality).
- Find the most prominent peak in the silhouette curve above a height threshold.
- Fall back to the global maximum if no peak is found.
Functions
__init__
__init__(
dbscan: DBSCAN | None = None,
auto_tune: bool = False,
n_eps_grid: int = 200,
tune_sample_size: int = 2000,
min_peak_height: float = 0.9,
assign_noise: bool = False,
nearest_neighbors: NearestNeighbors | None = None,
)
Initialize DBSCAN clusterer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dbscan
|
DBSCAN | None
|
A configured |
None
|
auto_tune
|
bool
|
Whether to automatically find the optimal epsilon using silhouette-based peak analysis (MATLAB bSTAB algorithm). |
False
|
n_eps_grid
|
int
|
Number of epsilon candidates to evaluate during auto-tuning. |
200
|
tune_sample_size
|
int
|
Maximum number of samples to use during the epsilon search. If the dataset is larger, a random subsample is drawn to keep the search fast. |
2000
|
min_peak_height
|
float
|
Minimum silhouette peak height for the peak finder during auto-tuning. |
0.9
|
assign_noise
|
bool
|
Whether to assign noise points (-1) to the nearest cluster using KNN. |
False
|
nearest_neighbors
|
NearestNeighbors | None
|
A configured |
None
|
fit_predict
Fit and predict labels using DBSCAN clustering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature array of shape |
required |
y
|
Any
|
Ignored (present for sklearn API compatibility). |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Cluster labels ( |
pybasin.predictors.dynamical_system_clusterer.DynamicalSystemClusterer
Bases: DisplayNameMixin, BaseEstimator, ClusterMixin
Two-stage hierarchical clustering for dynamical systems.
This clusterer uses physics-based heuristics to classify trajectories into attractor types (Stage 1) and then sub-classifies within each type (Stage 2).
Stage 1: Attractor Type Classification
Fixed Point (FP) Detection: Heuristic: variance < fp_variance_threshold
A trajectory is classified as converging to a fixed point if the variance
of its steady-state values is extremely low. The threshold should be set
based on the expected numerical precision of your integration.
IMPORTANT: If features are normalized/scaled (e.g., StandardScaler), the
variance values will be transformed. For normalized features with unit
variance, use a threshold relative to 1.0 (e.g., 1e-4). For unnormalized
features, use absolute thresholds based on your system's scale.
Limit Cycle (LC) Detection: Heuristic: (periodicity_strength > lc_periodicity_threshold AND variance < chaos_variance_threshold) OR has_drift
A trajectory is classified as a limit cycle if:
1. It shows strong periodic behavior (high autocorrelation periodicity)
AND has bounded variance (not chaotic), OR
2. It shows monotonic drift (rotating solutions like pendulum rotations)
The periodicity_strength comes from autocorrelation analysis and ranges
from 0 (no periodicity) to 1 (perfect periodicity). Values above 0.5
typically indicate clear periodic behavior.
Chaos Detection: Heuristic: NOT FP AND NOT LC (default fallback)
Trajectories that don't meet FP or LC criteria are classified as chaotic.
High variance combined with low periodicity strength indicates chaos.
Stage 2: Sub-classification
Within each attractor type, trajectories are further clustered: - FP: Clustered by steady-state location (mean values) - LC: Hierarchically clustered by period number, then amplitude/mean - Chaos: Clustered by spatial mean location
Required Features
Feature names must follow the convention: state_X__feature_name
Required base features: - variance: Steady-state variance (FP detection) - amplitude: Peak-to-peak amplitude (LC sub-classification) - mean: Steady-state mean (FP/chaos sub-classification) - linear_trend__attr_slope: Linear drift rate (rotating LC detection) - autocorrelation_periodicity__output_strength: Periodicity measure [0-1] - autocorrelation_periodicity__output_period: Detected period - spectral_frequency_ratio: Ratio for period-n detection
Note: This clusterer requires feature names to be set via set_feature_names() before calling fit_predict(). The BasinStabilityEstimator handles this automatically during the estimation process.
Functions
__init__
__init__(
drift_threshold: float = 0.1,
drift_fraction: float = 0.3,
tiers: list[str] | None = None,
fp_variance_threshold: float = 1e-06,
fp_sub_classifier: Any = None,
lc_periodicity_threshold: float = 0.5,
lc_sub_classifier: Any = None,
chaos_variance_threshold: float = 5.0,
chaos_sub_classifier: Any = None,
)
Initialize the dynamical system clusterer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
drift_threshold
|
float
|
Minimum |slope| to consider a dimension as drifting. Drifting dimensions (e.g., pendulum angle during rotation) are excluded from variance/mean calculations for FP and chaos sub-classification to avoid spurious splits. Also used to detect rotating limit cycles. Units: [state_units / time_units]. Default: 0.1. |
0.1
|
drift_fraction
|
float
|
Minimum fraction of trajectories with |slope| > drift_threshold for a dimension to be flagged as drifting. Default: 0.3 (i.e., 30% of trajectories must show drift). |
0.3
|
tiers
|
list[str] | None
|
List of attractor types to detect, in priority order. First matching tier wins. Options: "FP", "LC", "chaos". Default: ["FP", "LC", "chaos"]. |
None
|
fp_variance_threshold
|
float
|
Maximum variance to classify as fixed point. For unnormalized features, set based on expected steady-state fluctuations (e.g., 1e-6 for well-converged integrations). For normalized features (unit variance), use relative threshold (e.g., 1e-4 meaning 0.01% of typical variance). Default: 1e-6. |
1e-06
|
fp_sub_classifier
|
Any
|
Custom sub-classifier for fixed points. Input: mean values per non-drifting dimension. Default: HDBSCAN with min_cluster_size=50. |
None
|
lc_periodicity_threshold
|
float
|
Minimum periodicity strength [0-1] to classify as limit cycle. The periodicity strength measures how well the autocorrelation matches periodic behavior (0.0 = no periodic pattern, 0.3-0.5 = weak/noisy, 0.5-0.8 = clear periodic, 0.8-1.0 = strong/clean limit cycle). Default: 0.5. |
0.5
|
lc_sub_classifier
|
Any
|
Custom sub-classifier for limit cycles. Input: [freq_ratio, amplitude, mean] features. Default: Hierarchical period-based clustering. |
None
|
chaos_variance_threshold
|
float
|
Maximum variance for limit cycle. Trajectories with variance above this AND low periodicity are classified as chaotic. Set based on expected LC amplitude range. For normalized features, typical LC variance is ~0.5-2.0. Default: 5.0. |
5.0
|
chaos_sub_classifier
|
Any
|
Custom sub-classifier for chaotic attractors. Input: mean values per dimension. Default: HDBSCAN with auto_tune=True. |
None
|
needs_feature_names
This clusterer requires feature names to parse physics-based features.
set_feature_names
Set feature names and build feature indices.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature_names
|
list[str]
|
List of feature names matching the feature array columns. |
required |
fit_predict
Predict labels using two-stage hierarchical clustering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Feature array of shape (n_samples, n_features). |
required |
y
|
Any
|
Ignored (present for sklearn API compatibility). |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Array of predicted labels with format "TYPE_subcluster". |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If set_feature_names() was not called before prediction. |
pybasin.predictors.unboundedness_meta_estimator.UnboundednessMetaEstimator
Bases: DisplayNameMixin, MetaEstimatorMixin, BaseEstimator
Meta-estimator for separately labeling unbounded trajectories.
This meta-estimator wraps another estimator (classifier or clusterer) and handles unbounded trajectories separately. Unbounded trajectories are identified using a detector function and assigned a special label, while bounded trajectories are processed using the wrapped estimator.
The API adapts to the wrapped estimator type (similar to sklearn.pipeline.Pipeline): - If estimator is a clusterer: provides fit(), fit_predict(), predict() - If estimator is a classifier: provides fit(), predict(), and potentially predict_proba()
This is particularly useful in basin stability calculations where some trajectories may diverge to infinity (e.g., in the Lorenz system).
from pybasin.predictors import UnboundednessMetaEstimator
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import numpy as np
X, _ = make_blobs(n_samples=100, n_features=10, centers=3, random_state=42)
# Add some "unbounded" samples with extreme values
X[0, :] = 1e10
X[1, :] = -1e10
clf = UnboundednessMetaEstimator(KMeans(n_clusters=3, random_state=42))
clf.fit(X)
labels = clf.predict(X)
print(f"Unbounded samples: {np.sum(labels == 'unbounded')}")
Notes:
- Only bounded samples are used to fit the base estimator
- The unbounded label is automatically tracked
- If all samples are unbounded, the estimator will only predict the unbounded label
- The estimator type validation ensures only classifiers or clusterers are accepted
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
estimator
|
Any
|
The base estimator to use for bounded trajectories. Must be a classifier or clusterer implementing |
required |
unbounded_detector
|
Callable[[ndarray], ndarray] | None
|
Function to detect unbounded trajectories. Should take a feature array of shape (n_samples, n_features) and return a boolean array of shape (n_samples,) where True indicates unbounded. If None, uses the default detector which identifies: - Trajectories with Inf/-Inf values (from JAX solver) - Trajectories with values at ±1e10 (from torch feature extractor) |
None
|
unbounded_label
|
int | str
|
Label to assign to unbounded trajectories. |
'unbounded'
|
Attributes:
| Name | Type | Description |
|---|---|---|
estimator_ |
estimator object
|
The fitted base estimator (only fitted on bounded samples). |
classes_ |
ndarray of shape (n_classes,)
|
The classes labels (only for classifiers), including the unbounded label. |
labels_ |
ndarray of shape (n_samples,)
|
Cluster labels for each sample from the last fit operation (only for clusterers). |
n_features_in_ |
int
|
Number of features seen during fit. |
bounded_mask_ |
ndarray of shape (n_samples,)
|
Boolean mask indicating which training samples were bounded. |
Functions
fit
Fit the meta-estimator.
Detects unbounded samples, then fits the base estimator only on bounded samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training data. |
required |
y
|
ndarray | None
|
Target values. Only used if the base estimator is a classifier. |
None
|
Returns:
| Type | Description |
|---|---|
UnboundednessMetaEstimator
|
Fitted estimator. |
predict
Predict labels for samples in X.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Samples to predict. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted labels. |
fit_predict
Fit the meta-estimator and predict labels (for clusterers).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
ndarray
|
Training data. |
required |
y
|
ndarray | None
|
Target values (ignored for clusterers, required for classifiers). |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Predicted labels. |
pybasin.predictors.unboundedness_meta_estimator.default_unbounded_detector
Default unbounded trajectory detector.
Detects unbounded trajectories based on: - NaN values (invalid/undefined trajectories) - Inf or -Inf values (from JAX solver) - Values at extreme bounds: 1e10 or -1e10 (from torch feature extractor with imputation)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
ndarray
|
Feature array of shape (n_samples, n_features). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Boolean array of shape (n_samples,) where True indicates unbounded. |