Samplers
Samplers generate initial conditions for basin stability estimation. All samplers return PyTorch tensors (float32) and support GPU acceleration.
Tensor Precision
All samplers use float32 precision for GPU efficiency (5-10x faster than float64). Samples are returned as torch.Tensor on the configured device.
Available Samplers
| Class | Description | Returns Exact N? | Deterministic? |
|---|---|---|---|
UniformRandomSampler |
Uniform random in hypercube | ✓ | With set_seed() |
GridSampler |
Evenly spaced regular grid | ✗ (scales up) | ✓ |
GaussianSampler |
Gaussian around midpoint | ✓ | With set_seed() |
CsvSampler |
Load from CSV file | ✓ | ✓ |
Common Parameters
All samplers share these constructor parameters:
| Parameter | Type | Description |
|---|---|---|
min_limits |
list[float] |
Minimum value for each state dimension |
max_limits |
list[float] |
Maximum value for each state dimension |
device |
str or None |
"cuda", "cpu", or None for auto-detect |
Fixed Dimensions
To fix a state variable to a constant value (e.g., initial velocity = 0), set the same value for both min and max in that dimension. All samplers handle this correctly:
# 3D state space (x, y, z) with z fixed at 0
sampler = UniformRandomSampler(
min_limits=[-10.0, -20.0, 0.0],
max_limits=[10.0, 20.0, 0.0], # z is fixed at 0
)
CsvSampler Exception
CsvSampler does not accept min_limits/max_limits—bounds are computed from the CSV data.
UniformRandomSampler
Generates random samples uniformly distributed within the bounding hypercube.
from pybasin.sampler import UniformRandomSampler
import numpy as np
sampler = UniformRandomSampler(
min_limits=[-np.pi, -2.0],
max_limits=[np.pi, 2.0],
device="cuda", # optional, auto-detects GPU
)
# Generate 10,000 samples (returns exactly 10,000)
samples = sampler.sample(n=10000)
For reproducible results, call set_seed() before sampling. See Reproducibility below.
GridSampler
Generates evenly spaced samples in a regular grid pattern. Ideal for 2D visualizations and deterministic sampling.
from pybasin.sampler import GridSampler
import numpy as np
sampler = GridSampler(
min_limits=[-np.pi, -2.0],
max_limits=[np.pi, 2.0],
)
samples = sampler.sample(n=10000) # Returns 10,000 samples (100×100 grid)
Sample Count Scaling
The grid sampler rounds up the requested sample count to form a complete grid. For a d-dimensional space, it computes:
The actual number of samples returned is \(n_{\text{per dim}}^d\).
2D Examples:
| Requested N | Points per Dimension | Actual Samples |
|---|---|---|
| 50 | ⌈50^0.5⌉ = 8 | 8² = 64 |
| 100 | ⌈100^0.5⌉ = 10 | 10² = 100 |
| 1,000 | ⌈1000^0.5⌉ = 32 | 32² = 1,024 |
| 20,000 | ⌈20000^0.5⌉ = 142 | 142² = 20,164 |
Fixed Dimensions and Sample Count
When using fixed dimensions (see Fixed Dimensions), only the varying dimensions contribute to the grid calculation.
Given \(d\) varying dimensions and requested \(n\) samples, the points per varying dimension is \(\lceil n^{1/d} \rceil\). Fixed dimensions always contribute exactly 1 point, so the total number of samples is:
Example: 3D space with 1 fixed dimension and \(n = 20000\):
- \(d = 2\) varying dimensions → \(20000^{1/2} = 141.42...\), so \(\lceil 141.42 \rceil = 142\) points per dimension
- Total: \(142 \times 142 \times 1 = 20164\) samples
GaussianSampler
Generates samples from a Gaussian distribution centered at the midpoint of each dimension. Samples are clamped to stay within bounds.
from pybasin.sampler import GaussianSampler
sampler = GaussianSampler(
min_limits=[-np.pi, -2.0],
max_limits=[np.pi, 2.0],
std_factor=0.2, # σ = 20% of the range (default)
)
samples = sampler.sample(n=10000)
The distribution parameters are computed as:
CsvSampler
Loads pre-defined samples from a CSV file. Essential for reproducing results from MATLAB or other reference implementations.
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
csv_path |
str or Path |
Required | Path to the CSV file containing samples |
coordinate_columns |
list[str] |
Required | Column names to use as state coordinates |
label_column |
str or None |
None |
Column name for ground truth labels |
device |
str or None |
None |
"cuda", "cpu", or None for auto-detect |
from pybasin.sampler import CsvSampler
sampler = CsvSampler(
csv_path="data/initial_conditions.csv",
coordinate_columns=["x1", "x2"], # Column names for state variables
label_column="attractor", # Optional: ground truth labels
device="cuda", # Optional: auto-detects GPU if None
)
# Get all samples from the file
samples = sampler.sample()
# Or get the first n samples
samples = sampler.sample(n=1000)
# Access ground truth labels (if provided)
labels = sampler.labels # numpy array or None
Bounds Auto-Detection
Unlike other samplers, CsvSampler does not require min_limits and max_limits. These are automatically computed from the data in the CSV file.
Exceptions
| Exception | Condition |
|---|---|
FileNotFoundError |
CSV file does not exist at the specified path |
ValueError |
Coordinate columns not found in CSV |
ValueError |
Label column not found in CSV (when specified) |
ValueError |
Requested n samples exceeds available data |
Properties
| Property | Type | Description |
|---|---|---|
labels |
np.ndarray or None |
Ground truth labels from CSV |
n_samples |
int |
Total number of samples in the file |
Reproducibility
Because UniformRandomSampler and GaussianSampler draw from the global PyTorch random state, calling sampler.sample() twice gives different results by default. Fix this by calling set_seed() once at the top of your script before any sampling or estimation:
from pybasin import set_seed
from pybasin.sampler import UniformRandomSampler
set_seed(42)
sampler = UniformRandomSampler(min_limits=[-1.0, -1.0], max_limits=[1.0, 1.0])
samples = sampler.sample(n=10000) # always the same
set_seed() seeds PyTorch (CPU and CUDA), NumPy, and Python's random module in one call. This covers every stochastic step in the pipeline -- sampling, feature extraction, and HDBSCAN clustering.
GridSampler and CsvSampler are always deterministic and do not require a seed.
Creating Custom Samplers
Inherit from Sampler and implement the sample method:
from pybasin.sampler import Sampler
import torch
class LatinHypercubeSampler(Sampler):
"""Latin Hypercube sampling for better space coverage."""
display_name: str = "Latin Hypercube Sampler"
def __init__(
self,
min_limits: list[float],
max_limits: list[float],
device: str | None = None,
):
super().__init__(min_limits, max_limits, device)
def sample(self, n: int) -> torch.Tensor:
# Your implementation here
# Must return tensor of shape (n, self.state_dim)
...
Base Class Attributes
After calling super().__init__(), these attributes are available:
| Attribute | Type | Description |
|---|---|---|
device |
torch.device |
Target device (cuda:0 or cpu) |
min_limits |
torch.Tensor |
Minimum bounds as float32 tensor on device |
max_limits |
torch.Tensor |
Maximum bounds as float32 tensor on device |
state_dim |
int |
Number of state dimensions (length of limits) |
Requirements
- Call
super().__init__()withmin_limits,max_limits, anddevice - Return a
torch.Tensorof shape(n, self.state_dim) - Use
self.devicewhen creating tensors to ensure GPU compatibility - Use
float32dtype for consistency with the base class