Sweeps & Suite API Reference
Run a sweep over corruptor severity levels with independent seeds.
Constructs a fresh corruptor instance for each severity level and runs
a BenchPipeline with an independent child seed derived via
an internal reproducible seeding strategy.
The global NumPy RNG state is never touched.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dgp
|
A pre-constructed DGP instance (e.g. |
required | |
corruptor_cls
|
Corruptor class (not instance). Instantiated fresh per severity level
as |
required | |
severities
|
list[str]
|
Ordered list of severity strings, e.g. |
required |
n_samples
|
int
|
Number of samples passed to |
500
|
n_features
|
int
|
Number of features passed to |
10
|
random_state
|
int
|
Master integer seed. Child seeds are derived via an internal reproducible seeding strategy, so results are bit-identical across calls with the same arguments. |
0
|
**corruptor_kwargs
|
Extra keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
list[BenchResult]
|
One |
Run a sweep over DGP complexity levels with independent seeds.
Constructs a fresh DGP instance for each complexity level (avoiding any state mutation across iterations) and derives independent child seeds via an internal reproducible seeding strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dgp_cls
|
DGP class (not instance). Instantiated per complexity level as
|
required | |
complexities
|
list[str]
|
Ordered list of complexity strings, e.g. |
required |
corruptors
|
list | None
|
Optional list of pre-constructed corruptor instances forwarded to
each |
None
|
label_corruptors
|
list | None
|
Optional list of pre-constructed label corruptor instances.
Defaults to |
None
|
n_samples
|
int
|
Number of samples passed to |
500
|
n_features
|
int
|
Number of features passed to |
10
|
random_state
|
int
|
Master integer seed. Child seeds are derived via an internal reproducible seeding strategy. |
0
|
**dgp_kwargs
|
Extra keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
list[BenchResult]
|
One |
Run a full factorial grid over (n_samples, complexity, severity).
Seeds are derived via a nested three-level SeedSequence.spawn
hierarchy so that every cell receives a statistically independent seed::
root = SeedSequence(random_state)
n_branch = root.spawn(len(n_samples_list))[i]
c_branch = n_branch.spawn(len(complexities))[j]
s_branch = c_branch.spawn(len(severities))[k]
seed = int(s_branch.generate_state(1)[0])
This guarantees that adjacent cells (e.g. (200, "low", "low") vs
(200, "low", "medium")) have different data even if they share all
other parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dgp_cls
|
DGP class. A fresh instance is constructed per cell as
|
required | |
corruptor_cls
|
Corruptor class. A fresh instance is constructed per cell as
|
required | |
n_samples_list
|
list[int]
|
List of sample counts to include in the grid. |
required |
complexities
|
list[str]
|
List of complexity strings to include in the grid. |
required |
severities
|
list[str]
|
List of severity strings to include in the grid. |
required |
n_features
|
int
|
Number of features passed to |
10
|
random_state
|
int
|
Master integer seed for the root |
0
|
**dgp_kwargs
|
Extra keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
dict[tuple[int, str, str], BenchResult]
|
Mapping from |
Named collection of curated benchmark datasets.
Provides a single-call interface to run a named set of benchmark datasets that share a common theme (e.g. all easy classification problems or all hard regression problems). Results are fully reproducible: the same suite name always produces bit-identical BenchResult objects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Name of a bundled suite (e.g. |
required |
Examples:
>>> suite = BenchSuite("easy-classification")
>>> results = suite.run()
>>> list(results.keys())
['linear_low', 'tree_low', 'friedman_low']
list_suites()
staticmethod
Return a sorted list of all bundled suite names.
Returns:
| Type | Description |
|---|---|
list[str]
|
Sorted list of available suite names. |
run()
Run all entries in the suite and return their results.
Each entry in the suite spec is run through a BenchPipeline with the DGP, corruptors, and run parameters specified in the JSON spec.
Returns:
| Type | Description |
|---|---|
dict[str, BenchResult]
|
Mapping from entry label (str) to BenchResult. The dict is ordered according to the entry order in the spec. |