Skip to content

BenchSuite

BenchSuite provides named, curated collections of benchmark datasets. A single .run() call generates all datasets in the collection, keyed by human-readable labels. Results are fully reproducible: the same suite name always produces bit-identical BenchResult objects.

Bundled suites

Name Description
easy-classification Low-complexity classification datasets across LinearDGP, TreeDGP, and FriedmanDGP
hard-regression High-complexity regression datasets across PolynomialDGP, FriedmanDGP, and AdditiveDGP

Usage

Running a suite

from synthbench import BenchSuite

suite = BenchSuite("easy-classification")
results = suite.run()

print(list(results.keys()))
# ['linear_low', 'tree_low', 'friedman_low']

result = results["linear_low"]
print(result.X.shape)                          # (500, 10)
print(result.metadata["dgp_class"])            # "LinearDGP"
print(result.metadata["dgp_params"]["complexity"])  # "low"

Listing available suites

from synthbench import BenchSuite

print(BenchSuite.list_suites())
# ['easy-classification', 'hard-regression']

BenchSuite reference

BenchSuite(name)

Loads a bundled suite spec by name.

Parameter Type Description
name str Name of a bundled suite, e.g. "easy-classification".

Raises ValueError with the list of available names if name is not found. The error message does not chain an inner exception, keeping the traceback clean.

BenchSuite.run() → dict[str, BenchResult]

Generates all datasets in the suite via BenchPipeline.run().

Each entry in the returned dict is keyed by the entry's label field from the internal JSON spec. The dict is ordered according to the entry order in the spec.

Results are reproducible: running the same suite twice (in the same or different sessions) always returns bit-identical data, provided the same version of synthbench, numpy, and scikit-learn is used.

BenchSuite.list_suites() → list[str]

Class method. Returns a sorted list of all bundled suite names.

BenchSuite.list_suites()   # ['easy-classification', 'hard-regression']

Adding custom suites

Custom suites are not supported in the current release. To assemble your own curated collection, construct a list of BenchPipeline calls directly following the same pattern used in the bundled suite JSON specs:

from synthbench import BenchPipeline, LinearDGP, PolynomialDGP

entries = [
    ("linear_custom", BenchPipeline(LinearDGP(task_type="regression"))),
    ("poly_custom",   BenchPipeline(PolynomialDGP(task_type="regression"))),
]
results = {label: pipeline.run(n_samples=500, n_features=10, random_state=0)
           for label, pipeline in entries}