GeometricDGP
GeometricDGP generates classification datasets based on geometric manifold structures: moons, circles, or spirals. It is classification-only and useful for benchmarking methods that exploit spatial structure.
Quick Start
import synthbench
from synthbench import BenchPipeline, GeometricDGP
dgp = GeometricDGP(
geometry="moons",
complexity="medium",
task_type="classification",
random_state=0,
)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)
print(result.X.shape) # (500, 10)
print(result.y.shape) # (500,)
print(list(result.metadata.keys()))
# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values())) # 1.0
Parameters
| Parameter | Default | Description |
|---|---|---|
geometry |
"moons" |
Geometric structure: "moons", "circles", or "spirals" |
complexity |
"medium" |
Controls noise level and separability of the classes |
task_type |
"classification" |
Must be "classification". GeometricDGP does not support regression. |
random_state |
0 |
Integer seed for reproducibility |
class_weight |
0.5 |
Fraction of samples in the positive class |
Notes
- GeometricDGP uses NumPy trigonometry from scratch (no scikit-learn dependency) to generate the geometric structures.
- Feature importances are structural constants:
0.5for each of the two geometric dimensions. Remaining features are noise with0.0importance. class_weightcontrols the fraction assigned to the positive class:n_half = int(n_samples * class_weight).- The
"moons"geometry places two interleaving half-circles;"circles"creates concentric rings;"spirals"produces interleaving spiral arms.