Skip to content

GeometricDGP

GeometricDGP generates classification datasets based on geometric manifold structures: moons, circles, or spirals. It is classification-only and useful for benchmarking methods that exploit spatial structure.

Quick Start

import synthbench
from synthbench import BenchPipeline, GeometricDGP

dgp = GeometricDGP(
    geometry="moons",
    complexity="medium",
    task_type="classification",
    random_state=0,
)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)

print(result.X.shape)   # (500, 10)
print(result.y.shape)   # (500,)
print(list(result.metadata.keys()))

# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values()))  # 1.0

Parameters

Parameter Default Description
geometry "moons" Geometric structure: "moons", "circles", or "spirals"
complexity "medium" Controls noise level and separability of the classes
task_type "classification" Must be "classification". GeometricDGP does not support regression.
random_state 0 Integer seed for reproducibility
class_weight 0.5 Fraction of samples in the positive class

Notes

  • GeometricDGP uses NumPy trigonometry from scratch (no scikit-learn dependency) to generate the geometric structures.
  • Feature importances are structural constants: 0.5 for each of the two geometric dimensions. Remaining features are noise with 0.0 importance.
  • class_weight controls the fraction assigned to the positive class: n_half = int(n_samples * class_weight).
  • The "moons" geometry places two interleaving half-circles; "circles" creates concentric rings; "spirals" produces interleaving spiral arms.