Skip to content

PolynomialDGP

PolynomialDGP generates datasets with polynomial interactions between features. The complexity parameter controls the interaction degree: low complexity uses main effects only, medium adds adjacent-pair cross-terms, and high introduces higher-order interactions.

Quick Start

import synthbench
from synthbench import BenchPipeline, PolynomialDGP

dgp = PolynomialDGP(complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)

print(result.X.shape)   # (500, 10)
print(result.y.shape)   # (500,)
print(list(result.metadata.keys()))

# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values()))  # 1.0

Parameters

Parameter Default Description
complexity "medium" Controls interaction degree: "low" = main effects, "medium" = cross-terms, "high" = higher-order
task_type "regression" "regression" for continuous target, "classification" for binary labels
random_state 0 Integer seed for reproducibility
class_weight 0.5 (Classification only) Fraction of samples in the positive class

Notes

  • Feature importances are structural equal-weight importances (1/n_informative) rather than coefficient-squared, reflecting the polynomial structure of the ground truth.
  • Medium complexity uses adjacent-pair cross-terms (feature i x feature i+1).
  • Noise features receive exactly 0.0 importance in signal_feature_importances.