Skip to content

AdditiveDGP

AdditiveDGP generates datasets using a generalized additive model (GAM) structure: the target is a sum of univariate functions applied independently to each informative feature. This structure isolates individual feature contributions without cross-feature interactions.

Quick Start

import synthbench
from synthbench import BenchPipeline, AdditiveDGP

dgp = AdditiveDGP(complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)

print(result.X.shape)   # (500, 10)
print(result.y.shape)   # (500,)
print(list(result.metadata.keys()))

# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values()))  # 1.0

Parameters

Parameter Default Description
complexity "medium" Controls number of additive components and their nonlinearity
task_type "regression" "regression" for continuous target, "classification" for binary labels
random_state 0 Integer seed for reproducibility
class_weight 0.5 (Classification only) Fraction of samples in the positive class

Notes

  • Feature importances are empirical variance-based: Var(w_i * f_i(X[:,i])) normalized over all informative components.
  • This captures how much each component actually contributes to output variance, not just its structural weight.
  • Noise features receive exactly 0.0 importance in signal_feature_importances.
  • Each univariate component can be a different function type (sin, polynomial, step, etc.) depending on complexity.