AdditiveDGP
AdditiveDGP generates datasets using a generalized additive model (GAM) structure: the target is a sum of univariate functions applied independently to each informative feature. This structure isolates individual feature contributions without cross-feature interactions.
Quick Start
import synthbench
from synthbench import BenchPipeline, AdditiveDGP
dgp = AdditiveDGP(complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)
print(result.X.shape) # (500, 10)
print(result.y.shape) # (500,)
print(list(result.metadata.keys()))
# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values())) # 1.0
Parameters
| Parameter | Default | Description |
|---|---|---|
complexity |
"medium" |
Controls number of additive components and their nonlinearity |
task_type |
"regression" |
"regression" for continuous target, "classification" for binary labels |
random_state |
0 |
Integer seed for reproducibility |
class_weight |
0.5 |
(Classification only) Fraction of samples in the positive class |
Notes
- Feature importances are empirical variance-based:
Var(w_i * f_i(X[:,i]))normalized over all informative components. - This captures how much each component actually contributes to output variance, not just its structural weight.
- Noise features receive exactly
0.0importance insignal_feature_importances. - Each univariate component can be a different function type (sin, polynomial, step, etc.) depending on complexity.