FriedmanDGP
FriedmanDGP implements the classic Friedman (1991) benchmark functions. Three variants are available via the friedman_function parameter. These functions are widely used in the machine learning literature to benchmark nonlinear regression methods.
Quick Start
import synthbench
from synthbench import BenchPipeline, FriedmanDGP
dgp = FriedmanDGP(friedman_function=1, complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)
print(result.X.shape) # (500, 10)
print(result.y.shape) # (500,)
print(list(result.metadata.keys()))
# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values())) # 1.0
Parameters
| Parameter | Default | Description |
|---|---|---|
friedman_function |
1 |
Friedman function variant: 1, 2, or 3 |
complexity |
"medium" |
Complexity level (affects noise scaling) |
task_type |
"regression" |
"regression" for continuous target, "classification" for binary labels |
random_state |
0 |
Integer seed for reproducibility |
class_weight |
0.5 |
(Classification only) Fraction of samples in the positive class |
Notes
- Function 1:
y = 10*sin(pi*x1*x2) + 20*(x3-0.5)^2 + 10*x4 + 5*x5. Featuresx1-x5in[0, 1]. - Function 2: Uses non-unit feature ranges as per the original Friedman (1991) paper (e.g.,
x1 in [40*pi, 560*pi]). - Function 3: Also uses non-unit ranges following the original paper.
- Feature importances are equal-weight across formula features — ground truth is the formula structure, not empirical contribution.
- Extra features beyond those used in the formula are noise with exactly
0.0importance.