FriedmanDGP

FriedmanDGP implements the classic Friedman (1991) benchmark functions. Three variants are available via the friedman_function parameter. These functions are widely used in the machine learning literature to benchmark nonlinear regression methods.

Quick Start

import synthbench
from synthbench import BenchPipeline, FriedmanDGP

dgp = FriedmanDGP(friedman_function=1, complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)

print(result.X.shape)   # (500, 10)
print(result.y.shape)   # (500,)
print(list(result.metadata.keys()))

# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values()))  # 1.0

Parameters

Parameter	Default	Description
`friedman_function`	`1`	Friedman function variant: `1`, `2`, or `3`
`complexity`	`"medium"`	Complexity level (affects noise scaling)
`task_type`	`"regression"`	`"regression"` for continuous target, `"classification"` for binary labels
`random_state`	`0`	Integer seed for reproducibility
`class_weight`	`0.5`	(Classification only) Fraction of samples in the positive class

Notes

Function 1: y = 10*sin(pi*x1*x2) + 20*(x3-0.5)^2 + 10*x4 + 5*x5. Features x1-x5 in [0, 1].
Function 2: Uses non-unit feature ranges as per the original Friedman (1991) paper (e.g., x1 in [40*pi, 560*pi]).
Function 3: Also uses non-unit ranges following the original paper.
Feature importances are equal-weight across formula features — ground truth is the formula structure, not empirical contribution.
Extra features beyond those used in the formula are noise with exactly 0.0 importance.