Skip to content

FriedmanDGP

FriedmanDGP implements the classic Friedman (1991) benchmark functions. Three variants are available via the friedman_function parameter. These functions are widely used in the machine learning literature to benchmark nonlinear regression methods.

Quick Start

import synthbench
from synthbench import BenchPipeline, FriedmanDGP

dgp = FriedmanDGP(friedman_function=1, complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)

print(result.X.shape)   # (500, 10)
print(result.y.shape)   # (500,)
print(list(result.metadata.keys()))

# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values()))  # 1.0

Parameters

Parameter Default Description
friedman_function 1 Friedman function variant: 1, 2, or 3
complexity "medium" Complexity level (affects noise scaling)
task_type "regression" "regression" for continuous target, "classification" for binary labels
random_state 0 Integer seed for reproducibility
class_weight 0.5 (Classification only) Fraction of samples in the positive class

Notes

  • Function 1: y = 10*sin(pi*x1*x2) + 20*(x3-0.5)^2 + 10*x4 + 5*x5. Features x1-x5 in [0, 1].
  • Function 2: Uses non-unit feature ranges as per the original Friedman (1991) paper (e.g., x1 in [40*pi, 560*pi]).
  • Function 3: Also uses non-unit ranges following the original paper.
  • Feature importances are equal-weight across formula features — ground truth is the formula structure, not empirical contribution.
  • Extra features beyond those used in the formula are noise with exactly 0.0 importance.