Skip to content

RandomNeuralDGP

RandomNeuralDGP generates datasets using a randomly initialized neural network as the signal function. The random network maps input features to a nonlinear target, producing complex high-dimensional relationships that are difficult to approximate with simple models.

Optional dependency

RandomNeuralDGP requires PyTorch. Install it with:

pip install synthbench[neural]

Quick Start

import synthbench
from synthbench import BenchPipeline, RandomNeuralDGP

dgp = RandomNeuralDGP(complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)

print(result.X.shape)   # (500, 10)
print(result.y.shape)   # (500,)
print(list(result.metadata.keys()))

# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values()))  # 1.0

Parameters

Parameter Default Description
complexity "medium" Controls network width, depth, and activation nonlinearity
task_type "regression" "regression" for continuous target, "classification" for binary labels
random_state 0 Integer seed for reproducibility
class_weight 0.5 (Classification only) Fraction of samples in the positive class

Notes

  • Importing synthbench does not load PyTorch into sys.modules. PyTorch is imported lazily only when RandomNeuralDGP is first accessed.
  • Feature importances are computed using input gradient magnitudes, normalized to sum to 1.0. The normalization is done in Python (not torch float32) to guarantee exact sum == 1.0.
  • The network architecture (width, depth) is determined by the complexity parameter.
  • Reproducibility: the same random_state always produces an identical network and dataset.