RandomNeuralDGP
RandomNeuralDGP generates datasets using a randomly initialized neural network as the signal function. The random network maps input features to a nonlinear target, producing complex high-dimensional relationships that are difficult to approximate with simple models.
Optional dependency
RandomNeuralDGP requires PyTorch. Install it with:
Quick Start
import synthbench
from synthbench import BenchPipeline, RandomNeuralDGP
dgp = RandomNeuralDGP(complexity="medium", task_type="regression", random_state=0)
pipeline = BenchPipeline(dgp)
result = pipeline.run(n_samples=500, n_features=10, random_state=42)
print(result.X.shape) # (500, 10)
print(result.y.shape) # (500,)
print(list(result.metadata.keys()))
# Signal importances sum to 1.0
importances = result.metadata["signal_feature_importances"]
print(sum(importances.values())) # 1.0
Parameters
| Parameter | Default | Description |
|---|---|---|
complexity |
"medium" |
Controls network width, depth, and activation nonlinearity |
task_type |
"regression" |
"regression" for continuous target, "classification" for binary labels |
random_state |
0 |
Integer seed for reproducibility |
class_weight |
0.5 |
(Classification only) Fraction of samples in the positive class |
Notes
- Importing
synthbenchdoes not load PyTorch intosys.modules. PyTorch is imported lazily only whenRandomNeuralDGPis first accessed. - Feature importances are computed using input gradient magnitudes, normalized to sum to 1.0. The normalization is done in Python (not torch float32) to guarantee exact
sum == 1.0. - The network architecture (width, depth) is determined by the
complexityparameter. - Reproducibility: the same
random_statealways produces an identical network and dataset.