Best-of-N Sampling in Tabular GANs

Improving GAN-generated tabular data quality via best-of-N selection

This project investigates applying Best-of-N sampling strategies to Generative Adversarial Networks for tabular data synthesis, improving the quality and utility of synthetic datasets.

Motivation

Tabular GANs often produce low-quality synthetic data due to mode collapse and training instability. Best-of-N sampling selects the highest-quality samples from N generated candidates, improving downstream ML task performance.

Approach

  • Systematic evaluation of quality metrics (column-wise stats, ML efficacy) for best-of-N selection
  • Benchmarking across CTGAN, TVAE, and CTAB-GAN on real-world tabular datasets