Speaker
Description
This work explores the applicability of synthetic data for training deep learning models aimed at real-time classification of astronomical radio signals. Building on previous research where lightweight convolutional neural networks (CNNs) using DM-time representations showed promising performance in detecting transient signals, we now turn to the question of whether synthetic datasets can serve as a reliable substitute for real observational data during training.
Synthetic data offers the advantage of full control over signal characteristics, allowing us to simulate a wide range of astrophysical phenomena and noise conditions. In this study, we generate a set of synthetic DM-time images designed to replicate realistic signal dispersion, varying signal-to-noise ratios (SNRs), receiver noise patterns, and other instrumental effects. The synthetic dataset is informed by parameters derived from well-known sources such as the Crab Pulsar and is intended to reflect the diversity and complexity of real-world radio observations.
We train minimalist CNN architectures—optimized for low-latency and low-resource environments—exclusively on synthetic data. These models will be evaluated on real pulsar observations to assess generalization capabilities across key performance metrics, including classification accuracy, precision, recall, and sensitivity to weak signals.
By comparing synthetic-trained models against baselines trained on real data, we aim to quantify the effectiveness and limitations of using simulated data in machine learning pipelines for radio astronomy. This work seeks to clarify the role synthetic data can play in accelerating model development, especially in scenarios where annotated real datasets are scarce or difficult to obtain.