The SKA Observatory is an international endeavour to build the world’s largest radio telescopes, a €2 billion project aimed at transforming humanity’s understanding of the universe. Behind this monumental effort lies an equally groundbreaking challenge: processing petabytes of raw data every day into high-resolution astronomical images.
nAG’s High-Performance Computing (HPC) engineers play a pivotal role in making this possible. Their work ensures the seamless transformation of vast volumes of data into science-ready data products, supporting the SKAO’s mission to accelerate discovery in radio astronomy and gravitational wave science.
My team and I are responsible for a range of tools that are needed to process petabytes (yes, with a P!) of raw data into detailed pictures of the sky. Among our contributions, we’ve optimised key software components — most notably the imaging and “cleaning” pipeline that converts calibrated data into high-fidelity astronomical images, as well as the tools that detect and catalogue the radio emission from black holes captured in these images.
Personally, I’ve focused heavily on data engineering, specifically helping to design and define the next-generation data format for radio astronomy: Measurement Set version 4 (MSv4). This new format is poised to become a global standard for radio telescopes worldwide. I’ve also been instrumental in developing tools to ensure its smooth adoption by the broader radio astronomy community.
In December 2024, I represented the SKA project as the lead expert on I/O performance and scalability at an international review spanning 24 institutions, including input from Oak Ridge National Laboratory — the epicentre of HPC in the United States. During this review, I advised the panel on the hardware strategies and software architectures that will unlock the full potential of MSv4, pushing the limits of what’s possible in large-scale data processing for radio astronomy.
Image Copyright: SKAO Image Author: SKAO
My focus is on science data processing — the stage of the pipeline where we run large-scale batch processing on HPC systems to transform raw telescope data into science-ready images and catalogues. This is where the heavy lifting happens, as we apply sophisticated algorithms to transform petabytes of data into images varying from 1 megapixel to over 17,000 megapixels in size.
While my primary role is within this area, I collaborate closely with colleagues across other critical parts of the SKAO software ecosystem. For example, the real-time processing team, who use FPGA arrays to hunt for pulsars in real-time before any data even touches a disk, and the SRCNet team — architects of a global network of data centres designed to receive, distribute, and further process the outputs from the Science Data Processor (SDP). Together, our efforts ensure that data flows smoothly from telescope to scientist, no matter where in the world the science happens.
The concept for the SKA telescopes dates all the way back to the late 1980s But for decades, it remained an ambitious idea, waiting for computer hardware to catch up. Astronomers always knew that to unlock the SKA’s potential, we’d need computing systems fast enough — and affordable enough — to process the staggering volumes of data produced by its two massive telescopes.
That’s why the SKA telescopes are often described as “software telescopes”. Rather than relying on a single, gigantic dish, the SKA combines data from hundreds of dishes and thousands of antennas through software, creating images far superior to anything achievable with even the largest single-dish telescope.
Without HPC systems, the SKA telescopes simply wouldn’t be possible. HPC is the beating heart of the project, enabling us to turn torrents of raw signals into precise, high-resolution snapshots of the universe.
Image Copyright: SKAO · Collage of simulated images of future SKA-Low observations, showing what the telescope is expected to be able to produce as it grows in size. The images depict the same area of sky as that observed in the first image from a working version of the telescope, released in March 2025. Top left: By 2026/2027, SKA-Low will have more than 17,000 antennas and will become the most sensitive radio telescope of its kind in the world. It will be able to detect over 4,500 galaxies in this same patch of sky. Top right: By 2028/2029, SKA-Low will count over 78,000 antennas and be able to detect more than 23,000 galaxies in this field. Bottom: The full SKA-Low telescope will count more than 130,000 antennas spread over 74 km. Similar observations of this area will be able to detect some 43,000 galaxies, while deep surveys performed of this area of the sky from 2030 will be able to reveal up to 600,000 galaxies.
Right now (May 2025), I’m focused on integrating MSv4 support into a key software tool called DP3 (pronounced “DP cubed”). DP3 plays a crucial role in the SKA data pipeline: it calibrates the raw signals we receive from the telescopes and flags corrupted data, ensuring the data is scientifically accurate. By adding MSv4 support, we’re enabling distributed calibration — allowing us to spread the workload across multiple computing nodes, with recent benchmarks showing 20x throughput compared to the most widely used MSv2 format.
Day to day, my work is hands-on software development. I spend much of my time writing and optimising code, but in a project of this scale and complexity, collaboration is just as critical as coding. With such a diverse software ecosystem, we work closely as a team to stay aligned and ensure we’re all moving towards our shared goals.
Beyond being part of the SDP software team, I’m also part of a collaborative working group that includes key contributors from the US National Radio Astronomy Observatory (NRAO) and the South African Radio Astronomy Observatory (SARAO). We meet weekly to align our software solutions, share technical insights, and ensure tight integration across our international efforts. This close collaboration is vital to ensure our tools work seamlessly across observatories and deliver the best possible outcomes for the global radio astronomy community.
And it’s not just about coding — I also work alongside teams contributing to the SDP roadmap and long-term vision. Together, we’re constantly refining our priorities and strategies to realise the ambition of building the SKA telescopes.
I’m a physicist by training — my PhD focused on writing software to simulate magnetic systems at the tiniest length scales. Interestingly, I was never particularly drawn to astronomy during my studies. But once I joined the SKA project, I quickly discovered that radio astronomy comes with its own rich history of specialised data processing techniques, as well as some truly unique challenges. One of the first hurdles I faced was the sheer breadth of new terminology and concepts specific to the field, especially those arising from the difficulties of observing the universe amidst a world saturated with wireless signals.
Once I found my footing in the landscape of radio astronomy, the real technical challenge emerged: squeezing every last ounce of performance from the hardware we have. This is because, despite being a €2 billion flagship project, the SKAO operates with a relatively modest budget for HPC infrastructure. My colleagues and I are deeply immersed in the fine details of both the data and the hardware, constantly engineering robust and scalable solutions to push the limits of what’s possible. It’s a challenge that demands not just technical skill, but creativity — one of the things that makes working on the SKA project so rewarding.
Image Credit and Copyright: SKAO/Cassandra Cavallaro · Author: Cassandra Cavallaro
The historic Lovell Telescope reflected in the window of SKAO Global HQ, UK.
As part of the SDP development we continually test our software and hardware against progressively larger volumes of data, all with the goal of hitting the required performance and scalability targets by the time the telescope comes online towards the end of the decade. Among all the challenges, the most demanding HPC bottleneck is unquestionably I/O performance. To realise the full SKA vision, our Science Processing Centres in Cape Town and Perth will need to sustain average read and write speeds of around 8 terabytes per second, 24 hours a day — a staggering figure.
While that might sound achievable in comparison to high-end data centres, the SKA project’s challenge lies in the complexity of our data access patterns. Unlike workloads that are “embarrassingly parallel”, where tasks can be distributed independently across compute nodes, our workflows require tightly coordinated data movement which is governed by the physics of radio astronomy. This means we have to choreograph data access across many nodes with precision. Achieving this level of orchestration is an ongoing frontier in HPC, and it’s pushing us right up against the limits of today’s hardware and software architectures.
Wherever possible, we aim to use battle-tested, off-the-shelf solutions. This allows our developers to focus their efforts on the truly unique challenges of radio astronomy, rather than reinventing the wheel. Importantly, the software we build for the SKA telescopes isn’t just for internal use — it’s made available to the global radio astronomy community. That means there’s a strong emphasis on modernising and optimising existing tools to make them more robust, sustainable, and accessible for a wide range of users.
That said, we’re always keeping a close eye on advances in software, hardware, and infrastructure, and we actively adopt the latest technologies in data storage and numerical computing where they offer real benefits. Some of our most complex challenges centre around managing data dependencies efficiently. We work hard to minimise inter-process communication (IPC) and avoid data access contention, both of which are critical for scaling our software effectively across an entire HPC cluster. Solving these problems often requires creative, novel approaches to data orchestration — it’s at this intersection of proven technologies and innovative problem-solving where much of the SDP software progress happens.
I’ve been a strong advocate for integrating object storage technologies — widely adopted in cloud computing — into the SKAO’s Science Data Processor (SDP). This approach is still quite unconventional in HPC environments, but I believe it represents the start of a paradigm shift. As HPC increasingly converges with data science and “big data” workloads, the required performance characteristics of these systems are evolving. Our choice of underlying technologies needs to evolve too.
Object storage offers flexibility and scalability that align well with the data-intensive nature of the SKA telescopes. However, introducing new technologies into an established domain like HPC is never straightforward. From experience, I know it can be challenging to build consensus and overcome natural resistance to change. But, by demonstrating the tangible benefits of these approaches, I hope to help pave the way for a new generation of HPC systems that are better suited to the data challenges of modern scientific research.
Not all of my work is about writing code to make software faster — a significant part of my role involves running hardware optimisation experiments to guide critical infrastructure decisions. When you’re dealing with data at the scale of the SKA project, the amount of system memory (RAM) in each machine can have an outsized impact on performance.
In one case, my team and I observed a 4× performance boost by doubling the amount of RAM available. With the larger memory footprint, the operating system was able to cache all of data accessed by other processes in memory, dramatically reducing the number of read operations from high-latency storage like our Lustre partitions. Crucially, this also increases the bandwidth available for ingesting data from the telescope. What makes this particularly exciting is that it’s a performance improvement that doesn’t require huge investment. By carefully matching memory configurations to our projected data volumes, we can achieve significant gains cost-effectively.
A huge part of a radio astronomer’s work today involves painstaking data processing — something the SKAO must fully automate because the vast data volumes are too large to transmit over the internet. By taking this burden off researchers, the SKAO will accelerate the pace of discovery, freeing scientists to focus on new ideas and groundbreaking research rather than manual data wrangling. It’s a transformation that will be a tremendous benefit to the global radio astronomy community.
The work I do directly enables the SKA telescopes to image structures in the universe with far greater sensitivity than ever before. By combining extreme sensitivity with unprecedented resolution, we’re opening a window into the faintest structures in the cosmos, allowing us to probe the physics of the distant past. With the SKA telescopes, we’ll be able to study the evolution of dark matter, investigate processes happening at the atomic scale in black hole jets, and even detect the very first light emitted in the universe.
One particularly exciting frontier is pulsar research. Pulsars — ancient, collapsed stars with the strongest magnetic fields known in the universe — are a relatively new focus in astrophysics. The SKA telescopes will vastly expand our ability to discover, characterise, and track pulsars — providing unparalleled opportunities to test the limits of Einstein’s theory of General Relativity.
Image: Pulsar neutron star. Source of radio emission in space.
Perhaps most thrilling of all, the SKA telescopes’ extraordinary sensitivity will play a crucial role in advancing gravitational wave astronomy. Experiments like LIGO, the most sensitive science instrument ever built, can detect ripples in space-time but struggle to differentiate between genuine gravitational waves and local interference — even something as mundane as a microwave being switched on in a local town. The SKA telescopes will act as a verification tool, confirming the astrophysical origins of LIGO’s signals. Once a gravitational wave is confirmed, the SKAO would be alerted. Then, its telescopes could be put into an emergency observation mode and rapidly point at the source. I’m convinced that when we witness the first direct observation of two black holes merging, it will be thanks to the SKAO.
What excites me most about working on the SKA project is the chance to learn from, and collaborate with, some of the brightest minds in radio astronomy. I get a unique, behind-the-scenes view of how groundbreaking science is done, and I have the privilege of helping to develop new techniques that will directly contribute to published research and future discoveries.
There’s something incredibly rewarding about knowing that the tools and systems I’m helping to build could play a role in answering some of the biggest questions about our universe. And one day, when a Nobel Prize in Physics is awarded for discoveries made with SKA data, I’ll be able to say: I helped make that possible.
In building automated, near-real-time data pipelines for the SKA telescopes, there’s simply no room for error. Unlike traditional data processing workflows, we can’t afford to stop and restart if something goes wrong — the data is flowing continuously, and the telescopes need to be operational almost 100% of the time. That means the software I’m developing has to work flawlessly, first time, every time. It’s a level of precision and reliability that pushes us to write the most robust and dependable code of our careers.
I’m helping pave the way for scalable, high-performance storage systems that will power radio telescopes for decades to come.
Image Copyright: SKAO
This will close in 20 seconds
This will close in 0 seconds