South African Population Research
Infrastructure Network


Processing Longitudinal Population Data at the Centre for High-Performance Computing

2 December 2021

“SAPRIN’s vision aligns with that of the National Department of Science and Innovation, which is to foster high-quality science beneficial to South Africa and its people. To facilitate this public benefit of science there is an interconnected network of organisations that help each other work towards the goal. One such that has been highly beneficial for SAPRIN is the Center for High Performance Computing (CHPC) in the National Integrated Cyber-Infrastructure System (NICIS), located in the Council for Science and Industrial Research (CSIR) in Pretoria. 

This week, Kobus Herbst presented on behalf of SAPRIN at the 2021 National Conference of the Center for High Performance Computing. His presentation  was on ‘Processing longitudinal population data using CHPC’

A fundamental principle for a national research infrastructure is accessibility. At SAPRIN we receive anonymised data from the HDSS node operations and prepare them through harmonisation and integration to be ready for multicenter longitudinal analysis, and at this point they are released for public access. The requisite data processing is done through the CHPC. In Kobus’ presentation you will see a breakdown of some impressive numbers. In a typical SAPRIN dataset there are 570 000 individuals who have ever resided in a SAPRIN node, representing 4,5 million person-years of observation, or 1,6 billion person-days. These person-days observed are re-aggregated by the computer into periods of time (or episodes) in which each person's residence and other status variables do not change. The final database for public release contains over a million episodes of time belonging to over half-a-million people. Thanks, CHPC, for helping us process this vast information at manageable speeds.”