At DUG, maintaining the peak performance of our compute environment is essential for the smooth operation of our high-performance computers and the huge workloads being undertaken each day. We work tirelessly, around-the-clock, to keep your jobs running.
Recently we made two considerable upgrades that would ordinarily require significant system downtime.
Did anyone notice? We don’t think so!
First, we undertook significant power distribution changes for ‘Bruce’, our supercomputer in Perth, which took weeks of planning and careful process management to ensure that the upgrade was completed without interrupting the compute cycles. Not a single piece of hardware went down!
Secondly, we undertook a major config change to the batch-scheduler – a complicated procedure that normally requires the loss of all state – meaning all the jobs in the queue would be lost. However, with some fit-for-purpose software development and careful testing, this improvement was also made without any downtime and not a single job loss.