forums

[Service Disruption] Unexpected Outage at NCI

A cooling problem in the data centre for the National Computational Infrastructure (NCI) has required their compute hosts and other infrastructure to be powered off for the time being. This is expected to disrupt all Nectar Research Cloud services being run out of NCI, including all access to virtual machine instances running in the NCI Availability Zone.


At the moment it is unclear how long the problem will last. This post will be updated with more news as it arises.


If you need support as a result of this outage, please feel free to contact support@ehelp.edu.au


We thank you for your patience.


EDIT 2019-09-19: All services at NCI were restored at approximately 17:30 last night, 2019-09-18. If you raised a ticket regarding the outage, an NCI representative will aim to respond to you today.


1 person likes this

Since the outage NCI Node has been suffering ongoing hardware issues which manifest as apparently random lockups on hypervisors. We are restarting the affected hypervisors as we identify them, and we will be attempting to restart affected instances.


Because of the apparently random nature of these failures we cannot make any predictions about what may be affected next, however we currently believe that this will continue until we can identify the underlying cause. As such, users with instances at NCI Node should be prepared for intermittent services, and should ensure that their backups and recovery plans are up to date.


The NCI Node Cinder storage is not affected, so data on Cinder volumes is not at risk.


NCI Cloud Team

Login to post a comment