Major Network Upgrade - Service Outage on the Nectar Cloud 29 March 2022

Posted over 2 years ago by Jo Morris

  • Topic is Locked
J
Jo Morris Admin

Upgrade Progress Update: 5.50pm ADT 29 March 2022

Our SDN migration didn't work out as planned, so we've rolled back. Services should be operating as before. Please get in touch with our helpdesk (support@nectar.org.au) if you're experiencing any issues. Thanks for your patience!


Upgrade Progress Update: 12.49pm ADT 29 March 2022

The Software Defined Networking (SDN) upgrade is underway and continuing. We have encountered an issue with the migration from the old to the new system, which required us to restart part of the process, adding several hours.  The Nectar Cloud Dashboard was down but is now available but a lot of functionality is not available. 


Original Notification Notice: 


We are migrating the backend of Nectar’s Software Defined Network (SDN), which is the software that provides Advanced Networking capabilities such as Tenant Networks, Routers and Floating IP addresses.

We planning for the migration to occur on 29 March 2022. We will be emailing all Nectar Cloud users 4 weeks prior to the upgrade to notify them of upcoming service disruption and freeze.


There will be disruptions for all users, please read below for more details.

What will be affected

We are expecting a one day (0900 to 1800 AEDT) disruption for this migration. During the migration, the following will happen:


Minor Disruptions

  • OpenStack APIs will be stopped, or put into a read-only mode.

  • Compute Instances, Volumes and other resources (except Advance Networking) that are working will continue to work (with caveats, read more below)

  • Instances with Classic Networking will continue to function, but modifications will not be allowed. This means that instances will continue to run, but you cannot stop or reboot them, or boot new instances


Major Disruptions

  • All Container Orchestration Service Clusters will lose networking

  • Network traffic for Advanced Networking will cease, while Classic Networking will continue to function

  • Instances with Tenant Networks will need to be rebooted after migration is completed. It is recommended that you save your work and power down your virtual machine before the migration, and start it up after the migration is completed. If you do not, they will be hard rebooted when migration completes

What we are doing

We are migrating Nectar’s SDN provider from MidoNet to Open Virtual Network (OVN). MidoNet has been in use since 2017. Over the years, new SDN projects have emerged with better reliability and features. One such project is Open Virtual Network (OVN), which has since become OpenStack’s default backend.

Why do we need to migrate

We are migrating from MidoNet to OVN so that the Nectar Research Cloud will benefit from the increased reliability, support and features that OVN provides. Staying with MidoNet is not an option as MidoNet is unmaintained and will be removed from OpenStack.


There are a few known bugs in MidoNet which cause degradation of stability over time, particularly with the usage of Floating IPs. As MidoNet is currently unmaintained, these bugs will not be fixed. MidoNet also prevents us from upgrading the Nectar Cloud and substantial effort would have to be made to reintegrate MidoNet with each new version of OpenStack.


Moving to OVN will allow us to rely on upstream effort in development, testing and integration of a SDN within the OpenStack space. As OVN is now OpenStack’s default backend, we expect OVN to get better over time with more developers’ attention. 

How do I check if I am using Classic Networking or Advanced Networking?

You can check what kind of network you are using by logging into Dashboard and navigation to Project > Compute > Overview (this should be the default page). 


Under Network quota, if you do not see quota for Floating IPs or Networks, you are only using Classic Networking.


If you see quota for Floating IPs or Networks, you are using Advanced Networking, in addition to Classic Network. You have to check which instances are on tenant networks, these instances will be hard rebooted.


Note: You have to check this for each project.

0 Votes


0 Comments