[Known Issue] Nova snapshot failures
The research cloud is currently experiencing issues with reliability of uploads to the Glance image service. This has been ongoing for a couple of months, however the full extent and widespread impact of the problem was only recently discovered. We apologise to users who have been impacted by this problem and encourage you to report issues such as snapshot errors to helpdesk whenever they occur. An upstream fix is in progress and Nectar Core Services is in the process of backporting this to apply across the research cloud.
A small percentage (less than 1/10th of a percent) of these Glance API requests are failing. Unfortunately each Nova snapshot involves many Glance uploads as the snapshot image is chunked. A lack of retry logic around these uploads means that a failure of any chunk will cause the whole snapshot operation to fail, resulting in the snapshot going into the "Error" state.
Until the fix is in place across all Nectar zones users may need to retry snapshots several times until they are successful. If snapshots continue to fail please open a support ticket so that we can track your issue and notify you when to retry.
- OpenStack Blueprint for buffered-reader https://blueprints.launchpad.net/glance/+spec/buffered-reader-for-swift-driver
- Upstream patch https://review.openstack.org/#/c/120866/