forums

Start a new topic

CentOS 6 not booting with kernel-2.6.32-696.30.1.el6.x86_64

If you run CentOS 6, and keep your system up-to-date, you may have recently installed a kernel update to version 2.6.32-696.30.1. This specific version of the kernel was found to have issues booting, when used in an instance which runs on compute nodes with AMD CPU's. The workaround for this bug will unfortunately create a massive performance hit, and is not viable.


When a kernel is updated, the new kernel does not actually run until the instance is rebooted. By default, multiple versions of the kernel are kept: if there's an issue with one, another one can be booted into. Unfortunately, when using the instance console in the dashboard, the option to boot  into a different kernel (called the "GRUB menu") by default disappears before a user can make a choice. This is because of a low default timeout value of 5 seconds, which can and should be adjusted.


You can check if your instance is running on an AMD or an Intel CPU by running:


lscpu | grep "Model name"


You can check your current kernel version by running:


uname -r


You can check what kernels you have currently installed by running:


rpm -qa kernel* | grep -v firmware


You can check what kernel will be booted into next by default:


cat /boot/grub/grub.conf | egrep "(title|default)"


The above command produces a list of bootable kernels and a default value. Please note the default value is indexed from 0. For example, the following output indicates that by default, the next kernel to boot into by default is 4.17.1-1.el6.elrepo.x86_64:


default=1

title CentOS (2.6.32-696.30.1.el6.x86_64)

title CentOS (4.17.1-1.el6.elrepo.x86_64)

title CentOS 6 (2.6.32-696.20.1.el6.x86_64)


To adjust timeout for the grub menu, using your text editor of choice, edit the file /boot/grub/grub.conf:


#timeout=5

timeout=30


This will increase the time given to the user from 5 seconds to 30. Please note that this will increase your instance boot (without user intervention via dashboard console) time by 25 seconds.


SOLUTION:


If you have installed kernel 2.6.32-696.30.1 and are running on an AMD compute node, but have not yet rebooted your instance (if you have rebooted, you are likely not able to boot it back up, please submit a ticket), you have several options:


a) Upgrade to CentOS 7 which is not known to have this issue

b) Remove 2.6.32-696.30.1 and pin your kernel to version 2.6.32-696.29.1 (not recommended, as you are opting out of further security kernel updates)

c) Install and use a mainline kernel version 4.x.x-x.el6.elrepo.x86_64 from ELRepo. To do this, please follow this procedure:


#Remove kernel 2.6.32-696.30.1

yum remove -y kernel-2.6.32-696.30.1

#Install mainline kernel

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

rpm -Uvh http://www.elrepo.org/elrepo-release-6-8.el6.elrepo.noarch.rpm

yum --enablerepo=elrepo-kernel install -y kernel-ml

#Exclude non-mainline kernel from further updates (otherwise every non-mainline kernel update will set itself as default, and you will need to change the default every time)

echo "exclude=kernel-2.6*" >> /etc/yum.conf

#Ensure the mainline kernel as default boot kernel as described above in the "You can check what kernel will be booted into next by default" example

#If you followed the above procedure correctly and in order, the default next-boot kernel version will be 4.x.x-x and not 2.x.x-x.x.x
#If for some reason the default next-boot kernel is not version 4.x.x-x, simply change the "default=x" entry in /boot/grub/grub.conf using your preferred text editor


Bug Report: https://bugs.centos.org/view.php?id=14871#c32055




17/08/2018 UPDATE:


Kernel version kernel-2.6.32-754.3.5.el6.x86_64 is currently the latest version 2.* kernel, and the above issue is not evident when running this kernel.


This means if you've pinned your kernel as described above, you can unpin it and upgrade to kernel-2.6.32-754.3.5.el6.x86_64.

Login to post a comment