I meant to post this last week, but we know how that goes…
The original title was going to be, iSCSI Boot w/ an HP DL380 G5.
We first purchased three of these units back in … < SEE PO DB > and got them racked and powered. We allocated one of them to replace NISC40, an image server for a Linux Lab. The other unit we christened CP3 (a replacement for CP2) which will run CPanel for our Academic Faculty, Staff, & Depts.. We never really got around to configuring the 3rd one. I believe I spent some time trying to get iSCSI boot working but I’m not really certain.
Several years later, it’s time for me to revisit the project. I began by downloading the latest binary from HP and reading the updated documentation. The process is fairly straight forward. You configure the Option ROM on the NIC by providing the necessary iSCSI parameters. You tell the BIOS you want to boot iSCSI from the NIC, you reboot and hope that you establish a connection and that somehow your system recognizes the iSCSI LUN as a boot disk and off you go.
I was able to update all of the firmware and what not on the box using a USB key and an HP FirmWare CD image. I was able to find out what IP Address we had configured the iLO2 port for and uploaded the iSCSI Boot configuration file. I then modified the BIOS to enable iSCSI Boot and to load the Option ROM, but I wasn’t able to establish a connection. I tried changing a few parameters in the iSCSI Boot configuration file but was unsuccessful. To top it off, there wasn’t any error messages displayed on the console, so I didn’t know what exactly was going on.
So, in order to try and find out what was going on I configured an old Mac laptop w/ Wireshark (an tedious task that ended up consuming a day and a half, but thank you MacPorts) and decided to sniff the network traffic to see what exactly was going on. I believe I discovered two major issues.
1. It appears that the Option ROM doesn’t support iSCSI Load Balancing. That is to say it cannot communicate w/ our VIP and then change its connection to the specific module, as told by the VIP. No problem, we’ll just tell it to communicate w/ a specific module. That workaround, however, raises the question as to what happens after initialization, after the OS boots if redirection has to occur, will the OS comply. There are a few issues with this, but for the moment I have a workaround.
2. I suspected that network initialization between the NIC and the port on the switch doesn’t occur fast enough for the Option ROM to establish a connection and login to the iSCSI target. This doesn’t occur if I have a small hub as an intermediary since the link is already active. Aaron suspect that we might be able to hard code the port to Gig speeds @ full duplex to get around this issue. Before I do that, I’d like to make sure that I get iSCSI Boot to work. However, I’m now thinking that this might be more important to verify now, rather than later. It would be a great pain to have come all this way to get iSCSI Boot working, only to find out that there is no way to delay the initialization of the Option ROM. Being stuck at 10 Base-T @ half duplex would not be ideal.
Well, with those two out of the way, my attempts to configure iSCSI boot have so far failed. My first attempt was to simply copy the existing OS installation onto the iSCSI LUN, make the appropriate changes, and hopefully boot that way. Wishful thinking… There is still far too much that is unknown for that to successfully work.
My next step was to configure iSCSI Boot and try a clean OS install, as documented in the documentation. There is a KickStart file that helps configure the installation to support an iSCSI LUN from the Option ROM. I managed to get that initial portion to work, however during the end of the install the post scripts failed and brought the entire thing to a screeching halt. It looks as if the installation didn’t properly create a kernel image on the /boot lun (no initrd file to be found). I spent some time mucking about the install trying to see if I can create said kernel, but w/o the working install directly I didn’t get too far.
I decided that I would try again w/ a more recent OS release (RHEL 5.4). So, at the end of the day on Friday (10/2/09) I downloaded a newer install image and burnt it to DVD, on Monday. Hopefully the post install scripts in the KS.cfg file will work w/ this release. If they don’t, I’ll be forced to post to HP and give their support line a call to see if they can help. Thankfully, the install completed and the post install scripts of the KS.cfg file ran. Unfortunately, iSCSI boot still didn’t work. The iSCSI connection was initialized at boot by the Option ROM, but GRUB failed to load. All I was able to get was a black screen w/ the letter GRUB in the upper left hand corner. No prompt (GRUB>) just GRUB. I troubleshooted this for a bit on Monday. I though that perhaps the MBR wasn’t properly installed on the iSCSI LUN. Somewhere in the process of trying to fix that I hosed the /boot partition on the physical disks installed in the server and had to spend a few hours getting that back to normal.
If I can get iSCSI boot to work, that would be great. It would mean that any future RHEL builds (although none are planned and it is highly unlikely that we will continue to run RHEL unless specific software required it) can be booted from the SAN and have backups (via SAN snapshots) taken care of. The reality is that we will probably employ Ubuntu Server on most new Linux boxes. The unfortunate aspect of this is that even if I do manage to get iSCSI boot working, I’ll have to perform a new clean install of CPanel in order to begin migrating existing Academic hosts from our older install. Regardless, it is a worth while project to spend my time on.
The good news is that since the RHEL install worked and the post scripts ran I had a somewhat capable initrd image to work with. I tried copying that to the /boot partition of the internal disks and tried using a local /boot to mount an iSCSI root (/). That failed, but at least it would get to a certain point and then fail. I decided that I could probably take a look at the initrd image and hack it in order to initialize an iSCSI connection and possibly mount /root from iSCSI. I would have gotten to it sooner except for the fact that a backup volume on my backup server filled and I was forced to repair the volume / server for the entire day on Tuesday.
Which brings me up to today. I’m happy to report that I have successfully managed to get this system booting /root from iSCSI. Now, if I can modify one of the original initrd files to include the additional drivers and scripts of the latest RHEL install I might be able to copy the local root to another iSCSI LUN and try to boot root off of iSCSI w/ the OS and our CPanel build installed and configured.
Joy,
–Raf