VMware ESXi on Dell PowerEdge internal dual SD module may not be responsive

Last week I came up on a situation on my homelab where one of my VMware ESXi host was not responsive and not connected to the vCSA, even though the virtual machines running on it were not affected.

Trying to connect to the host’s website I received this error:

2018-12-06 12_29_59-Recorded Session_ 516710582 ==_ RZ02BACKUP01 (992 640 826) - Date_ 2018-11-30 16.jpg

I connected to the host by SSH and tried to restart the management services, as this was my first thought to troubleshoot this behavior. But it didn’t change the availability of the host.  Then I checked the hostd.log and vpxa.log

cat /var/log/hostd.log

cat /var/log/vpxa.log

Next step was to determine if the partitions are running out of space by using

df -h

The command did not respond and the SSH session timed out. So I tried to check the status of my storage devices using (KB1014953)

esxcli storage core path list

esxcli storage core device world list

2018-12-06 12_41_14-Recorded Session_ 516710582 ==_ RZ02BACKUP01 (992 640 826) - Date_ 2018-11-30 16.jpg

ls -alh /vmfs/devices/disks


So I tried to list the devices using the localcli command which worked and gave me output about the attached devices on my host. So I searched for the SD-card on which the ESXi is installed and reviewed the state:

2018-12-06 12_53_23-Recorded Session_ 516710582 ==_ RZ02BACKUP01 (992 640 826) - Date_ 2018-11-30 16.jpg

I rescanned the adapter to reactivate the state using the command:

esxcfg-rescan vmhba32

After the command was successful I was able to restart the management services and the host’s website was available again!

Of course I checked the installed firmware and compatibility matrix twice and everything seems to be just fine. But there is a potential bug in the internal dual SD modul 1.6 firmware on Dell 13G/14G which I ran into while testing some situations at my homelab.

Fixes & Enhancements

Resolved defect where IDSDM fails to be detected when server is rebooted after running VMWare ESXi certification tests
– Disabled reporting of ERROR5 (Secondary Missing) in non-RAID mode
– Disallows writes to certain registers while a rebuild is happening, correcting misleading information reported to IDRAC
– Fixed potential defect in write protect function for 14G