Check the time. Like… really! Check it! VMware PSC queries Windows Server NTP Service

I won’t forget to double, double, double check a local NTP service ever again. Everyone knows, if something does not work, it’s DNS. I agree. But almost as important as the reverse/forward DNS lookup is the correct time given by a local NTP service. I learned it the really hard way and will probably tell my grandkids about this.

But let’s start from the beginning. I was installing a new virtual environment running on DellEMC VxRail Nodes with a separate VMware Platform Service Controller and VMware vCenter Server Appliance. Both version 6.5.

Collection off all needed information was no big deal. Also generating the new DNS records for forward and revers lookup queries. I also test this step twice to make sure it is working. So I moved on to the configuration wizard and validated the details with no errors. So everything seems to be just fine. The installation began…

wizard_validate

Shortly afterwards it stopped with an error during the first boot configuration of the VMware Platform Service Controller.

wizard_fail

Checked the log, checked the DNS (again!), checked the time on the nodes using “esxcli hardware clock get” and “esxcli system time get” which both showed the local time. So what was the deal?

I dig like a fanatic through the VxRail Manager installation log looking for a clou. And then it showed up.

2018-01-01 12:00:00,000 – DEBUG – Running: ntpq -p 192.168.1.1

2018-01-01 12:00:00,000 – DEBUG – Output of command ntpq -p 192.168.1.1: 192.168.1.1: timed out, nothing received

***Request timed out

2018-01-01 12:00:00,000 – DEBUG – Output of command ntpq -p 192.168.1.2: 192.168.1.2: timed out, nothing received

***Request timed out

It looks like, even though I added not one but two NTP servers during the validation, it did not work out. So at this point my colleague @JörnRusch joined me to test all network settings done during the implementation and helped making sure the provided NTP servers are working as expected. We did take a closer look at the used commands and queried the NTP services directly.

cmd_ntp_fail

w32tm /stripchart /computer:IPADDRESSORDNSNAME

shell_ntp_fail

ntpq -p <IPADDRESSORDNSNAME>

Exactly this issue is described at the VMware KB1035833 which states that a NTP service running on a Microsoft Windows Server with default configuration may not response with a correct time for VMware services.

Symptoms

An ESXi/ESX host configured to use a Microsoft Windows 2003 or newer Domain Controller as a time source never synchronizes its clock with a default configuration.

Resolution


Workaround

[…]
ESXi/ESX support synchronization of time with an external NTPv3 or NTPv4 server compliant with RFC 5905 and RFC 1305. Microsoft Windows 2003 and newer use the W32Time service to synchronize time for windows clients and facilitate the Kerberos v5 protocol. For more information, see the Microsoft Knowledge Base article 939322 and How the Windows Time Service Works.
By default, an unsynced Windows server chooses a 10-second dispersion and adds to the dispersion on each poll interval that it remains in sync. An ESXi/ESX host, by default, does not accept any NTP reply with a root dispersion greater than 1.5 seconds.
[…]

Configure Windows NTP Client

ESXi/ESX requires an accurate time source to synchronize with. To use a Windows 2003 or newer server, it should be configured to get its time from an accurate upstream NTP server. For more information, see the Microsoft Knowledge Base article 816042.
[…]

So we tried the firewall which, to our luck, does also provide a NTP service.

cmd_ntp_success

shell_ntp_success

That looked more like a success for querying an NTP service! Reverting the validation for the DellEMC VxRail installation and changing the NTP settings to the firewall service worked like a charm. The cluster was up with no errors in a blink of time!

Interesting though, I had a very similar NTP / vCSA situation posted a while ago -> Click here … I hope that, if someone is running into the same situation, this post will give a hint to doublecheck the name resolution AND time service. Both services are mandatory for your virtual infrastructure! Keep that in mind.

Kudos to @JörnRusch!

If you feel something needs to be added or edited, feel free to contact me in the comments or by mail.

Share the knowledge!

 

_____

Logs and time was changed for this post.