HA errors after update to esx 3.5 u2
Yesterday I updated my whole ESX farm to 3.5 u2 and suddenly encountered a strange one: A few minutes after updating one of my test clusters gone red telling me the ha agent has an error. So i checked out the communities and found that i was not alone. Many people had the same issue. Now there are several guides online how to fix it, but most of them didn´t solve it for me now here is how I solved it:
First, it seems like this is some kind of DNS issue. So check the hostname in the
vi client. Lets say its esx1.domain.local
Now enter console and check
/etc/hosts
and make sure the entry from the vc is exactly the same, especially check for upper/lowercase mismatches. If your /etc/hosts shows ESX1.domain.local, change it to esx1.domain.local.
Now, thats not all, check
/etc/opt/vmware/aam/FT_HOSTS.
Your cluster members should be in this file, but only the first part f the dns name, if you dns name is esx1.domain.local, only esx1 should be in FT_HOSTS.
If there is any other entry or you are not shure, simply delete FT_HOSTS and reconfigure your cluster. Reboot the ESX hosts.
Now: Mostly these steps seemed to solve the problem but not for my test lab. The next day i encountered the error again. Now this is what I have done in addition which seemed to finally solve the problem.
Put your esx hosts in maintenance mode, remove them from the cluster, delete the cluster and create a new one with a different name. Put your esx hosts out of maintenance mode and assign them the cluster again. Now finally to be 100% sure right click em and reconfigure HA. That whole bunch solved the problem for me (have my eyes now on it for a few hours).