Article
Useful post-installation utility
EsxDiag for VMware ESX Server by Veeam
Author: Anton Petrov
Applies to: ESX Server, EsxDiag
Date: August 08, 2007
|
 |
|
The other day the software maker Veeam issued an open-source tool EsxDiag. This new tool is meant to conduct post-install configuration checks for ESX Server 3.x. Quote from the website: “…the product checks service console configuration, the TCP and DNS settings, and checks which scheduled services failed to launch. It also checks the NTP settings (including firewall) and shows you the time difference between the public NTP server and the host.”
Yesterday I gave it a try (see my results below ) and would like to share with you what I have learned from it. Also there are some ESX commands that might help you to resolve your FAILED issues:
Each ESX subsystem check consists of several sub-checks. Sub-checks can be “critical” or “non-critical”. Overall check status gets ‘FAILED’ if at least one “critical” sub-check gets 'FAILED'.
Service console network interface
This set of tests is intended to check if the service console interface (vswif) is configured, operational and connected to a virtual switch (vswitch). At the same time vswitch is checked to be connected to the operational physical network interfaces (vmnic).
The service console interface IP should be neither broadcast nor multicast.
If you get overall FAILED status you should not be able to connect to ESX via service console, and then refer to the recommendations of VMware knowledge base at:
http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1339
Commands used:
ifconfig – to check the service console NIC
esxcfg-vswif – to retrieve the portgroup for the service console and to get the same IP address as in the file /etc/sysconfig/network-scripts/ifcfg-vswitch<SC NIC number>.
esxcfg-vswtch – to see the vswitch associated with the service console portgroup
esxcfg-nics – to retrieve uplink information.
Routing
To pass the routing configuration check the default gateway must be set up and reachable for the ESX host, since EsxDiag will consider a ping reply as a successful test outcome. To fix your routing you should edit the /etc/sysconfig/network and execute /etc/init.d/network with ‘restart’ parameter.
Commands used:
ip with route parameter – to show current routing configuration.
ping and arping – to ping
Name resolution
This will check whether the ESX Server name resolution subsystem and appropriate network environment (DNS servers) are properly configured and operational. For the host lookups /etc/nsswitch.conf and /etc/host.conf should be configured to use DNS. Also DNS servers list as well as the primary domain prefix (‘search’ or ‘domain’ parameters in /etc/resolv.conf) should be set. The ‘A’ record should be specified for the server’s hostname.
As you might know the host names are retrieved from the following files and they should be identical:
- /etc/vmware/esx.conf
- /etc/sysconfig/network
- /etc/hosts
The localhost entry should be in the /etc/hosts.
If you get FAILED status make sure the DNS servers are configured properly and reachable to reply to the DNS requests, check your DNS server zones configuration.
Commands used:
dig – to get reply to DNS requests
Services
This set of tests is checking if all important services (such as crond, vmware-vmkauthd, xinetd, pegasus, syslog, ntpd, vmware-webAccess, sshd, mgmt-vmware) are enabled at startup for the current init level (3 by default) and all processes this service depends on are running. Some of the services are considered to be non-critical (e.g. ntpd, vmware-webAccess, pegasus) so if you get FAILED status on them - it is just FYI.
For sshd additional check is performed since sometimes the init script started with ‘status’ parameter says “sshd is running” even if the main process/usr/sbin/sshd is not running.
If you get overall FAILED status check the services which failed to start. You can start them with the commands mentioned below.
Commands used:
chkconfig - to check whether any service startup is enabled at the init level
service – to start/stop/restart the service,
alternatively you can run /etc/init.d/<service name> start (stop/restart).
Time synchronization
Many critical applications rely on accurate time synchronization, including Microsoft Active Directory and VMware DRS. With EsxDiag you can check, modify and apply the correct NTP configuration for all ESX Servers within your VI3 environment.
The outgoing UDP 123 port should be open. Even if the port is closed the checks will be executed, since ‘esxcfg-firewall’ can generate invalid ‘iptables’ config. NTP servers should be specified in the /etc/ntp.conf and in the /etc/ntp/step-tickers.
Specified NTP servers should be available through the NTP protocol. The time on the server should be synchronized with the NTP servers configured or with VMware recommended (vmware.pool.ntp.org, vmware.pool.ntp.org, vmware.pool.ntp.org).
Current time offset should be reasonable. Absolute value for the time offset should be less than 15 seconds.
The ntpd service should be enabled on the current runlevel and currently started.
Time synchronization daemon (ntpd) should be operational.
If you get overall FAILED status please refer to the recommendations of VMware knowledge base at: http://www.vmware.com/support/kb/enduser/std_adp.php?p_faqid=1339
Commands used:
esxcfg-firewall –q ntpClient – to check if the outgoing UDP 123 port is open.
chkconfig – to check ntpd service.
service ntpd status - to check ntpd.
ntpdate -q or ntpdate – to synchronize time - if the first command fails, since they are using different outgoing port number - udp datagram may be blocked by external firewall.
Network storage
In the current version of EsxDiag only NAS and iSCSI are tested.
To pass the test for remote network storage availability you should have all the remote NAS (NFS) mounts accessible from the ESX host and eventually allow getting the list of files.
The targets should be specified in the /etc/vmkiscsi.conf (‘DiscoveryAddress’ parameter) and generated having used Virtual Infrastructure Client.
The sockets for all discovered targets have to be created by ‘/usr/sbin/vmkiscsid’ process.
To check availability EsxDiag creates a file with random name on iSCSI LUN mounted to vmfs path.
All remote iSCSI paths (adapter: target: LUN) should be properly configured to have access.
If you get overall FAILED status check if all remote storage hosts/devices are alive and reachable via the network. In case of iSCSI storage adapters: try to edit in the Virtual Infrastructure Client:
Configuration tab-> Hardware pane -> Storage adapters or click Rescan.
Also try to use commands below to rescan iSCSI devices.
Commands used:
esxcfg-nas -l – to get the list of NAS mounts.
esxcfg-vmhbadevs -m – to get the list of iSCSI devises.
vmkfstools – to get additional information on vmfs device.
vmkiscsi-tool – to get additional information on iSCSI device.
lsof – to get the list of opened sockets.
esxcfg-swiscsi -s – to rescan iSCSI LUNs.
My EsxDiag output:
Prechecking ESX ......................................................... PASSED
Testing ESX subsystems
--------------------- Service Console network interface -----------------------
Check if at least one Service Console NIC is up (vswif0) ................ PASSED
Check if at least one uplink attached to SC NIC is up (vmnic0) .......... PASSED
Check if SC NIC IP is valid (192.168.0.32) .............................. PASSED
Check if SC NIC IP is the same as in sysconfig .......................... PASSED
---------------- Service Console network interface test ----------------- PASSED
---------------------------------- Routing ------------------------------------
Check if the default gateway is set (192.168.0.1) ....................... PASSED
Check if the gateway device is set (vswif0) ............................. PASSED
Check if the default gateway replies to ICMP or ARP requests ............ PASSED
----------------------------- Routing test ------------------------------ PASSED
------------------------------ Name resolution --------------------------------
Check if host.conf is configured to use DNS ............................. PASSED
Checking DNS servers (192.168.0.100, 192.168.0.101) ..................... PASSED
Check if the domain is set (company_name.local) ......................... PASSED
Check if DNS servers reply to ping (192.168.0.100, 192.168.0.101) ....... PASSED
Check if DNS servers reply to DNS request (192.168.0.100, 192.168.0.101) PASSED
Check if the server hostname is in esx.conf (esx2.company_name.local) ... PASSED
Check if the server hostname is in sysconfig (esx2.company_name.local) .. PASSED
Check if the server hostname is in hosts file (esx2.company_name.local) . PASSED
Check if the localhost is in hosts file ................................. PASSED
Check if all discovered hostnames are in sync ........................... PASSED
Check if the server hostname has fully qualified domain name (fqdn) ..... PASSED
Check if the server hostname is in DNS (resolves to 192.168.0.32) ....... PASSED
------------------------- Name resolution test -------------------------- PASSED
---------------------------------- Services -----------------------------------
Check if service is enabled on current runlevel crond ................... PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel vmware-vmkauthd ......... PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel xinetd .................. PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel pegasus ................. PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel syslog .................. PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel ntpd .................... PASSED
|_Check if service processes are running ................................ FAILED
Check if service is enabled on current runlevel vmware-webAccess ........ PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel sshd .................... PASSED
|_Check if service processes are running ................................ PASSED
Check if service is enabled on current runlevel mgmt-vmware ............. PASSED
|_Check if service processes are running ................................ PASSED
---------------------------- Services test ------------------------------ PASSED
---------------------------- Time synchronization -----------------------------
Check if firewall config allows outgoing NTP packets .................... PASSED
Check if ntpd (time synchronization daemon) is enabled at startup ....... PASSED
Check if ntpd is running ................................................ FAILED
Checking NTP servers (192.168.0.100) .................................... PASSED
Trying to reach found NTP servers: 192.168.0.100 ........................ PASSED
Check if time offset is reasonable (0.002916 sec.) ...................... PASSED
---------------------- Time synchronization test ------------------------ FAILED
------------------------------ Network storage --------------------------------
Check if NAS storage is configured (192.168.0.80) ....................... PASSED
Check if NAS hosts reply to ping ........................................ PASSED
Check if NAS mounts are available ....................................... PASSED
Checking iSCSI targets (192.168.0.58:3260) .............................. PASSED
Pinging discovered iSCSI targets ........................................ PASSED
Check if initiator is connected to iSCSI targets ........................ PASSED
Check if iSCSI LUNs are available ....................................... PASSED
------------------------- Network storage test -------------------------- PASSED
SUMMARY:
Service Console network interface test .................................. PASSED
Routing test ............................................................ PASSED
Name resolution test .................................................... PASSED
Services test ........................................................... PASSED
Time synchronization test ............................................... FAILED
Network storage test .................................................... PASSED
As you can see this script definitely can add value if you want to fast-check your fresh ESX 3.x installation. I’d recommend you to give it a try.
In fact, EsxDiag is a part of the Veeam Configurator product that helps you manage and control the configuration of your entire Virtual Infrastructure from a single Windows interface. You can read more at http://www.veeam.com/veeam_configurator.asp
The free EsxDiag utility is available at http://www.veeam.com/free-script/.
|