Fixing The Ever-Crashing VMware Virtual Center Service

Here is a  post on fixing the irritatingly unreliability of our VMware VirtualCenter Service (VPXD).  Here we are running the latest-and-greatest from VMware (vSphere 4.0 Update 1), and our Virtual Center still cannot ride out a SQL database failover.  A tiny loss in connectivity between the VPXD process and it’s remote SQL database, and the service faults.  It does not outright stop, sadly, since we could configure it to auto-restart… it just stops working and never reconnects to the database.

You would think that there would be gripes about this all over the Internet, but it is not so commonly complained about as you might think.  Fortunately I found a lead today:

http://communities.vmware.com/message/1332356

The solution proposed by “embo500” is to trigger a PowerShell script when an “EventID 1000” gets registered by the VPXD service.  This is more or less what I though we were going to have to do.  I was hoping there were some data source or virtual center service settings we could throw that would mitigate the problem, but apparently not.

FWIW, here is the code snippet provided by embo in the thread above:

$logentry = Get-EventLog -LogName Application | Where {$_.EventId -eq "1000"} | Where {$_.Source -eq "VMWare VirtualCenter Server"} | Select -First 1
if ($logentry.Message -match "ODBC error")
{
if ($logentry.Message -match "SHUTDOWN is in progress")
{
Start-Sleep -s 30
Start-Service vpxd
}
}

The code provides a good starting place.  However, I think a better approach might be to run this command:

get-viserver

If you get a successful connection result, all is well.  If not, then you need to cycle VPXD.  You have to have the VMware PowerShell modules loaded for this to work.  However, it turns out that none of this scripting will likely be needed as work on our SQL infrastructure has changed the game.

We had an additional problem as well… I just converted from a SQL Server 2005 failover cluster based on MSCS to a SQL Server 2008 mirrored database model.  Unfortunately, I could get the Virtual Center service to respect the failover node specified in the data source selector.  Aargh!  I tried enabling the SQL Server Browser service on the database servers to see if it would help the VC Server make connectivity (and also the firewall ports required for VC to reach the browser service).  This was ineffective, as was disabling the named pipe SQL Client transport, as suggested in other forums.

In the end the problem was resolved by using the SQL Native Client Data Source setup tool to test the datasource during a mirror failover.  The connectivity test failed with a permissions error!  Why?  Well, the VMware Virtual Center requires the use of SQL authentication.  When we set this up on our original SQL 2005 failover cluster, the database account used by the VC Service was mapped to the local database “dbo” user (this is the default config for vCenter).  Guess what?  That does not work in a mirror config.  The SQL logon accout to database account mapping is per server.  Once I set up a separate account in the database for the virtual center account (and assigned it the “dbo” role), the data source started working.

Even better news… we now find that Virtual Center rides out mirrored database failover events.  I was able to swing primary/mirror roles between our data centers several times without Virtual Center even noticing.  So much for the old problems of Microsoft failover clusters.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s