Fixing The Ever-Crashing VMware Virtual Center Service

Here is a  post on fixing the irritatingly unreliability of our VMware VirtualCenter Service (VPXD).  Here we are running the latest-and-greatest from VMware (vSphere 4.0 Update 1), and our Virtual Center still cannot ride out a SQL database failover.  A tiny loss in connectivity between the VPXD process and it’s remote SQL database, and the service faults.  It does not outright stop, sadly, since we could configure it to auto-restart… it just stops working and never reconnects to the database.

You would think that there would be gripes about this all over the Internet, but it is not so commonly complained about as you might think.  Fortunately I found a lead today:

http://communities.vmware.com/message/1332356

The solution proposed by “embo500” is to trigger a PowerShell script when an “EventID 1000” gets registered by the VPXD service.  This is more or less what I though we were going to have to do.  I was hoping there were some data source or virtual center service settings we could throw that would mitigate the problem, but apparently not.

FWIW, here is the code snippet provided by embo in the thread above:

$logentry = Get-EventLog -LogName Application | Where {$_.EventId -eq "1000"} | Where {$_.Source -eq "VMWare VirtualCenter Server"} | Select -First 1
if ($logentry.Message -match "ODBC error")
{
if ($logentry.Message -match "SHUTDOWN is in progress")
{
Start-Sleep -s 30
Start-Service vpxd
}
}

The code provides a good starting place.  However, I think a better approach might be to run this command:

get-viserver

If you get a successful connection result, all is well.  If not, then you need to cycle VPXD.  You have to have the VMware PowerShell modules loaded for this to work.  However, it turns out that none of this scripting will likely be needed as work on our SQL infrastructure has changed the game.

We had an additional problem as well… I just converted from a SQL Server 2005 failover cluster based on MSCS to a SQL Server 2008 mirrored database model.  Unfortunately, I could get the Virtual Center service to respect the failover node specified in the data source selector.  Aargh!  I tried enabling the SQL Server Browser service on the database servers to see if it would help the VC Server make connectivity (and also the firewall ports required for VC to reach the browser service).  This was ineffective, as was disabling the named pipe SQL Client transport, as suggested in other forums.

In the end the problem was resolved by using the SQL Native Client Data Source setup tool to test the datasource during a mirror failover.  The connectivity test failed with a permissions error!  Why?  Well, the VMware Virtual Center requires the use of SQL authentication.  When we set this up on our original SQL 2005 failover cluster, the database account used by the VC Service was mapped to the local database “dbo” user (this is the default config for vCenter).  Guess what?  That does not work in a mirror config.  The SQL logon accout to database account mapping is per server.  Once I set up a separate account in the database for the virtual center account (and assigned it the “dbo” role), the data source started working.

Even better news… we now find that Virtual Center rides out mirrored database failover events.  I was able to swing primary/mirror roles between our data centers several times without Virtual Center even noticing.  So much for the old problems of Microsoft failover clusters.

Getting cruft objects out of Operations Manager

Recently I had to downgrade a SQL Express instance from the 2008 version back to 2005.  The downgrade solved my DB performance problems, but created a monitoring problem.  Operations Manager continued to believe that this server was running SQL 2008!

So, how do you get rid of a monitored object that is part of a dynamically discovered group?  The answer lies (as with most OpsMgr problems) in overrides:

http://blogs.msdn.com/boris_yanushpolsky/archive/2007/11/20/opsmgr-sp1-removing-instances-for-which-discovery-is-disabled.aspx

Boris of OpsMgr++ fame tells us to use the “Authoring” view in the OpsMgr console to find the “Object Discovery” rule that found your SQL instance (probably “SQL 2008 DB Engine”).  You then generate on override which will disable discovery for your named computer.  Since SQL discovery runs fairly infrequently, you may also want to override the same rule for all computers, forcing discovery to a more frequent interval (say… 300-600 seconds).

After discovery completes, open the OpsMgr PowerShell console, and run the “Remove-DisabledMonitoringObject” cmdlet (with no arguments).  If you are exceptionally lucky, your undesired object will disappear from the OpsMgr Monitoring view in short order.

SQL Server 2008 – Configuration Notes For A New Mirror

With the impending release of SharePoint 2010, I have decided to try to get our backing SQL infrastructure up-to-date, and also to re-architect our system to use SQL mirroring.  This project has been easier than I expected, with a few inevitable bumps along the road:

  • The SQL documentation states that when setting up a new mirrored database connection, you must take a backup of the existing database, and restore it to the mirror server before configuring the mirror.  It is a bit vague on the particulars.  I found out the hard way that you must perform the recovery and not restore the database to a running state, otherwise the mirror setup will fail.  The worst part is that the mirroring wizard will not give a helpful error… it will claim that ‘no network endpoint is available for the mirror connection’, or some such rot, leading you to believe that you have a problem with your firewall.  So, make sure you do the following when setting up the database for mirroring:
    1. Backup the database and transaction logs separately
    2. Restore the database backup with the “WITH NORECOVER” option.
    3. Restore the transaction logs with the “WITH NORECOVER” option.
  • Concerning SQL Express:  SQL Express is approved for use as a Witness server in a SQL mirroring configuration.  However, SQL Express will not work as a Witness out of the box.  To get things running, you will need to:
    • enable the “TCP” protocol in the “SQL Server Configuration Manager” tool.
    • You also may need to configure the instance to use the static TCP port “1433” for SQL Services (also in the Configuration Manager.  On my installation, SQL Express defaulted to the use of dynamic ports for SQL, which made for a good deal of trouble in configuring the Windows firewall.
    • You may need to enable the SQL Browser service in “Configuration Manager”, and open UDP port 1434 to your management station.  Without the browser service, I could not get SS Management Studio to talk to SQL Express.  Boo
  • If configuring databases on Windows Server 2008 or 2008 R2, you will have the option to use the much-improved Windows firewall to secure your SQL instance.  I have chosen to restrict TCP access to SQL services to only those hosts that need connectivity (at least for now).  Unfortunately, SQL 2008 does not create any firewall rules or firewall rule templates for you, so you will need to do it yourself.  Here are the required ports:
    • TCP Port 1433 – For basic client/server connectivity, and remote administration
    • UDP Port 1434 – For the SQL Browser service… not needed in most cases, but required to enable connectivity to SQL Express instances in my case (see above).
    • TCP Port 5022 – The Mirroring Endpoint default port… required open on all SQL servers that will be used in a mirror configuration.  SQL clients do not require connectivity to this port… only primary, mirror, and witness servers in a cluster.
    • TCP Port 135 and Program-based access to “%ProgramFiles%Microsoft SQL Server100DTSBinnMsDtsSrvr.exe” – Required to establish SQL Server Integration Services remote connectivity.  These are DCOM-based services that use those annoying Dynamic RPC ports.  SQL clients should not be allowed to connect to this port.  I have it open only to the systems from which I plan to plan to perform SQL management operations.
  • Concerning Maintenance Plans:  MPs are made somewhat more complicated on systems using mirroring.  Clearly you want your Maintenance Plans to run regardless of where your primary mirror happens to be at any given time.  However, DBCC and backup operations can be performed only on the primary database in a mirror set.  So how do you ensure that your maintenance plans will always run on the primary?  A common solution seems to be to create identical maintenance plans for both mirror servers, but to use the “Ignore databases where the state is not online” flag in the SS Management Studio database selector dialog (“WITH SKIP” T-SQL argument).  This also brings up the question of how you can replicate those tediously configured Maintenance Plans between your different mirrors.  The answer involves SSIS packages and the SQL Server Business Intelligence Development Studio (an optional component in a SQL Server installation):
    • Use SS Management Studio to connect to “Integration Services” on the SQL instance where you have configured your maintenance plans.
    • Navigate to “Stored PackagesMSDBMaintenance Plans”… hey look… all of your maintenance plans are just SSIS packages!
    • Right-click the plan you want to copy, then select “Export Package”.
      • Select “File System” for your package location
      • Browse to the location where you want to save your package and click “OK”.
    • You could now use the BI Development Studio to edit the package file, but the smarter thing to do is to open the file using a text editor.  I like “Notepad++”.  Find and replace all instance of the name of your source SQL server with the name of your destination SQL server.  If you fail to complete this step, your Maintenance Plans will fail with connection errors.  Why?  Because the SSIS export function preserves the local server connection strings, causing your destination system to attempt to backup databases on a remote computer instead of itself.  Boo!
    • Connect to Integration Services on the destination SQL server.  Navigate to the same “Maintenance Plans” directory, right-click and select “Import”.  Again select “File System” for the package location.  Browse to the file you just edited, and click “OK” to import.
    • Verify that the imported package is configured correctly using the Maintenance Plan editor.
    • You will need to update the schedules for your plans because schedules are not an integral part of an MP.  This is “metadata” that does not get exported.
    • Remember to repeat this procedure every time you update your MPs on either server in your mirror configuration.