Tag Archives: OpsMgr

Improving Notifications in System Center Operations Manager 2012

Anyone who depends on System Center Operations Manager 2012 (or any earlier version of SCOM, back to MOM) likely has noticed that notifications are a bit of a weak spot in the product.

To address this, we have use the “command channel” to improve the quality of messages coming out of SCOM.  Building on the backs of giants, we implemented a script that takes an AlertID from SCOM, and generated nicely formatted email and alpha-numeric pager messages with relevant alert details.

More recently, we have identified the need to generate follow-up notifications when an initial alert does not get addressed.  I went back to our original script, and updated it to use a new, custom Alert ResolutionState (“Notified”), and I have added logic to update the Alert CustomField1 and CustomField2 with data that is useful in determining whether or not an alert should get a new notification, and how many times follow-up notifications have been sent.

Heart-felt appreciation goes out to Tao Yang for his awesome work on his “SCOMEnhancedEmailNotification.ps1” script, which served as the core for my work here.

Here is my version… let me know if you have any questions about how it works:

Operations Manager – Discovered Entity Cleanup

We have been tracking a problem with some of our Operations Manager Server 2008 R2 agents. We have a pool of single CPU VMs that have been reporting “Operations Manager Agent CPU too high” alerts every ten hours or so (give or take a few hours). Unfortunately, I am not able to catch the agents while the CPU spike is taking place. Maybe I could set up a “Data Collector Set” to gather lots of process information when a CPU spike condition occurs, but I am feeling lazy and don’t want to do it.

So instead, I am taking a different approach… disabling non-essential discoveries to see if this lightens the load on the agents enough to stop the CPU spikes.  I thought I knew how to do this already, but my fist pass failed, and I had to learn something new (gasp!).  My thanks to Jonathan Almquist for his post on this subject:
http://blogs.technet.com/b/jonathanalmquist/archive/2008/09/14/remove-disabledmonitoringobject.aspx
Without that one, I would still be foundering.

I our case, I wanted to suppress discovery of System Center Configuration Manager 2007 Clients in the SCCM 2007 Management Pack.  To accomplish this, we need to identify the pertinent discovery rules, create a group that contains the agents that we want to exclude from discovery, then override the discovery for this new group.  We then can speed cleanup of the now obsolete discovered objects using the PowerShell “remove-disabledMonitoringObject” cmdlet.

  1. Go to the OpsMgr console, change to the the Authoring->Management Pack Objects->Object Discoveries view.  Use the “change scope” option to limit the displayed discovery rules to only those in the Configuration Manager management packs.  In this instance, we see there are rules for “Microsoft ConfigMgr 2007 Clients Discovery” and “Microsoft ConfigMgr 2007 Advanced Client Discovery”.  I will disable discovery for both of these.  Before moving on, take careful note of the “target” column.  In this case the target is “MOM 2005 Backward Compatibility Computer”, not “Windows Computer”, as you might expect.
  2. Change to the Authoring->Groups view.  Create a group that includes only objects of the type you identified in the first step.  I used dynamic inclusion rules to add all entities that do not match the naming convention of our Configuration Manager servers.
  3. Now go back to the Object Discoveries view, find the rules you want to override again, and add an override for objects in your new group.
  4. You could wait a few discovery cycles for the discovered entities to go away, or just pop into the OpsMgr PowerShell console, and run “remove-DisabledMonitoringObject”.  If you did your override rules properly, your undesirable objects should disappear right away.

I now have removed discovery and monitoring of the SCCM Client on all of the Windows Servers in my monitored environment.  We now shall see if this makes the OpsMgr Agent CPU utilization alerts go away.

Automatic Maintenance Mode after Windows Update

Since our original deployment of Microsoft Operations Manager 2005 (now System Center Operations Manager 2007 R2), we have struggled with handling of alerting during maintenance windows for our servers.  As many a SCOM admin can tell you, your management server can do a whole lot of squawking if you fail to set maintenance mode before a server reboot.

In the past, we configured notification blackout windows during scheduled Windows Update times.  This worked, sort of… While we did not get paged during system reboots, we none-the-less had a lot of alerts that needed to be closed out every time a system rebooted.  Why?  Because suppressing notification does not suppress alerting.  What we really needed to do was to put systems into maintenance mode before the Windows Update triggered reboot.  This always seemed to difficult to implement, so we never did it.  Scripting of Maintenance Mode requires that any account attempting to start MM have "admin" rights on the RMS.  We did not want to go there with distributed scripts, and so we were at an impasse.  The other big disadvantage to this approach was that it suppressed notifications for all systems in the infrastructure during scheduled maintenance windows, not just the ones that needed a reboot at that particular time.  If there were no Windows Update-induced reboots that evening, too bad.

But now, thanks to Group Policy Client-Side Extensions, PowerShell, various improvements in SCOM 2007 R2 Maintenance Mode, and the fine work of blogger and SCOM developer Derek Harkin, we have a workable solution to this problem.

Prerequisites:

  • GP Client-Side Extensions with item-level targeting (requires that CSE update be applied to any Server 2003 systems under management, and also use of a Vista-later system running a current Group Policy Management Console)
  • Derek Harkin’s Maintenance Mode Management Pack for SCOM 2007 R2:
    http://derekhar.blogspot.com/2009/11/new-agent-maintenance-mode.html
  • Our additional VBScript "WUTriggerMaintMode.vbs" to create events triggered by Windows Updates
  • "eventtriggers.exe" on Server 2003 (should be included with the OS)
  • "schtasks.exe" on Server 2008 (definately included with the OS)
  • Windows Scripting Host to run VBScripts on all managed targets (should be there by default…)
  • PowerShell on the RMS (required to install SCOM 2007 R2 anyway…) 

How it Works:

  • Group Policy Preferences run every day to refresh the scripts and Scheduled Tasks on all managed systems.  "MaintModeTrigger2008.xml", "WUTriggerMaintMode.vbs" and "MaintMode.vbs" are distributed to all systems, and tasks are created which perform the following task:
    Watch the "SYSTEM" event log for event ID 22 from source "WindowsUpdateClient".  When this event is seen, run the "MaintMode.vbs script with the arguments "ON 20M"
  • The MaintMode.vbs script will create a WSH entry in the Application event log which will state that the system needs to enter Maintenance Mode for 20 minutes.
  • A trigger set by the Maintenance Mode management pack will detect this event, and run a PowerShell script on the Root Management Server (RMS) that will put the desired system into Maintenance Mode.

Installation:

  1. Configure your Windows Update policy to delay at five minutes after update application before initiating system reboot.
  2. Install the Maintenance Mode management pack following the included directions.
  3. Create a Group Policy object which is linked to your managed servers for the purpose of configuring Windows Update-triggered Maintenance Mode settings.
  4. Copy the MaintMode.vbs script and, our "WUTriggerMaintMode.vbs", and custom XML file to all managed Windows servers using Group Policy Preferences.  I copied my scripts to “%SystemDrive%localscripts” on all managed system, but you could just plan to run the scripts off of a trusted, highly-available network share.
    NOTE: If you will be copying the files from a network share to a local directory using GP Preferences, then you must grant “Domain Computers” read/execute/traverse rights to the parent directory that contains your files.  If you will be executing the files directly from the file server, you need grant “Domain computers” rights to only the files themselves.
  5. Create a Scheduled Task preference targeted to Server 2003 and Server 2003 R2 systems.  This task will use cscript.exe to run "WUTriggerMaintMode.vbs".  I configured this task to run daily, but you could adjust the interval to suit your needs (weekly, monthly, on next reboot, run once, etc.)
  6. Create a Scheduled Task preference targeted to Server 2008 and Server 2008 R2 systems.  This taks will run schtasks.exe to create a scheduled task from the "MaintModeTrigger2008.xml" task specification file.  I used the command line:
    schtasks.exe /Create /RU SYSTEM /TN "WUTriggerMaintMode" /XML [pathToFile]MaintModeTrigger2008.xml /F

    Note:  I was unable to use the "Vista and later" version of the Scheduled Task preference, which is exposed when running GPMC on a Windows 7 client.  I am unclear why this did not work, but just to be safe you probably will need to use the "legacy" Scheduled Task tool.
    Again, you will need to set a schedule for this task that suits your environment.

  7. Force a Group Policy update on one of each OS type that you are managing to ensure that the GP Preferences are being distributed.
  8. Run the Scheduled Tasks that are created by the the policy to make sure that they work (at least on Server 2003, you can create synthetic events in the SYSTEM event log using "EVENTCREATE.exe".  You can set of the event triggers using this tool.).
  9. Enjoy the silence during your next system maintenance.

Required Files:

MaintModeTrigger2008.xml –

the file which creates a Server 2008 scheduled task that will run the MaintMode.vbs script after Windows Update signals that the server will be rebooted (an Event ID 22).

Contents:



  
    2009-12-04T13:05:53.1637625
    CAMPUSadmin-jgm
  
  
    
      PT30M
      true
      <QueryList><Query Id="0" Path="System"><Select Path="System">*[System[Provider[@Name='Microsoft-Windows-WindowsUpdateClient'] and EventID=22]]</Select></Query></QueryList>
    
  
  
    
      HighestAvailable
      S-1-5-18
    
  
  
    
      true
      false
    
    IgnoreNew
    false
    true
    true
    false
    false
    true
    true
    false
    false
    false
    PT1H
    7
  
  
    
      cscript.exe
      %SystemDrive%localscriptsMaintMode.vbs ON 20M
      %systemroot%system32
    
  

WUTriggerMaintMode.vbs –

Uses "eventtriggers.exe" on Server 2003 systems to create an event trigger (and corresponding Scheduled Task) which will watch the system event log for Windows Update-triggered reboots (event ID 22) and run MaintMode.vbs when this happens.  By the way, I know that this is crumby code… it could/shold be updated to provide better error handling and return codes.

Contents:

' WUTriggerMaintMode.vbs - Trigger Maintenance Mode based on pending Windows
' Update-triggered reboot, as seen in the System event log
Option Explicit

Dim bSetTrig
Dim fLog
Dim oShell, oExec, oExec2, oFile
Dim sLine, sSub, sSysDrive

Set oShell = WScript.CreateObject("WScript.Shell")
sSysDrive = oShell.ExpandEnvironmentStrings("%SystemDrive%")

' Create Log File
Set oFile = CreateObject("Scripting.FileSystemObject")
Set fLog = oFile.CreateTextFile(sSysDrive & "localscriptsWUTriggerMaintMo" _
	& "de.log", true)

' Query for existing triggers:
Set oExec = oShell.Exec("eventtriggers.exe /query")

' If Existing triggers previously created by this script are present,
' delete them:
Do While Not oExec.StdOut.AtEndOfStream
	sLine = oExec.StdOut.ReadLine
	fLog.WriteLine(sLine)
	if InStr(sLine,"Windows Update - Reboot") then
		fLog.WriteLine("Existing trigger detected")
		sSub = Left(LTrim(sLine), 1)
		fLog.WriteLine("Trigger ID is: " & sSub)
		set oExec2 = oShell.Exec("eventtriggers.exe /delete /tid " & sSub)
		Do while oExec2.Status = 0
			WScript.sleep 100
		Loop
		fLog.WriteLine("Deletion of Event trigger attempted.")
	end if
Loop

' Create latest event trigger:
set oExec2 = oShell.Exec("eventtriggers.exe /create /tr ""Windows Update - R" _
	& "eboot Impending"" /l SYSTEM /eid 22 /tk ""cscript.exe %SystemDrive%l" _
	& "ocalscriptsMaintMode.vbs ON 20M"" /ru ""System"" ")
Do while oExec2.Status = 0
	WScript.sleep 100
Loop
fLog.WriteLine("Creation of event trigger attempted.")

Getting cruft objects out of Operations Manager

Recently I had to downgrade a SQL Express instance from the 2008 version back to 2005.  The downgrade solved my DB performance problems, but created a monitoring problem.  Operations Manager continued to believe that this server was running SQL 2008!

So, how do you get rid of a monitored object that is part of a dynamically discovered group?  The answer lies (as with most OpsMgr problems) in overrides:

http://blogs.msdn.com/boris_yanushpolsky/archive/2007/11/20/opsmgr-sp1-removing-instances-for-which-discovery-is-disabled.aspx

Boris of OpsMgr++ fame tells us to use the “Authoring” view in the OpsMgr console to find the “Object Discovery” rule that found your SQL instance (probably “SQL 2008 DB Engine”).  You then generate on override which will disable discovery for your named computer.  Since SQL discovery runs fairly infrequently, you may also want to override the same rule for all computers, forcing discovery to a more frequent interval (say… 300-600 seconds).

After discovery completes, open the OpsMgr PowerShell console, and run the “Remove-DisabledMonitoringObject” cmdlet (with no arguments).  If you are exceptionally lucky, your undesired object will disappear from the OpsMgr Monitoring view in short order.

OpsMgr Severity/Priority Levels: “What ‘IS’ is.”

When working with OpsMgr overrides, I am always forgetting the mappings between alert severities and their corresponding numeric values in the database.  It is important to keep this straight, because if you set your overrides incorrectly, you risk either suppressing all notification for an alert, or even worse… increasing the number of notifications that you receive!

Marius provide the following mapping info in his fine blog on MSDN:

Mapping:

Alert Severity – Its corresponding integer value

Critical – 2
Warning – 1
Information – 0

Alert Priority – Its corresponding integer value

High – 2
Medium – 1
Low – 0

Read more here:

http://blogs.msdn.com/mariussutara/archive/2007/12/17/alert-severity-and-priority-use-with-override.aspx

So remember, when downgrading an alert from "Critical" to "Warning", change in from "Severity 2" to "Severity 1".  "Severity 3" will just cause more paging… TWTTTTH!