Microsoft Data Protection Manager 2007 – Evaluating the MS Solution

Ever have a disaster with one of your servers?  No?  You lucky bastard…

Recently we had corruption of a number of our Virtual Machines (caused by a fault in the firmware of our “enterprise” storage system from a Nameless Mainstream Vendor, which was triggered by unexpected filesystem behavior from an Evil Mainstream Company’s virtualization platform).  This event forced us to exercise our system disaster recovery tools, (also from a Nameless Evil Mainstream Company) and brought us to the subsequent discovery that some backup products just don’t do DR.  There is much that could be said about that, but I will leave it there.

Anyway, we thought we would have a look at Microsoft DPM 2007 SP1 to see if the DR story there is any better.

Here are some sticking points I have hit while evaluating the product:

  • Server Recovery Tool (SRT) – This is the Bare Metal Recovery component of DPM.  As it turns out, it only can protect Server 2003 and XP systems.  Server 2000 is a no go (no tears here…), as are Server 2008 and Server 2008 R2 (aargh!).  SRT is pretty easy to setup and configure, but keep in mind that it must run on a Server 2003 OS (not Server 2008!).
  • DPM Reporting Services – The DPM installer creates a IIS instance on your machine, and configures/installs SQL 2005 with Reporting Services.  Very nice!  Unfortunately, the installer misses one critical IIS setting when installed on Server 2008:
    http://scdpm.blogspot.com/2009/07/reporting-does-not-work-with-dpm-2007.html
    You need edit the feature permissions on the the “HTTP Handler Mappings” feature on the Reporting Services IIS site to allow “Script” access (not script execution, just script access).  After that, you should be able to run reports from the DPM console.
  • Updates – In addition to SP1, there are numerous hotfix rollup packages available, of which you should take advantage. 
  • SharePoint Services Backup – The documentation on WSS backup is actually very good, but it is quite scattered.  A few sticking points for me were:
    • You must configure your WSS VSS Agent to run as an account that has both local Administrator rights on the SharePoint WFE, and Farm Administrator rights.  The writer must be configured after installing the DPM agent and before attempting backup.  Use “ConfigureSharePoint.exe” in the DPM bin directory to make these changes.
    • You also can configure backup of MOSS Search, which is accomplished using a different switch to the “ConfigureSharePoint.exe” tool.
    • The VSS updates mentioned above are required before backing up a WSS farm.
    • Server 2008 DR – Documentation on this really bites.  I sent the following feedback to the DPM whitepaper team for their document on Server 2008 Bare Metal recovery (How to do Bare Metal Recovery of WS08 with DPM 2007 SP1):
    • There are a few points in the white paper that require some clarification as they will confuse most readers.  I also have a few questions about the reasoning behind some of the steps in the document.

      1. In the section "Before you create the protection group" under "Configuring Backups for BMR" it is unclear on which system you should be performing these steps.  A seasoned DPM admin will be able to figure out that these steps need to be performed on every system which will be backed up, but a newbie will not know this.  The instructions should be more explicit. 
      2. The instructions have us create a share on the local server rather than simply backing up the WSB image to a named volume.  Why?  It would seem that backing up to a local share adds unnecessary complexity to this operation.  Using a local volume will be simpler and more secure. 
      3. Similarly, in the recovery instructions we are told to restore the system image to a local share.  Why?  In a bare metal recovery scenario, there is no local share to recover to!  The server referred to as "%computername%" in the recovery will likely be offline, and thus not available as a recovery target.
      4. In step 3 of "Configuring Backups…" we instructed to add the PreBackupScript commands to the "PSDataSourceConfig.xml" file, but we are not told within which XML tags to insert the code.  I was unable to make Pre-backup scripts run when following these instructions.  Instead, I placed the code snippet into "ScriptConfig.xml" (where other documentation suggests that this code actually belongs), and my backup jobs then started to run.
      5. There is no guidance here about the frequency with which BMR sets should be created.  Unfortunately, I can find very little in the way of best-practices on this subject (Server 2008 disaster recovery, as a general topic).  It seems that weekly (or perhaps even monthly) BME sets followed by daily "standard" DPM backups would be adequate to protect most operating systems, but it would be nice to have some verification of this.  Can you point me to any additional documentation on this subject.
    • Server 2008 DR needs some improvement, to be blunt about it.  The team promises better BMR integration under DPM 2010, but details are not yet available.
    Advertisements

    Sharepoint – farm build procedure

    After a semi-disaster with SharePoint earlier this week, I have been forced into the view that I really should have our SharePoint infrastructure hosted on more than one web server.  To that end, I am planning the deployment of a new, 2+ node Windows SharePoint Services farm. 

    Initial architecture will be something like this:

    • Host: SharePoint2
      • Roles:  Web front end, Search Server query and crawl, ECTS ADAM Instance
    • Host: SharePoint3
      • Roles:  Web front end, Search Server query and index, ECTS ADAM Instance
    • Hosts: WinDB1 and WinDB2
      • Roles:  Back-end SQL Database failover cluster
    • F5 Big-IP Local Traffic Manager (hardware load balancer)

    Once initial rollout is complete, we likely will want to add:

    • Host: SharePoint3
      • Roles: Dedicated Search Server Index and crawl engine.
    • Hosts: WinDB1 and WinDB2
      • Reconfigured in a SQL mirrored configuration

    Here is an outline of the SharePoint2/3 build procedure:

    1. Install Server 2008 x64 Standard OS
      1. Activate Roles:  IIS (with ASP.NET support), AD Lightweight Directory Services (AKA AD LDS, AKA ADAM).
      2. Activate Features:  .Net Framework 3.0, PowerShell
    2. Install Search Server Express x64 bits:
      1. Perform “complete” install (Search Server will not install a SQL 2005 instance, as is the case with WSS installer).  Under “file location”, specify “E:Office12.0Data” as the index storage location.
      2. Skip running of the Configuration Wizard after install.
    3. Install SharePoint Administration Kit v2.0
      1. Exclude Profile replicator component as it will not work on WSS
    4. Clone the server as many times as deemed necessary. (At present, make one clone!).  Any cloned systems must be sysprep-ed before joining the domain.  Once preped, join the computers, configure networking.
    5. If planning to add this server to a load balanced cluster, install NLB feature:
      • from “administrator” cmd shell, run “ocsetup NetworkLoadBalancingFullServer”
      • Don’t join to a production NLB cluster until SharePoint configuration is complete!
    6. Replicate AD LDS (ADAM) instance to new machine, if required.
      1. In Server Manager, Click on “AD Lightweight Directory Services” Role,
      2. Click “AD LDS Setup Wizard”
        1. Select “A replica of an existing instance”
        2. Name the instance “ECTSInstance”
        3. Accept standard LDAP ports
        4. specify a partnerpoint server to replicate from, use standard LDAP ports.
        5. Select the “OU=ects,…” partition set for replication (this should be the only partition!)
        6. Select secondary (non-system) volume as target for AD LDS data… generally this will be “E:Microsoft ADAMECTSInstancedata”
        7. Specify domain service account to run the AD LDS instance.
        8. Add “domain admins” to the AD LDS Administrators list.  Finish the wizard.
      3. Run the campus…bat file located in e:Microsoft ADAMECTSInstancedata.  This will register the Kerberos Service Principal Names required for LDP replication mutual authentication.
      4. Open the “Local Security Policy” Admin tool.  Add the domain service account to the “generate security audits” User Rights Assignment branch.
      5. Open the AD Users and Computers tool, locate the computer object on which you installed the Instance.  Give the LDS service account “create all child objects” to the computer object.
      6. Add the cluster load balanced SSL cert into the Personal certificate store of the ECTSInstance service account.
        1. Request wildcard certificate using the procedure outlined here:
          http://erlend.oftedal.no/blog/?blogid=7
          (We use the web interface for requesting a certificate, make user we use the RSA SChannel crypto provider to generate the request, use the “SHA-1” hash, use PKCS10 format, and use the “UVM – Web Server” request template.  For load-balanced LDAP servers, we must request a wildcard certificate (*.uvm.edu)
          NOTE: This step will not have to be repeated again until the current cert expires.  To add another AD LDS server, export the cert from a current server, import into the new server
        2. Export the request cert to file selecting “export all extended attributes” and “export private key” options.
        3. Import the cert into the “Personal” branch of the service account’s certificate store on the target server.  Make sure that you import “all extended attributes”, and the private key.  Do not select the use of advanced encryption password.
        4. Restart AD LDS and test SSL connections.
        5. If all is not working (as is the case with one of my two servers), here is where we get into undocumented territory.  Here are some helpful resources for debugging:
          1. I set SChannel diag logging to verbose :
            • HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlSecurityProvidersSchannel
              REG_DWORD EventLogging, value 0x7
            • Restart ECTSInstance, look for “SChannel” entries in the server “application” even logs.  These logs will tell you which certificate the system attempted to use, and why access failed.
          2. You may need to add the wildcard cert to the Local Computer Certificate Store as well… run MMC, add the “Certificates” snap-in for “Service Account”, using the “ECTS Instance” service.  Navigate to the “Personal” branch, run an import action, import the wildcard with all extended attributes and the private key.
          3. Now locate the physical copy of this cert in c:programdatamicrosoftcryptoRSAMachineKeys (it will be the file with the most recently modified time stamp).  Add “read/execute” permissions to this file for the AD LDS service account, then restart the LDS instance.
      7. Force mutual authentication for replication traffic:
        1. Run ADSI Edit
        2. “Connect to”, enter the AD LDS server name in the Computer field, select the “Configration” well-known naming context.  As documented in http://technet.microsoft.com/en-us/library/cc794841.aspx, get “properties” on the “CN=Configuration…” partition, and change the value of “msDSReplAuthenticationMode” to “2”.
      8. Set local password policy – this controls password policy of AD LDS accounts:
        1. Add the Sharepoint server computer account to the “ETS – SharePoint Password Policy” GP Object.  After running “gpupdate /target:computer /force”, verify the settings by doing the following:
          1. Open Local Security Policy control panel
          2. Expand “Account Policies”->”Password Po
            licy”
          3. Settings applied should follow the 24/365/0/8/Disabled/Disabled format.  (we may want to revisit this policy later).
    7. Run the SharePoint Products and Technologies Configuration Wizard:
      1. Connect to an existing Farm
      2. Enter “WINBD” as the database server.  The wizard will correctly select “SharePoint_FarmConfig” as the configuration database.  The correct service account username will be provided… you need to enter the password.
      3. Click “Advanced Settings”, specify that you which the server to host the Central Admnistration site.
        1. If setup fails with the error:
          "SharePoint Configuration Wizard failed with an exception "Error during encryption or decryption. System error code 997"
          A solution can be found here:
          http://blogs.msdn.com/priyo/archive/2007/08/11/add-new-sharepoint-server-to-existing-server-farm-an-unhandled-exception-occurred-in-the-user-interface-exception-information-unable-to-connect-to-the-remote-server.aspx
          Essentially we just run “stsadm –o updatefarmcredentials –userlogin “domainservice_acount” –password <thePassword>” on the first SharePoint server, then re-run the wizard.
      4. Update the “Central Admin” shortcut to point to the local Central Admin site by doing the following registry hack:
        http://blogs.technet.com/wbaer/archive/2007/08/30/sharepoint-3-0-central-administration-url-on-multiple-web-front-end-servers.aspx
        Essentially, edit the key:
        HKEY_LOCAL_MACHINESOFTWAREMicrosoftShared ToolsWeb Server Extensions12.0WSS
        Then locate CentralAdministrationURL and change it to point to the local server.
    8. Configure Search Service:
      1. When Search is run in an environment where SharePoint services are accessed from a FQDN which is different from the physical host name (i.e. our environment, or any other environment with load balancers), you will need to work around the “loopback security check” feature of Windows.  Failing to do so will result in “access denied” errors in the crawl logs.  My thanks to Shawn Feldman for discovering this:
        http://blogs.msdn.com/fledman/archive/2008/09/18/access-denied-with-windows-server-2008-and-moss-when-crawling.aspx
        The relevant work-around is documented here (see “Method 2”):
        http://support.microsoft.com/kb/896861
        We simply need to add the public FQDN of our SharePoint server to:
        Key: HKLMSYSTEMCurrentControlSetControlLsaMSV1_0
        Value: REG_MULTI_SZ, sharepoint.uvm.edu
        And then restart the IISAdmin service.
      2. Open the search admin page from SharePoint Central Administration:
        1. Access Crawling –> Content Sources
          1. Click the “Local Office SharePoint Server sites” default source.
          2. Define a crawling schedule for the SharePoint application
          3. Click “new content source” to add any additional content sources that are desired (i.e. our production file servers).
          4. Define additional crawl schedules for these new content sources.
      3. Add Search Center to our SharePoint landing page:
    9. Install Infrastructure Update for WSS3 x64:
      1. Initiate the update on the first node in the cluster. 
      2. When prompted, start the install on the second cluster node.
      3. When the configuration wizard completes on the second node, go back to the first and allow configuration to complete.
    10. Install Infrastructure Update for Search Server x64:
      1. Initiate the update on the first node in the cluster. 
      2. When prompted, start the install on the second cluster node.
      3. When the configuration wizard completes on the second node, go back to the first and allow configuration to complete.
    11. Clean up IIS settings for the newly created Web Sites – configure binding, authentication and SSL:
      1. SSL Cert Installation: 
        Install SSL certs into “Personal” Store of the Computer account using the “Certificates” MMC snapin.
      2. Binding: 
        Open the IIS Manager MMC snapin.  On each site, right-click and select “edit bindings”:
        1. For site “SharePoint – 443” (which represent the traditional “sharepoint.uvm.edu” URL), bind https and http protocols to port 80 and port 443, using the IP address for “sharepoint.uvm.edu” (132.198.102.12).  When binding SSL, select the appropriate cert from the “SSL Certificate” drop down menu.
        2. For “SharePoint – Internet” (which represents SharePointLite), bind https and http, ports 443 and 80, to “sharepointlite.uvm.edu”, IP 132.198.102.36.  Again, select the correct SSL cert for this site.
        3. For “SharePoint – Extranet” (which represents PartnerPoint), bing https and http, ports 443 and 80, to “partnerpoint.uvm.edu”, IP 132.198.102.49, selecting the matching SSL cert once again.
      3. SSL Configuration:  (Note that these procedures are only accurate when using Windows-native load balancers… when we transition to f5 load balancing, it will not be necessary to return custom errors from IIS as the f5 will handle HTTP-to-HTTPS redirections.)
        1. In IIS Manager, open the “features view” for each site.
        2. Double-click “SSL Settings”
        3. Check “Require SSL”, leaving the default “ignore Client certificates” setting.
        4. Now double-click the “Error Pages” item for the server root.  Add a custom error for 403.4 (SSL required), pointing to our custom “redirect.html” javascript file.  We will need to have copied this file into “c:inetpubcusterren-US” before completing this step
        5. Now find the applicationHost.config file for the IIS server.  This should be located in “C:Windowssystem32inetsrvconfig”.  Locate the section for each site that serves SharePoint content (i.e. <location path=”SharePoint – 443”>), then locate the <httpErrors> tag under <system.webServer>.  In the httpErrors tag, change the value for “existingResponse” from “PassThrough” to “Replace” (response “Auto” also seems to work, but may produce inconsistent results).  This will prevent ASP.NET from replacing the 403.4 error response from t
          he IIS server.  I am much indebted to this forum thread for this breakthrough:
          http://forums.iis.net/t/1113734.aspx
          Also helpful was the new “failed request tracing” module in IIS7:
          http://learn.iis.net/page.aspx/266/troubleshooting-failed-requests-using-tracing-in-iis7/
          More information on the meaning of the various existingResponse values can be found here:
          http://blogs.iis.net/ksingla/archive/2008/02/18/what-to-expect-from-iis7-custom-error-module.aspx
    12. Install the MS FilterPack 1.0 (Search Server can already index most Office 2007 documents, but this adds ability to index inside of One Note files and ZIP archives):
      1. Follow instructions at:
        http://support.microsoft.com/?id=946336
    13. Install Adobe iFilter, with 64-bit “thunking” DCOM service:
    14. Install MindManager extensions.
      1. DEPRECATED – We will discontinue this extension with the new upgrade as it does not work with MM v7 or v8
    15. Install ECTS components on each web front end server.
      • Having problems with installation script… what if we try the ECTS update available though CodePlex???
        1. If using updated ECTS files, it will be necessary to update the PartnerAdmin and PartnerConfig pages, as the self-service Site Collection Manager.  The existing pages will not work because the GUIDs on the Web Parts have changed. 
        2. The ects_setup_sharepoint.vbs script still fails using the updated code… Since the codeplex team has not documented their changes, I think we will skip this option.
      • Troubleshooting issues:
        1. The ects_setup_sharepoint.vbs script succeeds in installing the ECTS solution, but fails when activating site features.  I suspect that “cscript” on Server 2008 is not processing return codes from stsadm.exe correctly, and this is reporting failure to install features (I am not positive about the reason for the script failure, although it certainly is not a result of stsadm.exe being broken. 
          I was able to work around this problem by opening the ects_setup_sharepoint.vbs file in a text editor, searching for the error string that was sent to the console when the script failed, then running all of the operations in the script manually from that point forward.  Fortunately, all of the stsadm commands in the script are successful when run from the command line.
        2. ECTS is not compatible with MS Load Balancing out of the box.  I switched to a F5 load balancer before working through the problem.  It is possible that the problem I was having could have been fixed with the same “loopback security check” that caused problems during our F5 configuration
          http://support.microsoft.com/kb/896861
          In fact, we may have had the problem even with the f5 in place, but I would not know because I applied the loopback fix before implementing the F5.
          The error codes suggest that a login failure is occurring between the IIS application and the AD LDS LDAP instance.  When I try to connect to the load-balanced LDAP DNS name using the “LDP.exe” LDAP client, I also get an authentication error.  However, when I connect to the local server address, authentication works. 
        3. As was the case when I first installed ECTS, the web.config files required a bit of hand-tuning to get services working correctly:
          http://www.uvm.edu/~jgm/wordpress/?p=112
          Once again, I had to modify the “ADAMConnectionString” in the web.config of each IIS site to reflect the actual DNS name of the load-balanced AD LDS servers.  I had installed ECTS using a different name initially, and the ECTS un-installation script did not clear out these values.
        4. I did find it necessary to deactivate all ECTS site collection features, re-activate them, then perform an IIS reset before my existing ECTS management pages would work again.  This seems pretty par for the course when removing and re-installing SharePoint solutions.
    16. Install Globally-deployable solutions from the “fab 40” application template.  If you deploy a web front end into an existing farm, the files required by these features will get transferred automatically.  However, when building a new farm, we need to install them manually.  Currently required “server admin” templates are:
      • ApplicationTemplateCore
      • ChangeRequest
      • ContactsManagement
      • DocumentLibraryReview
      • EventPlanning
      • HelpDesk
      • InventoryTracking
      • ITTeamWorkspace
      • Knowledgebase
      • LendingLibrary
      • PhysicalAssetTracking
      • ProjectTrackingWorkspace
      • RoomEquipmentReservations
      • Procedure:
        • stsadm -o addsolution -filename <file_path><template_name>.wsp
        • stsadm -o deploysolution -name <template_name>.wsp –allowgacdeployment
    17. Install radEditor:
      1. Install ASP.NET Ajax for .NET 2.0, version 1.0
      2. Follow the Ajax configuration for SharePoint configuration guide found here:
         http://sharepoint.microsoft.com/blogs/mike/Lists/Posts/Post.aspx?ID=3
      3. Install radEditor using the included instructions.
      4. Copy radEditor configuration files from an existing production server to the new server:
        1. In the directory:
          ”C:Program FilesCommon FilesMicrosoft Sharedweb server extensionswpresourcesRadEditorSharePoint[versionString]RadControlsEditor”
          Backup the existing ListCon
          figFile.xml, ConfigFile.xml, ListToolsFile.xml, and ToolsFile.xml files.  Replace with versions customized for UVM.  Note that the MOSS LinkManager tool does not work in WSS.  Also note that when editing list content that does not support “Enhanced Content”, the first toolbar in the ListToolsFile.xml will be removed… in past versions, the toolbar named “enhancedTools” was removed.
        2. Copy the files ListConfigFile.xml, ConfigFile.xml, ListToolsFile.xml, and ToolsFile.xml to all other nodes in the cluster.
        3. perform an IISRESET.
        4. Update ONET.xml files in the “12” hive to activate the radEditor feature by default in all new sites (see ONET.xml template files on the prod web front end for examples).
          NOTE that the “RadEditor for non-IE browsers” and “RadEditor for IE” features have been collapsed into one unified feature.  Update the ONET.XML files accordingly!  (note that the feature ID for the main RadEditor List editor has not changed… only it’s name is different.  We did not have to insert a new default feature ID, but we did need to remove the “RadEditor for IE” feature because it is no longer present in RadEditor MOSS.)
        5. Run:
          stsadm –o uninstallfeature –name RadEditorFeatureRichHtml.
          This “Web Content Management” feature is not supported in WSS, so we may as well remove it to avoid confusion.
        6. Deactivate and then re-activate the radEditor features on at least one existing site, and test functionality.
    18. Install “Smiling Goat” Feed Reader (RSS/ATOM subscriber web part)
      1. This will require Feed Reader users to update their web parts!
    19. Install SharePoint Training Kit:
    20. Tune web application settings to match production server:
      1. Set upload limits for files (also need to set IIS “maxAllowedContentLength” in each web.config to be longer than the SharePoint upload limit.  See http://support.microsoft.com/kb/944981/en-us for details.)
      2. Set time zone
      3. Set allowed/disallowed MIME types
      4. Set quota templates for new sites
      5. Config incoming/outgoing email settings
      6. Config site expiration/auto-deletion.
      7. Edit the footer of the “welcome” email message starting at line 5219 of “core.en-US.resx” in the 12-hive “resources” folder.  Replicate on all web front ends in the farm.
    21. Configure f5 load balancers:
    22. TEST TEST TEST:
      1. Test each feature on both web front ends by alternately disabling the nodes in the load balancer configuration.
      2. Test again with both nodes enabled… watch for authentication and session persistence issues.
      3. Test all features in each access mapping – SP, SPLite, and Partner… web.config file variations could cause problems!
    23. Consider Deployment of “Group Board 2007” and “Sample Master Pages”: