Reporting & Maintaining MSP & MSSP tooling (Plus KPIs)

Introduction

If you're in the MSP / MSSP world we all know there are a million tools you use on a daily basis. We may have people or departments that are responsible to make sure these tools are deployed, maintained, and configured, but how can we be sure the job is actually being done? How do we bring visibility and accountability instead of just trust? How are those tools being verified today? Would we know if we have broken or undeployed or misconfigured tools? De we have visibility and process?

First we measure, but what do we measure?

Tool deployments come down to three areas of focus:

Deployment status (installed or not)
Health (services running / up to date)
Configuration

If we have systems of reporting, verification, and process around these three areas, we can have command over our tooling. Lets start with deployment!

1: Deployment

At the most basic level of tool verification, we need to know if the tool is installed or not. This may be your EDR, DNS protection, SIEM agent, whatever...point is, a thing should be installed, and we need to make sure it is.

RMMs / Intune give us a list of installed applications, but they rarely come with the concept of what is not installed. There's various methods between various RMMs and dashboarding / reporting tools to get to the answer here, but in my experience, the easiest and most consistent way to do this is just give us a value in a custom field per given tool. For example, maybe we want to ensure Cisco Umbrella is installed to all endpoints, so we'd have a custom field in the RMM labeled ciscoUmbrellaStatus and it would equal 1 (true) or 0 (false).

Most all RMMs will have a concept of this custom field that then allow us to take action depending on the value. For example, maybe we make a script to install the missing application to the endpoint if the value is 0. I'm not covering install scripts in this article but it may be a topic I cover later.

The important part here is to record the value, then output it in a live dashboard (popular options are Brightgauge or MSPBots, or any other platform that allows you display custom fields in a live dashboard). If you don't have live dashboarding then your next thing to do immediately is to go get one.

Following our scenario from above, we'd have a new reporting screen for Tool Verification, and we'd have a big number for Missing Cisco Umbrella. This should always be 0 and we make 0 green, and anything more than 0 red. This means green good red bad.

Rinse and repeat for ALL essential tools.

Deployment: how do I verify if the tool is installed (technical implementation)?

I wrote it for you in powershell, and as a bonus it includes looking for applications installed as user only (not system) which your RMM does not show.

Function Get-ApplicationInstallStatus {
    <#
    .SYNOPSIS
    Get-ApplicationInstallStatus

    .DESCRIPTION
    By default, this will search both the system (all users) and user install paths to verify an application
    is installed. RMMs (at the time of this writing) do not report user only installed applications, so this
    is handy to find those user installed applications!

    .PARAMETER AppName
    Use the name of the application exactly as seen in add/remove programs inside of single quotes

    .PARAMETER SystemInstallsOnly
    This is the equivalent of only applications that were installed for "all users" or "everyone" at install

    .PARAMETER UserInstallsOnly
    Many modern applications install directly to the user registry hive. Without defining this param, it will
    not be obvious if the application was found to be installed at the system level, or the user level. Use
    this parameter if you need to determine if an app is installed for this user only.

    .EXAMPLE
    C:\PS> Get-InstalledApp -ApplicationName 'Google Chrome'
    C:\PS> Get-InstalledApp -ApplicationName 'Google Chrome' -SystemInstallOnly $true
    C:\PS> Get-InstalledApp -ApplicationName 'Microsoft Teams' -UserInstallsOnly $true
    #>


    [CmdletBinding()]

    param(
        [Parameter(Mandatory = $true)]
        [string]$AppName,
        [Parameter(Mandatory = $false)]
        [boolean]$SystemInstallsOnly,
        [Parameter(Mandatory = $false)]
        [boolean]$UserInstallsOnly
    )

    $installed = @()

    if (!$UserInstallsOnly) {
        # Check the system location, also known as "install for everyone" locations
        $installed += New-Object psobject -prop @{
            sys32 = Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\*" | Where-Object { $_.DisplayName -eq $AppName }
            sys64 = Get-ItemProperty "HKLM:\SOFTWARE\WOW6432Node\Microsoft\Windows\CurrentVersion\Uninstall\*" | Where-Object { $_.DisplayName -eq $AppName }
        }
    }

    if (!$SystemInstallsOnly) {
        # Check the single user install locations
        New-PSDrive -PSProvider Registry -Name HKU -Root HKEY_USERS | Out-Null
        $installed += New-Object psobject -prop @{
            user32 = Get-ItemProperty "HKU:\*\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\*" | Where-Object { $_.DisplayName -eq $AppName }
            user64 = Get-ItemProperty "HKU:\*\SOFTWARE\WOW6432Node\Microsoft\Windows\CurrentVersion\Uninstall\*" | Where-Object { $_.DisplayName -eq $AppName }
        }
    }
        

    # any of these are true then we know the app was found to be installed, so output $true
    if ($installed.sys32 -or $installed.sys64 -or $installed.user32 -or $installed.user64) {
        return $true
    } else {
        return $false
    }
}

Just have your RMM engineer use this for each of your tools and output to the respective custom field you made.

Deployment: summary

Custom field for each tool
Fill each field with 1 or 0 value based on output from script
Make a number graph on your dashboarding platform where 0 is green and greater than 0 is red

2: Health

I've seen lots of MSPs / MSSPs make the mistake of only monitoring the installed status of a tool, but unfortunately that's just one piece of the puzzle and if this is you...I can almost guarantee you have inoperable tools you don't know about (probably a lot). Almost always by the time you have tools installed across thousands of endpoints you come up with a list that are missing services, are wildly out of date so not actually functioning correctly or at all, and overall the tool can't be accomplishing its job. Lets get some systems in place to verify these conditions are healthy in a way that is easy to manage and actionable.

Health: version control

The version number of all applications is always already available via your RMM or Intune so lets not over engineer. Use the data available, and make a pie graph in your dashboarding platform where each slice of the pie is a version, and limit the application that pie represents to the single tool we're verifying health for. I like this visual because we don't necessarily know what the latest version is at any given point, but we do know that a healthy state would be 2 pieces of pie (representing 2 total versions in our environment). This would mean everything is latest version, or on second to latest version which is absolutely reasonable. As soon as we see more than 2 slices, we know we have tools not being updated.

Health: services (technical implementation)

We need to make sure all services exist for the tool that's installed. Maybe you're thinking this sounds a little over the top and it's just a given that of course all of the services are there...but this has kicked me in the butt enough times over the years that I had to add it to my go-to verification processes. Most of the time this seems to happen during a botched update or partially failed install (the same thing mostly), but no matter the case, we have to watch for it! I wrote a script for that too:

Function Get-InstalledService {
    <#
    .SYNOPSIS
    Get-InstalledService

    .DESCRIPTION
    This is to verify that a given list of services exists, and optionally that they are all running.
    If all services are present, and running (if you specify to check that), it will output $true. If
    any services are missing, or in the stopped state (if you specify to check that), it will output $false

    .PARAMETER ServiceNameArray
    Use a single service name, or a list of services in single quotes separated by commas. Note this 
    the name of the service, not the display name of the service.

    .PARAMETER VerifyServiceRunning
    Define this as $true if you wish to also verify the services don't only exist, but are also running.

    .EXAMPLE
    C:\PS> Get-InstalledService -ServiceNameArray 'swprv'
    C:\PS> Get-InstalledService -ServiceNameArray 'swprv','uhssvc' -VerifyServiceRunning $true
    #>


    [CmdletBinding()]

    param(
        [Parameter(Mandatory = $true)]
        [array]$ServiceNameArray,
        [Parameter(Mandatory = $false)]
        [boolean]$VerifyServiceRunning
    )

    if (!$VerifyServiceRunning) {
        try {
            # If any service isn't found, throw
            Get-Service -Name $ServiceNameArray -ErrorAction Stop | Out-Null
            return $true
        } catch {
            return $false
        }
    } else {
        try {
            # If any service isn't found, throw
            Get-Service -Name $ServiceNameArray -ErrorAction Stop | Out-Null
            $ServiceNameArray | ForEach {
                # Verify all services are running
                $status = Get-Service -Name $_ | Where { $_.Status -ne 'Running' }
                if ($status) {
                    # If any are not running, throw
                    throw
                }
            }
            return $true
        } catch {
            return $false
        }
    }
}

If all of the services exist it will return 1, if the services don't all exist it will return 0. I have it built in to check the running status of the service too you can turn on optionally but I'd argue that's a different kind of monitoring we should have a pulse on from general "service monitoring" where this is a health check of the tools installed component state. It is different and gets dicy because of things like the status being collected during an endpoint shutdown, endpoint startup, upgrade, or various other conditions that are just (in my opinion) outside of this scope (but still important!).

Record the output of this script into a custom field labeled ciscoUmbrellaHealth.

At this point it would be a good idea to trigger a reinstall script of the tool when services are missing whether that's an in place upgrade or a rip and replace (depending on tool) but don't forget to consider the potential reboot consequences!

Health: summary

Versions:

Make a pie graph in your dashboarding platform
Each slice represents version
Limit to endpoints that have been online in the last 30 days
Limit to the single application we're looking for (Cisco Umbrella for example)
More than 2 slices is bad, 2 or less is good

Services:

Run my script once a day and record the value to ciscoUmbrellaHealth (obviously change the name per tool)
Make a number graph on your dashboarding tool where 0 is green and greater than 0 is red.

Configuration

This is the hardest part and I won't be able to make all of these for you, but I can give you examples of what they should look like. Lets start with a framework:

Daily
Weekly
Monthly
Quarterly

All tools need some kind of configuration / general health verification, but generally those frequencies depend on the type of tool, and even your specific use of the tool. In general, I find it best to categorize our maintenance by frequency and have a list of things to be done in each of those. Keep in mind lots of tools do not need daily checks, but some do (backups for example), so it's okay to leave some of these blank! Lets make a rough outline for our Cisco Umbrella example:

Daily
- None
Weekly
- Ensure Roaming Client is installed to all endpoints (use your dashboard)
- Ensure Roaming Client version is no more than 2 pie slices (on your dashboard)
- Ensure all Roaming Client health statuses are healthy (on your dashboard)
Monthly
- None
Quarterly
- Review global whitelist to ensure no client specific requests have made their way here (apply to single client that requested instead)
- Review per client whitelist for suspicious or unexpected items then verify with client
- Ensure policies are applied to all tenants
- Ensure all client WAN IPs have been added to the Cisco Umbrella portal and are showing good status (green meaning DNS forwarders have been configured)

You can put these on reoccurring calendar events blocking specific chunks of time to get it done, or create recurring tickets in your PSA. Important part is make sure it's scheduled, the person responsible is aware, familiar, and assigned, and make sure it gets done!

Important KPIs

Number of endpoints missing xx tool, should be 0
Number of endpoints with unhealthy status, should be 0
Number of endpoints outside of the 2 highest slices of the version pie

Wrap up

By the time you're done here you should have a dashboard for each of your tools that shows:

Tool Missing (should be 0)
Tool Unhealthy (should be 0)
Tool versions (no more than 2 pie slices)

Track these week over week as you adopt them as KPIs for deploying and maintaining these tools. Once you have all of this in place, you can finally have a real sense of control and command over your tool deployment, maintenance, and ongoing configuration processes.

If this was helpful please let me know in the comments!