How to monitor Microsoft Dynamics NAV 2017 services via PowerShell scripting, Telegraf logparser, InfluxDB and Grafana

This post explains how to monitor a set of Microsoft Dynamics NAV 2017 service instances using a nifty custom power shell script, the “Get-NAVServerInstance” command, custom log parsing with Telegraf logparser/tail, a cool GROK query, InfluxDB of course, and the sexy Grafana dashboard tool.

Prerequisites:

  1. Grafana installed.
  2. InfluxDB installed.
  3. InfluxDB datasource created in Grafana and connected succesfully to your InfluxDB instance.
  4. A server with existing Telegraf client installed (either your existing Dynamics server, or a central monitoring/log parsing server)

Note: I assume you will use the same server for point 4 above. However, it’s recommended to ship to write the log files to a central server, and ship them to a monitoring server for parsing to avoid impacting your production server)

Steps

Get your Microsoft Dynamics NAV instance states written to a log file:

  1. Check the command “Get-NAVServerInstance” works in your “Dynamics NAV 2017 Administration Shell” on your server.
  • Open your NAV admin power shell, just search for “power” in your start menu and “Dynamics NAV 2017 Administration Shell” should come up. Click on it.
  • Type "Get-NAVServerInstance" on the terminal and press enter.
  • You should see a list of all your Dynamics instances and “Running” or “Stopped” for each one.
  • Only continue if you are sure this command works.
  1. Create an empty file c:\navcheck\scripts\navcheck.ps1. This is the PowerShell script that will check your Dynamics service states. (you can plase this script anywhere, this is just my recommended path and naming)

  2. Populate the first script with the pre-loading scripts from “Dynamics NAV 2017 Administration Shell”. To do this copy the contents of C:\Program Files\Microsoft Dynamics NAV\100\Service\NavAdminTool.ps1 into your navcheck.ps1 file. This will ensure that all the required pre-requisites are loaded so that your script can run the command Get-NAVServerInstance. This command does not work in a standard PowerShell terminal.

  • If this path doesn’t work for you find the location of your NavAdminTool.ps1 file. Do this by right clicking on the NAV admin shell icon, go to location of file, right click again and check what parameter is being passed to the power shell. It will show you the path.

After you have copied the contents of NavAdminTool.ps1 to navcheck.ps1, delete the lines you don’t need such as the command list and signature block. You should be left with this:

#navcheck version 0.0.3
$errorVariable = $null

# Import-Module or register Snap-in, that will enable side-by-side registrations
function RegisterSnapIn($snapIn, $visibleName)
{
  if(Get-Module $snapIn)
  {return}

  $nstPath = "HKLM:\SOFTWARE\Microsoft\Microsoft Dynamics NAV\100\Service"

  $snapInAssembly = Join-Path (Get-ItemProperty -path $nstPath).Path "\$snapIn.psd1"
  if(!(Test-Path $snapInAssembly)) { $snapInAssembly = Join-Path (Get-ItemProperty -path $nstPath).Path "\$snapIn.dll" }

  # First try to import the module
  Import-Module $snapInAssembly -ErrorVariable errorVariable -ErrorAction SilentlyContinue
  
  if (Check-ErrorVariable -eq $true)
  {
    # fallback to add the snap-in
    if ((Get-PSSnapin -Name $snapIn -ErrorAction SilentlyContinue) -eq $null)
    {
        if ((Get-PSSnapin -Registered $snapIn -ErrorAction SilentlyContinue) -eq $null)
        {write-host -fore Red "Couldn't register $visibleName"
            write-host -fore Red "Some cmdlets may not be available`n"}
        else
        {Add-PSSnapin $snapIn}
    }
  }
}

# Check if there is any error in the ErrorVariable
function Check-ErrorVariable
{return ($errorVariable -ne $null -and $errorVariable.Count -gt 0)}

# Register Microsoft Dynamics NAV Management Snap-in
RegisterSnapIn "Microsoft.Dynamics.Nav.Management" "Microsoft Dynamics NAV Management Snap-in"

# Register Microsoft Dynamics NAV Apps Management Snap-in
RegisterSnapIn "Microsoft.Dynamics.Nav.Apps.Management" "Microsoft Dynamics NAV App Management Snap-in"
  1. Append the following lines into the navcheck.ps1 script. This is the part that you need to check the status of your Dynamics services and output the status to a log file.
  • This script will output two types of log files. Once logfile is a snapshot at the latest point in time (hostname.log), and the other is a appended log file of all service states over time (services.log).
  • You will notice that the Get-NAVServerInstance command returns an array that we can loop through to process each instance separately.

#Set the correct character encoding otherwise Telegraf input.tail can’t read our text file format:
$PSDefaultParameterValues[‘Out-File:Encoding’] = ‘utf8’

#Find and store this computers hostname:
$thisHost = $env:computername

#Setup some paths we will use to write the log files to:
$log_root = “c:\navcheck\log”
$services_logfile = “$log_root\services.log”
$thisHost_logfile = “$log_root$thisHost.log”

#Get the current date and time including the timezone, in ts-httpd format (to be used later by Telegraf)
$dateTimeNow = Get-Date -Format “d/MMM/yyyy:HH:mm:ss zz00”

#Start checks:
#Get the results of the NAV instance check in a array
echo “LOG,$dateTimeNow,$thisHost,Starting checks on $thisHost” >> $services_logfile
$list = Get-NAVServerInstance

#Write to snapshot log (that we are starting with this host)
echo “LOG,$dateTimeNow,$thisHost,Starting checks on $thisHost” > $thisHost_logfile

#Loop through the results:
foreach ($navRes in $list)
{
#Get the current time
$dateTimeNow = Get-Date -Format “d/MMM/yyyy:HH:mm:ss zz00”

    #Get the SHORT service name from result:
    $displayName = $navRes.DisplayName
    $nameArray = $displayName.Split("[") 
    $serviceName = $nameArray[1].Split("]")[0]

    #Get the state from result:
    $navState = $navRes.State

    #Write to central log for tail purposes:
    echo "RESULT,$dateTimeNow,$thisHost,$serviceName,$navState" >> $services_logfile

    #Write to host snapshot log for quick manual checks:
    echo "RESULT,$dateTimeNow,$thisHost,$serviceName,$navState" >> $thisHost_logfile

}
#Write to host snapshot log
echo “LOG,$dateTimeNow,$thisHost,Finished checks on $thisHost” >> $thisHost_logfile
echo “LOG,$dateTimeNow,$thisHost,Finished checks on $thisHost” >> $services_logfile

  1. Schedule the script to run at a regular schedule with “Task Scheduler” (I chose every 10mins)
  • Give your task a name “navcheck”
  • Set “Run whether user is logged in or not
  • Create a trigger “Daily”. Chose a start time. Set setting to “Daily”. Recur every 1 days. Repeat task every 10mins. For a duration of 1 Day. Stop task if runs for longer than 30mins.
  • Make sure “Enabled” is ticked. (is by default)
  • Set “Program/Script” to:

C:\Windows\system32\WindowsPowerShell\v1.0\PowerShell.exe

  • Set “Add Argument (optional)” to your script (important include the quotes):

“&‘c:\navcheck\scripts\navcheck.ps1’”

  • Set start in to:

c:\navcheck\scripts

  • Finish your task. Click on it in the task list, and click run on the scheduled task (so that you can test it).
  1. Go to your log path that you selected in step 4 (c:\navcheck\log) and check if service.log contains your scripts latest log lines.

NOW YOUR LOG FILE SHOULD BE WRITING AND YOU CAN SETUP YOUR TELEGRAF LOG PARSER.

SETUP TELEGRAF INPUT AGENT:

  1. Create a new telegraf.conf file (telegraf_log.conf) for your new input agent. Use your normal defaults, and to start use this input and output agents:

Output Agent:

[[outputs.file]]
files = ["stdout"]
data_format = "influx"

Input Agent:

[[inputs.tail]]
  ## file(s) to tail:
  files = [
  "c:\\navcheck\\logs\\services.log"
  ]
  from_beginning = false
  name_override = "navcheck"
  data_format = "grok"
 
  grok_patterns = ["%{CUSTOM_LOG}"]
  grok_custom_patterns = ''' 
CUSTOM_TZ [+-][0-9]{4}
CUSTOM_TIMESTAMP %{MONTHDAY}/%{MONTH}/%{YEAR}:%{HOUR}:%{MINUTE}:%{SECOND} %{CUSTOM_TZ}
CUSTOM_LOG RESULT,%{CUSTOM_TIMESTAMP:timestamp:ts-httpd},%{WORD:log_entry_hostname},%{WORD:log_entry_service},%{WORD:log_entry_state}    
'''
  1. Run the new Telegraf agent (from command line for testing). It’s important to use the console and debug parameters when testing.

c:
cd “\Program Files\telegraf”
telegraf --console --debug --config “c:\Program Files\telegraf\conf\telegraf_log.conf”`

  1. Watch the output. If all works, update your output plugin for InfluxDB (if it didn’t work go back to debugging)

[[outputs.influxdb]]
urls = [“http://yourinfluxdbhost:8086”]
database = “monitoring_results”
skip_database_creation = true
username = “yourusername”
password = “yourpassword”

  1. Run the Telegraf command from 8 again and check your InfluxDB tables (for example “SHOW METRICS”)

  2. Schedule the new Telegraf service (telegraf_log) to run as a separate new service:

telegraf --service install --service-name=telegraf_log --service-display-name=“My Telegraf” --config “c:\Program Files\telegraf\conf\telegraf_log.conf”

NOW TELEGRAF IS SENDING THE DATA TO YOUR INFLUXDB INSTANCE AND YOU CAN SETUP YOUR NEW DYNAMICS MONITORING DASHBOARD:

SETUP GRAFANA DASHBOARD:

  1. Create the Grafana Dashboard that will show Dynamics state (link to example be added soon)

  2. To see all your log entries click create new panel and chose “Table” visual style. Select your InfluxDB data source and pull in your data from InfluxDB using the “navcheck” metric using a query similar to the following. This will show your log history.

A3LvEsTYkc9PojOttcpYIcgGpd0.png

Example Grafana/InfluxDB Query

SELECT log_entry_hostname as "Hostname", log_entry_service as "Service", log_entry_state as "State" FROM "navcheck" WHERE $timeFilter ORDER BY time DESC

  1. To see a GREEN/RED block per Dynamics service. Do the following for each service:
  • Create a Stat widget for a specific Dynamics Service.
  • Create a new panel, resize it small and select the “Stat” visual style.
  • Make the title the hostname and name of your service, something like myhostname.myDynamicsService001.

5kbh9fIGB8frqjSBfFF8AvWCjqp.png

Example Grafana/InfluxDB Query

SELECT count("log_entry_state") FROM "autogen"."navcheck" WHERE ("log_entry_hostname" = 'YOURHOSTNAME' AND log_entry_service = 'YOURDYNAMICSSERVICENAME') AND $timeFilter AND log_entry_state = 'Running' AND $timeFilter

  • Set the “Query Options / Relative time” value to “35m” so that you get at least 3x checks data for the Stat.
  • Set the query for the stat to count the number of “Running” states in the last time filter.
  • Remember to set YOURHOSTNAME and YOURDYNAMICSSERVICENAME to your values:
  • Now you can tell the Stat widget what to do based on the count. Add the following “Value Mappings” to the Panel: 3 to 10 = “OK”, 1 to 2 = “TO CHECK”, 0 = “STOPPED”.

dZUGswoIyDHZT3M24nL5DZWskEf.png

  • Setup the “Thresholds” to show correct colours for your block: Set colour for value 3 = green, 1 = orange, and base = red.

tlUc3Y9vmJCPJQjb6LFyCqfc0Vk.png

You should now have a Dynamics widget:

image

Future:

  • Adding log file rotation paramters for Telegraf to services.log

Disclaimed: There are some plugins for Power Shell available that can push updates directly from PowerShell to a Prometheus or other timeseries database, for example Prometheus PowerShell Script 0.0.2. I elected not to use any untested 3rd party plugins since this is required for our production server. Instead I chose this log file solution.

Note: Any recommendations of how to better or remotely check a Microsoft Dynamics NAV 2017 service is up and running is welcome. Please drop suggestions in the comments.