Inputs.win_eventlog to capture unexpected shutdown events, from_beginning parameter

I noticed the win_eventlog input plugin doesn’t catch events in the first ~2 minutes after unexpected computer shutdowns (eventIDs like 6006 or 41). Which makes sense, as telegraf isn’t immediately running after restarts.

In my setup the plugin runs at the same interval of “60s” as the agent.

Reviewing the win_eventlog plugin configs options, I noticed the from_beginning parameter:

  ## When true, event logs are read from the beginning; otherwise only future
  ## events will be logged.
  # from_beginning = false

Assuming that maybe this would try to access older events, I tested setting the parameter to true, but I got this output:

2023-11-28T16:28:56Z E! [telegraf] Error running agent: Error loading config file C:\Apps\Telegraf\telegraf_windows.conf: plugin inputs.win_eventlog: line 37: configuration specified the fields ["from_beginning"], but they weren't used

Anyone ever used from_beginning? What am I doing wrong?

Or is there another method that would allow telegraf to capture these 6006/41 events?

Telegraf 1.23.0 (git: HEAD 806dc283)

This usually means you are running a version that is old, which does not have this configuration option.

You are right. Using v.1.28 instead of v.1.23 I could get it to recognize the parameter. And it really seems to mean that it’ll go through ALL win-events of the past (endless scroll, while doing a --test run)

So, that won’t do what I want, unless the plugin has a memory of which past events it already uploaded?

There is bookmarking built into the plugin, but I think that is used to ensure you only get future events after reading things.

@srebhan is state persistence something available in windows event log? not clear to me from the readme.

For anyone interested: I meanwhile solved this with a dedicated python script that is scheduled to run after a computer restart, that then looks for all the shutdown/restart related events (41,1074,6006,6008) and uploads them directly to influxdb.

from datetime import datetime
import win32evtlog # requires pywin32 pre-installed

from influxdb_client import InfluxDBClient, Point, WritePrecision
from influxdb_client.client.write_api import SYNCHRONOUS

# You can generate an API token from the "API Tokens Tab" in the UI
token = "******"
org = "******"
bucket = "******"

server = 'localhost' # name of the target computer to get event logs
logtype = 'System' # 'Application' # 'Security'
hand = win32evtlog.OpenEventLog(server,logtype)
flags = win32evtlog.EVENTLOG_BACKWARDS_READ|win32evtlog.EVENTLOG_SEQUENTIAL_READ
total = win32evtlog.GetNumberOfEventLogRecords(hand)
GO_BACK = 200
cnt = 0

shutdown_eventids=[ 41,1074,6006,6008 ]


with InfluxDBClient(url="******", token=token, org=org) as client:

	write_api = client.write_api(write_options=SYNCHRONOUS)

	while cnt<GO_BACK:
		events = win32evtlog.ReadEventLog(hand, flags, GO_BACK)
		if events:
			for event in events:
				cnt = cnt + 1
				if event.EventID & 0x1fff in shutdown_eventids:

					e_num = event.EventID  & 0x1fff

					msg = "python script pulled event after restart, EventTime: " + str(event.TimeGenerated)

					data = event.StringInserts
					if data:
						for d in data:
							msg = msg + " / " + d

					point = Point("win_eventlog") \
					.tag("host", "******") \
					.tag("Computer", "******") \
					.tag("Channel", logtype) \
					.tag("Source", event.SourceName) \
					.tag("EventID", e_num ) \
					.tag("Level", event.EventType) \
					.field("Message", msg) \
					.time(datetime.utcnow(), WritePrecision.NS)

					write_api.write(bucket, org, point)

					print('EventID:', e_num)

	client.close()

hello my 2 cents:

the events are tagged with the time stamp, each time you reboot the machine telegraf will send the events to the influxdb db.

but that won’t generate multiple cloned entry, the entries are identical. So it will rewrite the incident on DB.
Good point the data is not loss, neither duplicated.
Bad point if you restart all your machine at the same moment there will be an io storm on the influxdb
eventully depending of the volume of data.

if you have 10 machines your DB will handle it, if you have 10k machines that will be harder to absorb.

now rebooting 10 machines at the same moment is something, but being able to restart 10k machines at the same moment it’s far more complex (usually you take few hours to do that)

The data in the DB, influxdb got a retention time for the data, every reboot older than the retention won"t stay. If i’m not wrong, it will write the data, and retention process will remove it.

that’s not optimised yes for sure, but if you have data in the event log older than 60d and your retention is 30d, you will guard in definitive only the last 30d.