I’m trying to design an alert to tell me if cron activity deviates from normal. I know this sounds super vague, let me try to explain how my system works:
We have numerous clients that we receive FTP files from on different schedules: some daily, some multiple times per day. We collect things like number of files downloaded and time to process the files.
I want an alert that can tell me things like:
- normally we receive 50 files at 8am from client X everyday, but it’s 10am and we haven’t received any files.
- normally we receive 10 files at 10am, 12pm, and 2pm, but it’s 4pm and we’ve only seen the 10am and 12pm files.
The struggle I’m having is that it’s not a constant stream of data: for most of the day we receive no files, then at a very specific time (different to every client), we download files. So I’m struggling to see how this can work with something like deadman because a number of 0 without any context is not anomalous, however a number of 0 after 10am is for certain clients.
All the ideas that’ve come to my head so far include aggregating the entire day’s worth of data and using sigma / stddev to calculate if it’s less or more than normal. The problem with this approach is we don’t find out for, at worst, an entire day–way to late for us to take action before our client notices.
My goal is to develop an entire series of this style of alert: have we missed any scheduled FTP downloads? have we failed to process as many files as we usually do? is the processing taking longer or shorter than normal?
Maybe these are all slightly different. But the real alert I’m trying to focus on is to be notified when a client’s files have not been processed when usually they have. I fully believe Kapacitor has the capability to do what I want, but I’m drawing a blank on how to design the alert.