Do we have a Deadman if entire stream stopped for all host?

Hello,

Do we have a Deadman if entire stream stopped. For all host not just one?

Thanks

Hello @Ashish_Sikarwar ,
No, I don’t believe we do. However maybe we can write a custom check for a deadman? I’m not sure though… @mhall119 do you have any suggestions/ideas?

Hello @Ashish_Sikarwar ,

Assuming you mean stream to Kapacitor, Kapacitor has deadman function. But it seems to be placed inside a TICK script thus particular to a stream/measurement, not all streams.
I am contemplating using 2 Telegrafs, the remote Telegraf sending the data we need to Kapacitor. And a local Telegraf which purpose is to check if the remote end(s) is alive. Being local, co-located with Kapacitor, means it has the benefit of detection for example problems with the network. This doesn’t probably detect all scenario’s why streams are silent, or give very good information, but its a step and simple to implement.

Potential scenario’s to check are host down, remote telegraf down, or network issue.

Best Regards,
Menno

Hello @Anaisdg Sorry for the delay and thank you for your response.
Thank you for suggestion @Menno
I’ve also got another method which actually turned out to be more simpler than I thought.
For entire stream check whether it is dead or not - I did not group the data over host and it worked :slight_smile:

Not grouping over host will prevent singular alert for each host, plus it will be a helpful check for Global agent status, an indication of either network down or global issue with all agents… etc.

Thanks
Ashish

1 Like

Hello @Ashish_Sikarwar,
Can you please share your Flux script with us? I think other users could really benefit…
Thanks.

Sorry for the delay!

The following script will help others in getting alerts if all the agents are down or No incoming Stream.

dbrp "telegraf"."autogen"

var groupBy = ['objectname']

var message = '{{ if eq .Level "OK" }}All Telegraf Agents are Up and Running{{else}}All Telegraf Agents are Down{{ end }}.'

var messageField = 'message'

var env_class = 'Production'

var data = stream
    |from()
        .measurement('any_measurement')
        .groupBy(groupBy)
        .where(lambda: "Env_Class" == env_class)
    |window()
        .period(5m)
        .every(5m)
        .align()
    |where(lambda: isPresent("your_fieldname"))
    |last('your_fieldname')
        .as('value')
    |deadman(0.0, 10m)
        .details('Stream_Deadman')
        .message(message)
        .id('Telegraf_Agent_Health')
        .idTag('alertID')
        .levelTag('level')
        .messageField(messageField)
        .stateChangesOnly()
        .post()
        .endpoint('My_Stream')
        .captureResponse()
1 Like