Duplicate Data Point in InfluxDB

Hello All,

My usecase to read from a json output file wriiten by K6 (an opensource load testing tool) and write every new json to influxdb via telegraf. I have deployed two pods in k8s cluster for this purpose, one is a k6-telegraf pod and another is an influxdb pod, below is my telegraf conf file

  [agent]
    interval      = "$POLL_INTERVAL"
    omit_hostname = true
    metric_batch_size = 1000
    metric_buffer_limit = 10000
    flush_interval = "5s"

  [[inputs.tail]]
    files = ["/outputs/result.json"]
    data_format = "json"
    from_beginning = false
    path_tag = ""
    json_name_key = "metric"
    json_string_fields = ["data_value", "data_tags_name", "data_tags_scenario", "data_tags_testrun", "data_tags_workflow", "type", "data_time"]

  [[processors.starlark]]
  source='''
  def apply(metric):
    if metric.fields["type"] == "Point":
        return metric
    return None
  '''

  [[outputs.influxdb]]
    urls = ["$INFLUX_HOST"]
    database = "$INFLUX_DATABASE"
    username = "$INFLUX_USERNAME"
    password = "$INFLUX_PASSWORD"
    timeout = "5s"

I observed that telegraf is writing twice the datapoints in influxdb, I was following this issue raised earlier here
I am not able to find any viable solution to my problem
Here is the influxdb output for one of the measurement this is the case for every measurement


I have tested for similar cases also but it worked perfectly(like writing to a example.out file and push it to influxdb using tail) but only for this case
Happy to get some help regarding this

Hello @Nilesh786,
Hmm the points don’t look like exact duplicates/different timestamps. But I’m assuming you want to be overwriting the point? I’m not sure why it’s happening though. Are you using two telegraf agents deployed in both pods with the same config/reading the same json? Apologies i’m a little confused about your architecture.

Do you get this error outside of k6?

No I am using one telegraf agent only, this is happening with k6 scenario only I tried with one sample out file no issues there, but for k6 outputs only telegraf is writing twice the datapoints from the file it is tailing.

My usecase is simple, I am writing the k6 metrics to a json file and using telegraf agent to tail the file and push to influxdb, here I am running the load test using k6 and simultaneously pushing all the k6 metrics to influxdb, I want to avoid using the k6-influxdb client for its limitations.

I’m guessing what’s happening here is that Telegraf is sending the data with no timestamps so the times that get assigned to the point are the times that InfluxDB actually writes each point. That would explain the microsecond difference in timestamps. In InfluxDB, each point is uniquely identified by its measurement, tag set, and timestamp. Since the timestamps are different, InfluxDB recognizes each as a unique point.

You need to tell Telegraf to use the JSON data_time field as the time value when writing the point to InfluxDB. I’d also recommend writing some of these as tags instead of everything as string fields. The one column in there that should be a field (data_value) is numeric, so it shouldn’t be included as a string field.

Try this inputs.tail config:

[[inputs.tail]]
    files = ["/outputs/result.json"]
    data_format = "json"
    from_beginning = false
    path_tag = ""
    json_name_key = "metric"
    tag_keys = ["data_tags*", "type"]
    json_time_key = "data_time"
    json_time_format = "2006-01-02T15:04:05Z07:00"    

Thanks this works perfectly

1 Like