Timestamp not being recorded - JSON v2

Hi
I am having a very hard time getting telegraf to recognise a timestamp from HTTP GETted JSON data. I get the data from an endpoint that stores it, including timestamps, and so need to also save the time to put things at the right place, as they are past data published after the measurement occurred.

The data looks like this (I omitted unnecessary fields to keep things readable)

{
    "count": 131,
    "next": null,
    "previous": "https://api.com/?page_size=1000",
    "results": [
        {
            "id": 94670,
            "url": "https://api.com/94670/",
            "name": "",
            "reference": "728732-7",
            "version": "V6",
            "latest_voltage": 222.85000610351562,
            "latest_voltage_timestamp": "2021-12-04T10:30:01Z",
        }
    ]
}

and my telegraf config looks like this (again, just relevant part)

[[inputs.http]]
  ## One or more URLs from which to read formatted metrics
  urls = [
    "https://api.com/page_size=1000&page=1"
  ]
  
  #interval = "15s"
  
  ## HTTP method
  method = "GET"

  ## Optional HTTP headers
  headers = {"Authorization" = "Token ${TOKEN}"}

  ## Data format to consume.
  data_format = "json_v2"

  [[inputs.http.json_v2]]
    measurement_name = "voltage_data"
    [[inputs.http.json_v2.object]]
      path = "results"
      timestamp_path = "latest_voltage_timestamp"
      timestamp_format = "2006-01-02T15:04:05Z"
      #timestamp_timezone = "UTC"
      included_keys = [
        "latest_voltage"
        ]
      tags = [
        "id",
        "reference"
        ]

Despite all the above, influxDB (influxcloud) saves the data with the current timestamp, so it ignores the timestamp coming from the data.
Anybody sees what I am doing wrong?

Thanks a lot in advance for any support!
fabio

I went and simplified this even further using the following input file:

{
    "results": [
        {
            "id": 123,
            "last_reported": "2020-12-04T10:30:01Z",
            "latest_voltage": 222.85000610351562

        },
        {
            "id": 124,
            "last_reported": "2010-01-01T12:00:01Z",
            "latest_voltage": 222.85000610351562
        }
    ]
}

and used the following configuration file:

[agent]
    omit_hostname = true

[[inputs.file]]
    files = ["config.log"]
    data_format = "json_v2"

        [[inputs.file.json_v2]]
            [[inputs.file.json_v2.object]]
                path = "results"
                tags = ["id"]
                timestamp_key = "last_reported"
                timestamp_format = "2006-01-02T15:04:05Z"

[[outputs.file]]

Which produced:

> file,id=123 latest_voltage=222.85000610351562 1607077801000000000
> file,id=124 latest_voltage=222.85000610351562 1262347201000000000

That looks like it was parsing the timestamp just fine. However, as soon as I moved the timestamp field to the last field in the JSON as you have in your example it stops working. For example, using the following input:

{
    "results": [
        {
            "id": 123,
            "latest_voltage": 222.85000610351562,
            "last_reported": "2020-12-04T10:30:01Z"
        },
        {
            "id": 124,
            "latest_voltage": 222.85000610351562,
            "last_reported": "2010-01-01T12:00:01Z"
        }
    ]
}

With this I get the current timestamp:

> file,id=123 latest_voltage=222.85000610351562 1638804713000000000
> file,id=124 latest_voltage=222.85000610351562 1638804713000000000

@sspaink have you seen something like this before? Is there something preventing the timestamp field from being the very last field?

Looks like this is actually a bug, I put up a PR here with a fix.

Hi @jpowers
thanks so much for this!
I have crashed my head against the wall on this for two days. I am very new to influxdb and telegraf and totally assumed I was doing it all wrong.

Also thanks for submitting thet PR so quickly, this is great. Do you have any idea of when the fix would be reflected on dockerhub? We use the dockerhub release as is. Unfortunately, we have no control over the incoming data format, so until we can update our telegraf docker we are unable to ingest this data.

Thanks again, this has definitely exceeded expectations

fabio

1 Like

Hi @fiabo,

Our next release, 1.21, is scheduled for December 22. That is the next time we will update the docker images.

Thanks!

Hi @jpowers, I think I might have bumped into the same problem, however I have control over what I send to telegraf (I am using the http listener plugin), so I have tested the position of the timestamp key with no success:

Here is what I am sending:

{
    "data" : [
        {
            "created_at" : "2021-12-06T18:19:03Z",
            "meter_id" : 14,
            "meter_external_reference": "1002848",
            "meter_external_id" : 12,
            "kwh" : 490,
            "kwh_delta" : 13,
            "core_external_reference" : "1231231",
            "core_external_id" : 0,
            "grid_id" : 52,
            "grid_name" : "Test",
            "meter_type_id" : 2,
            "meter_type_name" : "Essential Service"
        }
    ]
}

and here is my telegraf config:

[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will cache metric_buffer_limit metrics for each output, and will
  ## flush this buffer on a successful write.
  metric_buffer_limit = 10000
  ## Flush the buffer whenever full, regardless of flush_interval.
  flush_buffer_when_full = true

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Run telegraf in debug mode
  debug = true
  ## Run telegraf in quiet mode
  quiet = false
  ## Override default hostname, if empty use os.Hostname()
  hostname = ""

# Read formatted metrics from one or more HTTP endpoints
[[inputs.http_listener_v2]]
  #name_override = "Supplier"

  ## Address and port to host HTTP listener on
  service_address = ":8080"

  ## Path to listen to.
  path = "/telegraf1"

  ## HTTP methods to accept.
  methods = ["POST", "PUT"]

  ## maximum duration before timing out read of the request
  # read_timeout = "10s"
  ## maximum duration before timing out write of the response
  # write_timeout = "10s"

  ## Maximum allowed http request body size in bytes.
  ## 0 means to use the default of 524,288,000 bytes (500 mebibytes)
  # max_body_size = "500MB"

  ## Set one or more allowed client CA certificate file names to
  ## enable mutually authenticated TLS connections
  # tls_allowed_cacerts = ["/etc/telegraf/clientca.pem"]

  ## Add service certificate and key
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"

  ## Optional username and password to accept for HTTP basic authentication.
  ## You probably want to make sure you have TLS configured above for this.
  basic_username = "****"
  basic_password = "****"

  ## Data format to consume.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  #data_format = "json"

  #Parse `stationBeanList` array only
  #json_query = "data"

  #Exclude url and host items from tags
  #tagexclude = ["url", "host"]

  #Set station metadata as tags
  #tag_keys = [
  #  "meter_id",
  #  "meter_external_reference",
  #  "meter_external_id",
  #  "kwh",
  #  "kwh_delta",
  #  "grid_id",
  #  "grid_name",
  #  "meter_type_id",
  #  "meter_type_name"
  #]

 #Latest station information reported at `lastCommunicationTime`
  #json_time_key = "created_at"

  #Time is reported in Golang "reference time" format
  #json_time_format = "2006-01-02T15:04:05Z07:00"

  #Time is reported in UTC
  #json_timezone = "UTC"

  data_format = "json_v2"

  [[inputs.http_listener_v2.json_v2]]
    measurement_name = "steamaco_v2"
    [[inputs.http_listener_v2.json_v2.object]]
      path = "data"
      timestamp_path = "created_at"
      timestamp_format = "2006-01-02T15:04:05Z"
      #2006-01-02T15:04:05Z
      #timestamp_timezone = "UTC"
      included_keys = [
        "meter_external_reference",
        "kwh",
        "grid_name"
      ]

      #tags = [
      #  "meter_id",
      #  "meter_external_reference",
      #  "meter_external_id",
      #  "kwh",
      #  "kwh_delta",
      #  "grid_id",
      #  "grid_name",
      #  "meter_type_id",
      #  "meter_type_name"
      #]
  #    [inputs.http.json_v2.object.renames]
  #      id = "meter_id"
  #      bit_harvester = "bit_harvester_id"

[[outputs.influxdb_v2]]
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
urls = ["${INFLUX_HOST}"]
token = "${INFLUX_TOKEN}"
bucket = "mybucket"

when testing the above, I keep getting the current timestamp, with the value passed in the created_at key being ignored by telegraf. Any advice?

Thanks in advance!

@tommaso_girotto can you please file an issue with your example and a reference to this post, and let us know the issue number?

It seems that with your example, I can get the timestamp set correctly when the timestamp filed is added to the included_keys field. Also, note you should be using timestamp_key when inside a json_v2.object.

Here is what was working:

[agent]
    omit_hostname = true

[[inputs.file]]
    files = ["config.log"]
    data_format = "json_v2"

    [[inputs.file.json_v2]]
        [[inputs.file.json_v2.object]]
            path = "data"
            timestamp_key = "created_at"
            timestamp_format = "2006-01-02T15:04:05Z"
            included_keys = [
                "meter_external_reference",
                "kwh",
                "grid_name",
                "created_at"
            ]

[[outputs.file]]

Results in:

file grid_name="Test",kwh=490,meter_external_reference="1002848" 1638814743000000000

Hi @jpowers ,
thanks for the quick reply. I have tested your solution, and it does improve things, but the issue does not seem to be completely fixed. In fact, by including “created_at” in the included_keys and changing timestamp_path to timestamp_key, the date, minutes and hours seem to be ok, but not seconds (and I am assuming milliseconds). Here is my new config:

[agent]
  ## Default data collection interval for all inputs
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  round_interval = true

  ## Telegraf will cache metric_buffer_limit metrics for each output, and will
  ## flush this buffer on a successful write.
  metric_buffer_limit = 10000
  ## Flush the buffer whenever full, regardless of flush_interval.
  flush_buffer_when_full = true

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## Run telegraf in debug mode
  debug = true
  ## Run telegraf in quiet mode
  quiet = false
  ## Override default hostname, if empty use os.Hostname()
  hostname = ""

# Read formatted metrics from one or more HTTP endpoints
[[inputs.http_listener_v2]]
  #name_override = "Test"

  ## Address and port to host HTTP listener on
  service_address = ":8080"

  ## Path to listen to.
  path = "/telegraf1"

  ## HTTP methods to accept.
  methods = ["POST", "PUT"]

  ## maximum duration before timing out read of the request
  # read_timeout = "10s"
  ## maximum duration before timing out write of the response
  # write_timeout = "10s"

  ## Maximum allowed http request body size in bytes.
  ## 0 means to use the default of 524,288,000 bytes (500 mebibytes)
  # max_body_size = "500MB"

  ## Set one or more allowed client CA certificate file names to
  ## enable mutually authenticated TLS connections
  # tls_allowed_cacerts = ["/etc/telegraf/clientca.pem"]

  ## Add service certificate and key
  # tls_cert = "/etc/telegraf/cert.pem"
  # tls_key = "/etc/telegraf/key.pem"

  ## Optional username and password to accept for HTTP basic authentication.
  ## You probably want to make sure you have TLS configured above for this.
  basic_username = "test"
  basic_password = "test"

  ## Data format to consume.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  #data_format = "json"

  #Parse `stationBeanList` array only
  #json_query = "data"

  #Exclude url and host items from tags
  #tagexclude = ["url", "host"]

  #Set station metadata as tags
  #tag_keys = [
  #  "meter_id",
  #  "meter_external_reference",
  #  "meter_external_id",
  #  "kwh",
  #  "kwh_delta",
  #  "grid_id",
  #  "grid_name",
  #  "meter_type_id",
  #  "meter_type_name"
  #]

 #Latest station information reported at `lastCommunicationTime`
  #json_time_key = "created_at"

  #Time is reported in Golang "reference time" format
  #json_time_format = "2006-01-02T15:04:05Z07:00"

  #Time is reported in UTC
  #json_timezone = "UTC"

  data_format = "json_v2"

  [[inputs.http_listener_v2.json_v2]]
    measurement_name = "steamaco_v2"
    [[inputs.http_listener_v2.json_v2.object]]
      path = "data"
      timestamp_key = "created_at"
      timestamp_format = "2006-01-02T15:04:05Z"
      #timestamp_timezone = "UTC"
      included_keys = [
        "kwh",
        "created_at"
      ]

      #tags = [
      #  "meter_id",
      #  "meter_external_reference",
      #  "meter_external_id",
      #  "kwh",
      #  "kwh_delta",
      #  "grid_id",
      #  "grid_name",
      #  "meter_type_id",
      #  "meter_type_name"
      #]
  #    [inputs.http.json_v2.object.renames]
  #      id = "meter_id"
  #      bit_harvester = "bit_harvester_id"

[[outputs.influxdb_v2]]
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
urls = ["${INFLUX_HOST}"]
token = "${INFLUX_TOKEN}"
bucket = "test"

And here is what I am sending:

{
    "data" : [
        {
            "created_at" : "2021-12-07T15:19:02Z",
            "meter_id" : 14,
            "meter_external_reference": "1002848",
            "meter_external_id" : 12,
            "kwh" : 499,
            "kwh_delta" : 13,
            "core_external_reference" : "1231231",
            "core_external_id" : 0,
            "grid_id" : 52,
            "grid_name" : "Ogheye",
            "meter_type_id" : 2,
            "meter_type_name" : "Essential Service"
        }
    ]
}

This is the resulting time I am getting:

2021-12-07T15:19:10.000Z	

I have opened an issue here