Two timestamps per event?

TomW · July 10, 2023, 5:00pm

Hi, I’m new here and new to InfluxDB. Just started reading about it recently.

I’m looking for a way to store and make sense of logs from an HLS VoD service that provides on-demand access to a two-week rolling buffer of recordings of live streams. Each event in the time series corresponds to a request/response for an HLS segment in the web access log.

What’s interesting about this is that each segment in the access log has two timestamps:

access time, and
program time, which can lag access time by up to two weeks,

and I want to be able to draw graphs of the frequency of segment requests on both time axes.

I guess I would represent those as two different measurements in InfluxDB. The access timestamps appear nearly monotonically in web logs so that should present no problem. But the program timestamps are completely out of order. The purpose of the VoD service is to allow people to listen back over the last two week wherever they choose. But I want to display the program time-series before the two weeks is over. This is the challenge.

Docs say to sort keys before writing to InfluxDB. What about timestamps? Is there a problem with writing radically out of order timestamps?

And if I were to write the program timestamps to InfluxDB in they appear in the web server access logs, will that screw anything up with performance later?

TomW · July 12, 2023, 3:12pm

I’ll tro to explain in less technical terms.

Imagine there’s a live stream that runs 24x7 that you’re interested in. Our service gives you access to the last 14 days of that live stream.

Let’s say on Wed July 12 at 19:22 you start watching but not what’s currently going out live but what went out on Mon July 10 at 09:00, then

July 12 19:22 is the access timestamp, and
July 10 09:00 is the program timestamp

of the first segment of the stream our server sends to your player. Subsequent segments tick up together until you interact with your player somehow (e.g. seeking or skipping will change the relationship between the timestamps).

So the infrastructure specialists are going to take an interest in aggregate frequency of segment request events on the access time axis, that’s what shows physical resource demand.

But the program directors behind the live streams are interested in the same events but on the program time axis to show what programs the audience played most.

In our HLS (HTTP live stream) server the media segments are sent by nginx so we use the timestamp in the nginx access log for access timestamp and we can compute the program timestamp from the URL of the segment GET requests.

So you see we can’t sort segment events by their program timestamp for insertion into InfluxDB until 14 days later and the live stream recordings are expired from the service.

Anaisdg · July 12, 2023, 4:14pm

Hello,
You don’t have to worry about cardinality and tags/fields in 3.0. If I were you I’d try using InfluxDB Cloud 3.0 or wait for v3.0.

There shouldn’t be a problem with writing timestamps that are out of order in 2.x.

I don’t think it should screw up anything with performance, but again look into 3.0 first.

TomW · July 12, 2023, 5:15pm

Thanks @Anaisdg

My preference isn’t to copy all our data into someone’s cloud (we’re a bare metal shop) but I’ll be sure to look at tech docs on what’s changing in 3.0. Is there a public schedule for OSS 3.0?

Anaisdg · July 12, 2023, 6:28pm

Hello @TomW,
Unfortunately there isn’t.
But you can sign up for updates here:

Topic		Replies	Views
Timestamp lost after influx_inspect export -> influx write (migrating from 1 to 2) InfluxDB 2	0	431	October 12, 2022
InfluxDB dropping records with the same timestamp and tagset InfluxDB 2 influxdb	1	619	April 25, 2022
Can InfluxDB 1.x write to one record in parallel?	2	438	April 17, 2023
Different query parameter values in Influx acccess log vs journalctl InfluxDB 1 influxdb , query	0	413	December 28, 2021
Why is there 3 timestamps in InfluxDB?	1	100	August 19, 2024

Two timestamps per event?

Related topics