Hi, I’m new here and new to InfluxDB. Just started reading about it recently.
I’m looking for a way to store and make sense of logs from an HLS VoD service that provides on-demand access to a two-week rolling buffer of recordings of live streams. Each event in the time series corresponds to a request/response for an HLS segment in the web access log.
What’s interesting about this is that each segment in the access log has two timestamps:
- access time, and
- program time, which can lag access time by up to two weeks,
and I want to be able to draw graphs of the frequency of segment requests on both time axes.
I guess I would represent those as two different
measurements in InfluxDB. The access timestamps appear nearly monotonically in web logs so that should present no problem. But the program timestamps are completely out of order. The purpose of the VoD service is to allow people to listen back over the last two week wherever they choose. But I want to display the program time-series before the two weeks is over. This is the challenge.
Docs say to sort keys before writing to InfluxDB. What about timestamps? Is there a problem with writing radically out of order timestamps?
And if I were to write the program timestamps to InfluxDB in they appear in the web server access logs, will that screw anything up with performance later?
I’ll tro to explain in less technical terms.
Imagine there’s a live stream that runs 24x7 that you’re interested in. Our service gives you access to the last 14 days of that live stream.
Let’s say on Wed July 12 at 19:22 you start watching but not what’s currently going out live but what went out on Mon July 10 at 09:00, then
- July 12 19:22 is the access timestamp, and
- July 10 09:00 is the program timestamp
of the first segment of the stream our server sends to your player. Subsequent segments tick up together until you interact with your player somehow (e.g. seeking or skipping will change the relationship between the timestamps).
So the infrastructure specialists are going to take an interest in aggregate frequency of segment request events on the access time axis, that’s what shows physical resource demand.
But the program directors behind the live streams are interested in the same events but on the program time axis to show what programs the audience played most.
In our HLS (HTTP live stream) server the media segments are sent by nginx so we use the timestamp in the nginx access log for access timestamp and we can compute the program timestamp from the URL of the segment GET requests.
So you see we can’t sort segment events by their program timestamp for insertion into InfluxDB until 14 days later and the live stream recordings are expired from the service.
You don’t have to worry about cardinality and tags/fields in 3.0. If I were you I’d try using InfluxDB Cloud 3.0 or wait for v3.0.
There shouldn’t be a problem with writing timestamps that are out of order in 2.x.
I don’t think it should screw up anything with performance, but again look into 3.0 first.
My preference isn’t to copy all our data into someone’s cloud (we’re a bare metal shop) but I’ll be sure to look at tech docs on what’s changing in 3.0. Is there a public schedule for OSS 3.0?
Unfortunately there isn’t.
But you can sign up for updates here: