I’m just getting started with InfluxDB and am not sure if my data is a good fit for it. I’m not sure I understand the system well enough to figure that out.
I am logging phase data comparing frequency standards (a device under test vs. a reference) over time, generated from a single measurement setup. Each data run for a DUT may last hours to weeks with a datapoint typically once per second. I need to identify each run so I can link it to the device that’s being tested and export the associated data for further processing. In my manual system, I included comment lines at the top of the file identifying the DUT, the reference, the start date/time, the measurement rate, and any other special info. Then each measurement writes a line to the log with (typically) timestamp and phase value.
What’s the best way with InfluxDB to handle data like that? Is there a way to associate metadata to a time series without including it all along with each measurement (which would take much more space than the data itself)?
Hi @n8ur ,
Great to hear you are checking out InfluxDB! I am not an expert in your domain so my apologies for the butchery of your field. Let’s see if you can sort your data into tags and fields:
Tags:
DeviceID - This will be to identify your device the samples are coming from
reference - Your associated reference to the DUT being performed
start-time - time your test started
measurement rate - note if you store this as a tag then it will be considered a string value. depending on if you want to use the value as part of calculations you might want to consider this to be a field instead. Up to you.
Though I do not consider your data high velocity. If you like you could also consider a more hybrid approach. Storing the metadata within a relational DB and your time series data in InfluxDB. Flux supplies the ability to query a relational database and pull that data into InfluxDB. Check out the docs for this great example with an IoT sensor: Query SQL data sources with InfluxDB | InfluxDB OSS 2.0 Documentation
Thank you, Jay, for the quick and friendly reply! To give you an idea of what I’m doing, below is an example of the most complex metadata I’m likely to have (this is generated by one of the tools I use; others have much less header info).
The semicolon lines are commas that provide helpful context. The STR lines are metadata with the parameters of the run and the fields should always follow the same format. The data are the values on the lines following (in this case, the individual lines don’t need to be timetagged because the metadata tells us the start time and the sample rate (in this case 1000 samples/second, but that’s unusually high; more often it’s once per second).
If we can handle this format, anything else I need will be simpler.
If this is too complex for tags, your idea of a relational database could work. For example, I could have the filename be a unique serial number and build a record in the other database with the metadata, and then in InfluxDB just include that serial number to link the data to the metadata.
HI @n8ur ,
I am so sorry for the late reply I did not get notified about this post. Anytime you don’t hear from me. Just @ me in the post. I will start to convert this now
Now it’s my turn to apologize for slow response! My excuse is that I’ve been working with a simpler dataset and trying to get that going. Thanks so much for putting the schema together. To answer your questions:
The “Time/Date” indicates either the start or the finish time – I need to double check that. It’s not generated on every sample.
The TIC field indicates the total number of samples to follow. The Sample Interval tells the interval between samples. So here, the data is 864000 points generated 0.1 second apart – ie, one day of data at a 10 Hz rate.
Naively we can just plot that data and get some visually interesting graphs, but the primary use of a data set like this is to subject it to a specialized statistical algorithm (Allan Deviation) that provides something like a standard deviation for samples taken at different intervals. At some point, I’ll learn how to use Influx and/or Grafana to do that analysis and generate plots. In the short term, I’ll pull the data from the database to reconstruct the input file so it can be fed into the existing analysis tools.
Yes, everything down to and including the TIC line is metadata and I suppose would go into tags. Here my lack of Influx familiarity shows – as you can see, the data sets can be quite large; I’ve routinely had 5 million or more points. If all those tags are literally included with each data record, it seems like the storage required would be enormous. In my limited knowledge, this seem slike something that a simple relationship database could do well – use a key field that links all the data points in one measurement run to a table containing all the metadata for that run. I don’t know if I’m barking up the wrong tree about this…
Thanks so much for your help. I have one other question, but I’ll post it as a new topic (if I can’t find an answer) rather than have scope creep here.