Newbie needs guidance on weather data schema

mugginsjm · May 14, 2024, 8:12am

I have the classic weather… wind, rain, humi, temp etc.
I have measurement as weather and then the fields. Problem is that in experimenting (I use node red as a transition bed and use local storage) I cannot delete specific field data. I’ve tried using tags but to no avail. Is there a better way to be able to add and delete specific data in this learnig stage.
thanks

scott · May 14, 2024, 2:31pm

@mugginsjm Really, the best way to do this as you’re learning is to be willing to throw all the data you’ve collected away. Deleting specific points can be a difficult process in InfluxDB and, depending on what version of InfluxDB you’re using, may not be possible.

The main recommendation that I’d give is write your experimental data to a database/bucket, but when you want to change the schema of the data, delete the database/bucket and write the newly structured data to a new database/bucket.

What version of InfluxDB are you using?

mugginsjm · May 14, 2024, 2:50pm

Hi Scott. I’m using InfluxDB v2.7.6
I also plan to add measurement for electric and gas etc. I imagine that there are times when I would want to delete rogue values from say a rain reading. As it stands I think I cannot delete individual data from rain for example.

scott · May 14, 2024, 3:10pm

That’s correct, you won’t be able to do this. InfluxDB v2 supports deleting data by time range, measurement, and tag values. You cannot delete data by field. More information in the Delete data documentation.

To remove rogue values, you’d have to “sanitize” your data in some way. There are a couple of different ways you could approach this.

If you’re using Telegraf to collect the data, I think you can filter out points that exceed a threshold before writing the data to InfluxDB.
If the data is already written to InfluxDB, you can use a task to query the raw, unsanitized data, filter out the rogue values, and then write the filtered dataset to a new bucket.

mugginsjm · May 14, 2024, 3:14pm

Would it make sense for me to put all the different fields eg rain,temp humi,electric,gas,water-pressure in their own measurements and then I could delete rogue values.by date

scott · May 14, 2024, 3:35pm

It’s totally up to you how you want to structure these. If you’re querying the data using Flux, querying multiple measurements isn’t a problem. InfluxQL can query multiple measurements as well, but you won’t be able to join data across measurements like you can with Flux.

So it really comes down to what your query workload is going to be. If you’re going to query each type of data exclusively, it doesn’t really matter if they’re in separate measurements. But if you’re going to perform any type of calculations across each of these metrics, it’ll be easier if they are in the same measurement.

The best guidance I can give is to structure your data so that each measurement is “homogenous,” meaning all points in that measurement have the same tag set. Not the same tag values, just the same tag keys.

How prone do you think your data will be to rogue values?

mugginsjm · May 14, 2024, 3:46pm

I simply want to graph each of these fields over time. So I’m thinking
measurement: temperature
field (temperaure: 12.3) etc

I’ve been trying this for a while and sometimes a sensor fails and outputs a very high or low value and I’ve had to delete all the data in that measurement in a time space.

mugginsjm · May 14, 2024, 3:49pm

Just thinking… if I add a timestamp field to each reading, can I zap that measurement using that timestamp

Pooh · May 14, 2024, 4:04pm

InfluxDB is a Time Series DataBase, so every value put into it automatically
has a timestamp. It’s either the value you specify when inserting the data,
or the current timestamp when the data gets inserted.

I think you would probably only cause confiusion by adding a “timestamp” field
which may then easily be different from the Timestamp value.

Antony.

mugginsjm · May 14, 2024, 4:31pm

thats true. I have been rebuilding my RaspPI with docker and before I continue I really want to get the schema correct. I’ve had to dump my last 18 months of weather data because I couldn’t export/import it.

NickN · May 18, 2024, 8:07pm

Another route that you may find easier (I did) is to route your data through node-RED first. If you’re not familiar with node-RED, it’s well suited to processing flows of data, and it’s largely a visual/no-code tool, so it’s quite easy to work with once you understand its approach.

When I was first using InfluxDB I made a lot of mistakes (and probably still do), but I found two helpful use cases for node-RED.

Eliminating bad/erroneous values. It’s pretty easy to check the data in a node-RED flow and delete, modify or just flag values that look odd. A node-RED flow for this might load/read the data from whatever source, check the value against a range, discard the value if it is outside the given range, structure the data for a database write, and pass the data to Influx. I currently have a flow that reads weather data from an API, throws away bad values, structures the data and writes it to Influx.
Restructuring previously captured data. You can export a text-based format, like CSV, from Influx and easily create a flow in node-RED that will read values from the CSV and restructure them before writing to Influx. Let me give you an example: in the beginning I didn’t understand cardinality or the use of tags. I put all of my data in a single measurement and was storing some data as a field that should have been a tag. When I realized my mistake, I dumped out the data I had collected and used node-RED to reformat it to the better structure I had come up with (multiple measurements and a standard set of tags). I personally found this to be a lot easier than trying to do things in Telegraf, but keep in mind that I’m really not a database guy.

I’m primarily using Influx 1.8, not 2.x, but I think the above will still work. There are plenty of node-RED examples/installs for Raspberry Pi, including Docker containers, so it should be easy to try.

mugginsjm · May 19, 2024, 7:16am

Hi Nick
Thank you for the comprehensive response. As it happens I do use Nodered and have just recently migrated to Docker. Prior to Docker I had 18 months worth of data, but despite best efforts I could not export/import the old data. Yes I could trap rogue data in nodered but I’m constantly adding stuff and injecting test data. I miss the older influx where you could just use simple SQL and delete data by field and timestamp. I’m not sure I understand tags. I have eg measurement as weather and then fields for temp, humi,rain etc…no tags. I have measurement electricity and then fields for hour-leccy and day-leccy. Should I be using tags?

Topic		Replies	Views
Delete specific data InfluxDB 1	2	4415	April 8, 2022
Influx Delete Fields in OSS InfluxDB 2 influxdb , time-series , influxdata , schema , flux	0	471	May 18, 2023
Fix measurment schema InfluxDB 2	0	425	November 28, 2022
Delete _field for measurement InfluxDB 2 influxdb	1	2035	January 6, 2023
Help with query needed InfluxDB 2	2	553	April 30, 2021

Newbie needs guidance on weather data schema

Related topics