Wide rows vs short rows



Hi there, I have a question regarding a choice between new schema that we want to put into a production database. We are using version 1.2 of influxdb with the tsm1 engine.

Should we store fewer, wider rows or more, shorter rows? The same data would be in both and the only difference be that the shorter rows would make queries easier to write.


I have been struggling with this, too. For a while I used really wide rows, but my measurements don’t all come in at the same time so each one of my rows would be one real value and 30 null values. This seemed bad to me (though I don’t know for sure that it is - do the nulls take up space?).

I switched it to a design where I have one ‘value’ field and you use tags to differentiate what value is what. That’s what you are calling ‘short’ rows (I would use the word ‘narrow’). That does indeed make querying easier for simple queries at least. But I could not figure out how to combine values the way I want any more.

Let me give you a concrete example. I am fetching power data from a main building power meter. Within the building there are several generators (PV arrays, fuel cell, wind). To figure out what the net usage of the building is I have to subtract all the generators from the main meter. With wide tables that’s relatively easy, assuming I’m OK with grouping by a time range.

So I think in cases where you might reasonably want to do aggregations or math across multiple values (within a single measurement) you really are better off going with the wide schema.


I will post the answer if anyone replies on stackoverflow