1 database vs many


#1

Hi,

I’m working on a deployment where, for purposes of initial sizing we are estimating 10,000 devices reporting once a second time stream data. Each device has a unique ID and we are retaining all 1 second data for 1 month, then rollups 1 / minute for 6 months. Generally I’m planning on 1 database and each series includes a tag which is the device unique identifier. With this amount of data does 1 database make sense (with the tag on unique id), or should I plan for a seperate database per device?

A secondary question, would a single telegraf instance typically be used to batch all these requests or should I look into multiple instances of this as well?

Thank you for any advice.


#2

Hi! I would stick with one instance of the database. A single node is open source, but the clustered version is Influx Enterprise (paid product). Besides that, I would look at the hardware recommendations, as that will be more likely to throttle the db performance. I would also keep cardinality in mind. As the number of unique tags grow, you will want to enable TSI.

You can almost always run a single instance of Telegraf per system, and it really complicates upgrades when you have multiples.


#3

Thanks, very helpful, another question on telegraf. Is influxdb_listener and line format better/faster for high volume then json via http_listener_v2? I’m having good results with http_listener_v2, but want to be sure I’m not missing the performance impact or why I should be using influxdb_listener to write to influx via telegraf.


#4

We don’t have much data to confirm which is faster or more efficient. I recommend using the influxdb_listener to let Telegraf serve as a proxy/router for the InfluxDB HTTP write endpoint. Any general metrics you want to send via HTTP should go through http_listener_v2.