Best way to process deal with "large" data sets


I have a streaming app, that generates statsd like statistics. Every couple minutes it will probably generate a data set around 50,000 rows. Its essentially, a bunch of tags and a couple values with a timestamp, so easy to convert to influx format.

My question is, whats the best way to get this to influx… i figure my options are :

  1. Just send unbatched web requests across the network (probably too slow)
  2. Send batches of 5000 across the network
  3. Send UDP messages to telegraf (1 per row) and let telegraf deal with batching (can it keep up?)
  4. copy everything to a file and use the -import command for influx

Is there best practice for this?


@David_Cohen options 2 and 3 are your best best. We advise batches of between 5k-10k field values per batch. Also telegraf can definitely keep up. I would suggest using the socket_listener if you go that route.