Optimizing query for all data


TL;DR I find that the time spent retrieving all data associated to a serie seems strangely long, and I wonder if there is not something fishy in the way data is being internally handled.

Context :

  • I am creating a database with one measurement, with one serie, with half a million points containing each one integer value, using minute precision.
  • I am querying all data with a single http request
  • The time spent is about 2 seconds, which seems a lot to me given the small number of data.

In order to get more information, I used the profiling tool (pproof) as indicated on influxdb contributing page. I discovered that :

  • around 20% of the elapsed time wad due to http transfert
  • nearly all the remaining time was use in the stream function of one (or more) Iterator
  • it seemed that one iterator was a MergeSortIterator, and that a significative amount of time was used building this MergeSortIterator using the Go Heap interface.

There is of course the (highly plausible) possibility that I am misreading the code. But if this is not the case, this is why I am puzzled : series are supposed to be stored already sorted wrt time, rigth? So why is there a sort operation when I am just retrieving all data of the serie?

I hope this is clear enough, if no I can provide exact scripts to reproduce my observations.

I’d be pleased to have some answers or advices to investigate further the matter.