Slower query time compared to SQL Server

Hello everyone,

My team has the following issue. We currently have a relational database with around 40 million entries - they are all entries from tags (TagId, TagValue, TimeStamp). We decided to move to a TS database to improve performance as the numbers keep going up.

After importing the 40m values to Influx, we have tried the following query:

SELECT * FROM dump WHERE time > 'some_date' AND TagName = 'tag_name'

This query runs 3 times slower in InfluxDB than in SQL Server. I tried a lot of different queries, but all of them come slower in InfluxDB and I can’t figure out why. The database has a default retention policy.
The measurement has two fields - Int and Float and one tag - TagName.

The SQL Server and InfluxDB are running on the same hardware in the same network.
Any suggestions are welcome. Thank you.

EDIT: The data I am querying is for the last month.

Hi Gele,

We’d be quite interested to understand the outcome of your research, because it seems that we are encountering just the same issue in our environment.

Some assumptions here:

  • From your comment I assume that you are using the free tier of Influx DB (as you said - you are using the same machine).
  • Another thing I assume is that by saying “one tag” you actually mean “tag key”, right?
  • Storage type shall be irrelevant. We all know that InfluxData is optimized for SSDs, but I do not believe it is fair for anyone to point that out, provided that your(and our) goal is to measure the performance Influx/SQL on equal hardware.

Clarifications necessary:

  1. First of all - it is interesting how do you read the InfluxData. With Elasticsearch we have had the case long ago where the response from the engine was fast, but most of the time was spent in deserialization of the JSON to objects. What we did was to evaluate a couple of engines and picked the best, which made the difference.

  2. Second - how many records/rows does that query return for you? Obviously SQL returns the same number, but it is good to know here

  3. Third - Where do you measure the time (Code, CLI, REST service directly) and what is the time measured (if anyone has had similar amount of data retrieval, it’d be easy for him to compare)

  4. Fourth - How many tag values do you have?

  5. Fifth - Have you tried running performance monitor on the machine while the query is running? Can you identify a bottleneck - RAM, CPU, Disk? Please note the amount of RAM consumed by the SQL server. We all know it is quite hungry for RAM.

@jackzampolin, Man do you have an idea?

Hello Ivo_Andreev,
Thank you for your response.

We are using the free tier and yes, by tag I meant exactly the tag key, thanks for the heads up.

Clarifications:

  1. I have done a couple more tests. Using C# to query the Influx database gives me around the same time as using the CLI. I have too thought that the JSON serialization could be an issue, but as of my last tests I do not think that is it. Using curl to query the database results in a lot slower query time.

  2. They both return the same amount, around 3.5 million.

  3. I mentioned a little of this in (1), I will clarify a little more. Measuring using C# library called InfluxData.NET and using the CLI gives me around the same query time, with C# being a little slower, but having in mind that it’s using the REST api and needs serializing and deserializing and network traffic I think it seems OK.

  4. The number of the tag values are around 250.

  5. The machine on which the DBMS’ are running has 16GB of RAM and a Quad-Core Xeon CPU. The SQL Server uses a lot of RAM, yes, but there’s plenty of it left. As I am looking at its usage, it does not seem to struggle, so I don’t think the hardware specifications are an issue here.

Thank you for your interest.
Looking forward to more suggestions.

Hello, it’s me again.

I’ve came up with an explanation and as I think, also a solution:
I was having 3 times slower query in influx than in SQL Server.

I managed to reduce it to 2 times when I used chunking using the HTTP api (chunk size of 10000 is working good so far).

Still, it’s 2 times slower!

What I didn’t mention earlier is I was using a HDD. I read Influx requires an SSD, but I didn’t have a SSD at hand at that time. I thought the performance ratio would be around the same using a HDD.

Here is where I was wrong.
The SQL Server has a very little performance impact when moving it to a SSD, but the Influx times have reduced dramatically. The Influx queries now run around the same time as the SQL Server (1:1).

Seems the SQL Server relies more on the RAM than on the hard drive.

TL;DR: Don’t do benchmarks with influx on a spinning drive, use a SSD - always! Query the HTTP api in chunks.

1 Like

This is a very hard comparison to make. The query cost models for SQLServer and InfluxDB are very different. The storage models are different, the indexing models are different, the compression models are different…

At the base, though, InfluxDB assumes SSDs and does not work as hard to eliminate random disk I/O as SQLServer typically does. (SQLServer includes a few different internal storage engines - so even this is hard to generalize).

Your performance on InfluxDB being different on HDD vs. SSD is very much expected.

Thanks for sharing your experiences and your performance tips.