Issue in loading the data

Amin_Mohebi · July 31, 2018, 8:28am

We are using spark to write csv files into Influxdb. We are actually writing with 5-10k points per batch. The total size of our dataset is about 500 GB which contains 1 billion records. Our influxdb is community edition which has been installed on a Azure VM with 56 GB memory. I can confirm it takes 1-2 hours to write whole records. But, we have faced some issues on querying the data.

First, we tried to use two tags named Tarrif_Code and Tarrif_description. The spark job took two hours to be completed and it was done with a successful status. I monitored the job throughout the process and I can confirm that all data were written to DB , but the problem is that when I try typing select count(*) there was only 2 million rows.

Second, I tried to change tag keys to one and different column which is national meter identifier like this : “2001007868”, I ran the spark job again, and then tried to run the same query or select * from measurement limit 5. The query failed and returned this error : ERR: %!s()

Third, I tried to write a smaller volume of data like 40 million rows, I did the same as step two and noticed that this time, I can see the number of rows and my query is working properly.

The questions here are :

1- Why the number of rows was decreased significantly, became 2 million instead of 1 billion when I chose Tarrif_Code as Tag.
2- Do you think this is memory issue or bad database design or even community edition limitation as we cannot run any query against 500 GB data
please note we have 10k series only

Topic		Replies	Views
Influx Writes are incomplete on my instance?	1	150	February 1, 2024
querying is slow and high memory usagewith TB of data	1	1367	May 24, 2018
InfluxDB-v2.6.1: How to reduce disk write load	1	779	February 18, 2023
Influx DB storage influxdb	1	319	November 24, 2023
Poor Write Performance on Bulk Import Store	3	2009	October 2, 2018

Issue in loading the data

Related topics