hello, I have a problem that I would like to understand, sometimes an influx reading request takes a lot of time.
I developed this little code to explain my problem, but I have this problem in productions with my real data
db_name = 'testBase'
inf_client = DataFrameClient(database=db_name)
inf_client.drop_database(db_name)
inf_client.create_database(db_name)
df = pd.DataFrame(data=list(range(100)),
index=pd.date_range(start='2014-11-16',
periods=100, freq='H'), columns=['0'])
inf_client.write_points(df, db_name)
duration_data = {}
for i in range(0, 50):
start_time = time.time()
data = inf_client.query("select * from testBase where time>='2014-11-16 00:00:00' and time<='2014-11-16 10:00:00'")
duration = time.time() - start_time
duration_data[i] = duration
data_df = pd.DataFrame(duration_data.items(), columns=["index", "duration"])
del data_df['index']
data_df.plot()
Hi @spyfox - welcome to the community! It is hard to give a specific response without much more detail, but in general, InfluxDB needs to load memory mapped files. It takes time for the OS to pull these in, but once resident accessing the same data again is faster. There’s no promise the OS keeps the data resident either because of other demands. It is odd that iteration ~4 is the slow one; how consistent is your graph if you run it 100s of times? I assume the y-axis units are seconds? ~50ms for the slowest example? While this example query is interesting, I think we would be better served trying to optimize your production query if you’re willing to share it.
Sorry- I meant if you run ~25 iterations quickly and then wait say 60 minutes and run 25 iterations of the query again, how consistent is it that the 4th query in particular is the slowest one? Not 200 iterations quickly.