CQ query consumes 27 gb of ram and restarts over and over

Hi, we experimented with the following CQ:

CREATE CONTINUOUS QUERY "hist_data_365d" ON "benchmark_hist_data" RESAMPLE EVERY 15m FOR 365d BEGIN
  SELECT mean("Grd_Prod_Pwr_Avg") AS "mean_Grd_Prod_Pwr_Avg",
         mean("Prod_TotAccumulated_TotActPwr") AS "mean_Prod_TotAccumulated_TotActPwr",
         mean("Nac_Temp_Avg") AS "mean_Nac_Temp_Avg",
         mean("Amb_WindSpeed_Avg") AS "mean_Amb_WindSpeed_Avg",
         mean("Amb_WindSpeed_Max") AS "mean_Amb_WindSpeed_Max",
         mean("Amb_WindSpeed_Min") AS "mean_Amb_WindSpeed_Min",
         mean("Blds_BladeA_BLPitchAngle_Avg") AS "mean_Blds_BladeA_BLPitchAngle_Avg",
         mean("Blds_BladeA_BLPitchRef_Avg") AS "mean_Blds_BladeA_BLPitchRef_Avg",
         mean("Blds_BladeA_CntlVolt_Avg") AS "mean_Blds_BladeA_CntlVolt_Avg",
         mean("Blds_BladeB_BLPitchAngle_Avg") AS "mean_Blds_BladeB_BLPitchAngle_Avg",
         mean("Blds_BladeB_BLPitchRef_Avg") AS "mean_Blds_BladeB_BLPitchRef_Avg",
         mean("Blds_BladeB_CntlVolt_Avg") AS "mean_Blds_BladeB_CntlVolt_Avg",
         mean("Blds_BladeC_BLPitchAngle_Avg") AS "mean_Blds_BladeC_BLPitchAngle_Avg",
         mean("Blds_BladeC_BLPitchRef_Avg") AS "mean_Blds_BladeC_BLPitchRef_Avg",
         mean("Blds_BladeC_CntlVolt_Avg") AS "mean_Blds_BladeC_CntlVolt_Avg",
         mean("Cont_Hub_Temp_Avg") AS "mean_Cont_Hub_Temp_Avg",
         mean("Cont_Top_Temp_Avg") AS "mean_Cont_Top_Temp_Avg",
         mean("Cont_VCP_Temp_Avg") AS "mean_Cont_VCP_Temp_Avg",
         mean("Count_ConsumedPower") AS "mean_Count_ConsumedPower",
         mean("Count_ConsumedReactivePow") AS "mean_Count_ConsumedReactivePow"   
  INTO "downsampled-365d"   
  FROM "signals"   
  GROUP BY time(996m),"AssetId" 

We want to downsample existing data into smaller views…
Are we on the wrong path here?

@Fredrik_Skeel_Lokke Yup! You are pulling the entire year of data every 15m and doing aggregations over 20 fields. I would advise running this query less frequently and only running it over the GROUP BY time() interval. Can you explain what you think this is going to do?

We are trying to run the query on data ingested a year back and not just newly ingested data

@Fredrik_Skeel_Lokke For that you should use a bare SELECT ... INTO ... query without the CQ. The CQ is meant to be a way to continuously run INTO queries so that you are computing those new values as they come in. I would also suggest splitting up the query into multiple queries, perhaps running over a month of data at a time.

Thank you, I understand that. But since the language is declarative I was hoping that the query/runtime might be smarter e.g by doing the work in smaller batches…

A problem with only looking a month back for us is updates to older than a month timestamped data…

@Fredrik_Skeel_Lokke So what I’m saying is to do the initial processing of historical data with SELECT ... INTO ... and then process the current data with a continuous query.

Yes, and thank you very much for the feedback! I’m trying it out right now. It looks like the initial processing has to be broken own into multiple queries to avoid excessive memory usage…

1 Like

@Fredrik_Skeel_Lokke Yeah thats what I’ve seen doing large migrations. Because we compress the data so thoroughly it can expand to be quite large even though on disk it is small.