I’ve a messaging system where I track these events: message received, message sent and new user created.
I have billions of records in each of messages tracks and hundred of millions in users created.
Example:
All three measurements are pretty similar.
messages_received
time appCodename host received type user
---- ----------- ---- -------- ---- ----
1494000481645999900 demo1 demo.com 1 text 321000
...
What I need
I need to display the cumulative_sum of every measurement.
What I’m doing
I first use a CQ to downsample all the data to 1 hour to reduce the number of records. To later do the sum with another CQ:
Group received messages by 1h:
CREATE CONTINUOUS QUERY "cq_messages_received_hourly" ON "stats"
RESAMPLE EVERY 10m FOR 120m
BEGIN
SELECT sum("received") AS "message_received"
INTO "stats"."autogen"."cq_messages_received_hourly"
FROM "stats"."autogen"."stats_message_received"
GROUP BY time(1h), "appCodename", "host"
FILL(none)
END
Do the cumulative_sum
with entire measurement
I had to use 10000d
(older than my first record) because it seems to only work that way. If I resample for a few hours, it continue to reset to the sum.
CREATE CONTINUOUS QUERY "cq_total_messages_received_hourly" ON "stats"
RESAMPLE EVERY 10m FOR 10000d
BEGIN
SELECT cumulative_sum(sum("message_received")) AS "total_messages"
INTO "stats"."autogen"."cq_total_messages_received_hourly"
FROM "stats"."autogen"."cq_messages_received_hourly"
GROUP BY time(1h), "appCodename", "host"
FILL(none)
END
The thing is that it seems to write a lot of records every 10 minutes:
I’m doing it right? Do you know a better solution?
Now it doesn’t seems to be a problem, but I expect to quadruplicate the numbers in the upcoming months only.