Hello!
Let’s say, I have the following data structure and following state of data
shop_measurement
TIMESTAMP, CUSTOMER_NAME
('March 1', 'Mike'),
('March 1', 'Yusuf'),
('March 1', 'John'),
('March 2', 'Ajay'),
('March 2', 'Mike'),
('March 2', 'Anna')
('March 5', 'Emily')
('March 5', 'Anna')
('March 5', 'Emily')
Let’s say, it represents a simple counter that calculates the fact that a person with a certain name entered a shop. Each time a name enters a shop I get a record in InfluxDB.
What I would like to build is a histogram showing how many unique names entered a shop each date. However, the definition of “unique” is not within the date only, but the definition of “unique” is as well within the scope of all previous dates before the date in question.
To explain better the result I would like should be following:
('March 1', 3) -- because Mike, Yusuf, John entered a shop for the first time ever, no previous entries are available
('March 2', 2) -- even though we have 3 entries for March 2 we have seen Mike already on March 1, that is why his entry is not unique, we do not calculate him
('March 3', 0)
('March 4', 0)
('March 5', 1) -- even though we have 3 entries for March 5 we have seen Anna before (on March 2) and Emily entered a shop twice same day
I know how to do in in “SQL” with ROW_NUMBER() and PARTITION tricks. But in InfluxQL I am stuck. The best I could do so far
SELECT
DATE_BIN_GAPFILL(INTERVAL '24 hours', time, '1970-01-01T00:00:00Z'::TIMESTAMP) AS time,
COUNT(*)
FROM 'shop_measurement'
WHERE time >= now() - interval '7 days' AND time <= now()
GROUP BY 1
ORDER BY 1
This gives me histogram with calculation of all shop entries each date in the interval. But how to move from here to what I need I have no idea.
Any help, please? : )