Hey,
I have a lot of financial data in minute resulution, and I want to resample to daily and save it in another bucket for quick access. I have a query doing just that, which works great if I run it with filters that caps the amount of data it uses, such as only on a few “tickers” (stock names, that are a tag), or only for a small range of time.
But when I let it go on all the data, the memory consumption maxes out and influx finally crashes.
What would be best practices to get around this? Do I need to make a scrip that runs the query on smaller chunks of date ranges, or can I make my query better?
Here’s my query:
vars = {start: 2022-08-01T00:00:00Z, stop: 2022-08-15T00:00:00Z}
DATA =
from(bucket: "stock_data_collector_v2")
|> range(start: vars.start, stop: vars.stop)
|> filter(
fn: (r) => r._field == "o" or r._field == "h" or r._field == "l" or r._field == "c" or r._field == "v",
)
EndTime =
DATA
|> filter(fn: (r) => r._field == "c")
|> window(every: 1d)
|> max(column: "_time")
|> keep(columns: ["_start", "_time", "_measurement", "ticker"])
o =
DATA
|> filter(fn: (r) => r._field == "o")
|> window(every: 1d)
|> first()
|> keep(columns: ["_field", "_value", "_start", "ticker"])
h =
DATA
|> filter(fn: (r) => r._field == "h")
|> window(every: 1d)
|> max()
|> keep(columns: ["_field", "_value", "_start", "ticker"])
l =
DATA
|> filter(fn: (r) => r._field == "l")
|> window(every: 1d)
|> min()
|> keep(columns: ["_field", "_value", "_start", "ticker"])
c =
DATA
|> filter(fn: (r) => r._field == "c")
|> window(every: 1d)
|> last()
|> keep(columns: ["_field", "_value", "_start", "ticker"])
vol =
DATA
|> filter(fn: (r) => r._field == "v")
|> window(every: 1d)
|> sum()
|> keep(columns: ["_field", "_value", "_start", "ticker"])
j1 =
join(tables: {EndTime, o}, on: ["_start", "ticker"])
|> drop(columns: ["_start"])
|> group(columns: ["ticker"])
|> to(bucket: "resample_test", org: "Tlng")
j2 =
join(tables: {EndTime, h}, on: ["_start", "ticker"])
|> drop(columns: ["_start"])
|> group(columns: ["ticker"])
|> to(bucket: "resample_test", org: "Tlng")
j3 =
join(tables: {EndTime, l}, on: ["_start", "ticker"])
|> drop(columns: ["_start"])
|> group(columns: ["ticker"])
|> to(bucket: "resample_test", org: "Tlng")
j4 =
join(tables: {EndTime, c}, on: ["_start", "ticker"])
|> drop(columns: ["_start"])
|> group(columns: ["ticker"])
|> to(bucket: "resample_test", org: "Tlng")
j5 =
join(tables: {EndTime, vol}, on: ["_start", "ticker"])
|> drop(columns: ["_start"])
|> group(columns: ["ticker"])
|> to(bucket: "resample_test", org: "Tlng")
Thanks!