Limiting number of series returned with flux

Milan_K · March 15, 2024, 9:18am

Hello, I am building an analytics platform that uses influx as it’s main db.
There is a dsl layer in front of the data that lets users query the db easily and render and customize their charts.
The issue I am having currently is that there is no way to limit the number of series returned by
the query.

I saw that there is SLIMIT for influxQL, and need something similar or a workaround.
There is no way to limit this on the API, since it’s an analytics platform, and users can’t know what data is available.
Additionally, I would like to return the max number of series if user’s query would go over the limit.
So they would not get an error message like too many series, but get the max configured.

I am using influx cloud v2 hosted on AWS.

Any help would be appreciated.

Anaisdg · March 21, 2024, 8:15pm

Hello @Milan_K
Welcome! And I apologize for the delay I was out of office :(((
Thanks for asking your question.
That’s so cool that you’re building an analytics platform! I’d love to learn more about it. What are your anticipated use cases and developer personas? Is this for fun or for work?

I’m assuming you need a limit for Flux?
If so you can use:
limit() function | Flux Documentation.

In flux data is automatically grouped by series so it works exactly like SLIMIT.

Milan_K · March 25, 2024, 11:08pm

It is for work. The idea is that we aggregate some raw data and optionally preprocess it to provide some useful metrics for analytics. We expose a mini dsl layer that is used for query building, so the users don’t really know what’s behind it.
This allows easy customization of charts and doing EDA.
We have a pipeline that is constantly pushing new data into the system.

Since users are querying the db as they like, we need to have some limits in terms of number of series that. Maybe I misunderstand what SLIMIT does, but limit in flux allows us to limit the number of data points in each series, but ideally I would like to limit the number of the series themself.
So the use case is, if for some query the users would get 100k lines, I would like to limit it to 500 max per query, so the data is still rendered, but we don’t stress the db that much.
We can handle this on the application side, but ideally I would like to avoid it for now.

For better understanding:
We look at some events to create our db:

tags [shop: s1, branch_office: b1, item: i1], fields[price: 10, quantity: 5] 
tags[shop: s1, branch_office: b1, item: i2], fields[price: 100, quantity: 1]

tags[shop: s1, branch_office: b2, item: i1], fields[price: 9.5, quantity: 5]

tags[shop: s2, branch_office: b1, item: i1], fields[price: 8, quantity: 10] 
tags[shop: s2, branch_office: b1, item: i3], fields[price: 1, quantity: 10]

We would aggregate this data based on time, so that we can get metrics on different levels:

metrics for items aggregated (avg price, total quantity, avg quantity…)
metrics for branch offices (total sales for branch office…)
metrics for shops

With the dsl, you can easily define what you want to render on the charts:

Price of item1 → a line for each item1 price (in each branch office of a shop)
Quantity of item1 in all shops → a line for each shop
Sales for all branches of a shop → a line for each branch of the specified shop
…

Allows great flexibility on different levels of granularity. The example is made up, so might not make complete sense, but I hope it paints a clear enough picture.
Is there a way to achieve this?

Anaisdg · March 26, 2024, 10:49pm

Hello @Milan_K
In InfluxQL
SLIMIT <N> returns every point from N series in the specified measurement.

LIMIT <N> returns the first N points from the specified measurement.

I’m not sure that there is a clean way to limit the number of series other than to do the following:

query your data here 
|> limit(n: 1) limit the results from each series so you just return the last point from each series
|> group() group all your series into one table
|> limit(n: the number of series you want to return)

This will effectively limit your series but also limit the result from each series to 1 result.

Milan_K · March 27, 2024, 1:42pm

Yeah, is it possible to filter the series returned by the query based on their table id?
So if I have 10 tables, and they have a specific id, I would add another filter at the end of my query to filter out all points based on that id?

|> filter(fn: (r) => r._table_id < 5)

Couldn’t make it work, wondering if it’s possible?

Anaisdg · March 29, 2024, 7:29pm

Hello @Milan_K,
What error are you getting? Yes you can filter by any column value.
If you have a column _table_id with int values then that should work. You might want to check types maybe you need 5.0?

Milan_K · April 2, 2024, 11:25am

I don’t have a table_id column, I was hoping the tables in flux would have something I could filter on.
In the end, I had to group all data and limit the number of points and split them up in the application later on.
Having the ability to limit the number of series natively would be very beneficial in our use case.
Thanks.

Anaisdg · April 4, 2024, 9:28pm

Hello @Milan_K
Thanks for sharing your solution. Unfortunately flux is maintenance mode

Topic		Replies	Views
Buckets list capped at page limit InfluxDB 2 influxdb , time-series	1	943	May 26, 2020
Sum to Influx query InfluxQL vs Flux InfluxDB 2 flux	1	311	May 17, 2021
Series limit issue	0	641	June 26, 2019
Better control of query size limit in Influx? Store influxdb	0	1964	October 4, 2018
InfluxDB: Limit select/into query response influxdb	0	1011	May 8, 2018

Limiting number of series returned with flux

Related topics