Docker - influxdb high cpu use

Jose_Martinez · October 8, 2024, 10:29pm

Hi, I am trying to create Candlestick from data I enter from binance, they are 1m timeframe candles.
When querying the data it does not take more than 0.07s
But when I implement some of the functions that I found on this website for example this code, the cpu usage goes way up

data = from(bucket: "crypto_app")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "candles")
  |> filter(fn: (r) => r["exchange"] == "Binance")
  |> filter(fn: (r) => r["symbol"] == "BTC-USDT")
  |> window(every: 5m)
  |> reduce(fn: (r, accumulator) => ({
  
      indexLow:
        if (r._field=="low") then 
          accumulator.indexLow+1 
        else
        accumulator.indexLow,
      indexOpen: 
      if (r._field=="open") then 
       accumulator.indexOpen+1 
       else 
       accumulator.indexOpen,
        open: 
      if (r._field=="open") then 
        if (accumulator.indexOpen==0) then 
           float(v:r._value )
        else 
          accumulator.open
      else
        accumulator.open  
    ,
    
    
      high:
       if (r._field=="high") then  
          if(r._value>accumulator.high ) then
            float(v:r._value )
          else
            accumulator.high 
      else 
        accumulator.high
   ,
    low: 
      if (r._field=="low") then

          if(r._value<accumulator.low or accumulator.indexLow==0.0) then
            float(v:r._value )
          else
           accumulator.low 
      else 
        accumulator.low,

             close: 
       if (r._field=="close") then 
          float(v:r._value )
      else 
        accumulator.close,
             volume: 
        if (r._field=="volume") then
          float(v:r._value )+accumulator.volume 
          else
           accumulator.volume
        
  
    }),
    identity: {indexLow:0,indexOpen:0,open: 0.0,high: 0.0,low: 0.0,close: 0.0,volume: 0.0})
    |> drop(columns: ["indexOpen","indexLow"])
    |> group(columns:["symbol"])
    |> yield(name: "candle")

Another code I found

data = from(bucket:"crypto_app")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "candles")
  |> filter(fn: (r) => r["symbol"] == "BTC-USDT")
 

open = data |> filter(fn: (r) => r._field == "open") |> aggregateWindow(every: 5m, fn: first) |> drop(columns: ["_start", "_stop", "_measurement", "_field"])
close = data |> filter(fn: (r) => r._field == "close") |> aggregateWindow(every: 5m, fn: last) |> drop(columns: ["_start", "_stop", "_measurement", "_field"])
low = data |> filter(fn: (r) => r._field == "low") |> aggregateWindow(every: 5m, fn: min) |> drop(columns: ["_start", "_stop", "_measurement", "_field"])
high = data |> filter(fn: (r) => r._field == "high") |> aggregateWindow(every: 5m, fn: max) |> drop(columns: ["_start", "_stop", "_measurement", "_field"])


OC = join(tables: {o: open, c: close}, on: ["_time", "symbol"])
HL = join(tables: {h: high, l: low}, on: ["_time", "symbol"])
OCLH = join(tables: {OC: OC, HL: HL}, on: ["_time", "symbol"])

|> group(columns:["symbol"])
|> rename(columns: {_value_c: "c", _value_h:"h", _value_l:"l", _value_o:"o"})
|> yield(name: "mean")

In this way I entered the information using a python script from the csv in binance collection
point = (
dot(“candles”)
.tag(“exchange”, “Binance”)
.tag(“symbol”, “BTC-USDT”)
.tag(“interval”, “1m”)
.tag(“topic”, “crypto/candles/Binance/BTC-USDT/1m”)
.field(“start”, row[0].astype(‘int64’))
.field(“stop”, row[6].astype(‘int64’))
.field(“open”, row[1])
.field(“high”, row[2])
.field(“minimum”, row[3])
.field(“close”, row[4])
.field(“volume”, row[5])
.time(row[0].astype(‘int64’), “ms”),
)

1722470400000,64628.01000000,64670.01000000,64601.00000000,64634.01000000,36.77990000,1722470459999,2377477.74237530,3399,13.19798000,853026.66151230,0

sometimes it takes more than 50sec or never ends.
I have no idea what I am doing wrong

Anaisdg · October 14, 2024, 9:14pm

@Jose_Martinez,
Yah unfortunately that isn’t surprising to me. How many points are you querying?
That flux is taking that long to run. I would suggest doing this work with python instead.
You could do some of this processing before it hits the db with the exec plugins.

github.com

influxdata/telegraf/blob/master/plugins/processors/execd/README.md

# Execd Processor Plugin

The `execd` processor plugin runs an external program as a separate process and
pipes metrics in to the process's STDIN and reads processed metrics from its
STDOUT.  The programs must accept influx line protocol on standard in (STDIN)
and output metrics in influx line protocol to standard output (STDOUT).

Program output on standard error is mirrored to the telegraf log.

Telegraf minimum version: Telegraf 1.15.0

## Caveats

- Metrics with tracking will be considered "delivered" as soon as they are passed
  to the external process. There is currently no way to match up which metric
  coming out of the execd process relates to which metric going in (keep in mind
  that processors can add and drop metrics, and that this is all done
  asynchronously).
- it's not currently possible to use a data_format other than "influx", due to
  the requirement that it is serialize-parse symmetrical and does not lose any

This file has been truncated. show original

Topic		Replies	Views
InfluxQL to new task FLUX with resample and group InfluxDB 2 influxdb , flux	8	1914	January 13, 2024
High CPU usage with influxdb influxdb	0	1247	May 15, 2021
High CPU usage by influxdb InfluxDB 2 influxdb	7	11652	April 15, 2021
Influxdb 1.7, High cpu usage Telegraf influxdb	0	720	October 12, 2019
CPU utilization of InfluxDB is too high InfluxDB 2 performance , docker	1	628	November 23, 2022

Docker - influxdb high cpu use

Related topics