Data Cleanup Strategies

#1

My sensor data contains periodic anomalies. I could not find a way to delete specific datapoints, and I’m not sure if there is a command that solves my problem.

I want something like:

delete from <measurement> where <tag_name> = "<mytag>" and <field_name> > <myval>

AFAIK you can delete by tag or time range, but not by field.

I’ve thought about things like:

  • dump the database, filter and reload
  • dump the series, filter and reload
  • adding a “status” tag (valid|invalid) and re-tagging invalid datapoints
  • deleting the time-range with invalid data

What is the best strategy to remove out-of-range datapoints from my Influx database?

1 Like
#2

@andyl Nonperformant deletes are a design decision so these types of operations are a littel messy.

We do have a DROP SERIES and DROP MEASUREMENT in the query language that could help with this. However if you are looking to delete individual field values the only way to do that is with your dumping the database and rewriting strategy.

#3

Thanks for the feedback ! (even though the dump and rewrite strategy seems awful and surprising there is not a better way…)

#4

So - a favour - for the complete INFLUX beginner… clearly this is NOT what those of us who use SQLITE and MYSQL etc are used to and it is a pain.

Can someone describe simply the process of extracting the data - being able to modify it and putting it into a replacement database?

#5

SELECT INTO lets you extract existing data into a new measurement or database. Be sure to read through the examples and the common issues in the docs.

#6

FYI here is a little bash script to remove zero-value temperatures from a weather database. It seems to work, but I hate how obtuse the code is and wonder if there is a better way…

#!/usr/bin/env bash
tmp="tmp1"
user="tbd"
pass="tbd"
dbase="weather"
metric="weatherpoint"
tag_clause="station = 'orange'"
field_clause="temp_f != 0"
influx -username $user -password $pass -database $dbase -execute "drop measurement $tmp"
influx -username $user -password $pass -database $dbase -execute "select * into \"$tmp\" from \"$metric\" where $tag_clause and $field_clause"
influx -username $user -password $pass -database $dbase -execute "drop series from \"$metric\" where $tag_clause"
influx -username $user -password $pass -database $dbase -execute "select * into \"$metric\" from \"$tmp\""
influx -username $user -password $pass -database $dbase -execute "drop measurement $tmp"
#7

Here’s another version, re-written in Ruby.

#!/usr/bin/env ruby

TMP_METRIC   = "tmp9"               # name of temporary measurement
USER         = "user"               # username
PASS         = "pass"               # password
DBASE        = "weather"            # database name
TGT_METRIC   = "weatherpoint"       # measurement name
TAG_CLAUSE   = "station = 'orange'" # selector to narrow scope of copy/replace
FIELD_CLAUSE = "temp_f != 0"        # field values to remove

script = <<-EOF
drop measurement  #{TMP_METRIC}
select * into    "#{TMP_METRIC}" from "#{TGT_METRIC}" where #{TAG_CLAUSE} and #{FIELD_CLAUSE}
drop series from "#{TGT_METRIC}" where #{TAG_CLAUSE}
select * into    "#{TGT_METRIC}" from "#{TMP_METRIC}"
drop measurement  #{TMP_METRIC}
EOF

lines = script.split("\n")         # split the script into an array of lines

lines.each do |line|               # iterate over each line
  eline = line.gsub('"', '\"')     # escape all double-quotes in the line
  cmd   = %Q[influx -username #{USER} -password #{PASS} -database #{DBASE} -execute "#{eline}"]
  puts cmd                         # print the command
  puts `#{cmd}`                    # execute the command and print the results
end