Drop corrupted measurements


We use Influxdb to store system metrics collected with Telegraf. Unfortunately some systems cause corrupted data to be send/inserted, which causes corrupted measurements to be created. (Similar to Influxdb corrupted measurements created and https://github.com/influxdata/telegraf/issues/2854)
We are still trying to pinpoint the exact cause, but that’s not the point of this topic.

I’m trying to find a way to drop these corrupted measurements. Using the influx cli client I get errors like ERR: shard 6: proto: invalid UTF-8 string or ERR: error parsing query: found \u, expected identifier at line 1, char 21. I tried escaping the slashes, (i.e. DROP MEASUREMENT "�\\u0000\u0000\u0000\u0000\u0006;diskio"), but resulted in the same “invalid UTF-8 string” error.

some examples of these measurement names:


Is there a way to deleted these measurements/series while keeping the rest?



Here here is what I did to remove the unwanted series.

First, delete all series whose name does not start with a char:

# echo "show series from /^[^a-z]/" | influx -username xxx -password xxxx  -database telegraf

check and delete them:

# echo "drop series from /^[^a-z]/" | influx -username xxx -password xxxx  -database telegraf

next, I have removed all series with less than 100 entries:

# echo "show measurements" |  influx -username xxx -password xxxx  -database telegraf >/tmp/lej1

edit file, remove header line and last blank line

# while read m; do echo -n "$m "; echo "select * from \"$m\" limit 100" | influx -username xxx -password xxxx  -database telegraf | wc -l; done </tmp/lej1 >/tmp/lej2


# diff /tmp/lej1 <(awk '{print $1}' /tmp/lej2)
# grep "[0-9][0-9][0-9]$" /tmp/lej2

you must see all your good measurements

# grep "[0-9][0-9][0-9]$" /tmp/lej2 >/tmp/lej3
# while read m; do if (! grep -q "^$m " /tmp/lej3); then echo $m; fi; done </tmp/lej1 > /tmp/lej4

# wc -l /tmp/lej1 /tmp/lej3 /tmp/lej4

and check that number of lines of lej1 = lej3 + lej4

now, you can delete all measurements in lej4:

# while read m; do echo "drop measurement \"$m\"" | influx -username xxx -password xxxx  -database telegraf; done </tmp/lej4

now, it must be good!

# echo "show measurements" |  influx -username xxx -password xxxx  -database telegraf

Thanks for the extensive guide. I ended up using the the drop series with regex feature to remove the unwanted measurements/series.

drop series from /[^a-z_]/

This cleaned up everything nicely. (Including some measurements that started with a character, but had some unicode stuff in the middle)