Suddenly a duplicated _1 tag - apparently I messed up my database

A few days ago one of my sensors sent completely wrong data which caused my statistics to be completely wrong. I looked for a way to remove the wrong data and since dropping doesn’t work when filtering by values I found a workaround that suggested inserting the whole table into a new one, dropping the original table and inserting it back. It seemed to have worked at first as all my data without the wrong values was back. But since then all new data that was written got a _1 appended to the tag name and I can not find a way to read that data from the database.

Unfortunately I noticed this way too late and even though I made a backup before I tried the “fix”, going back to that would cause several days of data being lost and I would like to avoid that.

I am running InfluxDB v1.7.6

Hello @Qetesh,

Thanks for your question. This problem sounds similar to this post. Is there a fix in there for you?

It does sound similar to my problem but also is not solved so unfortunately it’s not helping me in any way.

@Qetesh whoops! I thought it was solved. Why can’t you read from the database even with the name change? Can you please provide an example of a query that isn’t working?

@Anaisdg Thank you for the reply. The behavior actually changed a little after updating to influx 1.7.9:
Now the query works as long as I only query data after the incident. As soon as I start a query that includes data from before the accident only the old data can be read. I pinpointed the actual time it happened and this is the data directly around the incident:

> SELECT * FROM "temperature" WHERE time > 1572036412756096000 and time < 1572037373055757000
name: temperature
time                location location_1 value
----                -------- ---------- -----
1572036532738133000 living              22.6
1572036553715978000 outdoor             11.7
1572036581190571000 work                22.7
1572036641196348000 work                22.6
1572036643036868000 bed                 19.7
1572036703094991000 bed                 19.9
1572036763123191000 bed                 19.8
1572036821191581000 work                22.7
1572037003275377000 bed                 19.9
1572037061201475000 work                22.6
1572037063093457000 bed                 19.8
1572037181203827000          work       22.7
1572037182982241000          bed        19.8
1572037192747410000          living     22.6
1572037213934115000          outdoor    11.8
1572037241180611000          work       22.7
1572037243092490000          bed        19.9
1572037252952408000          living     22.6
1572037273970923000          outdoor    11.8
1572037312771283000          living     22.6
1572037333805383000          outdoor    11.8

So as soon as a query for “location” contains data from both periods the data stored in “location_1” is not returned anymore. When querying only before or after this timeframe everything now (after the update) works as expected. I would still like to fix this though if somehow possible because grafana is not able to display long time queries correct anymore.

Hi @Qetesh, if I understand correctly you want to continue to use the tag value location not location_1, but the values for a certain time range are logged as location_1 and new values are being written with the correct tag location ? You should be able to fix this by reading all measurement points witch have tag=location_1, then rewriting them with tag=location. Your database will show two tags for those measurements (location and location_1) but so long as your application only looks for tag=location it should continue to work as expected.

Unfortunately the only way to completely remove a bad tag value or a bad field name is to delete the measurement data in question and rewrite it. I and many other Influx users consider this to be a bug but it appears none of the Influx developers seem to think so :frowning:

Hey @JeremySTX, it is a bit more confusing…

both the old and the new values have the tag “location” - only when I do a query on a time range that includes data from before and after the incident the new values are marked as “location_1”.

When I for example just select the last 10 minutes, I get correct results:

> SELECT * FROM "temperature" WHERE time > now() - 10m
name: temperature
time                location value
----                -------- -----
1572966335068309000 living   22.8
1572966421395334000 bed      18.8
1572966455055101000 living   22.9
1572966481376263000 bed      18.9
1572966515070746000 living   22.8
1572966535734505000 outdoor  11.6
1572966588357757000 work     22.4
1572966715757556000 outdoor  11.5
1572966775713981000 outdoor  11.4
1572966781383296000 bed      19
1572966841388294000 bed      18.9

But when I do a bigger time range it looks like in my last post… getting just older data also works correct.

It looks like the database has two tags “location” that each work on their own but when I have a query with results from both in them the newer one gets renamed?

@Qetesh,

I’m having trouble getting help to help you. In the meantime, could you please create an issue and share the link here? Perhaps @rawkode, has some solutions for you? Thank you. I’m sorry this is happening, and I hope I can give you a concrete solution soon.

Right now the only suggestion I have is for you to delete the shard, unfortunately.

UPDATE:
I got the following advice from support, typically they do:
-A select into – for the entire measurement.
-You articulate all the fields and then you group by all the tags (less the one you are trying to get rid of)
-Then you drop the measurement
-Then you re-run the select into and move the data back.
-If you have massive amounts of data already, this can be painful. The select into requires memory. The guidance is to typically chunk this into data sizes that will fit within memory.

Hi @Anaisdg sorry for the late reply…
A select into was what caused the problem at first so I am a bit reluctant here…
Also you say to group by the tags less the one I want to get rid off - how do I do that if it has the same name? I don’t want to lose data… Sorry if my questions might seem stupid but I am not very experienced with using Influx unfortunately.

Besides that if there’s anything I can do to narrow the problem and the cause down let me know.

@Qetesh,
No problem, I haven’t dealt with this problem personally either, so we’re both new to this. That direction was passed on to me by coworkers. So you can’t do a select * or a group by * because you’re trying to get rid of a tag. So you you need to select each field into a target measurement. Then once all of your good data is moved into a new measurement you can drop the old one. You need to expand * to explicitly select the fields you want to keep and group by the tags. Does that help?

I have the same problem and I cannot find a way to fix this, I did a select into too and after that all points written to that collection get a _1 suffix in tag names. I also tried to drop and recreate the collection (with a select into with group_by) but no luck.

Any help?