Cardinality not lining up

#1

I have a weird issue. I’m importing data and setting the timestamps on the imported data. When I do not specify a time precision the series cardinality lines up properly but the rfc3339 times reported back are 1970’s times. When I specify the time precision then the time lines up properly using rfc3339 but the cardinality reported is much higher than what is reported by a ‘show series’. This is using 1.7.4-1 from influxdb’s xenial repository.

Here’s a script I wrote to test this

DATABASE        = 'cardi'
HOST            = '127.0.0.1'
PORT            = 8086
USER            = 'root'
PASSWORD        = 'root'

from influxdb import InfluxDBClient
from random import randint

def influxdb_insert(db, json):
    if not db:
        raise Exception("pass in a db object")
    if not json:
        raise Exception("pass in a json object")
    # Will cause the cardinality to be correct but rfc3339 times to be 1970
    # db.write_points(json)
    # Will cause the cardinality to be higher than it should but rfc3339 to be correct
    db.write_points(json, time_precision='s')


if __name__ == "__main__":
    client = InfluxDBClient(host=HOST, port=PORT, username=USER, password=PASSWORD, database=DATABASE)
    client.drop_database(DATABASE)
    client.create_database(DATABASE)
    client.switch_database(DATABASE)

    json_list = []
    epoch = 1545441300
    time_interval = 60
    meter_start = 10005
    meters = 100
    records_per_meter = 5000
    record = 0
    total_records = meters * records_per_meter

    for meter in range(meter_start, meter_start + meters):
        while record <= records_per_meter:
            json_list.append({
                "measurement": "meter",
                "tags": {
                    "MeterNumber": meter
                },
                "time": epoch,
                "fields": {
                    "MeterValue": randint(1, 100)
                }})
            epoch += time_interval
            record += 1

            if len(json_list) == records_per_meter:
                influxdb_insert(client, json_list)
                json_list = []

        record = 0

Running with time_precision=‘s’ I get the following results

influx -database cardi -execute 'show series' |grep meter |wc -l;
100

influx -execute 'show series exact cardinality on cardi';
150

influx -execute 'select COUNT(MeterValue) FROM cardi.autogen.meter'
500000

influx -precision 'rfc3339' -database 'cardi' -execute 'select * from meter limit 10'

time MeterNumber MeterValue


2018-12-22T01:15:00Z 10005 72
2018-12-22T01:16:00Z 10005 33
2018-12-22T01:17:00Z 10005 56
2018-12-22T01:18:00Z 10005 54
2018-12-22T01:19:00Z 10005 8
2018-12-22T01:20:00Z 10005 72
2018-12-22T01:21:00Z 10005 30
2018-12-22T01:22:00Z 10005 54
2018-12-22T01:23:00Z 10005 84
2018-12-22T01:24:00Z 10005 36

Running this with no time_precision set provides the following results.

influx -database cardi -execute 'show series' |grep meter |wc -l;
100

influx -execute 'show series exact cardinality on cardi';
100

influx -execute 'select COUNT(MeterValue) FROM cardi.autogen.meter'
500000

influx -precision 'rfc3339' -database 'cardi' -execute 'select * from meter limit 10'

time MeterNumber MeterValue


1970-01-01T00:00:01.5454413Z 10005 96
1970-01-01T00:00:01.54544136Z 10005 70
1970-01-01T00:00:01.54544142Z 10005 53
1970-01-01T00:00:01.54544148Z 10005 31
1970-01-01T00:00:01.54544154Z 10005 28
1970-01-01T00:00:01.5454416Z 10005 52
1970-01-01T00:00:01.54544166Z 10005 70
1970-01-01T00:00:01.54544172Z 10005 59
1970-01-01T00:00:01.54544178Z 10005 37
1970-01-01T00:00:01.54544184Z 10005 14

#2

Hi T ,
If you don’t use time_precision,
The follow two variabels must be specified in nanoseconds …

epoch = 1545441300
time_interval = 60

That will bring your dates to the 21st century and probably solve the other issue
as well …
hope this helps :sunny:

#3

Tried fiddling with the numbers and got stranger results. Uncomment the line with db.write_points(json) and comment the line with db.write_points(json, time_precision=‘s’). This is using epoch timestamps as influx recommends.

Set these values and everything works fine. This is 1 second intervals in the data.
epoch = 1545441300000000000
time_interval = 1000000000

However if you run 60 second intervals with these values then you’ll see the high cardinality.

epoch = 1545441300000000000
time_interval = 60000000000

So, how does time have anything to do with the cardinality?

#4

Hi ,

by coincidence? the difference(50) in your cardinality in your test with 1s interval and 60s interval ,
is almost the same as the difference in the number of shards .
With a 1 second interval the result is 2 shards
and with 60 seconds interval the number of shards is 51.
I don’t think time has anything to do with cardinality ,
but maybe there is a relation between the number of shards and the cardinality ?

If I do your test with 10 meters

(1s interval) : series =10 ,shards is 1 ,cardinality exact 10 ( = 10series + 1 shard - 1 )
(60s interval) : series=10 , shards is 6 ,cardinality exact 15 ( = 10series + 6 shards - 1 )

I did some more tests and my conclusion for your example is that
exact cardinality = series + shards - 1 ( or -2 if the cardinality is higher ? )

thanks for this interesting question :slight_smile: