Store Python dictionary as field using InfluxDB 2

How can I store a Python dictionary in a field using InfluxDB 2? Example:

write_client.write(bucket, org, {
    "measurement": measurement,
    "fields": {
        "mydict": {'a': 1, 'b': 2}
    },
    "time": timestamp
})

A straightforward way is to convert the dictionary (mydict) to JSON and store it as string. However, for binary data this is very space inefficient. Are there more efficient / compact ways? Thanks!

Hello @karl,
Welcome! May I ask, why do you want to store the json as a field? You won’t be able to visualize that data in a meaningful way. Can you provide me with a larger context about what you’re trying to accomplish?

Thanks, Anaisdg! Most of the fields in my use case contain time series data (just plain numbers). Additionally, I want to store unstructured / varying auxiliary data (Python objects) for some points. That data need not be visualized and is only used for script-based postprocessing purposes. For simplicity reasons, I would prefer having them in the same database instead of storing them somewhere else. The only way to achieve this seems to serialize (pickle) the Python objects and to encode them as base64 strings, which is not very efficient, however.

Hello @karl,
Thanks for explaining. I’ve asked someone on the storage team to help. That’s cool you’re building on top of InfluxDB. What specifically? I’m curious about your postprocessing purposes. Care to share more?

Hi @karl -

I’m not sure if you’re using influxv2 OSS or cloud but my advice here should apply to both. The storage engine uses gzip compression for strings already. If you convert the dict to a base64 string, you won’t get the compression advantage (since you already compressed it essentially). I suspect you are storing many of these dicts with overlapping key names. I would not compress in advance to allow compression across dicts.

If these dicts become very long strings, you may run into the length limit (~64k roughly I believe). Additionally, I would store the string as a field so that it is not indexed (i.e. don’t use a tag for this value). Indexing longs strings is expensive and you said you don’t need to query for these directly.

This is my expectation, depending on the characteristics of your data and how it serializes, your mileage may vary. If you’re on influx oss, you’ll be able to A/B test different approaches and the impact on data volume on disk and query time. My general suggestion is to stick the serialized value into a field and continue until the performance is unsatisfactory and then you can iterate on improving it.

Let us know how it works out! Storing metadata like this is a nice use case.

1 Like