Send Sensor Data to InfluxDB deployed on AWS via MQTT

FaitAccompli · July 27, 2020, 1:44pm

Hi! I’m new to this community and I think this is the perfect platform to ask my question. Our current system uses an RPI Zero W and sends sensor data to our InfluxDB database deployed on AWS via REST. We’re trying to transition from HTTPS to MQTT without totally disrupting our current architecture. I would like to ask what would be the best architecture for this particular use case taking into consideration sampling rates, costs and less bottlenecks. Here’s what I have considered so far:

1) RPI Zero W ==> send data via MQTT ==> AWS IoT Core ==> AWS Lambda ==> InfluxDB
2) RPI Zero W ==> send data via MQTT ==> Telegraf ==> InfluxDB (not sure if this is possible)
3) RPI Zero W ==> node red ==> send data via MQTT ==> InfluxDB (not sure about this either)

The thing is, Option 1 was what I am used to doing but as you may have noticed, our team might incur extra costs for the read/write and execution time, and it also has a lot of “jumps”. I am probably missing a really simple to architecture here, since I am relatively new to telegraf.

I’ve done something similar to Option 3, but the RPI 3 was the MQTT broker and everything was done locally from converting the mqtt data in node red to storing it in our InfluxDB database.

Any form of help would be much appreciated! I hope the InfluxDB Gods hear my plea!

willcooke · July 27, 2020, 4:42pm

Hi @FaitAccompli, welcome to the community!

My personal preference would be Option 2. Telegraf will give you a whole load of convenience for very little investment. Things like caching, batching and data marshalling for example, just by editing config rather than writing code.

What I do is:

Have all my various IOT things report JSON via MQTT topics
Run an MQTT Server somewhere on the network. I use mosquitto on the same Pi.
Run a Telegraf instance somewhere on the network. I also run this on the Pi.
Configure Telegraf to listen to the relevant topics. You can have multiple mqtt configs in a single Telegraf.
Configure Telegraf to expect JSON on those topics
Configure Telegraf to understand any string fields

Here’s some extracts from my Telegraf config for talking to Cloud 2.0. This might get you going.

[agent]
  debug = True
  interval = "300s"

[[inputs.mqtt_consumer]]
  servers = ["tcp://192.168.1.1:1883"]
  topics = [
      "sensors/temperature/#",
      "sensors/pressure/#",
   ]

data_format = "json"
json_string_fields = ["string_topic_name"]
  [[processors.converter]]
    [processors.converter.fields]
      float = ["temperature", "pressure"]
      integer = ["other_topic_1", "other_topic_2"]
      boolean = ["example"]

[[outputs.influxdb_v2]]
  urls = ["https://us-west-2-1.aws.cloud2.influxdata.com"]
  token = "<$YOUR_TOKEN>"
  organization = "$YOUR_ORG"
  bucket = "$YOUR_BUCKET"

Instead of data_format = "json" you can also use data_format = "value"
so if for example your data looked like this:
mqtt/topic/temperature/sensor1/25

Your config might look like this:

[[inputs.mqtt_consumer]]
servers = ["tcp://192.168.1.1:1883"]
topics = [
  "mqtt/topic/temperature/+/#"
]
data_format = "value"
data_type = "integer"

The + means kinda “match anything” and the # indicates where the value would be.

If you’re at the early stages of tweaking your Telegraf config you might like to add

[[outputs.file]]
  files = ["stdout"]

As well, and then you’ll get nice logging of what’s going on direct to your terminal.

Hope that gets you on the right track!

Cheers, Will

FaitAccompli · July 28, 2020, 3:43am

Thank you so much @willcooke ! Your insights are well-appreciated. I read through your code and I just have a few questions, if you don’t mind me asking.

I am really new to Telegraf, so am I correct in saying that Option 2 basically becomes:
RPI Zero W Sensor Data ==> Mosquitto ==> Telegraf ==> InfluxDB?
I also noticed you had an interval variable initialized to 300s, what is this for?
Lastly, I am assuming that your scripts for gathering data are written in python and you basically send them to the broker which is the same Pi and Telegraf serves as the middle man so you can send the data to your influxdb instance on the cloud?

I apologize for the ton of questions, I am just really interested in trying out Telegraf! The code snippets you’ve sent me are more than enough for me to get on the right track, thank you so much for this!

willcooke · July 28, 2020, 8:32am

Yes, exactly that.

Hmm, this is a good question. Intervals are described here:

github.com

influxdata/telegraf/blob/master/docs/CONFIGURATION.md#intervals

<!-- markdownlint-disable MD024 -->

# Configuration

Telegraf's configuration file is written using [TOML][] and is composed of
three sections: [global tags][], [agent][] settings, and [plugins][].

## Generating a Configuration File

A default config file can be generated by telegraf:

```sh
telegraf config > telegraf.conf
```

To generate a file with specific inputs and outputs, you can use the
--input-filter and --output-filter flags:

```sh
telegraf config --input-filter cpu:mem:net:swap --output-filter influxdb:kafka

This file has been truncated. show original

but… for MQTT that doesn’t really make sense since the data is pushed to Telegraf, not polled on a schedule. I’m not actually sure what difference this would make. Maybe try it out and I’d be interested to know what you find.

My sensors are all Zigbee now, so I use Zigbee2MQTT which is a Node app. I have used Python to send data as well, and I find that Paho is really easy to use.
By using MQTT as the transport layer you can easily abstract away all the complicated network sockets, HTTP servers etc. By using Telegraf you get a lot of features “for free” and as you say, it can upload to Cloud 2.0 very easily. I haven’t actually used it to upload to an Open Source instance of InfluxDB but I fully expect it to be just as painless.

No apology needed, I’m excited to be able to help and interested to hear more about how you get on with your project.

I would recommend that you just Go For It - once you get the plumbing set up you’ll see it’s really quite easy You can’t really break anything.

Edit: I can only post two links in a post, so I had to remove links to Zigbee2MQTT and Paho, but I’m sure you can find them

FaitAccompli · July 28, 2020, 9:18am

Thank you so much for this @willcooke! I’m currently reading Paho’s documentation, and I think I can be able to integrate this with our current code. I initially used AWSIoTPythonSDKMQTTLib for pubsub functions of our code but now since the architecture won’t be using AWS IoT core, I think I’ll be using Paho instead.

Thank you so much! I’ll look into the interval explanation you’ve sent me and overall your replies pointed to the right direction. Thank you so so much! I’ll be posting my insights here after trying things out!

Cheers!
FaitAccompli

Asmodeus · February 23, 2021, 7:46pm

Hi, @willcooke,

Maybe you could also tell me how to implement Option 1 with Aws IoT? I want to use this in my master’s thesis, but I can’t find any information on this topic.
At this point, I was able to send sensor data from my Raspberry to AWS IOT using the simple code written in Python.

willcooke · February 23, 2021, 8:12pm

Hi @Asmodeus

I’m not really clear on what you’re trying to achieve here, but I assume something along the lines of sending data to an AWS hosted MQTT server (AWS IoT Core) and then having an AWS Lambda Python function pick up that data and send it in to InfluxDB?

If so, then what I would do (and I might well be wrong) is:

Create a Python virtualenv and include the InfluxDB Python client library package and the Paho MQTT package. Write a Python script using Paho to subscribe to the topic, extract the data you want and then write in to InfluxDB using the client library. You can store your credentials outside of your venv using Lambda’s ability to pass in environment variables to your script.

You can then package that whole virtual env up as a zip file and upload it to Lambda.
This tutorial is useful: Amazon Pinpoint

I hope that sets you off on the right track.

Franky1 · February 23, 2021, 8:26pm

I would disagree with that.
It doesn’t work like that because AWS Lambda is not meant to run permanently - but it would have to if you had a Paho client running? Unless you trigger the Lambda function regularly with a cron trigger.

If it absolutely has to be AWS services, I would do it differently:

You don’t need the Paho library, the AWS IoT Core Service can also trigger an AWS Lambda function directly.

Asmodeus · February 23, 2021, 8:40pm

@willcooke, @Franky1

I am very grateful for the quick response and for trying to help.

I probably expressed myself poorly and was not able to explain my intentions. I am trying to transfer telemetry data in this way: Temperature sensor → Raspberry Pi 4 → AWS IoT core + AWS Greengass → AWS Lambda → InfluxDB + Grafana.
At the moment I have a script running on my Raspberry that sends data from the sensor to AWS IoT, then I need to transfer this data to InfluxDB using the Lambda functions.

I think this will be a little clearer.
Thank you.

Franky1 · February 23, 2021, 9:58pm

I don’t know exactly what AWS Greengrass does and how it works.
Haven’t dealt with it yet.
Basically, you can trigger AWS Lambda functions from many other AWS services.

Furthermore, @willcooke has already outlined the basic procedure - except that I don’t think you need the Paho library - because the AWS services can talk to each other directly.

If you are comfortable with Python, you can use a Python runtime for AWS Lambda.
When the Lambda function is triggered by IoT events, you take the data and push it to InfluxDB using the Python influxdb-client-python library.

Franky1 · February 23, 2021, 11:17pm

I don’t know the general conditions of your project.

Are the AWS services mandatory?
How stable is the client’s internet connection?
How stable are the clients itself?
How much data do the clients send?
How many clients?

I think it could be done more easily, but that depends on the general conditions of the project.
For example, you could also run the Telegraph agent on the Raspberry, which pushes the data directly to the InfluxDB. Then you would save yourself all the AWS stuff…

Or use a client library or the influx cli on the Raspberry, that pushes directly to InfluxDB.

But that just depends…

willcooke · February 24, 2021, 8:59am

Thanks for the info @Franky1 - I learned something today, much appreciated!

Topic		Replies	Views
Using InfluxDB and AWS EC2 instances to store sensor data on the cloud influxdb , telegraf , iot , python , raspberry	4	1505	October 18, 2022
Telegraf - can it pass iot data through to Influx? telegraf	4	1738	May 1, 2018
Telegraf on IoT collecting metrics, outputting to Telegraf on Server? influxdb , telegraf	6	1157	May 23, 2018
MQTT publishing somehow received - but something is wrong	3	561	October 23, 2020
Beginner question: MQTT > Transform > Influx with TICK stack? Telegraf	3	1500	July 12, 2018

Send Sensor Data to InfluxDB deployed on AWS via MQTT

Related topics