Build DataLake with InfluxDB(using telegraf, chronograf ...)

#1

Hi
I want to build datalase based influxDB functionalities? Do you have any case study ?

Thanks

#2

Hi @garba_moussa,

I’m not sure what you’re asking. A data lake generally involves storing data in their native formats. A data lake might store “structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video)”. I assume you are not trying to build a data lake using InfluxDB as the backing store.

Are you looking for ways to integrate InfluxDB with an existing data lake? For example, pulling data from the data lake to add to a measurement? One way of doing this would be using Kapacitor’s sideload node to load in data from external sources, but it’s hard to make specific recommendations without better understanding what you’re trying to do.

#3

Thanks for answers.
Then I have 4 differents databases content Time series and Log. My purpose is to use collector and to put all thoses data in one datalake and using InflusDB for storage.

After build datalake, i want to use kapacitor for machine learning ( Anomaly detection).
Then the first step is to build datalake.
Thanks

#4

Logs are a type of time series, as they have a timestamp and a value (the log line). In the case of a log, the data type is a string, or a set of fields, compared to a metric where the data is usually a number; but it is still a time series.

Since all of the data is the same format, time series, you don’t need a data lake, you just need a time series database.

Telegraf has both logparser and syslog inputs, so you can easily feed logs into InfluxDB, and has even more inputs for collecting metrics from a variety of sources and applications.

If your goal is to use Kapacitor for anomaly detection, you’ll probably want to use some form of structured logging, where your logs are output in a machine-readable format. The syslog protocol (RFC) includes a specification for structured data that you can use in conjunction with the syslog input plugin.

#5

Hello @noahcrowley Thanks for you help for me and those are been help me for my internship and now i want to run some UDF on kapacitor especially ARIMA Model as Kapacitor UDF. But many posts prouve that nobody have been do that and i want to help for that;

Thanks