Is InfluxDB good for analytics use-case (for Big Data)

I want to know whether below use-case fits for InfluxDB or not ?

Requirements-

  1. We are generating 200+ GB data daily. We need to produce rollups of daily cubes.
  2. Rollup can be any - Standard, and Custom
    Standard - Weekly, Monthly, Quartely, Yearly, Week Till Date, Month Till Date, Quarter Till date, Year Till Date, Last 90 days. (We can introduce more as per clients requirement.)
    Custom - can be any customized range. Say, Feb to Nov., etc.
  3. Read operation should be extremely fast.

Current technical stack-
We are using Snowflake. (Previously, we used Redshift)

Questions-

  1. Will above use-case fit in InfluxDB ?
  2. Is there any bottleneck, if we store TBs of data ? (Is it horizontally scalable ?)

Thanks,
Hitesh

Hitesh,

Based on what you’ve said, the analytics you’re doing is on time series data. The TICK stack is designed for that. There are rollup capabilities in the database (InfluxDB), itself, but I would point you to Kapacitor for more sophisticated rollups and true analytics.

InfluxEnterprise and InfluxCloud are our clustered offerings and offer horizontal scalability and high availability. Storing TBs of data will look different (better) in InfluxDB, most likely, as it will be compacted.

2 Likes

@Hitesh_Jhamb You can do this with InfluxDB! When you say 200+ GB of data, how many data points does that represent? We achieve excellent compression. You should see that number be ~1-2 GB on disk per day. As far as rollups there is a native way in the database to do this. And as @Sam said, there is a clustered version as well.

Hope that helps!

1 Like

Datapoints is in Hundreds of Billions.

@jackzampolin , @Sam - Currently my whole data is present in AWS Redshift, and Snowflake - https://www.snowflake.net/

Is there any way to migrate the whole data from above locations to InfluxDB/InfluxCloud ?

Each table size is in hundreds of GBs.

@Hitesh_Jhamb We have a wide selection of client libraries that should help you complete this.

@jackzampolin I went through Java API. But, I didn’t find any API/command to import table(data[oint) from external source(S3/Redshift/Snowflake)

@Hitesh_Jhamb I’m pretty sure there are Java SDKs for each of those technologies as well. You would need to write an ETL job to move the data into InfluxDB.

@jackzampolin Thanks for all your help.

@jackzampolin Similar example - Redirecting to … (Copying form AWS S3)

When can I expect similar feature as a part of InfluxDB ?

@Hitesh_Jhamb I’m not sure I would call that example similar. That is about copying into a completely different DBMS with a different storage model. InfluxDB is a specialized time series database that has it’s own storage engine and data model. You could copy data from S3 very quickly if the data is in the import data format. But you would need to get your data into our import format.

The best way to do that is by using the client libraries.

@Hitesh_Jhamb, our current implementation ingests about 750k lines/sec (of which each line is ~10-20 points) on a standalone OSS instance (read free version). We’re also using druid as a separate datasource for histogram-like information, using grafana as the central visualization source.

If you’re concerned about throughput, I can hopefully help put you at ease and attest to the performance of the OSS instance. Happy to provide more detail if you’re curious.

2 Likes