InfluxDB and Kapacitor: An Enhanced Data Model and Functional Query Language

Originally published at: InfluxDB and Kapacitor: An Enhanced Data Model and Functional Query Language | InfluxData

The current data model in InfluxDB of measurements, tags, and fields is a major change we made near the end of 2014. The query language has largely been the same since the initial introduction of the project in November of 2013. It looks kind of like SQL, but we have a few bits of syntactic sugar to make some things easier. Based on what we’ve learned from new and advanced users working with the system over the last three years, we’re going to make some big additions in the coming months. These changes will be additive to a future 1.x InfluxDB and Kapacitor release, but in short, we’re going to be simplifying the data model and releasing a new query language that is functional in nature that can be used in both InfluxDB and Kapacitor. It’s important to note that we will continue to support the SQL-style InfluxQL. These changes will be additive. Read on for a bit of the history and thinking behind these new approaches. Or jump to the pull request to review some initial thoughts and ideas on what the new query language looks like.

funcA(
    argOne: "some thing",
    argTwo: ['foo', 'bar'])
.funcB(
    arg: /some regex/)
.funcC(
    arg: `here "is a" string`)

The most controversial part of this proposal is that the new query language doesn’t look like SQL. It looks more like a Javascript-style function chaining syntax like you’d see in JQuery or D3. After working with users and time series data for the last 4 years, I’m firmly convinced that a functional style makes more sense for the problem space of time series. In fact, in the fall of 2014 when I was thinking about changing the data model to measurements, tags, and fields, I was also considering changing the query language to something functional. The feedback I received from the community was basically split in half. Half of our users loved the idea of a functional language and the other half insisted that the SQL style was what made the project great.

However, I think that many users don’t end up writing queries. Or if they do, they’re doing it from the CLI for some basic tasks. Most users are working in Grafana or Chronograf to build queries for their dashboards. I think with the right language and user interface, the functional style query interface will give us more power and flexibility while still keeping InfluxDB easy to use. Ease of use is always our highest priority (even though we don’t always achieve it). I take inspiration from thinks like R’s Tidyverse and Python’s Pandas. They have clear concepts and are very powerful for working with data. Over time I’d like to have many analytical functions implemented in this new query language.

Our goal is to roll this out in the InfluxDB 1.x and Kapacitor 1.x line as new endpoints. Users will be able to use both the old and new query languages at the same time. The initial prototype implementations that we roll into releases will not be locked down in terms of their API or the query language. We’d like to work with people in the community to ensure that things make sense for building their applications and user interfaces with this new query language. Once we’ve had a little time to iterate on it and confirm that it’s a real win for the community, we’ll lock down the API and roll it out as an officially supported release.

We’ll also be implementing a CLI with autocomplete functionality and a UI that helps users build queries. Ideally, new users won’t have to write queries at all. They’ll simply point their web browser to Chronograf and will be able to explore and visualize their data by clicking. For users that want to write queries, we’ll try to make things easier with autocompletion, help, and detailed documentation and examples.

We’re very excited about this new direction for the platform. Our focus is not just on the database, but on the platform as a whole for storying, querying, monitoring, processing, and building applications around time series data. We’d love your feedback on the InfluxQL 2.0 pull request.

What's next?

I’ll add my 2 cents. You are correct that most of my queries are for creating dashboards in Grafana but I do not use the picker. I generally craft the query in Chronograf and take portions of the query into Grafana or I use known good queries already in Grafana by capturing the dashboard json. However, Kapacitor uses a more functional language that I sometimes have difficulty replicating in a Grafana dashboard using the SQL style queries. The idea there is that I want a dashboard that shows me when Kapacitor is likely to throw a warning or critical alert.

One concern would be insuring that Grafana can still operate with InfluxDB without having to go into each dashboard and change the queries. Defining these dashboards is a stepwise refinement over time and would not be something I would want to completely redo. I have stayed on top of each Chronograf release and typically install each release as soon as it comes out but I still don’t see a product there that I would use everyday. The latest release brought back the host lists but really slowed down my dashboards to the point that they are unusable. I still like running both Grafana and Chronograf side by side for comparison purposes.

I would say that the query language should be the same for Kapacitor and InfluxDB.

2 Likes