Hello,
I am going to show the max selector time attribute and follow Select a tag key when the tag and field conflict · Issue #6519 · influxdata/influxdb · GitHub to get the syntax and run it in influxDB v1.2.0. However it throws exception:
> select max(water_level), max(water_level)::time from h2o_feet where time > 1439856000000000000 and time < 1439859240000000000
ERR: error parsing query: found ::, expected FROM at line 1, char 42
I notice the issue is closed in 1.0.0 version, so InfluxDB v1.2.0 contains such feature? If yes, what InfluxQL should I write to get the time attribute? Thanks!
References:
opened 04:16PM - 07 Mar 16 UTC
area/queries
kind/feature-request
flux/triaged
1.x
### Background
InfluxQL includes, at least, two different types of function cal… ls: aggregates and selectors. `mean()`, `median()`, and `percentile()` are examples of aggregates because they take an interval and create a new point value that represents the interval. `max()`, `min()`, `first()`, and `last()` are examples of selectors that have a relevant time field that may be useful to those querying for those points.
### What was done previously (<= 0.10.x)
If there was no `GROUP BY time(...)` in the select, the time for selectors would equal the column that was chosen. If multiple selectors were used, the time would be zero. If there was a `GROUP BY time(...)` interval, the time was always supposed to be the time associated with the start time of the interval, but I wasn't able to confirm that with 0.10.2.
### What will be done in 0.11
In 0.11, selectors and aggregators have been normalized to return zero for the time no matter what. This creates greater parity with the aggregate version and removes the weird behavior where two selectors would return a zero time. The time column will always be the start time of the interval or zero if there is no interval.
This is better because the time column always plays a consistent role and doesn't implicitly change what it means depending on the query. Unfortunately, it means that getting the time of the first point in an interval is no longer possible with any query.
### Proposal
We need to make a modification to the syntax so a person can request the time of a selector. One proposal is to have syntax like this:
```
SELECT max(value) AS max, max(value).time AS max_time FROM cpu
```
This syntax is clear and describes exactly what the query would do. I think that this syntax may be harder to implement than we like though. The `max(value)` call will make an iterator that includes the time in it, but with this method we either have to make two iterators for the same selector or try to match these two together in the query. Matching the queries can end up becoming even more difficult since the two are not guaranteed to be next to each other because they would be acting as separate fields in the selector. The benefit of this is it would be possible to only select the time rather than being forced to select both the value and the time by doing something like `SELECT max(value).time FROM cpu`.
An alternative syntax could also be:
```
SELECT max(value) AS max WITH time AS max_time FROM cpu`
```
This syntax reuses the `WITH` keyword to declare that the aggregate should include the time as an extra column. A slight negative of this approach is it would no longer be possible to get the number of columns that will be output from looking at the length of the fields slice in the select statement, but that should be a negligible problem. It also isn't possible with this proposal to omit the value itself without some additional syntax.
/cc @pauldix for comment.
opened 08:51PM - 29 Apr 16 UTC
closed 04:30PM - 16 May 16 UTC
area/queries
area/influxql
### Problem
This is a single issue to discuss #4630.
When points are written t… o the same measurement where a tag key and a field key conflict with each other, it is impossible to query the tag key with InfluxQL. The following examples (rendered to different measurements to demonstrate these are different examples) leaves the tag not queryable.
#### Example 1
```
example1,host=server01 value=2
example1 host='server02',value=3
```
#### Example 2
```
example2,host=server01 host='server02',value=2
```
The following queries become ambiguous and cannot be resolved correctly, so they default to selecting the field rather than the tag.
```
SELECT host, value FROM example1
SELECT host, value FROM example2
```
### Background
There is a bit of a debate on exactly what a tag is. Before talking about what a tag is, here is a list of exactly what capabilities a tag has:
- They are indexed.
- They are always strings.
- Points that have the same tag keys/values make up a series.
- They are associated with the series, not the individual point.
Since a tag is associated with the series and not the individual point, tags do _not_ have a time associated with them. This makes it impossible to query for only a tag in a measurement since a select works by finding the time associated with a field and returning the value at that time. Since tags are not associated with points, they also do not have a time and cannot be iterated over. To query a tag with a select, they are wholly dependent on being queried with a field in the same query.
There is also no way to differentiate between a field and a tag in InfluxQL at the moment. The search path for how to treat a variable reference is:
- Check if the measurement (not series) has a field with the value of the key.
- Check if the series has a tag with the value of the key.
- Pretend the cursor is a tag with a blank value.
Because of the first step, it makes it impossible to reference a tag once any series in a measurement writes uses a key as a field even if there was previous data with that tag.
### Possible Solutions
#### Solution 1
Change the prioritization so it will read a tag over a field (if present) and restrict points from being written with a conflicting field and tag key as described and implemented in #6410.
This solution makes it so example 1 is possible and works correctly when querying so that the first series will return the value of the tag and the second will return the value in the field. It makes example 2 into an error so it becomes impossible to get into this situation to begin with.
Pros:
- Tags become equivalent to indexed fields that are always strings.
- No additional complexity is added to the query language or query engine beyond a small fix to how it creates cursors.
- Makes `SELECT value, host FROM cpu GROUP BY host` use a consistent value (currently it will use the tag for the `GROUP BY` and the field for the actual selection).
Cons:
- Tags only sometimes act as fields. As described earlier, you cannot query a tag by itself. If you do `SELECT host FROM example1` you will get a weird result where it will only return the second point.
- Old data written where the field and the tag were written as part of the same series will still be unreadable, but you won't be able to write any new data.
- Breaking change.
#### Solution 2
Introduce new syntax to specify whether you meant to grab a tag or a field to InfluxQL as described and implemented in #6509 or as proposed in #4823.
This solution makes it so both examples are queryable and doesn't require any change to existing data or the point writer, but does require users to use new syntax to reference the correct value.
Pros:
- Backwards compatible.
- Keeps tags and fields as completely separate entities rather than tags being a special type of field.
- May allow us to detect when someone tries to select only a tag as we will be able to notify the user that they must select at least one field. This benefit is only if this new syntax is made required to access tags (not backwards compatible anymore) or if somebody actually uses it.
Cons:
- The implementation is more complex and InfluxQL will have to be modified.
- To maintain backwards compatibility, you still have to allow tags to be referenced with the same variable reference as fields. If you write `host` as a tag and then write it once as a field to one series in the measurement, all of your previous queries will break. It is much more optimal to require that the new syntax is always used for tags, but then we're not backwards compatible anymore.
##### Proposed Syntax
###### Solution 1
Prepend the variable reference with `tag` or `field`.
```
SELECT tag.host, field.value FROM cpu
```
The syntax for this looks pretty ugly in my personal opinion and I dislike that it adds more reserved keywords to a language that already has a lot of them. It becomes impossible to have a key with these names which makes this a backwards incompatible change. We then also might have to add code that strips the front of the variable reference from the ident as the query engine passes around auxiliary fields as a raw string. That means the identifier cannot just be stripped at parse time since we need to send it to the underlying engine which doesn't accept the AST structs as arguments. We can modify the iterator options struct to include references for the auxiliary fields, but this changes the wire protocol for RPC and might make things more complicated with protobuf.
###### Solution 2
```
SELECT @host, value FROM cpu
```
Prepend the input with a special sigil to signify that a tag is being referenced. The syntax for this is common in programming languages like Perl and Ruby. Some programmers may be offput by the syntax and so it might not be desirable (Perl and Ruby's use of sigils is a holy war among programmers). The positive benefit is that it is one character (very easy to check and strip efficiently) and so it is easier to pass around as a string without involving complex AST structures. This is also a backwards compatible change since the `@` symbol is not used like this anywhere else.
What sigil is used specifically is debatable. I would like to keep `$` reserved for future InfluxQL support for a Template node in the AST since I think this will make some things, like Chronograf or Influx Stress, easier to implement using the influxql package. I also avoided `#` since that's a comment character even though I was very tempted to use it because tag and hashtag both reference tags (even if its real name is the pound sign).
###### Solution 3
```
SELECT host::tag, value::string FROM cpu
```
This syntax alleviates one of the problems from solution 1 because it is less likely to conflict with a real measurement name, but it still may be difficult to strip off the end in an efficient manner when passing it through auxiliary fields. Likely, the best solution would be to change the function signature of IteratorOptions and just accept the wire protocol will change. The syntax is supposed to correspond to the casting syntax from PostreSQL, but we're not technically doing a cast, but a selection. While true, we can extend this to also allow specifying the field type of a field. If a field type is given, that will be considered a cast for a field and will override the default field type chosen. Tag would be treated as a separate type.