Config Change for InfluxDB 1.3.0 - Comment/Discuss

#1

We are considering changing one of the default configuration options starting with InfluxDB 1.3.0. Essentially, the current behavior is that we enable the storage of database statistics into _internal. However, we see that many customers deploy into production with this still enabled. Our recommendation is to turn off the internal storage of these statistics for your production system.

Therefore, the proposed change to the config option would be to ship with the default setting for store-enable = false. For pre-production environments, you would be required to turn this on.

Example:
Current

[monitor]
  # Whether to record statistics internally.
  # store-enabled = true

Proposed

[monitor]
  # Whether to record statistics internally.
  # store-enabled = false

Looking for feedback on whether folks think this causes issues/concerns with our community. Please share your thoughts.

2 Likes
#2

Could you clarify reasoning for this recomendation? Does store-enabled = true result in noticable performance hit on producton system? What are alternatives to database health/performance monitoring in production then?

Disk space usage by _internal statistics is small and monitor RP be adjusted as needed, right?

#3

The alternative to having the database capture and store information about itself is described here:
https://www.influxdata.com/influxdb-debugvars-endpoint/

The recommendation is to set up an open source instance of InfluxDB along with Telegraf to monitor InfluxDB Enterprise edition and capture system stats along with /debug/vars (as outlined in the blog post above) should be sufficient and eliminates any overhead (CPU, storage, memory, etc.) on the production instance.

There are situations where your Enterprise Edition may reach limits in terms of system resources – if you are also attempting to access that same instance for monitoring purposes to triage what is causing the issue (and when…), you simply will be unable to do so. This is not advisable for a production setup.

#4

It’s also best practice to monitor your production infrastructure with systems outside. It’s kind of like if you were using MySQL and you used a monitoring system that used your MySQL instance to monitor itself. The time you most need your monitoring is when your production instance isn’t working properly. People get caught up on the idea that InfluxDB is the monitoring stack so it shouldn’t need a separate monitor. Not true, you always need to think about who’s watching the watchers.

#5

We’re actually doing both at the moment, collecting _internal and with the inputs.influxdb telegraf plugin. To be honest I haven’t done an analysis, but if the data is indeed the same I think it’s fine to drop the _internal for prod environments.

The only possible angle I could see is if someone isn’t using telegraf at all (shocker, I know!). They would need to enable the feature to get any sense of local measurements, but then they might have a debug/var scraper of their own anyway.

Appreciate you all bringing this up again, I remember we had the impact discussion when we dropped the _internal interval to 1s way back when and it’s just seemed to fall on the backburner.

#6

Yep, the data is the same. If _internal is on it just writes those periodically to that DB.

#7

I think there is a risk that disabling this by default will lead to people running the OSS not having enough stats to debug common issues.

Maybe we state in the _internal documentation that it should be disabled in production?

#8

I’m only contemplating that change for the Enterprise Edition.

1 Like
#9

I wasn’t aware of the recommendation to turn it off for production, and since we will be going to Enterprise in Q3, this is worth knowing about. (5 nodes, three in prod, 2 in dev).

Currently we have OSS deployed in two environments (prod & dev), but dev is a misnomer, since we require it to be as performant as the prod environment.

Standing up a small OSS influx instance to provide external monitoring is an easy thing to do, given the scale of our infrastructure.

IS @sebito91 correct? Are the inputs.influxdb and _internal stats the same?

Perhaps a set of best practices or other advice based on existing data from OSS _internal stats would be helpful.

1 Like
#10

@bayendor Theres actually a bit more data in the telegraf plugin. The full schema for each is reproduced below.

[[inputs.influxdb]]:

> show measurements
influxdb
influxdb_cq
influxdb_database
influxdb_httpd
influxdb_memstats
influxdb_queryExecutor
influxdb_runtime
influxdb_shard
influxdb_subscriber
influxdb_tsm1_cache
influxdb_tsm1_engine
influxdb_tsm1_filestore
influxdb_tsm1_wal
influxdb_write

> show field keys
name: influxdb
fieldKey	fieldType
--------	---------
n_shards	integer

name: influxdb_cq
fieldKey	fieldType
--------	---------
queryFail	float
queryOk		float

name: influxdb_database
fieldKey	fieldType
--------	---------
numMeasurements	float
numSeries	float

name: influxdb_httpd
fieldKey		fieldType
--------		---------
authFail		float
clientError		float
pingReq			float
pointsWrittenDropped	float
pointsWrittenFail	float
pointsWrittenOK		float
queryReq		float
queryReqDurationNs	float
queryRespBytes		float
req			float
reqActive		float
reqDurationNs		float
serverError		float
statusReq		float
writeReq		float
writeReqActive		float
writeReqBytes		float
writeReqDurationNs	float

name: influxdb_memstats
fieldKey	fieldType
--------	---------
alloc		integer
buck_hash_sys	integer
frees		integer
gc_sys		integer
gcc_pu_fraction	float
heap_alloc	integer
heap_idle	integer
heap_inuse	integer
heap_objects	integer
heap_released	integer
heap_sys	integer
last_gc		integer
lookups		integer
mallocs		integer
mcache_inuse	integer
mcache_sys	integer
mspan_inuse	integer
mspan_sys	integer
next_gc		integer
num_gc		integer
other_sys	integer
pause_total_ns	integer
stack_inuse	integer
stack_sys	integer
sys		integer
total_alloc	integer
pause_ns	integer

name: influxdb_queryExecutor
fieldKey	fieldType
--------	---------
queriesActive	float
queriesExecuted	float
queriesFinished	float
queryDurationNs	float

name: influxdb_runtime
fieldKey	fieldType
--------	---------
Alloc		float
Frees		float
HeapAlloc	float
HeapIdle	float
HeapInUse	float
HeapObjects	float
HeapReleased	float
HeapSys		float
Lookups		float
Mallocs		float
NumGC		float
NumGoroutine	float
PauseTotalNs	float
Sys		float
TotalAlloc	float

name: influxdb_shard
fieldKey		fieldType
--------		---------
diskBytes		float
fieldsCreate		float
seriesCreate		float
writeBytes		float
writePointsDropped	float
writePointsErr		float
writePointsOk		float
writeReq		float
writeReqErr		float
writeReqOk		float

name: influxdb_subscriber
fieldKey	fieldType
--------	---------
createFailures	float
pointsWritten	float
writeFailures	float

name: influxdb_tsm1_cache
fieldKey		fieldType
--------		---------
WALCompactionTimeMs	float
cacheAgeMs		float
cachedBytes		float
diskBytes		float
memBytes		float
snapshotCount		float
writeDropped		float
writeErr		float
writeOk			float

name: influxdb_tsm1_engine
fieldKey			fieldType
--------			---------
cacheCompactionDuration		float
cacheCompactionErr		float
cacheCompactions		float
cacheCompactionsActive		float
tsmFullCompactionDuration	float
tsmFullCompactionErr		float
tsmFullCompactions		float
tsmFullCompactionsActive	float
tsmLevel1CompactionDuration	float
tsmLevel1CompactionErr		float
tsmLevel1Compactions		float
tsmLevel1CompactionsActive	float
tsmLevel2CompactionDuration	float
tsmLevel2CompactionErr		float
tsmLevel2Compactions		float
tsmLevel2CompactionsActive	float
tsmLevel3CompactionDuration	float
tsmLevel3CompactionErr		float
tsmLevel3Compactions		float
tsmLevel3CompactionsActive	float
tsmOptimizeCompactionDuration	float
tsmOptimizeCompactionErr	float
tsmOptimizeCompactions		float
tsmOptimizeCompactionsActive	float

name: influxdb_tsm1_filestore
fieldKey	fieldType
--------	---------
diskBytes	float
numFiles	float

name: influxdb_tsm1_wal
fieldKey		fieldType
--------		---------
currentSegmentDiskBytes	float
oldSegmentsDiskBytes	float
writeErr		float
writeOk			float

name: influxdb_write
fieldKey	fieldType
--------	---------
pointReq	float
pointReqLocal	float
req		float
subWriteDrop	float
subWriteOk	float
writeDrop	float
writeError	float
writeOk		float
writeTimeout	float

> show tag keys
name: influxdb
tagKey
------
host

name: influxdb_cq
tagKey
------
host
url

name: influxdb_database
tagKey
------
database
host
url

name: influxdb_httpd
tagKey
------
bind
host
url

name: influxdb_memstats
tagKey
------
host
url

name: influxdb_queryExecutor
tagKey
------
host
url

name: influxdb_runtime
tagKey
------
host
url

name: influxdb_shard
tagKey
------
database
engine
host
id
path
retentionPolicy
url
walPath

name: influxdb_subscriber
tagKey
------
database
destination
host
mode
name
retention_policy
url

name: influxdb_tsm1_cache
tagKey
------
database
engine
host
id
path
retentionPolicy
url
walPath

name: influxdb_tsm1_engine
tagKey
------
database
engine
host
id
path
retentionPolicy
url
walPath

name: influxdb_tsm1_filestore
tagKey
------
database
engine
host
id
path
retentionPolicy
url
walPath

name: influxdb_tsm1_wal
tagKey
------
database
engine
host
id
path
retentionPolicy
url
walPath

Best practices / examples for monitoring influxdb server
#11

@bayendor _internal

> show measurements
name: measurements
name
----
cq
database
httpd
queryExecutor
runtime
shard
subscriber
tsm1_cache
tsm1_engine
tsm1_filestore
tsm1_wal
write

> show field keys
name: cq
fieldKey	fieldType
--------	---------
queryFail	integer
queryOk		integer

name: database
fieldKey	fieldType
--------	---------
numMeasurements	integer
numSeries	integer

name: httpd
fieldKey		fieldType
--------		---------
authFail		integer
clientError		integer
pingReq			integer
pointsWrittenDropped	integer
pointsWrittenFail	integer
pointsWrittenOK		integer
queryReq		integer
queryReqDurationNs	integer
queryRespBytes		integer
req			integer
reqActive		integer
reqDurationNs		integer
serverError		integer
statusReq		integer
writeReq		integer
writeReqActive		integer
writeReqBytes		integer
writeReqDurationNs	integer

name: queryExecutor
fieldKey	fieldType
--------	---------
queriesActive	integer
queriesExecuted	integer
queriesFinished	integer
queryDurationNs	integer

name: runtime
fieldKey	fieldType
--------	---------
Alloc		integer
Frees		integer
HeapAlloc	integer
HeapIdle	integer
HeapInUse	integer
HeapObjects	integer
HeapReleased	integer
HeapSys		integer
Lookups		integer
Mallocs		integer
NumGC		integer
NumGoroutine	integer
PauseTotalNs	integer
Sys		integer
TotalAlloc	integer

name: shard
fieldKey		fieldType
--------		---------
diskBytes		integer
fieldsCreate		integer
seriesCreate		integer
writeBytes		integer
writePointsDropped	integer
writePointsErr		integer
writePointsOk		integer
writeReq		integer
writeReqErr		integer
writeReqOk		integer

name: subscriber
fieldKey	fieldType
--------	---------
createFailures	integer
pointsWritten	integer
writeFailures	integer

name: tsm1_cache
fieldKey		fieldType
--------		---------
WALCompactionTimeMs	integer
cacheAgeMs		integer
cachedBytes		integer
diskBytes		integer
memBytes		integer
snapshotCount		integer
writeDropped		integer
writeErr		integer
writeOk			integer

name: tsm1_engine
fieldKey			fieldType
--------			---------
cacheCompactionDuration		integer
cacheCompactionErr		integer
cacheCompactions		integer
cacheCompactionsActive		integer
tsmFullCompactionDuration	integer
tsmFullCompactionErr		integer
tsmFullCompactions		integer
tsmFullCompactionsActive	integer
tsmLevel1CompactionDuration	integer
tsmLevel1CompactionErr		integer
tsmLevel1Compactions		integer
tsmLevel1CompactionsActive	integer
tsmLevel2CompactionDuration	integer
tsmLevel2CompactionErr		integer
tsmLevel2Compactions		integer
tsmLevel2CompactionsActive	integer
tsmLevel3CompactionDuration	integer
tsmLevel3CompactionErr		integer
tsmLevel3Compactions		integer
tsmLevel3CompactionsActive	integer
tsmOptimizeCompactionDuration	integer
tsmOptimizeCompactionErr	integer
tsmOptimizeCompactions		integer
tsmOptimizeCompactionsActive	integer

name: tsm1_filestore
fieldKey	fieldType
--------	---------
diskBytes	integer
numFiles	integer

name: tsm1_wal
fieldKey		fieldType
--------		---------
currentSegmentDiskBytes	integer
oldSegmentsDiskBytes	integer
writeErr		integer
writeOk			integer

name: write
fieldKey	fieldType
--------	---------
pointReq	integer
pointReqLocal	integer
req		integer
subWriteDrop	integer
subWriteOk	integer
writeDrop	integer
writeError	integer
writeOk		integer
writeTimeout	integer

> show tag keys
name: cq
tagKey
------
hostname

name: database
tagKey
------
database
hostname

name: httpd
tagKey
------
bind
hostname

name: queryExecutor
tagKey
------
hostname

name: runtime
tagKey
------
hostname

name: shard
tagKey
------
database
engine
hostname
id
path
retentionPolicy
walPath

name: subscriber
tagKey
------
database
destination
hostname
mode
name
retention_policy

name: tsm1_cache
tagKey
------
database
engine
hostname
id
path
retentionPolicy
walPath

name: tsm1_engine
tagKey
------
database
engine
hostname
id
path
retentionPolicy
walPath

name: tsm1_filestore
tagKey
------
database
engine
hostname
id
path
retentionPolicy
walPath

name: tsm1_wal
tagKey
------
database
engine
hostname
id
path
retentionPolicy
walPath

name: write
tagKey
------
hostname
1 Like
#12

@jackzampolin Thanks, much appreciated.

1 Like