Influx Data backup and restore

#1

I have launched InfluxDB in docker container and then i tried to backup and restore the data. When i tried to backup individual databases i can see meta.00 and telegraf.autogen file in my tmp/backup directory and then i have cleared data in InfluxDB UI stopped the daemon and tried to restore and it shows unpacking backup files but when i go into the influxDB and see i am not able to find any data.

Steps implemented after logging into instance:

  1. influxd backup -database
    2.clear the data and docker stop InfluxDB
    3.influxd restore -metadir /var/lib/influxdb/meta /tmp/backup
  2. influxd restore -database telegraf -datadir /var/lib/influxdb/data /tmp/backup
  3. docker start InfluxDB

After this i cannot find any data restored.can you suggest where am i going wrong. Is there any separate process for backing up and restoring InfluxDB data which launched in docker container.

#2

@rajesh Are you launching the docker container? Are you mounting any host volumes?

#3

Yes,i have pulled an docker image from private registry and also mounting volumes through CFT.

#4

@rajesh Can you share your docker run command as well as the Dockerfile?

#5

@jackzampolin The image which i am using is pushed by other teams in dockyard where i can only inspect the docker file and i did not find any backup and restore written in the file and then what i have tried is logging into container and tried to do “influxd backup -database telegraf -datadir /tmp/telegraf/” where i am able to backup but it is mentioned as we should stop the daemon when restoring. How can i stop the daemon and restore it when i am inside the container?? When i am trying to restore from VM it is showing as Influxd Command not found because i haven’t installed any Influxd packages on VM. Even i have tried to use restore command from inside the container without stopping daemon then it shows unpacking backup files and restore complete but in GUI i cannot find files.Can you please clarify me is the only way i can do backup and restore when Influxd launched in docker container is by editing docker file and entering backup and restore scripts in it??

#6

@rajesh I wont be able to help you debug this unless you share the docker run command and the Dockerfile as I requested in the previous post.

Containers are definitely a bit different than other environments. My guess would be that the process for container backup and restore is going to be a little different from that of a VM. You should be able to backup the database from outside the running container:

$ influxd backup -h
Downloads a snapshot of a data node and saves it to disk.

Usage: influxd backup [flags] PATH

    -host <host:port>
            The host to connect to snapshot. Defaults to 127.0.0.1:8088.
    -database <name>
            The database to backup.
    -retention <name>
            Optional. The retention policy to backup.
    -shard <id>
            Optional. The shard id to backup. If specified, retention is required.
    -since <2015-12-24T08:12:23>
            Optional. Do an incremental backup since the passed in RFC3339
            formatted time.

As far as restore you would need to bring down the container running influxd, run the influxd restore command on the host pointing to the directory you have mounted store the database from the container:

$ influxd restore -h
Uses backups from the PATH to restore the metastore, databases,
retention policies, or specific shards. The InfluxDB process must not be
running during a restore.

Usage: influxd restore [flags] PATH

    -metadir <path>
            Optional. If set the metastore will be recovered to the given path.
    -datadir <path>
            Optional. If set the restore process will recover the specified
            database, retention policy or shard to the given directory.
    -database <name>
            Optional. Required if no metadir given. Will restore the database
            TSM files.
    -retention <name>
            Optional. If given, database is required. Will restore the retention policy's
            TSM files.
    -shard <id>
            Optional. If given, database and retention are required. Will restore the shard's
            TSM files.

Then you would restart your container on top of the newly restored data. Does this help?

#7

When i am using Following command i am facing error $influxd backup -database telegraf /tmp/backup
Error : backup failed: dial tcp 127.0.0.1:8088: getsockopt: connection refused
I know i am somewhere miss configuring ports can you check my docker run command and suggest how can i modify my port configuration to support backup and restore.

Docker Commands which i am using now:

docker pull dockyard.cloud.capitalone.com/cassinimonitoringinflux/version1.2.2

docker run -d --restart=always --name InfluxDB -p 8081:8086 -p 127.0.0.1:8083:8083 --link=Kapacitor -v /mnt/influxdb/etc:/etc/influxdb -v /mnt/influxdb/log:/var/log/influxdb -v /mnt/influxdb/lib:/var/lib/influxdb dockyard.cloud.capitalone.com/cassinimonitoringinflux/version1.2.2

Telegraf :[[outputs.influxdb]]
urls = [“http://localhost:8086”]

InfluxDB Conf:

### Welcome to the InfluxDB configuration file.

# The values in this file override the default values used by the system if
# a config option is not specified. The commented out lines are the configuration
# field and the default value used. Uncommentting a line and changing the value
# will change the value used at runtime when the process is restarted.

# Once every 24 hours InfluxDB will report usage data to usage.influxdata.com
# The data includes a random ID, os, arch, version, the number of series and other
# usage data. No data from user databases is ever transmitted.
# Change this option to true to disable reporting.
# reporting-disabled = false

# we'll try to get the hostname automatically, but if it the os returns something
# that isn't resolvable by other servers in the cluster, use this option to
# manually set the hostname
# hostname = "localhost"

###
### [meta]
###
### Controls the parameters for the Raft consensus group that stores metadata
### about the InfluxDB cluster.
###

[meta]
  # Where the metadata/raft database is stored
  dir = "/var/lib/influxdb/meta"

  # Automatically create a default retention policy when creating a database.
  # retention-autocreate = true

  # If log messages are printed for the meta service
  # logging-enabled = true

###
### [data]
###
### Controls where the actual shard data for InfluxDB lives and how it is
### flushed from the WAL. "dir" may need to be changed to a suitable place
### for your system, but the WAL settings are an advanced configuration. The
### defaults should work for most systems.
###

[data]
  # The directory where the TSM storage engine stores TSM files.
  dir = "/var/lib/influxdb/data"

  # The directory where the TSM storage engine stores WAL files.
  wal-dir = "/var/lib/influxdb/wal"

  # The amount of time that a write will wait before fsyncing.  A duration
  # greater than 0 can be used to batch up multiple fsync calls.  This is useful for slower
  # disks or when WAL write contention is seen.  A value of 0s fsyncs every write to the WAL.
  # Values in the range of 0-100ms are recommended for non-SSD disks.
  # wal-fsync-delay = "0s"


  # The type of shard index to use for new shards.  The default is an in-memory index that is
  # recreated at startup.  A value of "tsi1" will use a disk based index that supports higher
  # cardinality datasets.
  # index-version = "inmem"

  # Trace logging provides more verbose output around the tsm engine. Turning
  # this on can provide more useful output for debugging tsm engine issues.
  # trace-logging-enabled = false

  # Whether queries should be logged before execution. Very useful for troubleshooting, but will
  # log any sensitive data contained within a query.
  # query-log-enabled = true

  # Settings for the TSM engine

  # CacheMaxMemorySize is the maximum size a shard's cache can
  # reach before it starts rejecting writes.
  # cache-max-memory-size = 1048576000

  # CacheSnapshotMemorySize is the size at which the engine will
  # snapshot the cache and write it to a TSM file, freeing up memory
  # cache-snapshot-memory-size = 26214400

  # CacheSnapshotWriteColdDuration is the length of time at
  # which the engine will snapshot the cache and write it to
  # a new TSM file if the shard hasn't received writes or deletes
  # cache-snapshot-write-cold-duration = "10m"

  # CompactFullWriteColdDuration is the duration at which the engine
  # will compact all TSM files in a shard if it hasn't received a
  # write or delete
  # compact-full-write-cold-duration = "4h"

  # The maximum series allowed per database before writes are dropped.  This limit can prevent
  # high cardinality issues at the database level.  This limit can be disabled by setting it to
  # 0.
  # max-series-per-database = 1000000

  # The maximum number of tag values per tag that are allowed before writes are dropped.  This limit
  # can prevent high cardinality tag values from being written to a measurement.  This limit can be
  # disabled by setting it to 0.
  # max-values-per-tag = 100000

###
### [coordinator]
###
### Controls the clustering service configuration.
###

[coordinator]
  # The default time a write request will wait until a "timeout" error is returned to the caller.
  # write-timeout = "10s"

  # The maximum number of concurrent queries allowed to be executing at one time.  If a query is
  # executed and exceeds this limit, an error is returned to the caller.  This limit can be disabled
  # by setting it to 0.
  # max-concurrent-queries = 0

  # The maximum time a query will is allowed to execute before being killed by the system.  This limit
  # can help prevent run away queries.  Setting the value to 0 disables the limit.
  # query-timeout = "0s"

  # The the time threshold when a query will be logged as a slow query.  This limit can be set to help
  # discover slow or resource intensive queries.  Setting the value to 0 disables the slow query logging.
  # log-queries-after = "0s"

  # The maximum number of points a SELECT can process.  A value of 0 will make the maximum
  # point count unlimited.
  # max-select-point = 0

  # The maximum number of series a SELECT can run.  A value of 0 will make the maximum series
  # count unlimited.

  # The maximum number of series a SELECT can run. A value of zero will make the maximum series
  # count unlimited.
  # max-select-series = 0

  # The maxium number of group by time bucket a SELECt can create.  A value of zero will max the maximum
  # number of buckets unlimited.
  # max-select-buckets = 0

###
### [retention]
###
### Controls the enforcement of retention policies for evicting old data.
###

[retention]
  # Determines whether retention policy enforcment enabled.
  # enabled = true

  # The interval of time when retention policy enforcement checks run.
  # check-interval = "30m"

###
### [shard-precreation]
###
### Controls the precreation of shards, so they are available before data arrives.
### Only shards that, after creation, will have both a start- and end-time in the
### future, will ever be created. Shards are never precreated that would be wholly
### or partially in the past.

[shard-precreation]
  # Determines whether shard pre-creation service is enabled.
  # enabled = true

  # The interval of time when the check to pre-create new shards runs.
  # check-interval = "10m"

  # The default period ahead of the endtime of a shard group that its successor
  # group is created.
  # advance-period = "30m"

###
### Controls the system self-monitoring, statistics and diagnostics.
###
### The internal database for monitoring data is created automatically if
### if it does not already exist. The target retention within this database
### is called 'monitor' and is also created with a retention period of 7 days
### and a replication factor of 1, if it does not exist. In all cases the
### this retention policy is configured as the default for the database.

[monitor]
  # Whether to record statistics internally.
  # store-enabled = true

  # The destination database for recorded statistics
  # store-database = "_internal"

  # The interval at which to record statistics
  # store-interval = "10s"

###
### [admin]
###
### Controls the availability of the built-in, web-based admin interface. If HTTPS is
### enabled for the admin interface, HTTPS must also be enabled on the [http] service.
###
### NOTE: This interface is deprecated as of 1.1.0 and will be removed in a future release.

[admin]
  # Determines whether the admin service is enabled.
    enabled = true

  # The default bind address used by the admin service.
  # bind-address = ":8083"

  # Whether the admin service should use HTTPS.
  # https-enabled = false

  # The SSL certificate used when HTTPS is enabled.
  # https-certificate = "/etc/ssl/influxdb.pem"

###
### [http]
###
### Controls how the HTTP endpoints are configured. These are the primary
### mechanism for getting data into and out of InfluxDB.
###

[http]
  # Determines whether HTTP endpoint is enabled.
  # enabled = true

  # The bind address used by the HTTP service.
     bind-address = ":8088"

  # Determines whether HTTP authentication is enabled.
  # auth-enabled = false

  # The default realm sent back when issuing a basic auth challenge.
  # realm = "InfluxDB"

  # Determines whether HTTP request logging is enable.d
  # log-enabled = true

  # Determines whether detailed write logging is enabled.
  # write-tracing = false

  # Determines whether the pprof endpoint is enabled.  This endpoint is used for
  # troubleshooting and monitoring.
  # pprof-enabled = true

  # Determines whether HTTPS is enabled.
  # https-enabled = false

  # The SSL certificate to use when HTTPS is enabled.
  # https-certificate = "/etc/ssl/influxdb.pem"

  # Use a separate private key location.
  # https-private-key = ""

  # The JWT auth shared secret to validate requests using JSON web tokens.
  # shared-secret = ""

  # The default chunk size for result sets that should be chunked.
  # max-row-limit = 0

  # The maximum number of HTTP connections that may be open at once.  New connections that
  # would exceed this limit are dropped.  Setting this value to 0 disables the limit.
  # max-connection-limit = 0

  # Enable http service over unix domain socket
  # unix-socket-enabled = false

  # The path of the unix domain socket.
  # bind-socket = "/var/run/influxdb.sock"

###
### [subscriber]
###
### Controls the subscriptions, which can be used to fork a copy of all data
### received by the InfluxDB host.
###

[subscriber]
  # Determines whether the subscriber service is enabled.
  # enabled = true

  # The default timeout for HTTP writes to subscribers.
  # http-timeout = "30s"

  # Allows insecure HTTPS connections to subscribers.  This is useful when testing with self-
  # signed certificates.
  # insecure-skip-verify = false

  # The path to the PEM encoded CA certs file. If the empty string, the default system certs will be used
  # ca-certs = ""

  # The number of writer goroutines processing the write channel.
  # write-concurrency = 40

  # The number of in-flight writes buffered in the write channel.
  # write-buffer-size = 1000


###
### [[graphite]]
###
### Controls one or many listeners for Graphite data.
###

[[graphite]]
  # Determines whether the graphite endpoint is enabled.
  # enabled = false
  # database = "graphite"
  # retention-policy = ""
  # bind-address = ":2003"
  # protocol = "tcp"
  # consistency-level = "one"

  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.

  # Flush if this many points get buffered
  # batch-size = 5000

  # number of batches that may be pending in memory
  # batch-pending = 10

  # Flush at least this often even if we haven't hit buffer limit
  # batch-timeout = "1s"

  # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
  # udp-read-buffer = 0

  ### This string joins multiple matching 'measurement' values providing more control over the final measurement name.
  # separator = "."

  ### Default tags that will be added to all metrics.  These can be overridden at the template level
  ### or by tags extracted from metric
  # tags = ["region=us-east", "zone=1c"]

  ### Each template line requires a template pattern.  It can have an optional
  ### filter before the template and separated by spaces.  It can also have optional extra
  ### tags following the template.  Multiple tags should be separated by commas and no spaces
  ### similar to the line protocol format.  There can be only one default template.
  # templates = [
  #   "*.app env.service.resource.measurement",
  #   # Default template
  #   "server.*",
  # ]

###
### [collectd]
###
### Controls one or many listeners for collectd data.
###

[[collectd]]
  # enabled = false
  # bind-address = ":25826"
  # database = "collectd"
  # retention-policy = ""
  #
  # The collectd service supports either scanning a directory for multiple types
  # db files, or specifying a single db file.
  # typesdb = "/usr/local/share/collectd"
  #
  # security-level = "none"
  # auth-file = "/etc/collectd/auth_file"

  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.

  # Flush if this many points get buffered
  # batch-size = 5000

  # Number of batches that may be pending in memory
  # batch-pending = 10

  # Flush at least this often even if we haven't hit buffer limit
  # batch-timeout = "10s"

  # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
  # read-buffer = 0

###
### [opentsdb]
###
### Controls one or many listeners for OpenTSDB data.
###

[[opentsdb]]
  # enabled = false
  # bind-address = ":4242"
  # database = "opentsdb"
  # retention-policy = ""
  # consistency-level = "one"
  # tls-enabled = false
  # certificate= "/etc/ssl/influxdb.pem"

  # Log an error for every malformed point.
  # log-point-errors = true

  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Only points
  # metrics received over the telnet protocol undergo batching.

  # Flush if this many points get buffered
  # batch-size = 1000

  # Number of batches that may be pending in memory
  # batch-pending = 5

  # Flush at least this often even if we haven't hit buffer limit
  # batch-timeout = "1s"

###
### [[udp]]
###
### Controls the listeners for InfluxDB line protocol data via UDP.
###

[[udp]]
  # enabled = false
  # bind-address = ":8089"
  # database = "udp"
  # retention-policy = ""

  # These next lines control how batching works. You should have this enabled
  # otherwise you could get dropped metrics or poor performance. Batching
  # will buffer points in memory if you have many coming in.

  # Flush if this many points get buffered
  # batch-size = 5000

  # Number of batches that may be pending in memory
  # batch-pending = 10

  # Will flush at least this often even if we haven't hit buffer limit
  # batch-timeout = "1s"

  # UDP Read buffer size, 0 means OS default. UDP listener will fail if set above OS max.
  # read-buffer = 0

###
### [continuous_queries]
###
### Controls how continuous queries are run within InfluxDB.
###

[continuous_queries]
  # Determiens whether the continuous query service is enabled.
  # enabled = true

  # Controls whether queries are logged when executed by the CQ service.
  # log-enabled = true

  # interval for how often continuous queries will be checked if they need to run
  # run-interval = "1s"
#8

@rajesh Does the procedure I outlined in the previous post make sense to you. That would be the correct way to backup and restore under docker.

#9

@rajesh It sounds like the :8088 port on the container is not exposed. Can you please share the Dockerfile? If EXPOSE 8088 is in there you would also need -p 8088:8088 in your docker run command.

Why are you not using our official docker images?

#10

We are not authorized to pull the docker images from public repository. So we have to pull it from our private repository. Any way i can only pull images from there but we cannot see what docker file they used to create image and pushed it.

1 Like
#11

@jackzampolin Can we take the backup explicitly (without exposing 8088 in docker file) during runtime dynamically? Because in order to take the influxdb backup port 8088 needs to be binded in influx.conf. could we get the copy or replicated docker file locally as i couldn’t see in dockyard.

#12

So the issue is that you need to get the backup out of the container. You could exec into the container and then use the influxd backup in the container and then just save the backup file to the path that is mounted to the host file system.

#13

@jackzampolin Yes,i am able to backup inside the container but when i am trying to restore from inside the container it’s showing me restoring and unpacking the files but i cannot see anything restored in GUI and also in InfluxDB CLI. I have tried even by changing ports to -p 8088:8088 both inside the container and outside the container. I am able to backup but not restore.

#14

@rajesh I cannot help you further without the dockerfile.