Hello, and welcome to the SHIELD Operator's Manual, an in-depth look at all things SHIELD. This guide aims to be an exhaustive guide to the installation and operation of the SHIELD Data Protection solution.
If you are looking for a more easy-going start-up guide, you may want to check out the Getting Started guide.
If you are interested in contributing to SHIELD itself, or wish to write a plugin to extend the capabilities of your SHIELD installation, head on over to the SHIELD Developer Documentation.
What is SHIELD?
SHIELD is a data protection solution. It is designed to run scheduled tasks to backup your important data systems to off-site cloud storage solutions, and facilitate the restoration of backup archives in the event of outages or data loss.
SHIELD supports lots of different data systems, through its flexible and modular plugin architecture. We currently support:
- PostgreSQL databases, via the
- MySQL / MariaDB, via the
mysqlplugin, or the
- Redis key-value store, via the
- ... and many more.
Cloud Storage systems are likewise pluggable. Out of the box, SHIELD supports:
- Amazon S3 (and S3 work-alikes) via the
- Microsoft Azure Blobstore via the
- GCP Blobstore via the
- On-premise WebDAV endpoints, via the
- ... and several others.
SHIELD is a distributed system. The SHIELD core leverages a network of agents to do the heavy lifting of data backup and restore. When you deploy SHIELD into your infrastructure, yet get to choose how many agents you want to provision, and where in the network they sit.
Multi-tenancy is baked right into SHIELD via a robust role-based access control (RBAC) system in place to help isolate different subsets of users from one another. People in one tenant are unable to see configurations made by people in another tenant. This allows a single SHIELD to support multiple, independent teams.
SHIELD also supports a sophisticated authentication system. You can hook up to your external Cloud Foundry UAA server, a BOSH UAA instance, or even Github (both public and on-premise). As users log in via their external credentials, SHIELD will automatically create the necessary tenants and assign roles based on the SHIELD configuration.
We believe strongly in encryption. Whenever SHIELD communicates across the network, it does so over encrypted channels (SSH, TLS/HTTPS), with endpoint identity verification (host keys, mutual TLS, etc.). All backup archives are encrypted with unique key material, to ensure that data at-rest is also resistent to snooping and tampering.
Planning Your Installation
Before you start installing the software, it's worthwhile to take a step back and plan out your installation
SHIELD operates best on flat network topologies, without NAT devices or HTTP(S) proxy services.
SHIELD requires mutual network visibility between the core and all cooperating agents. Each agent issues a small HTTP request to the core, to inform the core that it is alive, and ready to be inventoried. This is called the registration ping. For each registration ping received, the core records the name and port given and the remote address of the connecting TCP socket. At some later time, the SHIELD core will initiate an SSH connection to the recorded agent IP address, and gather agent information.
Because of this, NAT devices tend to confound SHIELD. The registration ping originats (at the TCP level) from the NAT gateway, not the host running the SHIELD agent software. When the core attempts to connect back to the agent, it initiates a connection to the NAT device on the agent port, which generally fails.
Flat networks with HTTP(S) proxies are not impossible, but they can be
unruly. When configuring proxy clients (via
no_proxy environment varialbes, or similar mechanisms), you will want to
be especially cognizant of the HTTP(S) connections needed by the SHIELD
software itself. Often, these connections will need to bypass am
Internet-bound proxy (i.e. one in a DMZ) in order to function.
Likewise, if your cloud storage solution is to be dealt with over HTTP(S),
you will need to make sure that either your proxy server can contact it on
behalf of each SHIELD agent, or that each agent blacklists the domains
and/or IP addresses of the storage endpoints in something like
Where to Colocate SHIELD Agents
Depending on the data systems you wish to backup, and their configuration with respect to access control, you may be able to get away without colocating any SHIELD agents in your infrastructure.
There are really only two reasons for colocating a SHIELD agent on a data system installation: plugin requirements and host access control configuration.
Most SHIELD plugins stream their data through, without relying on any temporary local storage. This removes a throughput bottleneck (the disk), as well as a capacity concern (how much temporary space do you need?). Some plugins, however, require local disk. If the target system doesn't require local access (more on that in a moment), you may want to spin up some machines with large ephemeral disks just to handle these backup / restore operations. The conversation for backing up data then looks like this:
Some plugins absolutely cannot be executed across the network. The Filesystem Plugin, for example, can only deal with files on the local filesystem (networked filesystems notwithstanding). Therefore, if you need to back up files on a host, you will need to deploy a SHIELD agent to run on that host.
BOSH is a cloud-agnostic deployment and orchestration tool that excels at lifecycle management of software at all scales. SHIELD has a BOSH release that can be used to deploy both the SHIELD core, and SHIELD agents into new and existing BOSH deployments.
If you're already using BOSH (for example, if you are deploying Cloud Foundry), adding SHIELD into your infrastructure should be easy. If you are still looking for a great release engineering framework, you can get your feet wet with a SHIELD deployment or three.
The SHIELD BOSH release can be found on Github.
Deploying the SHIELD Core
Usually, the SHIELD core is a standalone, self-contained deployment. To deploy, you'll need to find or create a deployment manifest. A good starting point can be found here.
Save that file locally, as
shield.yml, and then run:
$ bosh -e my-bosh deploy \ -d shield \ -v static_ip=:::192.0.2.5::: \ -v domain=:::shield.example.com::: \ shield.yml
domain with the FQDN of your SHIELD management console, and
192.0.2.5 with a static IP that you want to deploy SHIELD on. You may
need to consult your BOSH cloud-config to find a suitable IP in
default network. Optionally, you may modify the deployment manifest
to specify a different network.
NOTE: The provided deployment manifest assumes that your BOSH director
has been deployed with a config-server that can generate the necessary
certificates and keys for securing SHIELD's communications. If that is not
the case, you will need to provide additional command-line options to the
bosh deploy command to store the generated credentials locally. See
the BOSH documentation for more information.
Once BOSH has finished deploying SHIELD, you should be able to access the
SHIELD management console at https://$IP. The default login will be
admin (username) and
Deploying SHIELD Agents
If you need to colocate agents on other BOSH deployments, you have a few
options. The fastest method is to modify those deployment manifests to
shield-agent job in the appropriate BOSH instance groups, like
instance_groups: - name: some-database jobs: # ... other jobs .... :::- name: shield-agent release: shield::: # ... rest of configuration ...
This can get out of hand fast. A more elegant solution is to use BOSH runtime configs and inject the SHIELD agent job into other deployments without mucking about with their deployment manifests.
Here's a working runtime config:
--- releases: - name: shield version: 8.0.8 addons: - name: shield jobs: :::- name: shield-agent release: shield:::
To use this, update your existing runtime-config:
$ bosh update-runtime-config addons.yml
bosh deploy your pre-existing manifests, without changing them.
For more information, including how to limit the
shield addon to just
specific deployments / VMs, read the
BOSH runtime-configs documentation.
The SHIELD Core Image
The SHIELD Agent Standalone Image
Embedding the SHIELD Agent
This section contains detailed descriptions of all configuration options for the SHIELD core, and SHIELD agents.
SHIELD Core Configuration File Reference
The SHIELD core configuration file is a YAML file, read at startup by the
listen_addr - The IP address and TCP port that the SHIELD core daemon should bind to and listen on for incoming API (HTTP) requests. Defaults to
*is interpreted to mean all interfaces.
workers - How many worker threads should the SHIELD core spin. This defaults to
2, but you should increase the 1.5 times the number of concurrent backup tasks you expect to see, at peak.
Low worker counts can cause the SHIELD scheduler to "stall out" and not execute scheduled tasks in a timely fashion. The 1.5x multiplier accounts for purge operations, cloud storage tests, and other background tests.
debug - Whether or not to enable verbose debug logging. This is a boolean, and it defaults to
no, which is a sane choice for any production or staging environment. Debug logging is verbose, and very low-level. It is of primary value to SHIELD developers.
data_directory - The absolute path to the directory where SHIELD will store all of its persistent data. Important files stored here include:
$data_directory/shield.db- The SHIELD metadata database.
$data_directory/vault/*- The encrypted files that back the vault.
$data_directory/vault.crypt- The encrypted file which stores the seal keys to the SHIELD Vault. This file is encrypted with the SHIELD master password.
$data_directory/bootstrap.log- A log of what occurred during a SHIELD from-nothing recovery.
fast_loop - The frequency, in seconds, of the SHIELD scheduler's "fast loop." On every iteration of the fast loop, SHIELD will schedule backup jobs that ought to run, execute pending tasks (if it has workers available), and handle inbound agent registration pings.
By default, the fast loop executes once a second. Unles you have an urgent need otherwise, you shouldn't change this.
slow_loop - The frequency, in seconds, of the SHIELD scheduler's "slow loop." The slow loop handles administrative tasks for the SHIELD core, including archive expiration and purgation, session clearing, data analytics, and cloud storage testing.
By default, the slow loop exucutes once every 300 seconds (5 minutes). Turning up the frequency will result in higher load on external cloud storage systems. Decreasing the frequency will cause expired archives to remain in cloud storage for longer.
web_root - The root path to the SHIELD web management UI assets. Defaults to the relative path
web, which is probably not what you want.
environment - An name for the environment, that SHIELD will pass through to clients accessing its API and web management console. This can be useful for differentiating your staging SHIELD from your production SHIELD. By default, no environment is set.
color - You can color code your SHIELD Web User Interfaces! Set a hex value here (i.e.
##003300) or other CSS-compatible color identifier, and the web UI will use it to colorize the environment name.
motd - A (hopefully) short message to display to operators on the login screen. You can use this for compliance messages, important notices, an explanation of which authentication method people should use, who to contact for help, etc. By default, there is no MOTD.
vault_address - The URL of the SHIELD Vault. This should almost always be
https://127.0.0.1:8200. If you are using the BOSH release, this cannot be configured.
vault_ca_cert - The X.509 Certificate Authority certificate, PEM-encoded, for validating the Vault certificate. If you are using the BOSH release, this cannot be configured (nor does it need to be).
encryption_type - Which encryption algorithm and chaining mode to use for encrypting backup archives. Supported values are:
aes256-ctr- 256-bit AES, in Counter CBC mode.
We plan to introduce more types as the need arises.
Each backup archive tracks which encryption type was in force when it was taken, to allow operators to change this value without rendering previous backup archives unusable.
session_timeout - How long (in hours) before idle authenticated sessions are invalidated. Defaults to 720 (about a month).
failsafe - When the SHIELD core starts up, it checks the local users table. If it is empty (there are no local users), it creates a failsafe account, using these parameters. This is designed to assist in a safe and secure bootstrap.
username - The username of the failsafe account.
password - The (cleartext) password of the failsafe account.
SHIELD Agent Configuration File Reference
The SHIELD agent configuration file is a YAML file, read at startup by the
name (required) - The name of this agent, for registration with the SHIELD core. This name will appear in web and CLI interfaces, to people configuring backup jobs, and should describe the role this agent installation plays in the overall topology.
authorized_keys_file (required) - The path to an SSH authorized keys file, which should contain the public component of the agent private key that the SHIELD core will use to authenticate to the agent for remote orchestration.
listen_address - The IP address and TCP port that the SHIELD agent should bind to and listen on for incoming orchestration (via SSH). Defaults to
*is interpreted to mean all interfaces.
plugin_paths (required) - A YAML list of paths that the agent will use when attempting to resolve plugin names to binaries. This is kind of like the canonical UNIX
$PATHenvironment variable, except it does not apply to any programs that the plugins themselves attempt to execute.
You should list all of your plugin binary directories here.
registration - This subsection governs how this agent will register with its SHIELD core. While technically optional, registration is highly recommended, from an ease-of-use standpoint.
The following keys exist underneath
url - The HTTP(S) URL of the upstream SHIELD core. This will normally be something like
interval - How often (in seconds) should the agent ping the SHIELD core and provide reigstration details. The SHIELD core determines when it validates agent registrations and extracts metadata information, so this setting cannot be used to increase the frequency of such updates.
shield_ca_cert - Path to a file containing the PEM-encoded CA certificate that issued the SHIELD core's X.509 TLS certificate. This allows operators to validate self-signed certificates, or custom, in-house CA-issued certificates.
This has no effect of
skip_verifyis set to true.
- skip_verify - Whether or not to disable verification of the SHIELD core X.509 TLS certificate. This defaults to false, since certificate verification is generally A Good Thing ™
SHIELD features a beautiful web user interface and a robust command-line interface. We like to think of the web UI as providing more visibility into the configuration of SHIELD, while the CLI provides more flexibility in terms of automation.
The Web UI
You can access the web UI by pointing your browser at the IP address of your SHIELD core installation. SHIELD forces all HTTP traffic over TLS, via port 443, for security reasons.
Before you can interact with SHIELD, you must log in.
On the right is the login form for local authentication. On the left is a list of the configured authentication providers. These allow SHIELD administrators to integrate SHIELD authentication with 3rd-party, external identity systems like Github, or Cloud Foundry UAA.
Note: You may not have any authentication providers listed.
The Heads-up Display
All logged in? Great!
At the top of the screen, you should see the heads-up display:
To the left is identifying information about this SHIELD core, including the configured SHIELD environment name, the IP address and/or fully-qualified domain, and the version of SHIELD.
The first pane summarizes the overall health of SHIELD and the current tenant's configuration.
SHIELD is ... - Reports the current status of the SHIELD API. If all is well, this will say SHIELD is up, in a reassuring green hue. If the SHIELD core is not responding to API calls, this will say SHIELD is DOWN, in red. Sometimes, it may report that the SHIELD is locked, in which case an administrator needs to intervene to unlock it.
Cloud Storage is ... - Reports the health of all global cloud storage systems, as well as the health of all cloud storage specific to the currently selected tenant.
Jobs are ... - Reports the status of all backup jobs for the current tenant. It considers only the most recent execution of each job, whether it was scheduled or run manually (ad hoc).
The second pane, titled Data Protection Summary, provides some numbers for your consideration. All of these are per-tenant.
Scheduled Backup Jobs - How many total jobs are scheduled to run.
Backup Archives - How many backup archives exist.
Cloud Storage Used - How much of cloud storage is being used by the backup archives for this tenant's jobs.
Daily Storage Increase - A simple linear projection of the amount of additional cloud storage that will be used, each day, given the current schedules, retention policies, and archive sizes.
The heads-up display is partially dependent on the current tenant, so if you switch to a different tenant, you might get different numbers / statuses.
The Task Sidebar
To the left of the screen is a sidebar with links to the common tasks you may want to perform:
- Run an ad hoc backup
- Restore data from a backup
- Configure a new backup job
The Top Bar
The Navigation Bar
The black navigation bar (immediately under the heads-up display) will stick to the top of the viewport as you scroll. It provides top-level navigation, including:
Systems - Your data systems are the things that SHIELD protects, by making copies of the important data contained within them, on a scheduled and recurring basis. This page lets you review and manage those systems.
Storage - Cloud storage is where SHIELD keeps the backup copies of your data. You can configure however many storage systems you want, in whatever configuration you deem appropriate. This page lets you access both global (shared) cloud storage systems, and those specific to your tenant.
Retention - SHIELD doesn't keep backup archives forever. Every backup job needs a retention policy so that SHIELD knows when it is okay to delete older archives, to conserve space. This page lets you review and manage those policies.
Admin - This one is for SHIELD administrators only. It provides access to a host of administrative functions like tenant and user management, global cloud storage management, etc.
The Systems Page
Adding a second schedule
Running Backups / Restores
Ad hoc vs. Scheduled
The Ad hoc backup Wizard
The Timeline View
Restoring from the Timeline Page
The Restore Wizard
How SHIELD Uses Cloud Storage
How the HUD interacts
The Storage Display Page
What is a Tenant?
Role Assignments (and what they mean)
Authentication Providers and Tenants
The default tenant
What is Encryption
Why do I Care?
How does SHIELD use encryption?
At-rest vs. in-flight encryption
Initializing A SHIELD Core
The Master Password
The Administrative Backend
Retention Policy Templates
Local User Management
How Do I Backup X?
The Problem With Encryption
The Important Bits of SHIELD
Cloud Foundry UAA
Cloud Foundry CCDB
Using the HUD
API Access for Monitoring
Metrics of interest
Log messages to watch