SHIELD Operator's Manual

Hello, and welcome to the SHIELD Operator's Manual, an in-depth look at all things SHIELD. This guide aims to be an exhaustive guide to the installation and operation of the SHIELD Data Protection solution.

If you are looking for a more easy-going start-up guide, you may want to check out the Getting Started guide.

If you are interested in contributing to SHIELD itself, or wish to write a plugin to extend the capabilities of your SHIELD installation, head on over to the SHIELD Developer Documentation.

What is SHIELD?

SHIELD is a data protection solution. It is designed to run scheduled tasks to backup your important data systems to off-site cloud storage solutions, and facilitate the restoration of backup archives in the event of outages or data loss.

SHIELD supports lots of different data systems, through its flexible and modular plugin architecture. We currently support:

Cloud Storage systems are likewise pluggable. Out of the box, SHIELD supports:

SHIELD is a distributed system. The SHIELD core leverages a network of agents to do the heavy lifting of data backup and restore. When you deploy SHIELD into your infrastructure, yet get to choose how many agents you want to provision, and where in the network they sit.

Multi-tenancy is baked right into SHIELD via a robust role-based access control (RBAC) system in place to help isolate different subsets of users from one another. People in one tenant are unable to see configurations made by people in another tenant. This allows a single SHIELD to support multiple, independent teams.

SHIELD also supports a sophisticated authentication system. You can hook up to your external Cloud Foundry UAA server, a BOSH UAA instance, or even Github (both public and on-premise). As users log in via their external credentials, SHIELD will automatically create the necessary tenants and assign roles based on the SHIELD configuration.

We believe strongly in encryption. Whenever SHIELD communicates across the network, it does so over encrypted channels (SSH, TLS/HTTPS), with endpoint identity verification (host keys, mutual TLS, etc.). All backup archives are encrypted with unique key material, to ensure that data at-rest is also resistent to snooping and tampering.

Installation

Before you can begin to use SHIELD to protect your important data, you're going to need to install it. You have two options: via BOSH (ideal for Cloud Foundry users), or via Docker.

Planning Your Installation

Before you start installing the software, it's worthwhile to take a step back and plan out your installation

Network Topology

SHIELD operates best on flat network topologies, without NAT devices or HTTP(S) proxy services.

SHIELD requires mutual network visibility between the core and all cooperating agents. Each agent issues a small HTTP request to the core, to inform the core that it is alive, and ready to be inventoried. This is called the registration ping. For each registration ping received, the core records the name and port given and the remote address of the connecting TCP socket. At some later time, the SHIELD core will initiate an SSH connection to the recorded agent IP address, and gather agent information.

Because of this, NAT devices tend to confound SHIELD. The registration ping originats (at the TCP level) from the NAT gateway, not the host running the SHIELD agent software. When the core attempts to connect back to the agent, it initiates a connection to the NAT device on the agent port, which generally fails.

Flat networks with HTTP(S) proxies are not impossible, but they can be unruly. When configuring proxy clients (via http_proxy / https_proxy, no_proxy environment varialbes, or similar mechanisms), you will want to be especially cognizant of the HTTP(S) connections needed by the SHIELD software itself. Often, these connections will need to bypass am Internet-bound proxy (i.e. one in a DMZ) in order to function.

Likewise, if your cloud storage solution is to be dealt with over HTTP(S), you will need to make sure that either your proxy server can contact it on behalf of each SHIELD agent, or that each agent blacklists the domains and/or IP addresses of the storage endpoints in something like no_proxy.

Where to Colocate SHIELD Agents

Depending on the data systems you wish to backup, and their configuration with respect to access control, you may be able to get away without colocating any SHIELD agents in your infrastructure.

There are really only two reasons for colocating a SHIELD agent on a data system installation: plugin requirements and host access control configuration.

Most SHIELD plugins stream their data through, without relying on any temporary local storage. This removes a throughput bottleneck (the disk), as well as a capacity concern (how much temporary space do you need?). Some plugins, however, require local disk. If the target system doesn't require local access (more on that in a moment), you may want to spin up some machines with large ephemeral disks just to handle these backup / restore operations. The conversation for backing up data then looks like this:

A Dedicated SHIELD Agent

Some plugins absolutely cannot be executed across the network. The Filesystem Plugin, for example, can only deal with files on the local filesystem (networked filesystems notwithstanding). Therefore, if you need to back up files on a host, you will need to deploy a SHIELD agent to run on that host.

Using BOSH

BOSH is a cloud-agnostic deployment and orchestration tool that excels at lifecycle management of software at all scales. SHIELD has a BOSH release that can be used to deploy both the SHIELD core, and SHIELD agents into new and existing BOSH deployments.

If you're already using BOSH (for example, if you are deploying Cloud Foundry), adding SHIELD into your infrastructure should be easy. If you are still looking for a great release engineering framework, you can get your feet wet with a SHIELD deployment or three.

The SHIELD BOSH release can be found on Github.

Deploying the SHIELD Core

Usually, the SHIELD core is a standalone, self-contained deployment. To deploy, you'll need to find or create a deployment manifest. A good starting point can be found here.

Save that file locally, as shield.yml, and then run:

$ bosh -e my-bosh deploy \
    -d shield \
    -v static_ip=:::192.0.2.5::: \
    -v domain=:::shield.example.com::: \
    shield.yml

Replace domain with the FQDN of your SHIELD management console, and 192.0.2.5 with a static IP that you want to deploy SHIELD on. You may need to consult your BOSH cloud-config to find a suitable IP in the default network. Optionally, you may modify the deployment manifest to specify a different network.

NOTE: The provided deployment manifest assumes that your BOSH director has been deployed with a config-server that can generate the necessary certificates and keys for securing SHIELD's communications. If that is not the case, you will need to provide additional command-line options to the bosh deploy command to store the generated credentials locally. See the BOSH documentation for more information.

Once BOSH has finished deploying SHIELD, you should be able to access the SHIELD management console at https://$IP. The default login will be admin (username) and shield (password).

Deploying SHIELD Agents

If you need to colocate agents on other BOSH deployments, you have a few options. The fastest method is to modify those deployment manifests to include the shield-agent job in the appropriate BOSH instance groups, like this:

instance_groups:
  - name: some-database
    jobs:
      # ... other jobs ....
      :::- name:    shield-agent
        release: shield:::

    # ... rest of configuration ...

This can get out of hand fast. A more elegant solution is to use BOSH runtime configs and inject the SHIELD agent job into other deployments without mucking about with their deployment manifests.

Here's a working runtime config:

---
releases:
  - name:    shield
    version: 8.0.8

addons:
  - name: shield
    jobs:
      :::- name:    shield-agent
        release: shield:::

To use this, update your existing runtime-config:

$ bosh update-runtime-config addons.yml

Then, bosh deploy your pre-existing manifests, without changing them. For more information, including how to limit the shield addon to just specific deployments / VMs, read the BOSH runtime-configs documentation.

Using Docker

The SHIELD Core Image

TBD

The SHIELD Agent Standalone Image

TBD

Embedding the SHIELD Agent

TBD

Configuration Reference

This section contains detailed descriptions of all configuration options for the SHIELD core, and SHIELD agents.

SHIELD Core Configuration File Reference

The SHIELD core configuration file is a YAML file, read at startup by the shieldd binary.

  • listen_addr - The IP address and TCP port that the SHIELD core daemon should bind to and listen on for incoming API (HTTP) requests. Defaults to *:8888. * is interpreted to mean all interfaces.

  • workers - How many worker threads should the SHIELD core spin. This defaults to 2, but you should increase the 1.5 times the number of concurrent backup tasks you expect to see, at peak.

    Low worker counts can cause the SHIELD scheduler to "stall out" and not execute scheduled tasks in a timely fashion. The 1.5x multiplier accounts for purge operations, cloud storage tests, and other background tests.

  • debug - Whether or not to enable verbose debug logging. This is a boolean, and it defaults to no, which is a sane choice for any production or staging environment. Debug logging is verbose, and very low-level. It is of primary value to SHIELD developers.

  • data_directory - The absolute path to the directory where SHIELD will store all of its persistent data. Important files stored here include:

    • $data_directory/shield.db - The SHIELD metadata database.
    • $data_directory/vault/* - The encrypted files that back the vault.
    • $data_directory/vault.crypt - The encrypted file which stores the seal keys to the SHIELD Vault. This file is encrypted with the SHIELD master password.
    • $data_directory/bootstrap.log - A log of what occurred during a SHIELD from-nothing recovery.
  • fast_loop - The frequency, in seconds, of the SHIELD scheduler's "fast loop." On every iteration of the fast loop, SHIELD will schedule backup jobs that ought to run, execute pending tasks (if it has workers available), and handle inbound agent registration pings.

    By default, the fast loop executes once a second. Unles you have an urgent need otherwise, you shouldn't change this.

  • slow_loop - The frequency, in seconds, of the SHIELD scheduler's "slow loop." The slow loop handles administrative tasks for the SHIELD core, including archive expiration and purgation, session clearing, data analytics, and cloud storage testing.

    By default, the slow loop exucutes once every 300 seconds (5 minutes). Turning up the frequency will result in higher load on external cloud storage systems. Decreasing the frequency will cause expired archives to remain in cloud storage for longer.

  • web_root - The root path to the SHIELD web management UI assets. Defaults to the relative path web, which is probably not what you want.

  • environment - An name for the environment, that SHIELD will pass through to clients accessing its API and web management console. This can be useful for differentiating your staging SHIELD from your production SHIELD. By default, no environment is set.

  • color - You can color code your SHIELD Web User Interfaces! Set a hex value here (i.e. ##003300) or other CSS-compatible color identifier, and the web UI will use it to colorize the environment name.

  • motd - A (hopefully) short message to display to operators on the login screen. You can use this for compliance messages, important notices, an explanation of which authentication method people should use, who to contact for help, etc. By default, there is no MOTD.

  • vault_address - The URL of the SHIELD Vault. This should almost always be https://127.0.0.1:8200. If you are using the BOSH release, this cannot be configured.

  • vault_ca_cert - The X.509 Certificate Authority certificate, PEM-encoded, for validating the Vault certificate. If you are using the BOSH release, this cannot be configured (nor does it need to be).

  • encryption_type - Which encryption algorithm and chaining mode to use for encrypting backup archives. Supported values are:

    • aes256-ctr - 256-bit AES, in Counter CBC mode.

    We plan to introduce more types as the need arises.

    Each backup archive tracks which encryption type was in force when it was taken, to allow operators to change this value without rendering previous backup archives unusable.

  • session_timeout - How long (in hours) before idle authenticated sessions are invalidated. Defaults to 720 (about a month).

  • failsafe - When the SHIELD core starts up, it checks the local users table. If it is empty (there are no local users), it creates a failsafe account, using these parameters. This is designed to assist in a safe and secure bootstrap.

    • username - The username of the failsafe account.

    • password - The (cleartext) password of the failsafe account.

SHIELD Agent Configuration File Reference

The SHIELD agent configuration file is a YAML file, read at startup by the shield-agent binary.

  • name (required) - The name of this agent, for registration with the SHIELD core. This name will appear in web and CLI interfaces, to people configuring backup jobs, and should describe the role this agent installation plays in the overall topology.

  • authorized_keys_file (required) - The path to an SSH authorized keys file, which should contain the public component of the agent private key that the SHIELD core will use to authenticate to the agent for remote orchestration.

  • listen_address - The IP address and TCP port that the SHIELD agent should bind to and listen on for incoming orchestration (via SSH). Defaults to *:5444. * is interpreted to mean all interfaces.

  • plugin_paths (required) - A YAML list of paths that the agent will use when attempting to resolve plugin names to binaries. This is kind of like the canonical UNIX $PATH environment variable, except it does not apply to any programs that the plugins themselves attempt to execute.

    You should list all of your plugin binary directories here.

  • registration - This subsection governs how this agent will register with its SHIELD core. While technically optional, registration is highly recommended, from an ease-of-use standpoint.

    The following keys exist underneath registration::

    • url - The HTTP(S) URL of the upstream SHIELD core. This will normally be something like https://$ip_or_hostname/.

    • interval - How often (in seconds) should the agent ping the SHIELD core and provide reigstration details. The SHIELD core determines when it validates agent registrations and extracts metadata information, so this setting cannot be used to increase the frequency of such updates.

    • shield_ca_cert - Path to a file containing the PEM-encoded CA certificate that issued the SHIELD core's X.509 TLS certificate. This allows operators to validate self-signed certificates, or custom, in-house CA-issued certificates.

    This has no effect of skip_verify is set to true.

    • skip_verify - Whether or not to disable verification of the SHIELD core X.509 TLS certificate. This defaults to false, since certificate verification is generally A Good Thing ™

Using SHIELD

SHIELD features a beautiful web user interface and a robust command-line interface. We like to think of the web UI as providing more visibility into the configuration of SHIELD, while the CLI provides more flexibility in terms of automation.

The Web UI

You can access the web UI by pointing your browser at the IP address of your SHIELD core installation. SHIELD forces all HTTP traffic over TLS, via port 443, for security reasons.

Logging In

Before you can interact with SHIELD, you must log in.

The SHIELD Web UI Login Screen

On the right is the login form for local authentication. On the left is a list of the configured authentication providers. These allow SHIELD administrators to integrate SHIELD authentication with 3rd-party, external identity systems like Github, or Cloud Foundry UAA.

Note: You may not have any authentication providers listed.

The Heads-up Display

All logged in? Great!

At the top of the screen, you should see the heads-up display:

The SHIELD Heads-Up Display

To the left is identifying information about this SHIELD core, including the configured SHIELD environment name, the IP address and/or fully-qualified domain, and the version of SHIELD.

The first pane summarizes the overall health of SHIELD and the current tenant's configuration.

  • SHIELD is ... - Reports the current status of the SHIELD API. If all is well, this will say SHIELD is up, in a reassuring green hue. If the SHIELD core is not responding to API calls, this will say SHIELD is DOWN, in red. Sometimes, it may report that the SHIELD is locked, in which case an administrator needs to intervene to unlock it.

  • Cloud Storage is ... - Reports the health of all global cloud storage systems, as well as the health of all cloud storage specific to the currently selected tenant.

  • Jobs are ... - Reports the status of all backup jobs for the current tenant. It considers only the most recent execution of each job, whether it was scheduled or run manually (ad hoc).

The second pane, titled Data Protection Summary, provides some numbers for your consideration. All of these are per-tenant.

  • Scheduled Backup Jobs - How many total jobs are scheduled to run.

  • Backup Archives - How many backup archives exist.

  • Cloud Storage Used - How much of cloud storage is being used by the backup archives for this tenant's jobs.

  • Daily Storage Increase - A simple linear projection of the amount of additional cloud storage that will be used, each day, given the current schedules, retention policies, and archive sizes.

The heads-up display is partially dependent on the current tenant, so if you switch to a different tenant, you might get different numbers / statuses.

The Task Sidebar

To the left of the screen is a sidebar with links to the common tasks you may want to perform:

  • Run an ad hoc backup
  • Restore data from a backup
  • Configure a new backup job

The Top Bar

TBD

The Navigation Bar

The black navigation bar (immediately under the heads-up display) will stick to the top of the viewport as you scroll. It provides top-level navigation, including:

  • Systems - Your data systems are the things that SHIELD protects, by making copies of the important data contained within them, on a scheduled and recurring basis. This page lets you review and manage those systems.

  • Storage - Cloud storage is where SHIELD keeps the backup copies of your data. You can configure however many storage systems you want, in whatever configuration you deem appropriate. This page lets you access both global (shared) cloud storage systems, and those specific to your tenant.

  • Retention - SHIELD doesn't keep backup archives forever. Every backup job needs a retention policy so that SHIELD knows when it is okay to delete older archives, to conserve space. This page lets you review and manage those policies.

  • Admin - This one is for SHIELD administrators only. It provides access to a host of administrative functions like tenant and user management, global cloud storage management, etc.

The CLI

TBD

Configuring Backups

TBD

Wizard Walkthrough

TBD

The Systems Page

TBD

Adding a second schedule

TBD

Running Backups / Restores

TBD

Ad hoc vs. Scheduled

TBD

The Ad hoc backup Wizard

TBD

The Timeline View

TBD

Annotating Tasks

TBD

Restoring from the Timeline Page

TBD

The Restore Wizard

TBD

Cloud Storage

TBD

How SHIELD Uses Cloud Storage

TBD

Retention Policies

TBD

How the HUD interacts

TBD

Storage Thresholds

TBD

The Storage Display Page

TBD

Shared Storage

TBD

Multi-Tenancy

TBD

What is a Tenant?

TBD

Switching Tenants

TBD

Role Assignments (and what they mean)

TBD

Authentication Providers and Tenants

TBD

UAA

TBD

Github

TBD

The default tenant

TBD

Encryption

TBD

What is Encryption

TBD

Why do I Care?

TBD

How does SHIELD use encryption?

TBD

At-rest vs. in-flight encryption

TBD

Administration

TBD

Initializing A SHIELD Core

TBD

The Master Password

TBD

The Administrative Backend

TBD

Tenants

TBD

Shared Storage

TBD

Retention Policy Templates

TBD

Managing Agents

TBD

Authentication Providers

TBD

Local User Management

TBD

Rekeying SHIELD

TBD

Session Management

TBD

How Do I Backup X?

SHIELD Itself

The Problem With Encryption

TBD

The Important Bits of SHIELD

TBD

Configuring the fs Plugin

TBD

"Normal-mode" Restores

TBD

Cloud Foundry UAA

Cloud Foundry CCDB

BOSH

Monitoring SHIELD

TBD

Using the HUD

TBD

API Access for Monitoring

TBD

Metrics of interest

TBD

Log messages to watch

TBD