Secure Compressed Data

From SRB

SRB Data Security and Compression System

Last updated May 29, 2003
Wayne Schroeder

This is a brief description of the SRB Data Security/Encryption/Compression System which was implemented for SRB 2.1 for release in late May 2003.

Currently, this system has been tested (a limited amount) on Linux, AIX, and SunOS. It should also work fine on other versions of Unix and a Windows verion would be feasible, if needed. Of course, you should throughly test this in your own environment before putting it into production, but we expect it to be as reliable as OpenSSL, which is an excellent and highly-reliable package.

Our initial plan was to use GSI in the SRB to provide network encryption but, as outlined below, we revised the plan in a manner that provides more security, better performance, and compression as well as encryption.

A key idea for this system is to encrypt (and/or compress) on the client side and store the files in that form. George Kremenek had originally conceived of this months ago, I later thought of the same approach, and we then convinced the rest of the SRB team and our first customers for this, BIRN.

We were planning to implement changes in Sput and Sget for this, but later decided to do it as scripts on top of Sput and Sget (and other commands). This script approach vastly simplified the implementation, making it feasible for the 2.1 timeframe. It also creates a much more flexible system. Performing the compression and/or encryption in Sput itself (and decompression and/or decryption in Sget) would be a little higher performance (avoiding some disk I/O), and can be done in the future if needed (although the Sget/Sput approach would also have a higher chance of encountering problems since it would be using the OpenSSL libraries instead of the OpenSSL commands).

We chose to include the compression capability since when encrypting data, it will often be quicker to compress it first. Also, there may be many cases in which compression would be effective, but encryption is not needed, so this system can support either one or both.

The Sput.pl script stores one or more files after encrypting and/or compressing. Sget.pl retrieves one or more files that were stored with Sput.pl, getting the attributes that Sput.pl had stored, and decompressing and decrypting if needed. An MD5 hash of the original file is verified.

By default, both Sput.pl and Sget.pl will display timing and other information as the sequence progresses. To disable that use the -q (quiet) option. For more messages, use the -v (verbose) option. Like the Sput and Sget utilities, users need to run Sinit first, to establish an SRB session.

By default Sput.pl will both compress and encrypt. -c none will override this and only encrypt.


The Sput.pl command line is:
   Sput.pl [-h] [-v] [-q] [-c compress_type] [-e encrypt_type] file(s)
   Allowed encryption types are rc5 rc5-cbc rc4 bf none
   Allowed compression types are gzip gzip-1 gzip-9 none
   By default, encryption is: bf
   By default, compression is: gzip
   -h help
   -v verbose
   -q quiet
gzip-1 is a fast, gzip-9 is better (but slower), gzip is medium

The Sget.pl command line is:
   Sget.pl [-h] [-v] [-q] files(s)
   -h help
   -v verbose
   -q quiet

For more information on the encryption types see the OpenSSL commands and openssl.org.

This data security system is built by including "--enable-secure-comm" on the configure line. It requires the OpenSSL libraries (static and dynamic) and command-lines utilities. OpenSSL, via both its libraries and utilities, does all of the encryption for both the bulk data encryption and the secure communication. OpenSSL is part of the GSI (Grid Security Infrastructure) system, and is distributed with Globus and NMI. Like GSI, the parent location of these OpenSSL components can be specified via "--enable-globus-location=path" on the configure line. If you have not installed Globus, you can alternatively specify the direct location of OpenSSL via "--enable-openssl-location=path". OpenSSL is an excellent package and is fairly easy to install, see openssl.org. You will probably need to update Sput.pl and Sget.pl for the location of perl, SRB commands, and/or OpenSSL on your system; each of which are near the top of each script.

For using only compression, these two configure options are not needed. The Sput.pl and Sget.pl scripts should be able to perform the compression and decrompression functions without these and without access to OpenSSL libraries or commands.

The Sput.pl script implements options to encrypt and/or compress data, and stores the data in that form on the server side via the Sput command. Sput.pl also calculates an MD5 hash (like a checksum) for the original file and stores it in the MCAT (via secure communication). For encrypted files, Sput.pl generates a random key, encrypts the data via an OpenSSL command using that key, and stores the Key and related information in the MCAT. Sget.pl determines that a file is encrypted and/or comressed via information from the MCAT, and performs the reverse to reform the original file. Sget.pl then verifies this MD5 value after.

Containers, for now, will not be encrypted or compressed.

The transfer of the file Key between client and MCAT-enabled server is done securely via new functions and libraries implemented in the SRB code, known as the Srb Secure Communication system. These Keys are transferred in an encrypted form and stored plain-text in the MCAT database.

The Srb Secure Commucation (SSC) is enabled with the above mentioned "--enable-secure-comm" on the configure line. SSC uses RSA and Blowfish (both patent-free) to set up a session key between the client and the MCAT-enabled server. There are two new client/server calls, and a set of library routines. See comments at the beginning of the library routines, srbSecureComm.c for a description. SgetD, SmodD and server-side code has been modified to use SSC to exchange the field that contains the file encryption key, if --enable-secure-comm has been selected.

Via the Sput.pl command line, users can select from various encryption algorithms (all of which are performed by OpenSSL commands) and the compression algorithms (currently gzip, gzip-1 (light), and gzip-9 (strong)). We had expected to support various levels of encryption (trading off less security for higher peformance, or vice versa), but the OpenSSL commands do not seem to allow that. If needed we could provide such a service via a simple utility using the OpenSSL RC5 libraries (which do include an API to specify the rounds for RC5 encrypting).

We had investigated using GSI to encrypt just the data transmissions, but there would be a number of difficulties. Since each side of each GSI link would be doing the CPU-intensive task of encrypting or decrypting, computational bottlenecks would frequently occur on the servers. Also, we'd have to handle client to server, server to server, and parallel server to client source code changes to interact with GSI.

As implemented, the client performs a similar amount of work as it would if GSI had been used. But on the server side, our implementation requires very little additional work. See below.

Advantages over a GSI implementation:

  • For encryption, the SRB servers will not have to do any decryption for incoming files or re-encryption for outgoing files; avoiding a serious bottleneck under even moderate load. This approach will scale, as the computational intensive work is pushed out to all the clients.
  • Encryption will apply to all storage resources, including HPSS. Since we don't control the protocol to/from HPSS (for example) we can't implement GSI encryption on the link. But since we'll only be storing encrypted data, the transfer to/from HPSS will be secure.
  • Encrypted files will be more secure on the server and any storage resource, since they will be stored encrypted. If the data storage system is compromised, only encrypted data will be available. Intruders would also need to break into the MCAT database to get the keys or use brute force computational attacks to break the ciphers.
  • The data encryption is fail-safe, in the sense that if a failure occurs the data will most likely still be kept secure. Once the data leaves the client it is encrypted, and never decrypted until retrieved. We don't have to consider and handle every case in which the SRB moves data, as it is already encrypted.
  • Combining encryption and compression will often be higher-performance. Since encryption is CPU-intensive, and compression somewhat less so, there will be many situations where it will be much quicker to compress and then encrypt. And compressed data will be quicker to transmit across the network. Compression is fast and effective on certain types of data (image and text in particular).
  • Compressed data will take less space on the storage system.
  • Users can select compression without encryption. In many cases, this will improve effective network bandwidth.
  • Data that is compressed and encrypted may be somewhat harder to break via brute-force computational attacks than data that is just encrypted.
  • Although this is not GSI, we're still leveraging off of GSI-related software, as we make use the encryption routines that are part of the GSI libraries (OpenSSL).
  • The computing expense is pushed up to the client host. Not only is this more efficient (as noted in another item), but it distributes the cost to the client user's machine where it "belongs".
  • Users have a quite a bit of flexibility and can choose between many of the capabilities of the OpenSSL system.
  • It is easy to add additional compression and/or encryption algorithms, as needed.
  • These capabilities were straight-forward to implement because the SRB is an integrated solution; as the SRB team controls the client, server, MCAT, communication protocol, and transport mechanisms. By making small changes in the communication (srb secure communcation (SSC)), some of the client utilities (SgetD and SmodD), and the Server code (to use SSC), and utilizing existing MCAT fields, and building a layer (the scripts) on top of these and the OpenSSL commands, we've quickly created a simple, reliable and powerful data security system. (This was a moderate amount of work, but probably easier then implementing GSI encryption on each of the types of transfers we handle.)
  • In the future, we may also be able to include lighter levels of encryption so that some data, which doesn't need to be so secure, can be more quickly encrypted/decrypted. The number of rounds performed for RC5 determines (in large part) the quality of the encryption and is almost linearly proportional to the compute time (the more rounds the strong the encryption and the more cpu time needed). We could have an RC5-light, RC5-medium, and RC5-strong. The OpenSSL library allows 3 RC5 rounds settings: 8, 12, and 16, default being 12. If OpenSSL commands were to include a setting for this (our current installed version apparently does not), then we could easily include it. Alternatively, we could write a program that uses the OpenSSL library to do so. Then users or SRB Admins would be able to make the computational/security trade-off choices themselves, without the SRB team's direct involvement.
  • OpenSSL may be able to utilize certain encryption accelerator boards. If so, it would only have to be configured on the client host to be effective.

Disadvantages:

  • For encrypted data, if the MCAT is lost, so is the data. This seems to be an acceptable risk given the advantages. It would be possible to retrive encrypted files without an MCAT, if the file key is available. SRB admins can utilize DBMS features to safeguard the MCAT. Concerned SRB admins or users can also record the key after displaying it with an SgetD command.