Have Big Data…Will Travel

database-schema-1895779_640

I have been working on a project lately that requires me to have access to a large database (about 1 Million rows).  On my recent trip to Scotland I knew I would have time on the plane to work on the project but didn’t want to pay for airline wifi that would be slow anyway.

(Note: This was anonymized data so it was OK to make it portable.  Always make sure you are not moving PII/PHI from a secure container to a non-secure one.  Even email addresses and social media handles are going to be considered PII when GDPR regulations activate next year.)

My solution: spin up a virtual Ubuntu server running MySQL.  I put the server on a USB 3.0 drive to save room on my MacBook Air and it worked perfectly.  The server version (no GUI) of Ubuntu runs on very little resources so it does not affect the performance of my computer.  I then used MySQL Workbench to access the data, run queries, etc.  The beauty of MySQL workbench is that I can create and save new schemas, import and export data, and save the workspace at anytime so that I can come back to it.

Here are the basics of my setup:

  1. Host Machine: MacBook Air with 4GB (any Windows, Mac or Linux machine will work)
  2. Virtualization Software: VirtualBox by Oracle (Free – https://www.virtualbox.org/)
  3. Ubuntu 16.04 Server ISO (Free – https://www.ubuntu.com/)
  4. MySQL Workbench Software (Free – https://www.mysql.com/products/workbench/)
  5. USB 3.0 128GB Memory Stick

There are a few things that will get you through your install a little faster:

  1. Add a Host Only adapter to the server in VirtualBox.  This allows communication to your database from your host machine.  Here is a blog post that explains Host Only adapters in VirtualBox: https://blogs.oracle.com/scoter/virtualbox-host-only
  2. For security purposes, disable the other network adapter unless you need to give the server internet access temporarily for updates, etc.
  3. If you are going to have sensitive data on your virtual server configure drive encryption during the install of your Ubuntu server.
  4. Configure your Ubuntu server to install LAMP during the install.  This saves you from installing MySQL manually.
  5. Modify your /mysqld.cnf file according to this article: http://www.configserverfirewall.com/ubuntu-linux/enable-mysql-remote-access-ubuntu/
    Personally I bind it to the IP address of my machine on the Host Only network instead of allowing anyone access.  This allows me to skip step #2 but still keep my MySQL data inaccessible from anywhere but my machine.
  6. Configure the firewall (UFW) to allow traffic on 3306
    sudo ufw allow 3306

That should be all you need to setup a virtual MySQL server for traveling with big data.