Now is a great time to start thinking about deploying a new Chef Server. Exciting new cloud deployment options such as AWS OpsWorks, Marketplace images for Azure & AWS, as well as high-scale options including Chef Backend for On-Prem HA and the AWS Native Chef Server, make now the perfect time to modernize the backbone of your automation and take advantage of the great new visualizations and reporting capabilities of Chef Automate.
In this article we’ll guide you through the easiest way to migrate your data over to new Chef infrastructure with the least impact to your users.
Knife-ec-backup creates a full object-based export of your Chef Server (e.g., organizations, users, cookbooks, nodes, clients & keys, roles, acls, etc.) and can restore them to another Chef Server. It was modelled after knife-backup in the sense that it downloads all objects as JSON (except cookbooks, which turn back into cookbook files) but it has been extended to understand many Enterprise Chef (and Chef Server 12) concepts including organizations, users and ACLs.
Unlike file-based backups (chef-server-ctl backup) which are optimized for speed, knife-ec-backup was designed to maximize compatibility and enable admins to clean up errors and bad data after it has been exported. The downside is that knife-ec-backup can be quite a bit slower than other backup strategies. Knife-ec-backup is a Ruby gem that has undergone significant improvements over the past 12 months, make sure you install the latest gem before proceeding.
Knife-tidy is an essential sidekick tool for Chef Server migrations. Think of it as the Robin to knife-ec-backup’s Batman.
Knife-tidy has several modes of operation:
knife tidy backup clean
can validate and eliminate errors within a knife-ec-backup data set. Over time Chef Server has become better about validating data, but doesn’t know how to clean up existing objects that were stored within it. This function fixes those objects for you so that they will cleanly import into the latest Chef server release.knife tidy server report
will check your server for unused nodes and cookbook versions, which are normally the largest objects in your Chef Server data. It identifies unused cookbook versions by evaluating the run_lists of all the nodes and environment constraints, therefore providing a high degree of safety in its recommendations.knife tidy server clean
is like server report but will take action for you, removing the unused nodes and cookbooks. Editor’s note: Many improvements have gone into our products to address issues encountered when migrating data from older Chef Servers with less stringent data validation. We encourage you to use the latest versions of our packages wherever possible.
$ sudo apt-get -y install gcc postgresql libpq-dev
$ sudo yum -y install gcc postgresql-devel
$ sudo yum -y install https://download.postgresql.org/pub/repos/yum/9.5/redhat/rhel-6-x86_64/pgdg-redhat95-9.5-2.noarch.rpm
$ sudo yum -y install gcc postgresql-devel postgresql95-devel
$ curl -L https://chef.io/chef/install.sh | sudo bash -s -- -P chefdk
$ sudo /opt/chefdk/embedded/bin/gem install knife-ec-backup -- --with-pg-config=/opt/opscode/embedded/postgresql/<PG_VERSION>/bin/pg_config
$ sudo /opt/chefdk/embedded/bin/gem install knife-tidy
You’ve launched your new Chef Server (or cluster) and it seems to be working. But there are steps you should take before putting that new Chef Server into production:
chef-server-ctl test
command.And remember that they’re not backups until you’ve validated you can successfully restore from them!
For Monitoring your Chef Server, there is no better resource than this ChefConf 2016 talk titled Monitoring and Tuning your Chef Server.
The more unused objects (e.g., nodes, clients, cookbooks & organizations) you can identify and remove beforehand, the faster your migration will be and shorter maintenance window you will require.
As mentioned above, knife tidy server clean
clean can be used to clean up your existing Chef server before migration. It is strongly recommended that you:
--dry-run
mode to see what actions it will takeknife tidy server clean
may need to make two passes in order to effectively clean out unused cookbooks. That’s because as more stale nodes are removed, the calculated list of needed cookbook versions is likely to shrink.It is also a good practice to maintain a list of important and needed Chef organizations, and to regularly audit and prune your organization list. Listing and removing organizations is easily accomplished with the chef-server-ctl Org Management commands.
Your migration will happen in two phases: the Initial Transfer phase, and the Synchronization phase (which consists of many small “catch up” syncs). Those familiar with the unix rsync
tool will find this process to be identical in concept.
The Initial Transfer phase can be pretty slow during both the backup and restore phases. It’s strongly recommended that you use a shell session manager like tmux
orscreen
to maintain your session in the event your computer is disconnected. Taking this one step further,you might configure those tools to capture all of the session history, or use a tool like script
or tee
to do that for you.
Initial backup and restore process:
$ /opt/chefdk/embedded/bin/knife ec backup my_backup_destination --with-user-sql --with-key-sql --concurrency 20 -c /etc/opscode/pivotal.rb
$ /opt/chefdk/embedded/bin/knife tidy backup clean --backup-path my_backup_destination
$ /opt/chefdk/embedded/bin/knife ec restore my_backup_destination --with-user-sql --with-key-sql -c /etc/opscode/pivotal.rb
This phase will use the exact same steps as the Initial Transfer phase, optionally you can add the --purge
flag to knife-ec-backup (but not on restore*) to delete objects in the backup folder that have been deleted on the source Chef Server.
The Synchronization phase is much shorter than a full transfer, because only the changed objects need to be transferred. Schedule periodic syncs using cron during this phase while you plan your final cut-over. The time it takes to complete one full synchronization cycle will determine the length of the maintenance window (Chef server downtime) needed for the cut-over.
*Using –purge while restoring can have unintended effects because of the way Chef Server de-duplicates cookbook files between versions of a cookbook.
The cut-over phase is essentially a Synchronization phase but with two additional steps added:
/etc/chef/trusted_certs
to all the clients so that they will be trusted.To migrate your Chef Server you will be installing the latest gems for knife-ec-backup and knife-tidy. If you’re using Enterprise Chef 11 or older versions of Chef Server 12 you may have gem dependency conflicts installing into your Chef Server’s ruby environment. To avoid this, we recommend installing ChefDK on your Chef server and install the gems into its ruby environment ( chef gem install knife-ec-backup knife-tidy
).
In order to accelerate the migration of large clusters, it may be possible to parallelize the backup and restore across multiple frontends. Leveraging a shared filesystem such as NFS or EFS also helps by reducing duplication.
This script provides an example for a parallelized restore operation, spreading the load in similarly sized grouped batches across the number of frontends you wish to utilize.
It’s important to monitor server loads and API response times during the backup and restore phases. Consider adding a dedicated frontend if a backup puts unacceptable strain on your production cluster. To speed restores, consider using more powerful servers or instance sizes temporarily during the migration process. In both backup and restore cases, tiered and clustered systems have a significant performance advantage over standalones.
Prior to Chef Server 12, there were two separate packages for Open Source and Enterprise Chef Servers. As of Chef Server 12, these were unified into a single open source package. If you are migrating data from an Open Source Chef 11 Server, check out our notes for upgrading from Open Source Chef 11.
Update: As of knife-ec-backup 2.4.0, there are no longer restrictions on restoring knife-ec-backup data sets from multiple chef servers with user sql records into a single chef server.
If you’re using the SQL options ( --with-user-sql --with-key-sql
) on restore, then a couple of scenarios are not possible:
Large numbers of database key errors are a signal that has happened, and you may need to start the process over.