This is a guest post by our friend Tom McLaughlin, Engineering Advocate at Threat Stack. It was first published on the Threat Stack blog on February 22, 2017.
One of the challenges of building open source tools is figuring out how to package and distribute them. This is particularly true with web services. To make building, deploying, and running web services easier, Chef created Habitat.
When building open source web services for Threat Stack, one of our concerns is how to package these Python Flask applications so they run in the widest array of environments with low adoption friction. Using Habitat, the process is quick and easy.
For this post, we’re going to focus on the specifics of packaging a Python Flask application and the particular needs of that stack.
Note: If you are not familiar with Habitat, take a look at Chef’s own series on why they created it:
- Why Habitat? – Plans and Packages, Part 1
- Why Habitat? – Plans and Packages, Part 2
- Why Habitat? – The Supervisor and Run Lifecycle
The Habitat tutorial also explains how to install Habitat and then package and deploy a generic application: https://www.habitat.sh/tutorials/
Our Application
The application we are going to package is a simple RESTful web service. It is written in Python, and the Flask microframework is used to write the service. To serve the application, it is expected that you would use the Gunicorn WSGI HTTP server. The deploy process would be something like this:
- Get the application code to the host.
- Install Python dependencies.
- Launch Gunicorn, pointing it to the config file and application.
Depending on your familiarity with the Python stack and your existing tool pipeline, this could be quick and easy, or it could take you several hours to figure out. Since getting this service deployed and usable as quickly as possible is really important to gaining user adoption, streamlining the process is very is important.
Habitat Configuration
Using Habitat starts with writing plan.sh, init hook, and run hook files. In most Habitat examples, these files are located in the repository root, but in ours they are under the build/ directory to keep them contained by themselves:
$ tree threatstack-to-s3/build/
threatstack-to-s3/build/
├── hooks
│ ├── init
│ └── run
└── plan.sh
plan.sh
The plan.sh file is a bash shell script used by Habitat to drive package building. The first part of the file is the basic configuration for packaging the service:
pkg_name=threatstack-to-s3 pkg_description='Archive alerts from Threat Stack to S3' pkg_version=0.1.0 pkg_origin=tmclaugh pkg_maintainer='Tom McLaughlin' pkg_license=('MIT') # we copy in the source code in the `unpack` phase and need to put # something here due to https://github.com/habitat-sh/habitat/issues/870 pkg_source="fake" pkg_build_deps=(core/virtualenv) pkg_deps=(core/coreutils core/python2) pkg_exports=([http]=8080) pkg_exposes=(http)
The first six lines are basic packaging information that describes the service. The pkg_source variable is set to fake because you are building the existing source tree. The pkg_source variable typically points to a remote source archive.
The pkg_build_deps and pkg_deps are where you define an array of Habitat packages needed for the application to build and run. The pkg_build_deps will not be included as a part of any Habitat packages. If you are doing anything more complicated than Hello World in Python, you will need third-party Python modules. You will need to create a Python virtual environment and install modules into it for them to be included in your Habitat package. You don’t have to include the core/virtualenv package in your runtime, however. For runtime, you will just need core/coreutils for things like file copy and lining operations and core/python2 to provide runtime libraries.
The pkg_exports and pkg_exposes variables define the ports that should be exposed by Habitat. In this case, only TCP port 8080 is exposed. This is the port that is defined in hooks/run for the gunicorn process to bind to. These need to match.
You don’t set it in the package you are creating, but there is an optional pkg_svc_user. If you have a user account that all your services run as, set it here to that user. Otherwise the value defaults to hab. This user must exist in order to start a service in a Habitat package.
With the package variables defined, next come the plan’s callback functions. These are executed by Habitat during the packaging process:
# Need to opt-out of all of these steps, as we're copying in source code do_download() { return 0 } do_verify() { return 0 } do_clean() { return 0 } do_unpack() { # Because our habitat files liver under build/. PROJECT_ROOT="${PLAN_CONTEXT}/.." mkdir -p $pkg_prefix build_line "Copying project data to $pkg_prefix/" cp -r $PROJECT_ROOT/app $pkg_prefix/ cp -r $PROJECT_ROOT/*.py $pkg_prefix/ cp -r $PROJECT_ROOT/requirements.txt $pkg_prefix/ } do_build() { return 0 } do_install() { cd $pkg_prefix virtualenv venv source venv/bin/activate pip install -r requirements.txt }
Since you are not downloading any source, do_download(), do_verify(), or do_clean() do not need to do anything. The do_unpack() is where you copy the files for your service into the $pkg_prefix. If you were downloading a remote source archive, you would be decompressing it in this step. Python is not a compiled language, so leave do_build() empty. Finally, in do_install(), you change to the packaging directory you copied files to in do_unpack() and install Python modules via pip using the requirements.txt. (You may be savvy and catch a bug here. We’ll get to that.)
hooks/init & hooks/run
The hooks/init file initializes the application. With the application you are creating, you symlink the package contents into the var/ directory of the Habitat application root. With more complicated applications, you might do something like initialize a database schema. Notice the first two lines: They are very important for making it easier to debug package errors. The shebang line passes -xe to bash, which will enable script tracing and cause the script to immediately exit on error. The second line redirects stderr to stdout for all commands in the script. If you don’t do these two things, you may be left scratching your head trying to understand why your service is failing to start:
#!/bin/sh -xe exec 2>&1 ln -fs /app ln -fs /venv ln -fs /gunicorn.conf.py ln -fs /config.py ln -fs /logging.conf ln -fs /threatstack-to-s3.py
The hooks/run file is what is executed to start the service. The script changes directory to where you symlinked files and starts the gunicorn process. Importantly, you tell gunicorn to bind to the port you exposed in the plan.sh:
#!/bin/sh -xe exec 2>&1 cd gunicorn -c gunicorn.conf.py --bind 0.0.0.0:8080 threatstack-to-s3
Building & Deploying
With your Habitat configuration in place, you can build a package and start the service. From the root of the threatstack-to-s3 repository, run:
$ hab pkg build build/
All your Habitat files are stored under the build/ directory, which is why that is supplied at the end of the command. With the package built, you can start the service as shown below. Remember that the pkg_svc_user, which defaults to hab, must be available:
$ sudo ./hab start tmclaugh-threatstack-to-s3-0.1.0-20170220224242-x86_64-linux.hart hab-sup(MN): Starting tmclaugh/threatstack-to-s3/0.1.0/20170220224242 hab-sup(CS): tmclaugh/threatstack-to-s3/0.1.0/20170220224242 is not installed → Using core/acl/2.2.52/20161208223311 → Using core/attr/2.4.47/20161208223238 → Using core/bzip2/1.0.6/20161208225359 → Using core/cacerts/2016.09.14/20161031044952 → Using core/coreutils/8.25/20161208223423 → Using core/gcc-libs/5.2.0/20161208223920 → Using core/glibc/2.22/20160612063629 → Using core/gmp/6.1.0/20161208212521 → Using core/libcap/2.24/20161208223353 → Using core/linux-headers/4.3/20160612063537 → Using core/make/4.2.1/20161214000256 → Using core/ncurses/6.0/20161213233720 → Using core/openssl/1.0.2j/20161214012334 → Using core/python2/2.7.12/20161214012727 → Using core/readline/6.3.8/20161213234107 → Using core/sqlite/3130000/20161214012650 → Using core/zlib/1.2.8/20161118033245 ✓ Installed tmclaugh/threatstack-to-s3/0.1.0/20170220224242 ★ Install of tmclaugh/threatstack-to-s3/0.1.0/20170220224242 complete with 1 new packages installed. hab-sup(MR): Butterfly Member ID 2608f6668ce8417e96b1b068db8cb146 hab-sup(MR): Starting butterfly on 0.0.0.0:9638 hab-sup(MR): Starting http-gateway on 0.0.0.0:9631 threatstack-to-s3.default(SR): Initializing threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/app /hab/svc/threatstack-to-s3/var threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv /hab/svc/threatstack-to-s3/var threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/gunicorn.conf.py /hab/svc/threatstack-to-s3/var threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/config.py /hab/svc/threatstack-to-s3/var threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/logging.conf /hab/svc/threatstack-to-s3/var threatstack-to-s3.default hook[init]:(HK): + ln -fs /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/threatstack-to-s3.py /hab/svc/threatstack-to-s3/var threatstack-to-s3.default hook[init]:(HK): + exec threatstack-to-s3.default(SV): Starting process as user=hab, group=hab threatstack-to-s3.default(O): + cd /hab/svc/threatstack-to-s3/var threatstack-to-s3.default(O): + source venv/bin/activate threatstack-to-s3.default(O): ++ deactivate nondestructive threatstack-to-s3.default(O): ++ unset -f pydoc threatstack-to-s3.default(O): ++ '[' -z '' ']' threatstack-to-s3.default(O): ++ '[' -z '' ']' threatstack-to-s3.default(O): ++ '[' -n /bin/sh ']' threatstack-to-s3.default(O): ++ hash -r threatstack-to-s3.default(O): ++ '[' -z '' ']' threatstack-to-s3.default(O): ++ unset VIRTUAL_ENV threatstack-to-s3.default(O): ++ '[' '!' nondestructive = nondestructive ']' threatstack-to-s3.default(O): ++ VIRTUAL_ENV=/hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv threatstack-to-s3.default(O): ++ export VIRTUAL_ENV threatstack-to-s3.default(O): ++ _OLD_VIRTUAL_PATH=/hab/pkgs/core/coreutils/8.25/20161208223423/bin:/hab/pkgs/core/python2/2.7.12/20161214012727/bin:/hab/pkgs/core/acl/2.2.52/20161208223311/bin:/hab/pkgs/core/attr/2.4.47/20161208223238/bin:/hab/pkgs/core/bzip2/1.0.6/2016$ 208225359/bin:/hab/pkgs/core/glibc/2.22/20160612063629/bin:/hab/pkgs/core/libcap/2.24/20161208223353/bin:/hab/pkgs/core/make/4.2.1/20161214000256/bin:/hab/pkgs/core/ncurses/6.0/20161213233720/bin:/hab/pkgs/core/openssl/1.0.2j/20161214012334/bin:/hab/pkgs/core/sqlite/3130$ 00/20161214012650/bin:/hab/pkgs/core/busybox-static/1.24.2/20161214032531/bin:/sbin:/bin:/usr/sbin:/usr/bin threatstack-to-s3.default(O): ++ PATH=/hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv/bin:/hab/pkgs/core/coreutils/8.25/20161208223423/bin:/hab/pkgs/core/python2/2.7.12/20161214012727/bin:/hab/pkgs/core/acl/2.2.52/20161208223311/bin:/hab/pkgs/core/attr/2.4$ 47/20161208223238/bin:/hab/pkgs/core/bzip2/1.0.6/20161208225359/bin:/hab/pkgs/core/glibc/2.22/20160612063629/bin:/hab/pkgs/core/libcap/2.24/20161208223353/bin:/hab/pkgs/core/make/4.2.1/20161214000256/bin:/hab/pkgs/core/ncurses/6.0/20161213233720/bin:/hab/pkgs/core/openss$ /1.0.2j/20161214012334/bin:/hab/pkgs/core/sqlite/3130000/20161214012650/bin:/hab/pkgs/core/busybox-static/1.24.2/20161214032531/bin:/sbin:/bin:/usr/sbin:/usr/bin threatstack-to-s3.default(O): ++ export PATH threatstack-to-s3.default(O): ++ '[' -z '' ']' threatstack-to-s3.default(O): ++ '[' -z '' ']' threatstack-to-s3.default(O): ++ _OLD_VIRTUAL_PS1= threatstack-to-s3.default(O): ++ '[' x '!=' x ']' threatstack-to-s3.default(O): +++ basename /hab/pkgs/tmclaugh/threatstack-to-s3/0.1.0/20170220224242/venv threatstack-to-s3.default(O): ++ PS1='(venv) ' threatstack-to-s3.default(O): ++ export PS1 threatstack-to-s3.default(O): ++ alias pydoc threatstack-to-s3.default(O): ++ '[' -n /bin/sh ']' threatstack-to-s3.default(O): ++ hash -r threatstack-to-s3.default(O): + gunicorn -c gunicorn.conf.py --bind 0.0.0.0:8080 threatstack-to-s3 threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23685] [INFO] Starting gunicorn 19.6.0 threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23685] [INFO] Listening at: http://0.0.0.0:8080 (23685) threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23685] [INFO] Using worker: gevent threatstack-to-s3.default(O): [2017-02-20 22:44:58 +0000] [23690] [INFO] Booting worker with pid: 23690
Exporting Other Package Formats
Habitat’s native package format is great, but may not suit your needs. For instance, it requires that you have Habitat pre-installed on the host that the package is going to run on. It also requires that its dependencies be downloaded when the service first starts. You may prefer to export a tarball that contains the application, all its dependencies, and the Habitat binary for running the application. Or, you might want to use your existing Docker infrastructure.
To export a tarball, run the following command:
$ hab pkg export tar tmclaugh/threatstack-to-s3
To run the application, transfer the tarball to the host that the application will run on, and do the following:
$ sudo tar zxvf tmclaugh-threatstack-to-s3-0.1.0-20170221015341.tar.gz -C / $ sudo /hab/bin/hab start tmclaugh/threatstack-to-s3
If you are using Docker instead, export a Docker container using the following command and deploy the container as you would any other container in your infrastructure:
$ hab pkg export docker tmclaugh/threatstack-to-s3
Final Words. . .
So there you have it — a streamlined, effective, and fast way of packaging a Python Flask web service using Chef Habitat. Using Habitat is an excellent way of getting your service deployed and usable as quickly as possible, thereby boosting the likelihood of user adoption.
To see the final code in our repository, click here: https://github.com/threatstack/threatstack-to-s3/tree/phase_3_habitat
Also, a big thanks goes out to Mike Fielder whose blog post, GitHub repo, and time on Slack got me started.
Additionally, the Habitat community Slack helped me with some remaining questions.