This is not urgent, but you may encounter SSL verification errors when using vagrant directly, or vagrant through test kitchen.
Special Thanks to Joe Damato of Package Cloud for spending his time debugging this issue with me the other day.
TL;DR, We found a bug in our bento boxes where the SSL certificates for AWS S3 couldn’t be verified by openssl and yum on our CentOS 5.11, CentOS 6.6, and Fedora 21 “bento” boxes because the VeriSign certificates were removed by the upstream curl project. Update your local boxes. First remove the boxes with vagrant box remove
, then rerun test kitchen or vagrant in your project.
We publish Chef Server 12 packages to a great hosted package repository provider, Package Cloud. They provide secure, properly configured yum and apt repositories with SSL, GPG, and all the encrypted bits you can eat. In testing the chef-server cookbook for consuming packages from Package Cloud, I discovered a problem with our bento-built base boxes for CentOS 5.11, and 6.6.
[2015-02-25T19:54:49+00:00] ERROR: chef_server_ingredient[chef-server-core] (chef-server::default line 18) had an error: Mixlib::ShellOut::ShellCommandFailed: packagecloud_repo[chef/stable/] (/tmp/kitchen/cache/cookbooks/chef-server-ingredient/libraries/chef_server_ingredients_provider.rb line 44) had an error: Mixlib::ShellOut::ShellCommandFailed: execute[yum-makecache-chef_stable_] (/tmp/kitchen/cache/cookbooks/packagecloud/providers/repo.rb line 109) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of yum -q makecache -y --disablerepo=* --enablerepo=chef_stable_ ----
...SNIP
File "/usr/lib64/python2.4/urllib2.py", line 565, in http_error_302
...SNIP
File "/usr/lib64/python2.4/site-packages/M2Crypto/SSL/Connection.py", line 167, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
M2Crypto.SSL.SSLError: certificate verify failed
What’s going on here?
We’re attempting to add the Package Cloud repository configuration and rebuild the yum cache for it. Here is the yum configuration:
[chef_stable_]
name=chef_stable_
baseurl=https://packagecloud.io/chef/stable/el/5/$basearch
repo_gpgcheck=0
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-packagecloud_io
sslverify=1
sslcacert=/etc/pki/tls/certs/ca-bundle.crt
Note that the baseurl is https – most package repositories probably aren’t going to run into this because most use http. The thing is, despite Package Cloud having a valid SSL certificate, we’re getting a verification failure in the certificate chain. Let’s look at this with OpenSSL:
$ openssl s_client -CAfile /etc/pki/tls/certs/ca-bundle.crt -connect packagecloud.io:443
CONNECTED(00000003)
depth=3 /C=SE/O=AddTrust AB/OU=AddTrust External TTP Network/CN=AddTrust External CA Root
verify return:1
depth=2 /C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Certification Authority
verify return:1
depth=1 /C=GB/ST=Greater Manchester/L=Salford/O=COMODO CA Limited/CN=COMODO RSA Domain Validation Secure Server CA
verify return:1
depth=0 /OU=Domain Control Validated/OU=EssentialSSL/CN=packagecloud.io
verify return:1
... SNIP
SSL-Session:
Verify return code: 0 (ok)
Okay, that looks fine, why is it failing when yum runs? The key is in the python stack trace from yum:
File "/usr/lib64/python2.4/urllib2.py", line 565, in http_error_302
Package Cloud actually stores the packages in S3, so it redirects to the bucket, packagecloud-repositories.s3.amazonaws.com. Let’s check that certificate with openssl:
$ openssl s_client -CAfile /etc/pki/tls/certs/ca-bundle.crt -connect packagecloud-repositories.s3.amazonaws.com:443
depth=2 /C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5
verify error:num=20:unable to get local issuer certificate
verify return:0
---
Certificate chain
0 s:/C=US/ST=Washington/L=Seattle/O=Amazon.com Inc./CN=*.s3.amazonaws.com
i:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at https://www.verisign.com/rpa (c)10/CN=VeriSign Class 3 Secure Server CA - G3
1 s:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at https://www.verisign.com/rpa (c)10/CN=VeriSign Class 3 Secure Server CA - G3
i:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5
2 s:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5
i:/C=US/O=VeriSign, Inc./OU=Class 3 Public Primary Certification Authority
...SNIP
SSL-Session:
Verify return code: 20 (unable to get local issuer certificate)
This is getting to the root of the matter as to why yum was failing. Why is it failing though?
As it turns out, the latest CA certificate bundle from the curl project appears to have removed two of the Versign certificates, which are used by AWS for https://s3.amazonaws.com.
But wait, why does this matter? Shouldn’t CentOS have the ca-bundle.crt file that comes from the openssl package?
$ rpm -qf ca-bundle.crt
openssl-0.9.8e-27.el5_10.4
Sure enough. What happened?
$ sudo rpm -V openssl
S.5....T c /etc/pki/tls/certs/ca-bundle.crt
Wait a second, why is the file different? Well this is where we get back to the TL;DR. In our bento boxes for CentOS, we had a line in the ks.cfg that looked like this:
wget -O/etc/pki/tls/certs/ca-bundle.crt http://curl.haxx.se/ca/cacert.pem
I say past tense because we’ve since removed this from the ks.cfg on the affected platforms and rebuilt the boxes. This issue was particularly perplexing at first because the problem didn’t happen on our CentOS 5.10 box. The point in time when that box was built, the cacert.pem bundle had the VeriSign certificates, but they were removed when we retrieved the cacert.pem for 5.11 and 6.6 base boxes.
Why were we retrieving the bundle in the first place? It’s hard to say – that wget line has always been in the ks.cfg for the bento repository. At some point in time it might have been to work around invalid certificates being present in the default package from the distribution, or some other problem. The important thing is that the distribution’s package has working certificates, and we want to use that.
So what do you need to do? You should remove your opscode-centos
vagrant boxes, and re-add them. You can do this:
for i in opscode-centos-5.10 opscode-centos-5.11 opscode-centos-6.6 opscode-centos-7.0 opscode-fedora-20
do
vagrant box remove $i
done
Then wherever you’re using those boxes in your own projects – cookbooks with test-kitchen for example, you can simple rerun test kitchen and it will download the updated boxes.
If you’d like to first check if your base boxes are affected, you can use the test-cacert cookbook. With ChefDK 0.4.0:
% git clone https://github.com/jtimberman/test-cacert-cookbook test-cacert
% cd test-cacert
% kitchen test default