Qik-n-EZ: Using a Shell’s STDOUT in a Chef ruby_block

Chef gets a bad rap for being “hard” — especially when compared to Ansible .. This is especially true when developers of Chef cookbooks don’t understand the two-pass model .. A common question amongst new Chef cookbook developers goes something like this: “I want to run a shell command and capture its output (i.e. STDOUT) .. I then want to loop through that shell command’s output to run other Chef resources, and I want those other Chef resources to be aware of the shell command’s output .. How can I do that ??” .. Sometimes it’s easier to use bullets:

  • run a shell command
    • the output of the shell command == A, B, C
  • loop through the shell command’s output one at a time
    • A
    • B
    • C
  • run other Chef resources per loop item
    • other Chef resources are aware of the loop item value

Often, the new Chef cookbook developer will try and run a Ruby loop in the raw, which will execute during the compile phase — this is a big no-no .. Instead, you should run the shell command, using shell_out, in a Chef ruby_block resource .. This will give you access to the shell command’s output where you can manipulate the output using good ol’ fashioned Ruby, and then “notify” other Chef resources as desired ..

One thing to be aware of is to make sure you are lazily evaluating properties in a resource that can’t be known until the execution phase of the chef-client run. Alan Thatcher has a nice writeup here about the hows and the whys ..

Enough talk — show me the code !!

So, is Chef “hard” ?? I would argue no — but it sure is nuanced, and for good reason ..

Qik-n-EZ: Secret Ohai Directory in Chef Cookbooks

Do you code Chef Cookbooks ?? Do you think Ohai plugins are so awesome that you’ve created some of your own ?? Do you really, really dislike having to depend on a community cookbook to deploy your custom Ohai plugins ?? Well, then I have some good news !!

There’s a secret top-level cookbook directory where you can put your custom Ohai plugins that will be synced and reloaded during a chef-client run .. Want to guess the name of this directory ?? My guess is you will only need one ..

./<cookbook name>/ohai/<ohai plugin>

Why is it a secret ?? Well, my guess is they simply forgot to document it, since it’s been an available feature since the 13.x days: https://github.com/chef-boneyard/chef-rfc/blob/master/rfc059-ohai-cookbook-segment.md


The ohai directory is not so secret anymore — see here ..

Free Idea: Ansible + Jenkins x AWS CodeBuild = Infinite Scale (sorta)

As you know, I like me some Ansible, AWS, and Jenkins .. Did you know it’s not uncommon to use Ansible + Jenkins as an “automation platform” to manage your cloud infrastructure ?? I do this a lot — it’s easy, reusable, and works !! Think about the workflow:

  • Ansible code is committed to the Git repo
  • Jenkins job is triggered
  • Jenkins pulls code to a local workspace
    • don’t forget to make it shallow
  • Jenkins executes “shell” build step based on defined parameters
  • Ansible playbook executes against defined inventory
  • (allthethings) are automated

Given the above workflow, your Jenkins job might look something like this:

Jenkins job

That said, our Jenkins node has been running hot for the past 3 months (CPU >=90%, increased RAM usage, etc.) — and our knee-jerk reaction has been to scale vertically .. Yes, we know how to setup a Jenkins slave and have done this in the past — but there has to be an easier way to consume “transient infrastructure” (provision, configure, execute, destroy) .. We looked into integrating Jenkins with Lambda — but it no longer looked like normal Ansible “code” — so we punted on that .. There has to be a way to run a Jenkins job on some transient infrastructure without having to: 1) redo your workflow, and 2) having to care too much ..

Let’s Google — “jenkins aws plugin” .. Hello AWS CodeBuild !!

Here’s how AWS CodeBuild describes itself — “AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers.” My big take away — “you don’t need to provision, manage, and scale your own build servers” .. Do a simple regex on that last statement (s/build/automation/) and you can see where I am going with this ..

Checkout the plugin, it states — “Instead of sending your build jobs to Jenkins build nodes, you use the plugin to send your build jobs to AWS CodeBuild.” What’s a build job ?? Commands .. What’s a command ?? CLI .. Just look at the examples they give us .. So what’s stopping me from baking a Docker image that has Ansible installed and using that as my custom build environment ?? Nothing .. All that’s left (<<famous last words) is to create a legitimate buildspec.yml and away you go:

Here’s a nice article that will show you how to setup and use the Jenkins AWS CodeBuild plugin ..

Purging Dead AWS Route53 Records via Ansible

We aren’t a big shop, but we use AWS Autoscale Groups .. That means nodes spin up and down all day long .. Got some traffic ?? Add nodes .. Traffic is low ?? Drop nodes .. Someone accidentally terminates a node ?? Add nodes .. Someone sneezes funny ?? Drop nodes ..

This goes on all day long to get as close to “right-sizing” our infrastructure as we can ..

We also name our nodes using a simple formula for various monitoring and orchestration purposes .. For example:



Yes, i understand an argument could be made this is an anti-pattern .. “Greg, you don’t need to name the stinking node .. It is cattle — nobody cares !!” .. I get it, I really do — but sometimes it’s nice just to have a name ..

So then, with all this spinning up and down of nodes and the related creation of Route53 records — you can end up with a lot of dead entries .. For whatever reason, you may wish purge these dead entries — and OBVIOUSLY you do NOT want to do this manually .. So what to do ??

Well, I might have a solution for you .. Go ahead and check out this GitHub repo: https://github.com/gkspranger/aws-route53-purge-dead-records .. It’s a simple playbook and role that will purge AWS Route53 records for a given hosted zone and a known naming pattern, while making sure to NOT delete records of nodes that are currently running ..

WARNING It can cause damage, so please be sure to review and understand what is going on with the playbook and role ..

Dear Nagios

Dear Nagios,

I’m sorry, but it’s over ..

No, no, no .. It’s not you, it’s me .. I’ve changed — let me explain ..

You’ve been great these past 7 years .. Better than great — you basically defined what monitoring means to me .. Yes, we’ve both said things we regret — but we always ended back together ..

This time, it’s different ..

I found someone ..

Their name is TIG (Telegraf, InfluxDB, Grafana) ..

STOP THAT !! THAT’S NOT NICE .. Yes, they’re younger than you — but it’s not about that .. NO !! NO !! NO !! I’m not going through a midlife crisis .. Like I said earlier — I’ve changed .. Simply put, I now care more about metrics now than I do monitoring ..

Yeesss, I know you offer performance data .. You know I know that .. We’ve dated for almost a decade — let’s at least be honest with one another ..

Uh huh .. Uh huh .. Uh huh .. No, not this time, not ever .. Uh huh .. Uh huh .. Uh huh .. Believe me, I wish there was a way, I just don’t see it happening ..

OK !! THAT’S ENOUGH !! This is getting abusive now — I’m going to leave ..

Thank you for being with me ..

Thank you for teaching me ..

But it’s time to go ..

Splunk Host Tags

Did you know you can tag a host in Splunk ?? I didn’t !! Do you know how much time tags would have saved me from having to craft a most excellent Splunk search to capture just the right hosts ?? Me neither — but I’m guessing it’s a lot ..

So instead of my searches looking like this:

# get all staging RMI nodes -- hard
index=* ( host=rmi1.s.* OR host=rmi2.s.* OR host=rmi3.s.* ) source=*tomcat* earliest=-1h

They can now look like this:

# get all staging RMI nodes -- easy
index=* tag=rmi tag=stage source=*tomcat* earliest=-1h

I know, I know — I could achieve the same level of excellence using targeted indexes (index=rmi_stage) and/or various regex filters .. Some of that, unfortunately, is out of my control ..

OK .. So how can you manage this without having to use the GUI ?? Easy !! You just need to drop a config file in the proper location (for me it’s: /opt/splunk/etc/system/local/tags.conf) on the search head, and away you go .. The syntax is pretty basic:

# tagging host login.example.net with PROD, TOMCAT, and LOGIN
prod = enabled
tomcat = enabled
login = enabled

Below’s a nice little example of how I automated this using Ansible (big surprise there 🙂 and the EC2 dynamic inventory script ..

He Shoots .. He Fails !!

For the past 6 months I have been working on a lean startup .. It was (<<infer all you want here) a chatbot interface for a financial services CRM — and did not end well 😦 That said, I did some of my best Ansible/AWS work and my server-side JavaScript (Node.js) and understanding of the Hubot internals improved exponentially ..

So I want to share !!

I can’t get into too many details, but the overall concept was that every customer would be running a micro instance with our custom Hubot code installed .. This instance would pull code updates, if any, every 5 minutes and infrastructure updates, if any, every 15 minutes ..  In addition, a customer could participate in pilot programs — AKA branch work ..

I really liked how I was able to mitigate the use of a “command node” and just run Ansible locally and on a schedule .. Also, I was able to automate pretty much everything — from VPC creation all the way to autoscaling groups ..

Anyway, here’s the link: https://github.com/gkspranger/failed-chatbot .. Maybe it will help one of you out there in Internets land ..