Qik-n-EZ: Using a Shell’s STDOUT in a Chef ruby_block

Chef gets a bad rap for being “hard” — especially when compared to Ansible .. This is especially true when developers of Chef cookbooks don’t understand the two-pass model .. A common question amongst new Chef cookbook developers goes something like this: “I want to run a shell command and capture its output (i.e. STDOUT) .. I then want to loop through that shell command’s output to run other Chef resources, and I want those other Chef resources to be aware of the shell command’s output .. How can I do that ??” .. Sometimes it’s easier to use bullets:

  • run a shell command
    • the output of the shell command == A, B, C
  • loop through the shell command’s output one at a time
    • A
    • B
    • C
  • run other Chef resources per loop item
    • other Chef resources are aware of the loop item value

Often, the new Chef cookbook developer will try and run a Ruby loop in the raw, which will execute during the compile phase — this is a big no-no .. Instead, you should run the shell command, using shell_out, in a Chef ruby_block resource .. This will give you access to the shell command’s output where you can manipulate the output using good ol’ fashioned Ruby, and then “notify” other Chef resources as desired ..

One thing to be aware of is to make sure you are lazily evaluating properties in a resource that can’t be known until the execution phase of the chef-client run. Alan Thatcher has a nice writeup here about the hows and the whys ..

Enough talk — show me the code !!


ruby_block 'shell out fun' do
block do
# this is me running my command and assigning the output to a var
ls = shell_out('ls /var/')
# this is me assigning the STDOUT to a var
raw_output = ls.stdout
# i used the Ruby p method to print out the raw chars
# then i knew how to manipulate the string
# in this case i am going to split the output by newlines and store it in an array
clean_output = raw_output.split(/\n/)
# here i am looping thru the array
clean_output.each do |line|
# setting the temp var you want to store this in and use later
node.run_state[:output] = line
# here i go notifying the resource i want to invoke knowing i have set
# the temp variable i plan to use later
resources(:execute => 'say_hello').run_action(:run)
end
end
end
execute 'say_hello' do
# here i am going to lazy fetch the command value
# so it will grab the latest value of the temp var we are using
command lazy { "echo 'hello #{node.run_state[:output]}'" }
# don't forget to make this do NOTHING until notified
action :nothing
end

view raw

recipe.rb

hosted with ❤ by GitHub

So, is Chef “hard” ?? I would argue no — but it sure is nuanced, and for good reason ..

Qik-n-EZ: Secret Ohai Directory in Chef Cookbooks

Do you code Chef Cookbooks ?? Do you think Ohai plugins are so awesome that you’ve created some of your own ?? Do you really, really dislike having to depend on a community cookbook to deploy your custom Ohai plugins ?? Well, then I have some good news !!

There’s a secret top-level cookbook directory where you can put your custom Ohai plugins that will be synced and reloaded during a chef-client run .. Want to guess the name of this directory ?? My guess is you will only need one ..

./<cookbook name>/ohai/<ohai plugin>

Why is it a secret ?? Well, my guess is they simply forgot to document it, since it’s been an available feature since the 13.x days: https://github.com/chef-boneyard/chef-rfc/blob/master/rfc059-ohai-cookbook-segment.md

UPDATE

The ohai directory is not so secret anymore — see here ..

Free Idea: Ansible + Jenkins x AWS CodeBuild = Infinite Scale (sorta)

As you know, I like me some Ansible, AWS, and Jenkins .. Did you know it’s not uncommon to use Ansible + Jenkins as an “automation platform” to manage your cloud infrastructure ?? I do this a lot — it’s easy, reusable, and works !! Think about the workflow:

  • Ansible code is committed to the Git repo
  • Jenkins job is triggered
  • Jenkins pulls code to a local workspace
    • don’t forget to make it shallow
  • Jenkins executes “shell” build step based on defined parameters
  • Ansible playbook executes against defined inventory
  • (allthethings) are automated

Given the above workflow, your Jenkins job might look something like this:

Jenkins job

That said, our Jenkins node has been running hot for the past 3 months (CPU >=90%, increased RAM usage, etc.) — and our knee-jerk reaction has been to scale vertically .. Yes, we know how to setup a Jenkins slave and have done this in the past — but there has to be an easier way to consume “transient infrastructure” (provision, configure, execute, destroy) .. We looked into integrating Jenkins with Lambda — but it no longer looked like normal Ansible “code” — so we punted on that .. There has to be a way to run a Jenkins job on some transient infrastructure without having to: 1) redo your workflow, and 2) having to care too much ..

Let’s Google — “jenkins aws plugin” .. Hello AWS CodeBuild !!

Here’s how AWS CodeBuild describes itself — “AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers.” My big take away — “you don’t need to provision, manage, and scale your own build servers” .. Do a simple regex on that last statement (s/build/automation/) and you can see where I am going with this ..

Checkout the plugin, it states — “Instead of sending your build jobs to Jenkins build nodes, you use the plugin to send your build jobs to AWS CodeBuild.” What’s a build job ?? Commands .. What’s a command ?? CLI .. Just look at the examples they give us .. So what’s stopping me from baking a Docker image that has Ansible installed and using that as my custom build environment ?? Nothing .. All that’s left (<<famous last words) is to create a legitimate buildspec.yml and away you go:


version: 0.2
phases:
build:
commands:
– ansible-playbook -i $INVENTORY $PLAYBOOK \
–limit $LIMIT –extra-vars $EXTRA –tags $TAGS

view raw

buildspec.yml

hosted with ❤ by GitHub

Here’s a nice article that will show you how to setup and use the Jenkins AWS CodeBuild plugin ..

Purging Dead AWS Route53 Records via Ansible

We aren’t a big shop, but we use AWS Autoscale Groups .. That means nodes spin up and down all day long .. Got some traffic ?? Add nodes .. Traffic is low ?? Drop nodes .. Someone accidentally terminates a node ?? Add nodes .. Someone sneezes funny ?? Drop nodes ..

This goes on all day long to get as close to “right-sizing” our infrastructure as we can ..

We also name our nodes using a simple formula for various monitoring and orchestration purposes .. For example:

asg-<application>-<environment>-<availability-zone>-<instance-id>.example.com

asg-login-prod-us-east-1a-i-abc123xyz.example.com

Yes, i understand an argument could be made this is an anti-pattern .. “Greg, you don’t need to name the stinking node .. It is cattle — nobody cares !!” .. I get it, I really do — but sometimes it’s nice just to have a name ..

So then, with all this spinning up and down of nodes and the related creation of Route53 records — you can end up with a lot of dead entries .. For whatever reason, you may wish purge these dead entries — and OBVIOUSLY you do NOT want to do this manually .. So what to do ??

Well, I might have a solution for you .. Go ahead and check out this GitHub repo: https://github.com/gkspranger/aws-route53-purge-dead-records .. It’s a simple playbook and role that will purge AWS Route53 records for a given hosted zone and a known naming pattern, while making sure to NOT delete records of nodes that are currently running ..

WARNING It can cause damage, so please be sure to review and understand what is going on with the playbook and role ..

Dear Nagios

Dear Nagios,

I’m sorry, but it’s over ..

No, no, no .. It’s not you, it’s me .. I’ve changed — let me explain ..

You’ve been great these past 7 years .. Better than great — you basically defined what monitoring means to me .. Yes, we’ve both said things we regret — but we always ended back together ..

This time, it’s different ..

I found someone ..

Their name is TIG (Telegraf, InfluxDB, Grafana) ..

STOP THAT !! THAT’S NOT NICE .. Yes, they’re younger than you — but it’s not about that .. NO !! NO !! NO !! I’m not going through a midlife crisis .. Like I said earlier — I’ve changed .. Simply put, I now care more about metrics now than I do monitoring ..

Yeesss, I know you offer performance data .. You know I know that .. We’ve dated for almost a decade — let’s at least be honest with one another ..

Uh huh .. Uh huh .. Uh huh .. No, not this time, not ever .. Uh huh .. Uh huh .. Uh huh .. Believe me, I wish there was a way, I just don’t see it happening ..

OK !! THAT’S ENOUGH !! This is getting abusive now — I’m going to leave ..

Thank you for being with me ..

Thank you for teaching me ..

But it’s time to go ..

Splunk Host Tags

Did you know you can tag a host in Splunk ?? I didn’t !! Do you know how much time tags would have saved me from having to craft a most excellent Splunk search to capture just the right hosts ?? Me neither — but I’m guessing it’s a lot ..

So instead of my searches looking like this:

# get all staging RMI nodes -- hard
index=* ( host=rmi1.s.* OR host=rmi2.s.* OR host=rmi3.s.* ) source=*tomcat* earliest=-1h

They can now look like this:

# get all staging RMI nodes -- easy
index=* tag=rmi tag=stage source=*tomcat* earliest=-1h

I know, I know — I could achieve the same level of excellence using targeted indexes (index=rmi_stage) and/or various regex filters .. Some of that, unfortunately, is out of my control ..

OK .. So how can you manage this without having to use the GUI ?? Easy !! You just need to drop a config file in the proper location (for me it’s: /opt/splunk/etc/system/local/tags.conf) on the search head, and away you go .. The syntax is pretty basic:

# tagging host login.example.net with PROD, TOMCAT, and LOGIN
[host=login.example.net]
prod = enabled
tomcat = enabled
login = enabled

Below’s a nice little example of how I automated this using Ansible (big surprise there 🙂 and the EC2 dynamic inventory script ..


# AWS EC2 hosts
# using ANSIBLE to assign tags
{% macro eze(tag) -%}
{# this is an easy, consistent way to enable a tag #}
{{ tag }} = enabled
{%- endmacro %}
{# loop thru all your EC2 hosts in alpha order #}
{% for i in ansible_play_batch | sort %}
{# set the node name AND node vars #}
{% set node_name=i %}
{% set node_vars=hostvars[ i ] %}
# tags for {{ node_name }}
[host={{ node_name }}]
{# set custom enviro tag — dev, stage, prod, etc #}
{{ eze(node_vars.enviro) }}
{# set std AWS metadata tag — region, AZ, etc #}
{{ eze(node_vars.ansible_ec2_placement_region) }}
{{ eze(node_vars.ansible_ec2_instance_id) }}
{{ eze(node_vars.ansible_ec2_placement_availability_zone) }}
{# set custom tag if condition is met — app type, etc #}
{% if node_vars.app_type is defined %}
{{ eze(node_vars.app_type) }}
{% endif %}
{% endfor %}

He Shoots .. He Fails !!

For the past 6 months I have been working on a lean startup .. It was (<<infer all you want here) a chatbot interface for a financial services CRM — and did not end well 😦 That said, I did some of my best Ansible/AWS work and my server-side JavaScript (Node.js) and understanding of the Hubot internals improved exponentially ..

So I want to share !!

I can’t get into too many details, but the overall concept was that every customer would be running a micro instance with our custom Hubot code installed .. This instance would pull code updates, if any, every 5 minutes and infrastructure updates, if any, every 15 minutes ..  In addition, a customer could participate in pilot programs — AKA branch work ..

I really liked how I was able to mitigate the use of a “command node” and just run Ansible locally and on a schedule .. Also, I was able to automate pretty much everything — from VPC creation all the way to autoscaling groups ..

Anyway, here’s the link: https://github.com/gkspranger/failed-chatbot .. Maybe it will help one of you out there in Internets land ..

Qik-n-EZ: Nagios AWS EC2 Hosts via Ansible

Sooo .. You are monitoring a fleet of AWS EC2 hosts via Nagios, and have yet to find an easy way to manage their host definitions .. Good news (if you happen to be using Ansible dynamic inventories) !! I created an Ansible template that loops thru all your EC2s and creates them for you ..

In addition, you can easily define Nagios service dependencies, helping you zero in on the root problem more quickly ..


{# loop thru all relevant nodes #}
{% for i in ansible_play_batch | sort %}
{# set the node name AND node vars #}
{% set node_name=i %}
{% set node_vars=hostvars[ i ] %}
{# define the nagios hostgroup vars associated with this node .. default to linux #}
{% if node_vars.nagios_hostgroups is defined %}
{% set node_hostgroups="linux," ~ node_vars.nagios_hostgroups %}
{% else %}
{% set node_hostgroups="linux" %}
{% endif %}
{# define the nagios hostgroup arr so we can do some easy checking #}
{% set node_hostgroups_arr=node_hostgroups.split(",") %}
#############################
## START {{ node_name }}
#############################
define host {
use linux-server
host_name {{ node_name }}
address {{ node_vars.ansible_ec2_local_ipv4 | default('127.0.0.1') }}
hostgroups {{ node_hostgroups }}
# it's nice to have some ec2 data as nagios host vars
_ec2_instance_id {{ node_vars.ansible_ec2_instance_id }}
_ec2_instance_type {{ node_vars.ansible_ec2_instance_type }}
_ec2_placement_availability_zone {{ node_vars.ansible_ec2_placement_availability_zone }}
_ec2_placement_region {{ node_vars.ansible_ec2_placement_region }}
_ec2_security_groups {{ node_vars.ansible_ec2_security_groups }}
}
#############################
## NRPE dependencies
## if NRPE aint available, these will fail
#############################
# SWAP
define servicedependency {
host_name {{ node_name }}
service_description NRPE Port
dependent_host_name {{ node_name }}
dependent_service_description Swap
execution_failure_criteria n
notification_failure_criteria w,u,c
}
# CPU
define servicedependency {
host_name {{ node_name }}
service_description NRPE Port
dependent_host_name {{ node_name }}
dependent_service_description CPU
execution_failure_criteria n
notification_failure_criteria w,u,c
}
# you can go on and on here ..
# for a very long time ..
#############################
## END {{ node_name }}
#############################
{% endfor %}

view raw

ec2_hosts.cfg

hosted with ❤ by GitHub

Qik-n-EZ: Nagios Plugins to Check AWS EC2 Images and Snapshots

Afraid of having too many AWS EC2 images and/or snapshots, thus running up your bill ?? Fear not !! I have you covered:

Nagios Plugins to Check AWS EC2 Images


#!/bin/bash
#
# checks for the number of AWS AMIs and allows alerts when threshold is met
# example usage:
# ./check_aws_amis.sh -w <integer> -c <integer>
###
### USES ANSIBLE to put AWS KEYs as VARS
### USES ANSIBLE to define NAGIOS REGION
###
### REQUIRES AWS CLI : https://aws.amazon.com/cli/
### REQUIRES JQ : https://stedolan.github.io/jq/
export AWS_ACCESS_KEY_ID={{ aws_access_key_id }}
export AWS_SECRET_ACCESS_KEY={{ aws_secret_access_key }}
warn=NULL
critical=NULL
help () {
cat << EOF
Check number of AMIs in AWS.
Usage:
check_aws_amis.sh -w <warning number> -c <critical number>
Options:
-h,
Print detailed help screen
-w INTEGER
Exit with WARNING status if greater than INTEGER
-c INTEGER
Exit with CRITICAL status if greater than INTEGER
EOF
exit 3
}
while getopts "w:c:h" opt; do
case $opt in
w)
warn="$OPTARG"
;;
c)
critical="$OPTARG"
;;
h)
help
;;
esac
done
if [[ "$warn" == "NULL" ]] || [[ "$critical" == "NULL" ]]; then
help
fi
amis=`aws ec2 describe-images –owners self –region {{ ansible_ec2_placement_region }} | jq -r '.Images | length'`
if [ $amis -ge $critical ]; then
dostatus="CRITICAL"
doexit=2
elif [ $amis -ge $warn ]; then
dostatus="WARNING"
doexit=1
elif [ $amis -lt $warn ]; then
dostatus="OK"
doexit=0
else
dostatus="UNKOWN"
doexit=3
fi
echo "AWS AMIs ${dostatus} – ${amis} AWS AMIs | amis=${amis};0;0;0;0"
exit $doexit

Nagios Plugin to Check AWS EC2 Snapshots


#!/bin/bash
#
# checks for the number of AWS snapshots and allows alerts when threshold is met
# example usage:
# ./check_aws_snapshots.sh -w <integer> -c <integer>
###
### USES ANSIBLE to put AWS KEYs as VARS
### USES ANSIBLE to define NAGIOS REGION
###
### REQUIRES AWS CLI : https://aws.amazon.com/cli/
### REQUIRES JQ : https://stedolan.github.io/jq/
export AWS_ACCESS_KEY_ID={{ aws_access_key_id }}
export AWS_SECRET_ACCESS_KEY={{ aws_secret_access_key }}
warn=NULL
critical=NULL
help () {
cat << EOF
Check number of snapshots in AWS.
Usage:
check_aws_snapshots.sh -w <warning number> -c <critical number>
Options:
-h,
Print detailed help screen
-w INTEGER
Exit with WARNING status if greater than INTEGER
-c INTEGER
Exit with CRITICAL status if greater than INTEGER
EOF
exit 3
}
while getopts "w:c:h" opt; do
case $opt in
w)
warn="$OPTARG"
;;
c)
critical="$OPTARG"
;;
h)
help
;;
esac
done
if [[ "$warn" == "NULL" ]] || [[ "$critical" == "NULL" ]]; then
help
fi
snapshots=`aws ec2 describe-snapshots –owner-ids self –region {{ ansible_ec2_placement_region }} | jq -r '.Snapshots | length'`
if [ $snapshots -ge $critical ]; then
dostatus="CRITICAL"
doexit=2
elif [ $snapshots -ge $warn ]; then
dostatus="WARNING"
doexit=1
elif [ $snapshots -lt $warn ]; then
dostatus="OK"
doexit=0
else
dostatus="UNKOWN"
doexit=3
fi
echo "AWS Snapshots ${dostatus} – ${snapshots} AWS Snapshots | snapshots=${snapshots};0;0;0;0"
exit $doexit

Hubot with Handlebars :-3

So you’ve met Hal — he’s my bud .. That said, I’ve never been a fan of how I “reply” to Hubot commands .. For example:


// Commands:
// hubot hello – replies with "world"!
const logger = require("winston");
module.exports = function(robot) {
robot.respond(/hello$/i, { id: "hello" }, function(message) {
message.send("world!");
logger.log("info", "i like to log stuff!", { yourObject: "some random data" });
});
};

view raw

hello.js

hosted with ❤ by GitHub

“Greg, what’s your problem ?? Hal brings you beer !!” .. True, true .. But I want more !! I want to be more like my developer friends and apply an MVC design pattern to my Hubot development .. Specifically, I want my “views” to be beautiful and maintainable, while having the ability to use complex data “models” ..

Hello Handlebars !! Long story short, I can easily build semantic templates, compile output, and send as a Hubot reply .. For example:


// Commands:
// hubot handlebar me – example of how to use handlebars
const logger = require("winston");
const handlebars = require("handlebars");
const fs = require("fs");
module.exports = function(robot) {
// this is the controller
robot.respond(/handlebar me/i, function(message) {
// this is me getting the view
fs.readFile('./views/mytemplate.txt', 'utf-8', function(error, source) {
var template = handlebars.compile(source);
// this is me passing the model into the view
var output = template({
line1: "this is line 1",
arr1: [
"this",
"is",
"an",
"array"
],
obj1: {
obj: "object within an object"
}
});
// this is me replying to the hubot command
message.send(output);
});
});
};

Here’s the template:


{{!–
# these are comments .. they will not be displayed
# i often put what the object looks like in here so i don't forget
{
line1: …,
arr1: […],
obj1: {
obj: …
}
}
–}}
Line 1: {{line1}}
{{#each arr1}}
Item {{@index}}: {{this}}
{{/each}}
My object inside an object: {{obj1.obj}}

Here’s the Slack output: