Category: Ansible

Splunk Host Tags

Did you know you can tag a host in Splunk ?? I didn’t !! Do you know how much time tags would have saved me from having to craft a most excellent Splunk search to capture just the right hosts ?? Me neither — but I’m guessing it’s a lot ..

So instead of my searches looking like this:

# get all staging RMI nodes -- hard
index=* ( host=rmi1.s.* OR host=rmi2.s.* OR host=rmi3.s.* ) source=*tomcat* earliest=-1h

They can now look like this:

# get all staging RMI nodes -- easy
index=* tag=rmi tag=stage source=*tomcat* earliest=-1h

I know, I know — I could achieve the same level of excellence using targeted indexes (index=rmi_stage) and/or various regex filters .. Some of that, unfortunately, is out of my control ..

OK .. So how can you manage this without having to use the GUI ?? Easy !! You just need to drop a config file in the proper location (for me it’s: /opt/splunk/etc/system/local/tags.conf) on the search head, and away you go .. The syntax is pretty basic:

# tagging host with PROD, TOMCAT, and LOGIN
prod = enabled
tomcat = enabled
login = enabled

Below’s a nice little example of how I automated this using Ansible (big surprise there ūüôā and the EC2 dynamic inventory script ..


He Shoots .. He Fails !!

For the past 6 months I have been working on a lean startup .. It was (<<infer all you want here) a chatbot interface for a financial services CRM — and did not end well ūüė¶ That said, I did some of my best Ansible/AWS work and my server-side JavaScript (Node.js) and understanding of the Hubot internals improved exponentially ..

So I want to share !!

I can’t get into too many details, but the overall concept was that every customer would be running a micro instance with our custom Hubot code installed .. This instance would pull code updates, if any, every 5 minutes and infrastructure updates, if any, every 15 minutes .. ¬†In addition, a customer could participate in pilot programs — AKA branch work ..

I really liked how I was able to mitigate the use of a “command node” and just run Ansible locally and on a schedule .. Also, I was able to automate pretty much everything — from VPC creation all the way to autoscaling groups ..

Anyway, here’s the link:¬† .. Maybe it will help one of you out there in Internets land ..

Qik-n-EZ: Nagios AWS EC2 Hosts via Ansible

Sooo .. You are monitoring a fleet of AWS EC2 hosts via Nagios, and have yet to find an easy way to manage their host definitions .. Good news (if you happen to be using Ansible dynamic inventories) !! I created an Ansible template that loops thru all your EC2s and creates them for you ..

In addition, you can easily define Nagios service dependencies, helping you zero in on the root problem more quickly ..

An Ansible Project Setup that Scales

So I have been using Ansible for over two¬†years now ..¬†I use it for damn near everything — provisioning infrastructure, configuring nodes, deploying Web applications, testing whatever I can, and other ad hoc¬†tasks (sadly, I’m still working on a¬†“get me beer” playbook) .. ¬†Long story short, it’s been a game changer .. Problem is, as my (and my team’s — hi guys !!) Ansible usage grows (50+ playbooks and 130+ roles) —¬†so does my desire to organize it¬†in a way that is scaleable ..

There are many ways you can setup your Ansible project (one, two, three, four, etc ..) — which is great !!¬†That said, I love me¬†some simplicity .. After trying out a few setups, I finally settled on¬†one that works¬†for me .. You can see¬†it here ..

The view from 30,000 feet:

  • inventories
    • All of my inventory files, static and dynamic, will be children of the inventories directory
      • FULL DISCLOSURE:¬†by design, I only have two inventory files — localhost and¬†.. every once in a blue moon¬†we will add additional inventory files (PoCs, targeted testing, etc ..), but we¬†always fallback to only the two
    • Group and other¬†variables
      • the group_vars directory can be relative to an¬†inventory file, no matter where the¬†playbook is .. since I like to add some structure to my playbook organization, it makes¬†sense to put it here
      • variable files (i.e. Ansible Vault) that need to be explicitly¬†loaded go in the¬†vars directory
      • IN THEORY: you could also put a host_vars directory here .. but as y’all¬†know, host_vars are the devil
  • playbooks
    • Let me explain my logic¬†here .. In my work, I essentially perform five¬†functions: 1) provision infrastructure, 2) configure infrastructure, 3) deploy to¬†infrastructure, 4) test¬†infrastructure (and other things), and 5) save the world (i.e. ad hoc tasks) .. Knowing this:
      • construct/*
        • since the line between provisioning and configuring infrastructure can get murky, I combined the two “functions” into one directory
      • deploy/*
        • guess what goes here ūüôā
      • test/*
      • adhoc/*
        • the one-offs that don’t fit nicely anywhere — but here
  • roles
    • resource/*
      • these are the bits and pieces that make up a useful¬†piece of infrastructure .. for example, we have resource roles for AEM, Bitbucket, CPANm, .forward, Java (1.6, 1.7, and 1.8), Apache, Netcat, Nagios, NRPE, etc .. all solid implementations .. all reusable ..
      • STYLE ALERT: we DON’T use Ansible Galaxy — directly .. we often refer to it for inspiration — but always end up using our own implementation
    • <primary>
      • this is a “completed”piece of infrastructure ..¬†the sum of the (resource) parts .. the cherry on top .. for example, we have primary roles for user_web_api_server, internal_dns_server, aem_author_server, aem_publish_server, etc ..
    • action/*
      • this is what I “do” to my infrastructure that doesn’t maintain any real “resource” …. for example, we have action¬†roles for silencing/unsilencing Nagios, deploying code, restarting application servers, taking data centers offline, etc ..

The view from 30 feet:

  • PROBLEM:¬†“I want my playbooks to be able to refer to my¬†roles in a way that is¬†consistent and easy” .. Great idea¬†!! The problem is, your¬†playbooks and roles are gonna be all over the place — so you¬†can’t¬†take advantage of the “relative referencing”¬†you can do in a¬†traditional project structure .. Let’s also assume you¬†don’t want to define an absolute path in the config file ..
  • SOLUTION: Hello bootstrap !! Take a path that will always be consistent (inventory_dir) and use it to define other paths with some regex magic¬†.. When finished, shove it all into a external variables file ..

  • FINALLY:¬†Refer to the¬†bootstrap, early and often, in every playbook you¬†create, ¬†..

Again, this project setup works for me¬†.. “Me” being myself and 4 other sys admins ..

Anywho — I need to get back to work on that¬†“get me beer” playbook ..

Make Everyone Talk to Hal !!

Hal is my Hubot chatbot .. He’s awesome !!¬†He¬†gets me beer !!

hal beer me

hal beer me

He also does things like restart app servers, deploy code, and show me pictures of grumpy cats ..¬†He’s so cool, I’ve started making¬†non-humans to talk to him .. “Greg, what do you mean¬†??” .. Well, let me¬†show you¬†..


  1. I have a Nagios server
  2. It monitors (allthethings)
  3. When the “logged in users”¬†alert is triggered, Nagios¬†sends a message to my chat service using hipsaint
    1. “logged in users” is a monitor I have that alerts me when more than 3¬†users are logged into a¬†server
  4. I see the alert and the server in question
  5. I SSH into the server
  6. I type who
  7. I then determine if I need to care
    1. If not, move on with my life
    2. If so, dig deeper

The thing is, I have more than 2,200 active monitors ..¬†That means Nagios can and will send many, many¬†messages to my chat service¬†— depending on the day .. So how can I make my life easier ??

Here’s an easy one:¬†ask Hal who’s on a server ..

hal whos on

hal whos on


hal whos on

hal whos on

My stack is HipChat -> Hubot -> Jenkins -> Ansible .. That means I can damn near do anything I want, all from my chat client ..

Remember what I said earlier —¬†about making non-humans talk to Hal ?? What I did was¬†create a Nagios event handler that¬†sends a message to my chat service using HipChat CLI¬†.. Therefore,¬†I AM NOT asking Hal who’s on a server, it’s NAGIOS WHO IS¬†doing it ..

nagios hal whos on

nagios hal whos on

It doesn’t stop there !!¬†You can create scripted Splunk alerts as well .. Before you know it, you will be making (allthethings) talk to Hal ..

Qik-n-EZ: Collect AEM Thread Dumps and Email via Ansible

So one of our AEM nodes was freaking out the other day .. No, not the election results .. Some code was deployed to it that had¬†runaway processes, thus gorging itself¬†on CPU and memory ..¬†EEEKK !! What to do ?? If you’ve been around AEM for awhile, you know how we love our raw thread dumps¬†.. That being said, I really dislike¬†the process¬†on how to obtain them:

  1. Log into the node
  2. Get the Java PID
  3. Execute jstack and output to a log file
    1. Repeat every 10 seconds for at least one minute
  4. Compress the log file
  5. Share the compressed log file
    1. Typically via email

So how can I automate this ?? Easy !!

Create a Bash script that will generate the thread dumps for you ..

Create an Ansible playbook that will execute the script, compress the log file, and email it