Free Idea: Ansible + Jenkins x AWS CodeBuild = Infinite Scale (sorta)

As you know, I like me some Ansible, AWS, and Jenkins .. Did you know it’s not uncommon to use Ansible + Jenkins as an “automation platform” to manage your cloud infrastructure ?? I do this a lot — it’s easy, reusable, and works !! Think about the workflow:

  • Ansible code is committed to the Git repo
  • Jenkins job is triggered
  • Jenkins pulls code to a local workspace
    • don’t forget to make it shallow
  • Jenkins executes “shell” build step based on defined parameters
  • Ansible playbook executes against defined inventory
  • (allthethings) are automated

Given the above workflow, your Jenkins job might look something like this:

Jenkins job

That said, our Jenkins node has been running hot for the past 3 months (CPU >=90%, increased RAM usage, etc.) — and our knee-jerk reaction has been to scale vertically .. Yes, we know how to setup a Jenkins slave and have done this in the past — but there has to be an easier way to consume “transient infrastructure” (provision, configure, execute, destroy) .. We looked into integrating Jenkins with Lambda — but it no longer looked like normal Ansible “code” — so we punted on that .. There has to be a way to run a Jenkins job on some transient infrastructure without having to: 1) redo your workflow, and 2) having to care too much ..

Let’s Google — “jenkins aws plugin” .. Hello AWS CodeBuild !!

Here’s how AWS CodeBuild describes itself — “AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers.” My big take away — “you don’t need to provision, manage, and scale your own build servers” .. Do a simple regex on that last statement (s/build/automation/) and you can see where I am going with this ..

Checkout the plugin, it states — “Instead of sending your build jobs to Jenkins build nodes, you use the plugin to send your build jobs to AWS CodeBuild.” What’s a build job ?? Commands .. What’s a command ?? CLI .. Just look at the examples they give us .. So what’s stopping me from baking a Docker image that has Ansible installed and using that as my custom build environment ?? Nothing .. All that’s left (<<famous last words) is to create a legitimate buildspec.yml and away you go:

Here’s a nice article that will show you how to setup and use the Jenkins AWS CodeBuild plugin ..

Advertisements

Purging Dead AWS Route53 Records via Ansible

We aren’t a big shop, but we use AWS Autoscale Groups .. That means nodes spin up and down all day long .. Got some traffic ?? Add nodes .. Traffic is low ?? Drop nodes .. Someone accidentally terminates a node ?? Add nodes .. Someone sneezes funny ?? Drop nodes ..

This goes on all day long to get as close to “right-sizing” our infrastructure as we can ..

We also name our nodes using a simple formula for various monitoring and orchestration purposes .. For example:

asg-<application>-<environment>-<availability-zone>-<instance-id>.example.com

asg-login-prod-us-east-1a-i-abc123xyz.example.com

Yes, i understand an argument could be made this is an anti-pattern .. “Greg, you don’t need to name the stinking node .. It is cattle — nobody cares !!” .. I get it, I really do — but sometimes it’s nice just to have a name ..

So then, with all this spinning up and down of nodes and the related creation of Route53 records — you can end up with a lot of dead entries .. For whatever reason, you may wish purge these dead entries — and OBVIOUSLY you do NOT want to do this manually .. So what to do ??

Well, I might have a solution for you .. Go ahead and check out this GitHub repo: https://github.com/gkspranger/aws-route53-purge-dead-records .. It’s a simple playbook and role that will purge AWS Route53 records for a given hosted zone and a known naming pattern, while making sure to NOT delete records of nodes that are currently running ..

WARNING It can cause damage, so please be sure to review and understand what is going on with the playbook and role ..

Dear Nagios

Dear Nagios,

I’m sorry, but it’s over ..

No, no, no .. It’s not you, it’s me .. I’ve changed — let me explain ..

You’ve been great these past 7 years .. Better than great — you basically defined what monitoring means to me .. Yes, we’ve both said things we regret — but we always ended back together ..

This time, it’s different ..

I found someone ..

Their name is TIG (Telegraf, InfluxDB, Grafana) ..

STOP THAT !! THAT’S NOT NICE .. Yes, they’re younger than you — but it’s not about that .. NO !! NO !! NO !! I’m not going through a midlife crisis .. Like I said earlier — I’ve changed .. Simply put, I now care more about metrics now than I do monitoring ..

Yeesss, I know you offer performance data .. You know I know that .. We’ve dated for almost a decade — let’s at least be honest with one another ..

Uh huh .. Uh huh .. Uh huh .. No, not this time, not ever .. Uh huh .. Uh huh .. Uh huh .. Believe me, I wish there was a way, I just don’t see it happening ..

OK !! THAT’S ENOUGH !! This is getting abusive now — I’m going to leave ..

Thank you for being with me ..

Thank you for teaching me ..

But it’s time to go ..

Splunk Host Tags

Did you know you can tag a host in Splunk ?? I didn’t !! Do you know how much time tags would have saved me from having to craft a most excellent Splunk search to capture just the right hosts ?? Me neither — but I’m guessing it’s a lot ..

So instead of my searches looking like this:

# get all staging RMI nodes -- hard
index=* ( host=rmi1.s.* OR host=rmi2.s.* OR host=rmi3.s.* ) source=*tomcat* earliest=-1h

They can now look like this:

# get all staging RMI nodes -- easy
index=* tag=rmi tag=stage source=*tomcat* earliest=-1h

I know, I know — I could achieve the same level of excellence using targeted indexes (index=rmi_stage) and/or various regex filters .. Some of that, unfortunately, is out of my control ..

OK .. So how can you manage this without having to use the GUI ?? Easy !! You just need to drop a config file in the proper location (for me it’s: /opt/splunk/etc/system/local/tags.conf) on the search head, and away you go .. The syntax is pretty basic:

# tagging host login.example.net with PROD, TOMCAT, and LOGIN
[host=login.example.net]
prod = enabled
tomcat = enabled
login = enabled

Below’s a nice little example of how I automated this using Ansible (big surprise there 🙂 and the EC2 dynamic inventory script ..

He Shoots .. He Fails !!

For the past 6 months I have been working on a lean startup .. It was (<<infer all you want here) a chatbot interface for a financial services CRM — and did not end well 😦 That said, I did some of my best Ansible/AWS work and my server-side JavaScript (Node.js) and understanding of the Hubot internals improved exponentially ..

So I want to share !!

I can’t get into too many details, but the overall concept was that every customer would be running a micro instance with our custom Hubot code installed .. This instance would pull code updates, if any, every 5 minutes and infrastructure updates, if any, every 15 minutes ..  In addition, a customer could participate in pilot programs — AKA branch work ..

I really liked how I was able to mitigate the use of a “command node” and just run Ansible locally and on a schedule .. Also, I was able to automate pretty much everything — from VPC creation all the way to autoscaling groups ..

Anyway, here’s the link: https://github.com/gkspranger/failed-chatbot .. Maybe it will help one of you out there in Internets land ..

Qik-n-EZ: Nagios AWS EC2 Hosts via Ansible

Sooo .. You are monitoring a fleet of AWS EC2 hosts via Nagios, and have yet to find an easy way to manage their host definitions .. Good news (if you happen to be using Ansible dynamic inventories) !! I created an Ansible template that loops thru all your EC2s and creates them for you ..

In addition, you can easily define Nagios service dependencies, helping you zero in on the root problem more quickly ..