Sooo .. You are monitoring a fleet of AWS EC2 hosts via Nagios, and have yet to find an easy way to manage their host definitions .. Good news (if you happen to be using Ansible dynamic inventories) !! I created an Ansible template that loops thru all your EC2s and creates them for you ..
In addition, you can easily define Nagios service dependencies, helping you zero in on the root problem more quickly ..
Afraid of having too many AWS EC2 images and/or snapshots, thus running up your bill ?? Fear not !! I have you covered:
Nagios Plugins to Check AWS EC2 Images
Nagios Plugin to Check AWS EC2 Snapshots
This post is kinda dopey, but it might help one person out there in intertubes land .. That said, I have been using HipSaint for ……………….. 3 years ?? It’s great !! It posts Nagios alert information into HipChat:
- Log into Nagios
- Locate the host or service that is alerting
- Click the link
- View the details
- Acknowledge the alert, if needed
Y’all know I’m lazy though –right ?? I wish the HipChat message would just give me the links I want .. Well, now it can:
As you will see, I am simply appending the status and acknowledgement links to the Nagios “service output” .. I also use Ansible to populate variables such as:
- HipChat token
- HipChat room
- Nagios hostname
Now I can be as lazy as I want to be:
Hal is my Hubot chatbot .. He’s awesome !! He gets me beer !!
He also does things like restart app servers, deploy code, and show me pictures of grumpy cats .. He’s so cool, I’ve started making non-humans to talk to him .. “Greg, what do you mean ??” .. Well, let me show you ..
- I have a Nagios server
- It monitors (allthethings)
- When the “logged in users” alert is triggered, Nagios sends a message to my chat service using hipsaint
- “logged in users” is a monitor I have that alerts me when more than 3 users are logged into a server
- I see the alert and the server in question
- I SSH into the server
- I type who
- I then determine if I need to care
- If not, move on with my life
- If so, dig deeper
The thing is, I have more than 1,200 active monitors .. That means Nagios can and will send many, many messages to my chat service — depending on the day .. So how can I make my life easier ??
Here’s an easy one: ask Hal who’s on a server ..
My stack is HipChat -> Hubot -> Jenkins -> Ansible .. That means I can damn near do anything I want, all from my chat client ..
Remember what I said earlier — about making non-humans talk to Hal ?? What I did was create a Nagios event handler that sends a message to my chat service using HipChat CLI .. Therefore, I AM NOT asking Hal who’s on a server, it’s NAGIOS WHO IS doing it ..
It doesn’t stop there !! You can create scripted Splunk alerts as well .. Before you know it, you will be making (allthethings) talk to Hal ..
OMG !! It’s still hip to say that — right ??
Anyway .. While I consider myself to be reasonably intelligent, I still find myself doing dopey things from time to time .. For example: saying OMG .. Another dopey thing I have been doing for a very, very long time — are run-on sentence notes for my Nagios service definitions .. I am ashamed to admit how many times I have tried to “fix” this — but always errored out during the pre-flight check .. Yes — I am aware of Google, but I’ve never found anything definitive ..
And then it hit me like a bolt of lightening — Bash style line breaks ..
So obvious, yet so elusive ..