Monitoring RT

Nicholas_Clark · July 23, 2007, 2:13pm

We’re going to make the RT self-service interface visible to our external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nicholas Clark

Drew_Barnes · July 23, 2007, 2:15pm

Nagios (or something similar) monitoring each of the services seems the
easiest way to me. And if RT goes down, just have Nagios send an alert
into RT and…oh, wait…

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nicholas Clark

The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Drew Barnes
Applications Analyst
Network Resources Department
Raymond Walters College
University of Cincinnati

Nicholas_Clark · July 23, 2007, 2:17pm

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nagios (or something similar) monitoring each of the services seems the
easiest way to me. And if RT goes down, just have Nagios send an alert
into RT and…oh, wait…

Yes, but this doesn’t catch the case where the web server is working, the
database is working, but the mod_perl has got itself into a state where the
database handle is invalid and spewing errors, but DBI still thinks that it’s
connected. I was already assuming that the low level services could be
monitored easily.

Nicholas Clark

Drew_Barnes · July 23, 2007, 2:24pm

I suppose I’ve gotten lucky and never had this happen.

One option to check for that is hire a journalism major to hit F5 a lot
and make sure the site is working.

Nicholas Clark wrote:

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nagios (or something similar) monitoring each of the services seems the
easiest way to me. And if RT goes down, just have Nagios send an alert
into RT and…oh, wait…

Yes, but this doesn’t catch the case where the web server is working, the
database is working, but the mod_perl has got itself into a state where the
database handle is invalid and spewing errors, but DBI still thinks that it’s
connected. I was already assuming that the low level services could be
monitored easily.

Nicholas Clark

Drew Barnes
Applications Analyst
Network Resources Department
Raymond Walters College
University of Cincinnati

David_Svejda1 · July 23, 2007, 2:26pm

Nicholas Clark wrote:

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nagios (or something similar) monitoring each of the services seems the
easiest way to me. And if RT goes down, just have Nagios send an alert
into RT and…oh, wait…

Yes, but this doesn’t catch the case where the web server is working, the
database is working, but the mod_perl has got itself into a state where the
database handle is invalid and spewing errors, but DBI still thinks that it’s
connected. I was already assuming that the low level services could be
monitored easily.

Nicholas Clark

Hi Nicholas,

Nagios seems to me to be good enough for that. For example check_http
plugin with sensible options combination will make Nagios to log in -
when it fails, you’ll know that something went terrible wrong and users
can’t log in. Is that what you want?

David Svejda

Jay_Lee · July 23, 2007, 2:08pm

Nicholas Clark wrote:

Yes, but this doesn’t catch the case where the web server is working, the
database is working, but the mod_perl has got itself into a state where the
database handle is invalid and spewing errors, but DBI still thinks that it’s
connected. I was already assuming that the low level services could be
monitored easily.

You could probably write a simple bash script using lynx, wget or curl
that would login as a user (might want to create one just for this
purpose with very limited rights) and pull up some a small ticket. If
you link directly to the ticket then the query would only be pulling out
that tickets data rather than pulling “RT at a Glance” or a search would
use more juice than necessary.

Jay

jlee.vcf (234 Bytes)

James_Moseley · July 23, 2007, 2:22pm

That’s why folks hire system admins - so when things stop working, they can
restart them. Other than monitoring HTTP and MYSQL via Nagios, you
could always write a Nagios plugin that would bring up the RT login page
and login with a real username and password. If that is successful, then
you consider RT to be up. If the login attempt generates errors or times
out, then you can assume that RT is ‘down’ and Nagios generates an alert.

James Moseley

         Nicholas Clark                                                
         <nick@ccl4.org>                                               
         Sent by:                                                   To 
         rt-users-bounces@         Drew Barnes                         
         lists.bestpractic         <barnesaw@ucrwcu.rwc.uc.edu>        
         al.com                                                     cc 
                                   rt-users@lists.bestpractical.com    
                                                               Subject 
         07/23/2007 09:17          Re: [rt-users] monitoring RT        
         AM

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our
external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that
would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nagios (or something similar) monitoring each of the services seems the
easiest way to me. And if RT goes down, just have Nagios send an alert
into RT and…oh, wait…

Yes, but this doesn’t catch the case where the web server is working, the
database is working, but the mod_perl has got itself into a state where the
database handle is invalid and spewing errors, but DBI still thinks that
it’s
connected. I was already assuming that the low level services could be
monitored easily.

Nicholas Clark
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Nicholas_Clark · July 23, 2007, 2:35pm

That’s why folks hire system admins - so when things stop working, they can
restart them. Other than monitoring HTTP and MYSQL via Nagios, you
could always write a Nagios plugin that would bring up the RT login page
and login with a real username and password. If that is successful, then
you consider RT to be up. If the login attempt generates errors or times
out, then you can assume that RT is ‘down’ and Nagios generates an alert.

The (sort of) problem I have is that there are sysadmins, and they use RT
as end users (internally) but when I asked them how they wanted to set up
monitoring the system (if necessary to wake them up when it needs restarting,
or just TLC) I got a sort of “meh” answer, rather than what I was hoping for.
(I wanted “We do monitoring like this round here” so that there would then
be an obvious way to extend that to RT)

Hence the slightly daftly phrased question - I was hoping that a good
solution already exists that they’d then agree to quickly.

Nicholas Clark

Torsten_Brumm · July 23, 2007, 4:17pm

The best solution is to do it the way Jay told you. Create a queue for testing, create a user to login from monitoring system and also a test ticket.

Then call from the monitoring system the queue or ticket url.

Example:

http://your.rt.com/Ticket/Display.html?id=123456&user=monitor&pass=monitor

to get your ticket. Together with a small shell or perl scrip you can grab this info from nagios and also from cacti (to get also some performance values)

small script from our Unix Gurus to check the time rt needs to serve the site:

#!/bin/sh

/usr/bin/time -o /tmp/rtcheck -f %e wget -q -O /dev/null --header=“Host: your.rt.com” “http://${1}/Ticket/Display.html?id=123456&user=monitor&pass=monitor”
cat /tmp/rtcheck
rm /tmp/rtcheck

call it with: ./rtcheck.sh Your.physical.host if you have more then one it makes sense to catch also all webservers

Torsten-----Ursprüngliche Nachricht-----
Von: rt-users-bounces@lists.bestpractical.com [mailto:rt-users-bounces@lists.bestpractical.com] Im Auftrag von Nicholas Clark
Gesendet: Montag, 23. Juli 2007 16:36
An: James Moseley
Cc: rt-users@lists.bestpractical.com; rt-users-bounces@lists.bestpractical.com
Betreff: Re: [rt-users] monitoring RT

On Mon, Jul 23, 2007 at 09:22:31AM -0500, James Moseley wrote:

That’s why folks hire system admins - so when things stop working, they can
restart them. Other than monitoring HTTP and MYSQL via Nagios, you
could always write a Nagios plugin that would bring up the RT login page
and login with a real username and password. If that is successful, then
you consider RT to be up. If the login attempt generates errors or times
out, then you can assume that RT is ‘down’ and Nagios generates an alert.

The (sort of) problem I have is that there are sysadmins, and they use RT
as end users (internally) but when I asked them how they wanted to set up
monitoring the system (if necessary to wake them up when it needs restarting,
or just TLC) I got a sort of “meh” answer, rather than what I was hoping for.
(I wanted “We do monitoring like this round here” so that there would then
be an obvious way to extend that to RT)

Hence the slightly daftly phrased question - I was hoping that a good
solution already exists that they’d then agree to quickly.

Nicholas Clark
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Jesse_Vincent · July 23, 2007, 5:05pm

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our
external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages
that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working
database

If #1 is true, then, #2 is true as well. (You need to get to the db
to check the user.) So just a simple scripted login test will do you.
Or a GET of a page with user and pass passed in as parameters.

PGP.sig (186 Bytes)

Khai_Lam · July 23, 2007, 5:14pm

If that’s all that’s required, then the check_http plug-in provided
by Nagios will do the trick. There’s an option to check the html for
the presence of a text specific string (maybe “Search/Results.rdf” -
you’ll only get this string after successfully logging in). Saves
you the trouble of writing your own monitoring script, but only if
you are already using Nagios.

-KhaiOn Jul 23, 2007, at 10:05 AM, Jesse Vincent wrote:

On Jul 23, 2007, at 7:17 AM, Nicholas Clark wrote:

Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our
external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages
that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working
database

If #1 is true, then, #2 is true as well. (You need to get to the db
to check the user.) So just a simple scripted login test will do
you. Or a GET of a page with user and pass passed in as parameters.

The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Tom_Lanyon · July 25, 2007, 1:13am

We also have a basic perl script using WWW::Mechanize which opens the
login page, logs in with a ‘nagios’ user and ensures it gets a valid
response (we check for a 200 OK and that the content matches “RT at a
glance”. If this fails it generates a critical nagios alert and wakes
the on-call staff up.

Regards,
TomOn 23/07/2007, at 11:43 PM, Nicholas Clark wrote:

We’re going to make the RT self-service interface visible to our
external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages
that would
let us quickly tell that (say)

1: users can log in
2: the RT web application has a live connection to a working database

without burning lots of CPU?

Have I missed anything key to check?

Nicholas Clark

The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Tom Lanyon
Systems Administrator
NetSpot Pty Ltd

Mikko_Lipasti · July 25, 2007, 10:12am

We’re going to make the RT self-service interface visible to our
external
clients. We’d like to monitor it, so that we know if it’s down?

What’s the best way to monitor RT? Are there any built in pages that
would
let us quickly tell that (say)

Take a look at WWW::Mechanize on CPAN. Think of it as a scriptable web browser robot that plugs right into standard perl Test stuff (via Test::WWW::Mechanize) if you want to. You can do a lot with it, including interact with forms and submit them.

You could configure a test user account and do a login and logout with (Test::)WWW::Mechanize, checking (with regular expressions if you wish) that RT at a glance page looks about right.

Mikko