So here's how i run my healthcheck against CommuniGate Pro to see if all is alive and well. This Healtchcheck works by using a pair of CGP Servers which email each other on given intervals. If one of the servers doesn't receive such a "ping" email, it alerts me via SMS.
This method - which uses real emails instead of just connecting to the email-servers ports (25 for SMTP for example) - has the benefit, that it can reliably alert you in case of a "stuck" Queue, where a simple connection to port 25 would still work. A stuck Queue is most often caused by a crashed or hung external helper.
This method involves the usage of cron, a shell script to analyze the received "ping" emails and the CGPro PIPE-Module and a perl script to "pipe" the received emails directly to a file on the server's HardDisk instead of a real Mailbox.
For this Tutorial, let us assume you have two CGPro Servers, called: server1.example.com and server2.example.com which handle email for these two domains respectively. Let us also assume, that healthckeck interval of 30 Minutes is good enough for us.
Configuration on the 2 CGPro Servers
You need to configure 2 Special Router Entries on both servers. These will be similar on both of the servers. Add the foollowing line to your Router-Records on both servers:
<pings> = "queue[PROC1] piper.pl"@pipe
The above router entry will result in your server accepting emails to an account named "ping" in the PRIMARY Domain (the licensed domain) of the server. When receiving an email to this account, the server's pipe-module will pass the email to a script called piper.pl.
If you haven't done so yet, you need to configure the CommuniGate PRO Pipe Module using the Web-Admin -> Settings -> -> PIPE. For our tutorial, we assume you have configured the PIPE-Modules of both servers with the application-directory set to: /var/CommuniGate/apps After you configured the CGP Pipe, you need to make sure, that the "apps" directory exists and has the appropriate permissions.
Here's the source of the piper.pl script:
#!/usr/bin/perl open(OUTLOG, ">apps/pings.out"); while (<>){ print OUTLOG $_; } close(OUTLOG);
Create this piper.pl script in /var/CommuniGate/apps and make it executable
chmod +x /var/CommuniGate/apps/piper.pl
For now we're done with configuring CGPro and continue with the shell script we will use to check if the "ping" emails arrive properly. We'll call this script pings.sh and here's the source:
#!/bin/sh - # PATH=/bin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin cd /var/CommuniGate/apps if [ -f pings.out ]; then mv -f pings.out pings.out.old else echo "`date +%Y-%m-%d_%H:%M:%S` PIPE Hangs on Server2" | /usr/bin/mail -s "CGP ALERT" someone@pager fi
chmod +x /var/CommuniGate/apps/pings.sh
Now we have all the scripts we need in place and can continue setting up cron to fire all this off at specified intervals.
On Server1.example.com
Edit /etc/crontab and add the following lines:# send pings to server 2 5,35 * * * * root echo "ping" | /usr/bin/mail -s ping pings@server2.example.com # check pings received from server2 7,37 * * * * root /var/CommuniGate/apps/pings.sh
On Server2.example.com
Edit /etc/crontab and add the following lines:
# send pings to server 1 5,35 * * * * root echo "ping" | /usr/bin/mail -s ping pings@server1.example.com # check pings received from server1 7,37 * * * * root /var/CommuniGate/apps/pings.sh
Almost Done
This is all there is to this healthcheck setup. But read on!
PLEASE don't email me as soon as you hit some wall when you try to implement this. A day only has 24 Hours - also for me.So if you run into problems, please read this tutorial again and make sure you really followed it to the point. It took me quite some time to write this down, and i'd really appreciate if you could take at least the same amount of time reading this.
Thanks!
NOTES: Please note, that in the event of a hung Queue, all "ping" mails will PILE up and will get delivered in ONE go, after the Queue is working again. This will probably result in a bunch of PING Mails being sent to your pager. If for instance the Queue on Server1 hangs, it can't receive nor delive ping mails. This it thinks Server2 hangs and Queues up warning emails to your pager (or alert email-address). Once the Queue is working again, you'll get these emails which tell you Server2 is hung where in fact Server1 was hung. But these false alarms are easily identifyable.