Linux‎ > ‎

ZFS monitoring script

Thank you for visiting this page, this page has been update in another link ZFS monitoring script


Here is the script I used for monitoring ZFS status, send e-mail if it detects problem. I drop it to /etc/cron.d/ to run once a day.
It's an easy perl script, the only thing you need to do is to setup an mail alias on you linux like system. On Redhad/SL, modify /etc/alias
add zfsinfo and raid for diffeent mail group, for example:
zfsinfo   storage@abc.com
raid        admin@abc.com

#!/usr/bin/perl -w

#set -v        # debugging tools
#set -x

use strict;
use POSIX;
use Sys::Hostname;
use File::Basename;

use vars qw( $msg $debug $host );

###############################################################################
# SUB:          sendmail
# PURPOSE:      mail message
#              
# ARGS:         $msg     - the message to be mailed
#
# RETURNS:      n/a
###############################################################################
sub sendmail {
  my $msg = $_[0];
  my $msg_flag = $_[1];

  ### Of course, the sysadmin should have editted /etc/aliases and added a
  ### real person to the alias for root! 
  my $mailfrom = 'root@localhost';
  my $mailtarget = 'raid@localhost';
  $mailtarget= 'zfsinfo@localhost' if($msg_flag == 2) ;
  my $sm = "/usr/sbin/sendmail -t";
  open( SENDMAIL, "|$sm" ) or die( "Cannot open $sm: $!" );
  print SENDMAIL "From: $mailfrom\n";
  print SENDMAIL "Reply-to: $mailtarget\n";
  print SENDMAIL "To: $mailtarget\n";
  print SENDMAIL "Subject: ZFS Status Warning\n";
  print SENDMAIL "Content-type: text/plain\n\n";
  print SENDMAIL "Dear System Adminstrator:\n\n";
  print SENDMAIL "$msg\n";
  close( SENDMAIL );
}

my $usage= "
  USAGE:\t$0 [ -v ]
  WHERE:
   -v\t - turns on verbose debugging\n";

( $#ARGV >= -1 and $#ARGV <= 0 ) or die( $usage );
$debug = 0;
while( (my $arg = shift @ARGV) ) {
  if( $arg eq "-v" ) {
    $debug = 1;
  }
  else {
    die( $usage );
  }
}

my $host = hostname() ;
## Run ZFS tool to get ZFS status
my (%cmdopt,$opt,$optval,$cmd) ;
my $msg_flag = 0 ;
my $statf = POSIX::tmpnam() ;
my $zfscmd="/usr/sbin/zpool" ;
$cmdopt{"status"}= "  status" ;
my $status_entry="";
my $poolname="";
while( ($opt,$optval) = each(%cmdopt)) {
   $statf = POSIX::tmpnam() ;
   $cmd= $zfscmd.$optval." >".$statf ;
   my $res = system( $cmd ) ;
   if( $res == 0 ) {
     if( !open( FILE, $statf ) ) {
       $msg.= "cmd:".$cmd." on ".$host." got wrong\n" ;
       $msg.=sprintf( "cannot open $statf: $!\n" ) ;
       $msg_flag =1 ;
     }
     my @status = <FILE> ;
     if( !close( FILE ) ) {
       $msg.= "cmd:".$cmd." on ".$host."got wrong\n" ;
       $msg.=sprintf( "cannot close $statf: $!\n" ) ;
       $msg_flag =1 ;
     }
     unlink $statf ;
     my( $entry ) ;
     $msg.="\n=========================================================\n";
     $msg.=sprintf(" %s Status Report on host %s:\n Tool %s%s\n ",$opt,$host,$zfscmd,$optval);
     foreach $entry (@status) {
       $msg.=$entry;
       chomp( $entry ) ;
       $entry =~ s/^(\s)+// ;
       my @fields = (split /\s+/, $entry) ;
       if(defined $fields[0] ) {
         $poolname=$fields[1] if($fields[0] eq "pool:");
         if( $fields[0] =~ /state:/ and $fields[1] ne "ONLINE") {
           $msg_flag=2;
         }
         if( $fields[0] =~ /raidz2/ and $fields[1] ne "ONLINE") {
           $msg_flag=2;
         }
       }
     }
   }
   else {
     $msg .= sprintf( "ZFS tool %s returned wrong value %d on host %s .\n",$cmd,$res,$host ) ;
     $msg .= sprintf( "Pls check %s on host %s .\n",$zfscmd, $host ) ;
     $msg_flag =1 ;
   }
}
if($msg_flag) {
   if($debug) {
     print $msg ;
     #sendmail( $msg,$msg_flag ) ;
   }
   else {
     #print $msg ;
     sendmail( $msg,$msg_flag ) ;
   }
   open LOG,">>/var/log/zfscheck.log" or die "Cannot open logfile : $!\n";
   my $ltime = strftime( "%b %d %H:%M:%S", localtime() );
   print LOG "$ltime ZFS checked something wrong\n";
}
else {
  open LOG,">>/var/log/zfscheck.log" or die "Cannot open logfile : $!\n";
  my $ltime = strftime( "%b %d %H:%M:%S", localtime() );
  print LOG "$ltime ZFS checks are happy\n";
  close LOG;
}



In the main time, I do ZFS scrub every four month, see the link for detail

https://sites.google.com/site/itmyshare/system-admin-tips-and-tools/zfs-periodic-scrub-script





Comments