Distributive - Health Check

Introduction

Distributive is a tool for running distributed health checks in datacenters. It was designed with Consul in mind, but is platform agnostic. It is simple to configure (with JSON checklists) and easy to deploy and run.

The exit code meanings are defined as Consul, Kubernetes, Sensu, and Nagios recognize them.

Exit code 0 - Checklist is passing

Exit code 1 - Checklist is warning

Any other code - Checklist is failing

As of right now, only exit codes 0 and 1 are used, even if a checklist fails.

Installation

Install Golang

sudo yum install golang -y

$ go version

go version go1.4.2 linux/amd64

Install Distributive

sudo rpm -i https://github.com/CiscoCloud/distributive/releases/download/v0.2/distributive-0.2-1.x86_64.rpm

Verify Distributive

$ distributive --help

NAME:

Distributive - Perform distributed health tests

USAGE:

Distributive [global options] command [command options] [arguments...]

VERSION:

0.2

COMMANDS:

help, h Shows a list of commands or help for one command

GLOBAL OPTIONS:

--verbosity 'warn' info | debug | fatal | error | panic | warn

--file, -f Read a checklist from a file

--url, -u Read a checklist from a URL

--directory, -d '/etc/distributive.d/' Read all of the checklists in this directory

--stdin, -s Read data piped from stdin as a checklist

--help, -h show help

--version, -v print the version

FATA[0000] Neither file, URL, directory, nor stdin specified. Try --help.

Getting Started

Install Kakfa

Follow the How-To Kafka to Install Kafka.

Create Distributive Config directory

All distributive application checklist will be present in this directory.

Distributive will run and pick all config in this directory and run checklist for each and report passed and failed checks.

sudo mkdir /etc/distributive.d

Define Distributive checklist for Kafka topic list and sshd service

The checklist could specify variety of checks from command, service to network\port.

$ sudo cat /etc/distributive.d/kafka.json

{

"Name": "Kafka Checklist",

"Notes": "A checklist that has checks, for kafka!",

"Checklist": [

{

"Name": "Kafka check list of topic",

"Notes": "Check kafka topic list is present.",

"Check": "command",

"Parameters": ["bin/kafka-topics.sh --list --zookeeper localhost:2181"]

},

{

"Name": "SSHD service",

"Notes": "Check sshd service status.",

"Check": "systemctlActive",

"Parameters": ["sshd"]

},

{

"Name": "Miscserver port",

"Notes": "Check misc servert port.",

"Check": "port",

"Parameters": ["9099"]

}

]

}

Execute Distributive

$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=

Total: 3

Passed: 2

Failed: 1

Port not open:

Specified: 9099

Actual:

Distributive detecting service state

SSHD service check with above json demonstrated distributive detecting service up\down with specified service check.

Similar, various checks for network, service, package could be enforced.

# SSHD is enabled

$ systemctl status sshd

sshd.service - OpenSSH server daemon

Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled)

$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=

Total: 3

Passed: 2

Failed: 1

Port not open:

Specified: 9099

Actual:

#SSHD is disabled

$ systemctl stop sshd

$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=

Total: 3

Passed: 1

Failed: 2

Service not active:

Specified: sshd

Actual ActiveState=Inactive

Port not open:

Specified: 9099

Actual:

#SSHD is enabled

$ systemctl start sshd

$ distributive -d /etc/distributive.d/

WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=

Total: 3

Passed: 2

Failed: 1

Port not open:

Specified: 9099

Actual:

Distributive health-check jsons in Worker VM

The distributive health check json packing in Worker VM would involve following actions:

0. Directory Structure for jsons

Distributive Parent Directory : /etc/distributive.d

Images Directory : /etc/distributive.d/images

Resource Image Directory : /etc/distributive.d/images/resource

Deployment Image Directory : /etc/distributive.d/images/deployment

Services Directory : /etc/distributive.d/services

Logging Service Directory : /etc/distributive.d/services/logging

Controller Directory : /etc/distributive.d/controller

Service Controller Directory : /etc/distributive.d/controller/service

App Controller Directory : /etc/distributive.d/controller/app

1. Platform Base Team

The distributed config directory would be pre-created with worker vm image at /etc/distributive.d/

2. Service Team e.g. Kafka or any other Applications

The Application team would write the health check json for Application for Distributive.

The Application installation would include Application json copy to Distributive directory.

3. Distributive involation with config directory

The distributive would be invoked with config directory so all jsons get executed and Distributive report gets generated.

4. Distribute to attach as registered service with Consul

Distribute would register as service in consul with distributive config directory.

If any check in Distributive config json fails, consul will create service failure warning, after which Admin could execute Distribute to get detailed checklist report and take corrective action.

The distributive service registration in Consul will happen via json as below

$ sudo cat /etc/consul.d/distributive.json

{

"service": {

"name": "Distributive",

"check": {

"script": "/usr/bin/distributive -d /etc/distributive.d/,

"interval": "10s"

}

}

}

Getting Started with Consul

Install Consul

$ mkdir consul

$ cd consul

$ wget https://dl.bintray.com/mitchellh/consul/0.5.1_linux_amd64.zip

$ unzip 0.5.1_linux_amd64.zip

$ sudo mkdir /etc/consul.d

$ sudo cat /etc/consul.d/distributive.json

{

"service": {

"name": "Distributive",

"check": {

"script": "/usr/bin/distributive -f /etc/distributive.d/checklist.json -d ''",

"interval": "10s"

}

}

}

Distributive Config

$ sudo ls /etc/distributive.d

samples

$ sudo cat /etc/distributive.d/checklist.json

{

"Name": "My first checklist",

"Notes": "A checklist that has checks, really!",

"Checklist": [

{

"Name": "Git installation check",

"Notes": "If I don't have git, I don't know what I'll do.",

"Check": "Installed",

"Parameters": ["git"]

}

]

}

]$ sudo cat /etc/distributive.d/checklist.json

{

"Name": "My first checklist",

"Notes": "A checklist that has checks, really!",

"Checklist": [

{

"Name": "Git installation check",

"Notes": "If I don't have git, I don't know what I'll do.",

"Check": "Installed",

"Parameters": ["git"]

},

{

"Name": "Git installation check",

"Notes": "If I don't have git, I don't know what I'll do.",

"Check": "Installed",

"Parameters": ["docker"]

},

{

"Check": "port",

"Parameters": ["41483"]

},

{

"Check" : "interface",

"Parameters" : ["docker0"]

}

]

}

Start Consul

$ ./consul agent -server -bootstrap-expect=1 -data-dir /tmp/consul -config-dir /etc/consul.d -dc dc-distributive -node=consul-distributive-1 --bind=192.168.0.167

Run Distributive Standalone

]$ /usr/bin/distributive -f /etc/distributive.d/checklist.json --verbosity="info"

INFO[0000] Creating checklist(s)... path=/etc/distributive.d/checklist.json type=file

INFO[0000] Running checklist: My first checklist

INFO[0000] Check passed name=Git installation check type=Installed

INFO[0000] Check passed name=Git installation check type=Installed

INFO[0000] Check failed name= type=port

INFO[0000] Check failed name= type=interface

WARN[0000] Check(s) failed, printing checklist report checklist=My first checklist report=

Total: 4

Passed: 2

Failed: 2

Port not open:

Specified: 41483

Actual:

Interface does not exist:

Specified: docker0

Actual: lo, eno16777728, vboxnet0