Distributive - Health Check
Introduction
Distributive is a tool for running distributed health checks in datacenters. It was designed with Consul in mind, but is platform agnostic. It is simple to configure (with JSON checklists) and easy to deploy and run.
The exit code meanings are defined as Consul, Kubernetes, Sensu, and Nagios recognize them.
Exit code 0 - Checklist is passing
Exit code 1 - Checklist is warning
Any other code - Checklist is failing
As of right now, only exit codes 0 and 1 are used, even if a checklist fails.
Installation
Install Golang
sudo yum install golang -y
$ go version
go version go1.4.2 linux/amd64
Install Distributive
sudo rpm -i https://github.com/CiscoCloud/distributive/releases/download/v0.2/distributive-0.2-1.x86_64.rpm
Verify Distributive
$ distributive --help
NAME:
Distributive - Perform distributed health tests
USAGE:
Distributive [global options] command [command options] [arguments...]
VERSION:
0.2
COMMANDS:
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--verbosity 'warn' info | debug | fatal | error | panic | warn
--file, -f Read a checklist from a file
--url, -u Read a checklist from a URL
--directory, -d '/etc/distributive.d/' Read all of the checklists in this directory
--stdin, -s Read data piped from stdin as a checklist
--help, -h show help
--version, -v print the version
FATA[0000] Neither file, URL, directory, nor stdin specified. Try --help.
Getting Started
Install Kakfa
Follow the How-To Kafka to Install Kafka.
Create Distributive Config directory
All distributive application checklist will be present in this directory.
Distributive will run and pick all config in this directory and run checklist for each and report passed and failed checks.
sudo mkdir /etc/distributive.d
Define Distributive checklist for Kafka topic list and sshd service
The checklist could specify variety of checks from command, service to network\port.
$ sudo cat /etc/distributive.d/kafka.json
{
"Name": "Kafka Checklist",
"Notes": "A checklist that has checks, for kafka!",
"Checklist": [
{
"Name": "Kafka check list of topic",
"Notes": "Check kafka topic list is present.",
"Check": "command",
"Parameters": ["bin/kafka-topics.sh --list --zookeeper localhost:2181"]
},
{
"Name": "SSHD service",
"Notes": "Check sshd service status.",
"Check": "systemctlActive",
"Parameters": ["sshd"]
},
{
"Name": "Miscserver port",
"Notes": "Check misc servert port.",
"Check": "port",
"Parameters": ["9099"]
}
]
}
Execute Distributive
$ distributive -d /etc/distributive.d/
WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=
Total: 3
Passed: 2
Failed: 1
Port not open:
Specified: 9099
Actual:
Distributive detecting service state
SSHD service check with above json demonstrated distributive detecting service up\down with specified service check.
Similar, various checks for network, service, package could be enforced.
# SSHD is enabled
$ systemctl status sshd
sshd.service - OpenSSH server daemon
Loaded: loaded (/usr/lib/systemd/system/sshd.service; enabled)
$ distributive -d /etc/distributive.d/
WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=
Total: 3
Passed: 2
Failed: 1
Port not open:
Specified: 9099
Actual:
#SSHD is disabled
$ systemctl stop sshd
$ distributive -d /etc/distributive.d/
WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=
Total: 3
Passed: 1
Failed: 2
Service not active:
Specified: sshd
Actual ActiveState=Inactive
Port not open:
Specified: 9099
Actual:
#SSHD is enabled
$ systemctl start sshd
$ distributive -d /etc/distributive.d/
WARN[0001] Check(s) failed, printing checklist report checklist=Kafka Checklist report=
Total: 3
Passed: 2
Failed: 1
Port not open:
Specified: 9099
Actual:
Distributive health-check jsons in Worker VM
The distributive health check json packing in Worker VM would involve following actions:
0. Directory Structure for jsons
Distributive Parent Directory : /etc/distributive.d
Images Directory : /etc/distributive.d/images
Resource Image Directory : /etc/distributive.d/images/resource
Deployment Image Directory : /etc/distributive.d/images/deployment
Services Directory : /etc/distributive.d/services
Logging Service Directory : /etc/distributive.d/services/logging
Controller Directory : /etc/distributive.d/controller
Service Controller Directory : /etc/distributive.d/controller/service
App Controller Directory : /etc/distributive.d/controller/app
1. Platform Base Team
The distributed config directory would be pre-created with worker vm image at /etc/distributive.d/
2. Service Team e.g. Kafka or any other Applications
The Application team would write the health check json for Application for Distributive.
The Application installation would include Application json copy to Distributive directory.
3. Distributive involation with config directory
The distributive would be invoked with config directory so all jsons get executed and Distributive report gets generated.
4. Distribute to attach as registered service with Consul
Distribute would register as service in consul with distributive config directory.
If any check in Distributive config json fails, consul will create service failure warning, after which Admin could execute Distribute to get detailed checklist report and take corrective action.
The distributive service registration in Consul will happen via json as below
$ sudo cat /etc/consul.d/distributive.json
{
"service": {
"name": "Distributive",
"check": {
"script": "/usr/bin/distributive -d /etc/distributive.d/,
"interval": "10s"
}
}
}
Getting Started with Consul
Install Consul
$ mkdir consul
$ cd consul
$ wget https://dl.bintray.com/mitchellh/consul/0.5.1_linux_amd64.zip
$ unzip 0.5.1_linux_amd64.zip
$ sudo mkdir /etc/consul.d
$ sudo cat /etc/consul.d/distributive.json
{
"service": {
"name": "Distributive",
"check": {
"script": "/usr/bin/distributive -f /etc/distributive.d/checklist.json -d ''",
"interval": "10s"
}
}
}
Distributive Config
$ sudo ls /etc/distributive.d
samples
$ sudo cat /etc/distributive.d/checklist.json
{
"Name": "My first checklist",
"Notes": "A checklist that has checks, really!",
"Checklist": [
{
"Name": "Git installation check",
"Notes": "If I don't have git, I don't know what I'll do.",
"Check": "Installed",
"Parameters": ["git"]
}
]
}
]$ sudo cat /etc/distributive.d/checklist.json
{
"Name": "My first checklist",
"Notes": "A checklist that has checks, really!",
"Checklist": [
{
"Name": "Git installation check",
"Notes": "If I don't have git, I don't know what I'll do.",
"Check": "Installed",
"Parameters": ["git"]
},
{
"Name": "Git installation check",
"Notes": "If I don't have git, I don't know what I'll do.",
"Check": "Installed",
"Parameters": ["docker"]
},
{
"Check": "port",
"Parameters": ["41483"]
},
{
"Check" : "interface",
"Parameters" : ["docker0"]
}
]
}
Start Consul
$ ./consul agent -server -bootstrap-expect=1 -data-dir /tmp/consul -config-dir /etc/consul.d -dc dc-distributive -node=consul-distributive-1 --bind=192.168.0.167
Run Distributive Standalone
]$ /usr/bin/distributive -f /etc/distributive.d/checklist.json --verbosity="info"
INFO[0000] Creating checklist(s)... path=/etc/distributive.d/checklist.json type=file
INFO[0000] Running checklist: My first checklist
INFO[0000] Check passed name=Git installation check type=Installed
INFO[0000] Check passed name=Git installation check type=Installed
INFO[0000] Check failed name= type=port
INFO[0000] Check failed name= type=interface
WARN[0000] Check(s) failed, printing checklist report checklist=My first checklist report=
Total: 4
Passed: 2
Failed: 2
Port not open:
Specified: 41483
Actual:
Interface does not exist:
Specified: docker0
Actual: lo, eno16777728, vboxnet0