Friday, April 8, 2016

Setting up Consul Service Discovery for Mesos in 10 Minutes

This will be a short series on using Consul in your Microservices environment. Consul provides Service Discovery and many other nice features for Mircoservices which you can read more here. After you read it you will understand why it is such a popular choice for many people using any form of Microservice and anything else that requires Service Discovery for that matter. I have chosen to use Consul for my PaaS offering service backed with Apache Mesos with integration for a tool called consul-template and also for DNS for containers. Ill kick off a small series about different ways to utilize Consul for your Microservices architecture and how I have been utilizing it for Service Discovery and multiple other things for Docker. I wont talk much about it or try to explain how it works because it is best to read as much as possible on your own so for more information please see Consul Documentation:

More info on Consul: https://www.consul.io/

Documentation: https://www.consul.io/docs/index.html
Free Online Demo!! : http://demo.consul.io/ui/
MUST UNDERSTAND: https://www.consul.io/docs/guides/outage.html

We will start off by installing a cluster of 3 server nodes and 1 client with the UI and then end with creating systemd units for the entire cluster.


1) Pull down the Hashicorp Consul zip file to ALL nodes and unzip. The same package is used for server and client.

    cd /usr/local/bin/ && wget https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_linux_amd64.zip
    unzip consul*



2) Pull down the UI Package for the node that will act serve the Web UI for the cluster. Can be any but I chose the client. Unzip in desired directory.

    wget -O /opt/consul/web-ui.zip https://releases.hashicorp.com/consul/0.6.4/consul_0.6.4_web_ui.zip && cd /opt/consul/ && unzip web-ui.zip



3) Focusing on the server config first, create the initial files/directories on all servers. One of them will act as the bootstrap server initially until we get the cluster in quorum. 

    /etc/consul.d/bootstrap/config.json  ### This only gets created on 1 of the servers
    {
        "bootstrap": true,
        "server": true,
        "datacenter": "your-dc",
        "data_dir": "/var/lib/consul",
        "log_level": "INFO",
        "advertise_addr": "$BSTRAP_LOCAL_IP",
        "enable_syslog": true
    }

    
    /etc/consul.d/server/config.json
    {
        "bootstrap": false,
        "advertise_addr": "$LOCAL_IP",
        "server": true,
        "datacenter": "your-dc",
        "data_dir": "/var/lib/consul",
        "log_level": "INFO",
        "enable_syslog": true,
        "start_join": ["server1", "server2","server3"]
    }

    mkdir -pv /var/lib/consul   ### Used as our data directory


Also we can go ahead and create out systemd unit files on each server and enable on boot.

    /etc/systemd/system/consul-server.service
    [Unit]
    Description=Consul Server
    After=network.target
    
    [Service]
    User=root
    Group=root
    Environment="GOMAXPROCS=2"
    ExecStart=/usr/local/bin/consul agent -config-dir /etc/consul.d/server
    ExecReload=/bin/kill -9 $MAINPID
    KillSignal=SIGINT
    Restart=on-failure
    
    
    [Install]
    WantedBy=multi-user.target

      

    # systemctl enable consul-server



4) Run the following commands in order on each of the servers to get quorum. You will need a bootstrap server to start with (server1). You will need lots of terminals here.

On Server1:
    # consul agent -config-dir /etc/consul.d/bootstrap -advertise $BSTRAP_LOCAL_IP

On Server2 (-bootstrap-expect defines the number of servers to connect):
    # consul agent -config-dir /etc/consul.d/server -advertise $LOCAL_IP -bootstrap-expect 3

On Server3:
    # consul agent -config-dir /etc/consul.d/server -advertise $LOCAL_IP -bootstrap-expect 3

Back on Server1, do a CTRL+C to kill the consul process and then start as server.
    CTRL+C 
    # consul agent -config-dir /etc/consul.d/server -advertise $LOCAL_IP -bootstrap-expect 3

The servers should select a leader and sync to quorum. Each time you lose quorum, this is how you will have to restart it. A few other methods will have to be used along with it, see Outage documentation above for more reference.




5) Lets go ahead and get our client with the Web UI up and running before we do step 6 so we can watch from the UI what Consul looks like during service failures.

    /etc/consul.d/client/config.json
    {
        "server": false,
        "datacenter": "your-dc",
        "advertise_addr": "$LOCAL_IP",
        "client_addr": "$LOCAL_IP",
        "data_dir": "/var/lib/consul",
        "ui_dir": "/opt/consul/",
        "log_level": "INFO",
        "enable_syslog": true,
        "start_join": ["server1", "server2", "server3"]
    }


Create the systemd unit file.
    /etc/systemd/system/consul-client.service
    [Unit]
    Description=Consul Server
    After=network.target

    [Service]
    User=root
    Group=root
    Environment="GOMAXPROCS=2"
    ExecStart=/usr/local/bin/consul agent -config-dir /etc/consul.d/client
    ExecReload=/bin/kill -9 $MAINPID
    KillSignal=SIGINT
    Restart=on-failure


    [Install]
    WantedBy=multi-user.target

Start the service:
    # systemctl start consul-client && systemctl status consul-client -l

You should see "agent: synced nod info" in the output of status. Go to the UI:
    http://client:8500/ui/




You should see the above image if it was successful. You will see 3 passing. vimWatch the UI during the next step to see how it interacts for health checks. 


6) In order to get consul to use a backgound process instead of the current window you are in, we will need to kill the current process and reboot each of the servers 1 at a time and let them rejoin 1 at a time so not to lose quorum. DO NOT CTRL+C the current process but KILL the process! See OUTAGE doc above about graceful leaves. Yes,  you will need yet another terminal for this. Run the following one server at a time:

    # ps -ef  |grep consul | grep -v grep  ## to get pid of current consul process
    # kill -9 $consul_pid

Go to your Consul UI and take a look at the nodes and consul service. You will see the consul service has 1 failure. Pretty cool?! No worries it will come back after you restart it.





    # reboot 
    OR
    # systemctl start consul-server && systemctl status consul-server -l

You should see that your consul server has rejoined and you didn't lose quorum because the other 2 stayed online. 

Rinse and Repeat Step 6 for all servers and you have a working Consul cluster. Next we will discuss how to register services there and show some of the things I have been doing with integration with Apache Mesos. 







1 comment: