Ubuntu16.04 安装配置Mesos集群

准备工作

1. 系统环境:三台VMware Workstation 12中运行的Ubuntu16.04虚拟机。

(1)主机名和IP配置

  • heron01:192.168.201.136(mesos master,zookeeper)
  • heron02:192.168.201.135(mesos slave,zookeeper)
  • heron03:192.168.201.133(mesos slave,zookeeper)

(2)修改三台主机中/etc/hosts和/etc/hostname文件。在/etc/hosts文件中添加:

192.168.201.136 heron01
192.168.201.135 heron02
192.168.201.133 heron03

(3)在/etc/hostname文件中修改本机名(默认为ubuntu),例如在heron01主机中的/etc/hostname配置文件内容修改为heron01,heron02和heron03主机同样配置。

注:在配置完成/etc/hostname中主机名后,需要重启Ubuntu。并且这里的主机名配置不可省略,否则会在启动集群时出现问题。

2. 配置三台主机之间SSH免密登录

  • 默认使用用户:yitian
  • 同时需要配置yitina和root用户的ssh免密码登录。

3. 安装jdk1.8

zookeeper需要使用jdk环境。

4. 安装配置zookeeper集群环境Ubuntu16.04安装配置ZooKeeper集群

编译安装Mesos

1. 下载Mesos

$ wget http://www.apache.org/dist/mesos/1.4.1/mesos-1.4.1.tar.gz
$ tar -zxf mesos-1.4.1.tar.gz

我的mesos安装文件解压目录:/home/yitian/mesos-1.4.1

2. 安装依赖库

# Update the packages.
$ sudo apt-get update

# Install a few utility tools.
$ sudo apt-get install -y tar wget git

# Install the latest OpenJDK.
$ sudo apt-get install -y openjdk-8-jdk

# Install autotools (Only necessary if building from git repository).
$ sudo apt-get install -y autoconf libtool

# Install other Mesos dependencies.
$ sudo apt-get -y install build-essential python-dev python-six python-virtualenv libcurl4-nss-dev libsasl2-dev libsasl2-modules maven libapr1-dev libsvn-dev zlib1g-dev

3. 编译安装

进入/home/yitian/mesos-1.4.1/目录:

# Configure and build.
$ mkdir build
$ cd build
$ ../configure --prefix=/home/yitian/mesosinstall/ # --prefix参数指定mesos安装路径
$ make –j 2 # 这里的-j参数,为指定编译使用的CPU核心数# Run test suite.
$ make check
# Install (Optional).
$ make install –j 2 # 这里的-j参数,为指定编译使用的CPU核心数

这里会花费挺长一段时间。。。

注意:在make的过程中,需要将虚拟机内存调大(这里为5g)否则可能会出现内存不够造成的编译失败。

测试编译结果

编译好后在/home/yitian/mesos-1.4.1/build/目录下,运行Mesos Document中提供的本地运行示例

官网提供示例命令:

# Change into build directory.
$ cd build
# Start Mesos master (ensure work directory exists and has proper permissions).
$ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos
# Start Mesos agent (ensure work directory exists and has proper permissions).
$ ./bin/mesos-agent.sh --master=127.0.0.1:5050 --work_dir=/var/lib/mesos
# Visit the Mesos web page.
$ http://127.0.0.1:5050
# Run Java framework (exits after successfully running some tasks).
$ ./src/examples/java/test-framework 127.0.0.1:5050

我的运行示例:

1. 运行mesos-master:

yitian@ubuntu:~/mesos-1.4.1/build$ sudo ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/var/lib/mesos  

2. 运行mesos-slave:(另外打开一个终端)

yitian@ubuntu:~/mesos-1.4.1/build$ sudo ./bin/mesos-agent.sh --master=127.0.0.1:5050 --work_dir=/var/lib/mesos

3. 运行示例Framework:

yitian@ubuntu:~/mesos-1.4.1/build$ ./src/examples/java/test-framework 127.0.0.1:5050
I0217 00:39:48.763849 15537 sched.cpp:232] Version: 1.4.1
I0217 00:39:48.775424 15555 sched.cpp:336] New master detected at master@127.0.0.1:5050
I0217 00:39:48.777709 15555 sched.cpp:352] No credentials provided. Attempting to register without authentication
I0217 00:39:48.787648 15552 sched.cpp:759] Framework registered with 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-0000
Registered! ID = 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-0000
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0 with cpus: 4.0 and mem: 2898.0
Launching task 0 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Launching task 1 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Launching task 2 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Launching task 3 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O0
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O1 with cpus: 0.0 and mem: 2386.0
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O2 with cpus: 0.0 and mem: 2386.0
Status update: task 3 is in state TASK_RUNNING
Status update: task 0 is in state TASK_RUNNING
Status update: task 2 is in state TASK_RUNNING
Status update: task 1 is in state TASK_RUNNING
Status update: task 3 is in state TASK_FINISHED
Finished tasks: 1
Status update: task 0 is in state TASK_FINISHED
Finished tasks: 2
Status update: task 2 is in state TASK_FINISHED
Finished tasks: 3
Status update: task 1 is in state TASK_FINISHED
Finished tasks: 4
Received offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O3 with cpus: 4.0 and mem: 2898.0
Launching task 4 using offer 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-O3
Status update: task 4 is in state TASK_RUNNING
Status update: task 4 is in state TASK_FINISHED
Finished tasks: 5
I0217 00:39:53.944774 15552 sched.cpp:2021] Asked to stop the driver
I0217 00:39:53.945152 15552 sched.cpp:1203] Stopping framework 692f4e1d-4cdb-4f92-8a6a-661556ba4df9-0000
I0217 00:39:53.948561 15537 sched.cpp:2021] Asked to stop the driver

4. 在浏览器中查看Mesos运行情况

(1)127.0.0.1:5050

image

(2)framework:

image

image

(3)agent:

image

image

image

5. 查看mesos的启动状态

yitian@ubuntu:~$ ps -e |grep mesos
  35043 pts/11   00:00:02 lt-mesos-master
  35106 pts/4    00:00:00 lt-mesos-agent

配置Mesos Cluster

在其他heron02和heron03主机中重复上述本地模式mesos的编译和安装过程,并完成。这里以heron02为例,完成下面的集群配置步骤。

修改集群中每个节点的Mesos配置文件(/home/yitian/mesosinstall/etc/mesos/)如下:

1. 使用模板文件创建配置文件

yitian@ubuntu:~/mesosinstall/etc/mesos$ ll
total 20
drwxrwxr-x 2 yitian yitian 4096 Feb 17 05:46 ./
drwxrwxr-x 3 yitian yitian 4096 Feb 17 05:46 ../
-rw-r--r-- 1 yitian yitian  595 Feb 17 05:46 mesos-agent-env.sh.template
-rw-r--r-- 1 yitian yitian  339 Feb 17 05:46 mesos-deploy-env.sh.template
-rw-r--r-- 1 yitian yitian  319 Feb 17 05:46 mesos-master-env.sh.template
lrwxrwxrwx 1 yitian yitian   27 Feb 17 05:46 mesos-slave-env.sh.template -> mesos-agent-env.sh.template
yitian@ubuntu:~/mesosinstall/etc/mesos$ cp mesos-master-env.sh.template mesos-master-env.sh
yitian@ubuntu:~/mesosinstall/etc/mesos$ cp mesos-slave-env.sh.template mesos-slave-env.sh
yitian@ubuntu:~/mesosinstall/etc/mesos$ cp mesos-deploy-env.sh.template mesos-deploy-env.sh
yitian@ubuntu:~/mesosinstall/etc/mesos$ cp mesos-agent-env.sh.template mesos-agent-env.sh

2. 创建配置文件masters

heron01

3. 创建配置文件slaves

heron02
heron03

4. 修改mesos-master-env.sh

# This file contains environment variables that are passed to mesos-master.
# To get a description of all options run mesos-master --help; any option
# supported as a command-line option is also supported as an environment
# variable.
# Some options you're likely to want to set:
# export MESOS_log_dir=/var/log/mesos

export MESOS_log_dir=/home/yitian/mesosdata/log
export MESOS_work_dir=/home/yitian/mesosdata/data
export MESOS_ZK=zk://heron01:2181,heron02:2181,heron03:2181/mesos
export MESOS_quorum=1 # 在使用zookeeper时必须设置

5. 修改mesos-slave-env.sh和mesos-agent-env.sh

# This file contains environment variables that are passed to mesos-agent.
# To get a description of all options run mesos-agent --help; any option
# supported as a command-line option is also supported as an environment
# variable.
# You must at least set MESOS_master.
# The mesos master URL to contact. Should be host:port for
# non-ZooKeeper based masters, otherwise a zk:// or file:// URL.

export MESOS_master=heron01:5050
export MESOS_log_dir=/home/yitian/mesosdata/log
export MESOS_work_dir=/home/yitian/mesosdata/run
#export MESOS_isolation=cgroups
# Other options you're likely to want to set:
# export MESOS_log_dir=/var/log/mesos
# export MESOS_work_dir=/var/run/mesos
# export MESOS_isolation=cgroups

6. 修改/home/yitian/mesosinstall/sbin/mesos-daemon.sh

# Increase the default number of open file descriptors.
# ulimit -n 8192
ulimit -n 1024

7. 添加Mesos环境变量

# Mesos configuration
export MESOS_HOME=/home/yitian/mesosinstall
export PATH=${MESOS_HOME}/sbin:${MESOS_HOME}/bin:$PATH

启动Mesos集群

1. 首先启动zookeeper集群,按照Ubuntu16.04安装配置ZooKeeper集群中的方式进行,成功启动后各个主机中的zookeeper运行状态如下:

heron01:

yitian@heron01:~$ ./zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

heron02:

yitian@heron02:~$ ./zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: follower

heron03:

yitian@heron03:~$ ./zookeeper/zookeeper-3.4.10/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf/zoo.cfg
Mode: leader

2. 在heron01中使用root用户启动集群

root@heron01:/home/yitian# ./mesosinstall/sbin/mesos-start-cluster.sh
Starting mesos-master on heron01
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron01 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-master </dev/null >/dev/null
Starting mesos-agent on heron02
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron02 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-agent </dev/null >/dev/null
Starting mesos-agent on heron03
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron03 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-agent </dev/null >/dev/null
Everything's started!

注:使用root用户启动集群的原因,是在/run/路径下的一些文件需要root用户的权限,才能访问和使用。见下面常见问题5。因为这里需要使用root用户启动集群,因此在准备工作中,需要配置各个主机之间root用户的ssh无密码登录。

3. 成功启动集群后,在heron01主机中查看:heron01:5050,显示如下:

(1)激活的agent:

image

(2)这里同时运行了aurora:

image

(3)这里的为agent主机的配置:

image

(3)Roles:

image

(4)Offers:

image

常见问题

1. 编译Mesos时,出现:g++: internal compiler error: Killed (program cc1plus)

解决:主要原因是因为内存不足,临时使用交换分区来解决

sudo dd if=/dev/zero of=/swapfile bs=64M count=16
sudo mkswap /swapfile
sudo swapon /swapfile
After compiling, you may wish to
Code:
sudo swapoff /swapfile
sudo rm /swapfile

参考:

2. 运行./mesos-start-cluster.sh启动集群时,出现如下信息:

yitian@ubuntu:~/mesosinstall/sbin$ ./mesos-start-cluster.sh
Starting mesos-master on heron01
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron01 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-master </dev/null >/dev/null
ssh: connect to host heron01 port 22: Connection timed out
Starting mesos-agent on heron02
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=2 heron02 /home/yitian/mesosinstall/sbin/mesos-daemon.sh mesos-agent </dev/null >/dev/null
ssh: connect to host heron02 port 22: Connection timed out
Everything's started!

解决:在集群中的主机中的防火墙中设置允许22端口服务或者关闭防火墙。

yitian@ubuntu:~/mesosinstall/sbin$ sudo ufw status
Status: inactive
yitian@ubuntu:~/mesosinstall/sbin$ sudo ufw allow 22
Skipping adding existing rule
Skipping adding existing rule (v6)

注意:NAT模式下的虚拟机IP变化,SSH开放root权限等问题。

参考资料:

3. 编译Mesos时出现:virtual memory exhausted:cannot allocate memory

内存不足,尝试增加虚拟机内存,重新make。参考:virtual memory exhausted:cannot allocate memory

4. Slave节点不可用(deactivated)

image

解决:相关问题见Mesos agent always in Deactivated state。之前出现这种情况的重要原因是三台主机只在/etc/hosts文件中配置了主机名和ip,但没有在/etc/hostname文件中进行配置本机的主机名,都默认使用了ubuntu为主机名,导致启动后agent节点不可用。因此,修改各个主机的/etc/hostname文件,或者将mesos的配置文件中的主机名都改为相应的IP地址进行配置,即可解决问题。

5. 启动集群时,Slave节点无法发现

在slave节点ERROR日志中出现如下信息:
Log file created at: 2018/02/17 06:57:48
Running on machine: ubuntu
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
E0217 06:57:48.811517 46316 main.cpp:468] EXIT with status 1: Failed to initialize systemd: Failed to create systemd slice 'mesos_executors.slice': Failed to write systemd slice `/run/systemd/system/mesos_executors.slice`: Failed to open file '/run/systemd/system/mesos_executors.slice': Permission denied

解决方法:通过查看’/run/systemd/路面权限,发现该目录为root用户权限,因此,在启动Mesos集群时使用如下命令:sudo /home/yitian/mesosinstall/sbin/mesos-start-cluster.sh启动。使用该命令启动,需要设置主机间root允许ssh登陆以及无密码登陆。

之前的步骤中已经设置了无密登陆,但没有设置root用户登陆。这里进行root用户登陆ssh:修改/etc/ssh/sshd_config及配置文件:

# Authentication:
LoginGraceTime 120
# PermitRootLogin prohibit-password
PermitRootLogin yes
StrictModes yes

参考:http://blog.sina.com.cn/s/blog_7e64a87b0100rn8w.html

参考资料