[Heron] 使用Apache Aurora部署Heron集群

内容概要

文章目的:

实现使用Apache Aurora和Apache Mesos部署Heron三个节点的集群环境。其中,Master节点中负责运行Mesos Master,Aurora Master,Aurora Client,Zookeeper,HDFS Namenode和Heron,并负责提交heron的topology。提交的拓扑由Aurora Scheduler负责将任务分配到相应的集群节点中运行。最终,可以在Master主机中运行Heron Tracker和Heron UI对提交的Topology进行管理和查看。

说明内容:

  •         为第二次实验环境中集群配置,第一次配置未成功,但过程仍作为保留。
  •         为遇到的问题以及相关的解决方法。
  •         为集群中的配置过程主线。

相关介绍:

Heron can be deployed on different environments such as Mesos, Yarn and Slurm. In this post we will explore how to deploy Heron in Mesos with Apache Aurora. Apache Aurora is a services manager on top of Apache Mesos. It can spawn and run long running applications such as servers, on a set of machines managed by Mesos and keep them running (restart them if they failed etc) until they are being stopped.

Heron uses Apache ZooKeeper as a state manager. In this post we’ll explore how to setup Aurora on three Ubuntu 16.04 machines and run a Mesos cluster on these machines to deploy Heron. To distribute the files across the cluster we’ll use the HDFS. So we need to install HDFS on the machines as well.

主要内容:

  • 在三个Ubuntu16.04虚拟主机中部署Apache Mesos, Apache Aurora
  • 安装HDFS文件系统(三节点)
  • 部署Zookeeper集群环境(三节点)
  • 部署三个节点的Heron集群(在上述基础上配置Heron集群环境)

集群环境:

OS:Ubuntu16.04(VMware Workstation Pro 12)

Heron Version:0.17.1(2017-12-16)

  • 机器01(Heron01):IP(192.168.201.136/218.195.228.52:heron04),内存(5g),硬盘(20g),核心(4)
  • 机器02(Heron02):IP(192.168.201.135/218.195.228.31:heron05),内存(5g),硬盘(20g),核心(4)
  • 机器03(Heron03):IP(192.168.201.133/218.195.228.12:heron06),内存(5g),硬盘(20g),核心(4)

说明:橘黄色标注为成功搭建集群时的IP配置。下面采用该IP配置说明集群的搭建。在如下的过程中,heron04的配置等同于heron01,同样,作为slaves的heron05,heron06的配置等同于heron02,heron03.这里仅区分master主机和slave主机,对heron01和heron04,以及heron05 06,heron02 03主机不做区分。

部署示意表

主机名称 部署组件
heron01 主节点(Mesos Master, Aurora Master, HDFS Master, Zookeeper, Heron Binaries,Aurora Client,Heron)
heron02 从节点(Mesos Slave/Agent,Aurora Slave/Thermos,HDFS DataNode,Zookeeper)
heron03 从节点(Mesos Slave/Agent,Aurora Slave/Thermos,HDFS DataNode,Zookeeper)

上表展示了将要在集群中安装的服务和组件的分配情况。在heron01主机中,我们将安装Zookeeper,Heron Binaries、Aurora Master、Mesos Master和HDFS Master。在Heron02和Heron03主机中,我们将安装Aurora Slave(Thermos)、Mesos Slave和HDFS Data Node。在实际的较大的生产环境的集群部署时,我们将会在多个不同的主机中安装Heron01中的不同服务。

在这篇文章中,我们将探索上述每个服务所需的最低配置。 对于产品设置,建议将这些服务配置为高可用性。 还有大量的配置选项可用于性能调整,日志管理,定制部署位置。


本文中的具体步骤如下


1. 配置IP和主机名

1. 在集群中的每个主机中配置正确的主机名和IP,我的hosts配置文件(/etc/hosts)内容如下:

yitian@heron01:~$ cat /etc/hosts
127.0.0.1    localhost
127.0.1.1    ubuntu

218.195.228.52 heron04
218.195.228.31 heron05
218.195.228.12 heron06

2. 为集群中每个主机配置各自的主机名,修改/etc/hostname文件如下,以heron04主机为例:

yitian@heron01:~$ cat /etc/hostname
heron04

注:配置完毕重启后生效!请确保正确的配置集群中每个主机的配置文件,否则在集群的启动过程中会出现错误。

2. 设置SSH免密登录

在之后安装配置集群的过程中,需要实现集群各个主机之间的SSH免密登录.

3. 安装JDK

为集群中的每个主机配置JDK环境,具体环境配置见:[Heron] Heron单节点(Locally)简易安装中的前面部分配置jdk环境的相关内容。

注意:从Windows中复制jdk文件到Liunx虚拟机时,使用键盘复制,鼠标粘贴(直接拖动会有问题)。

常见问题:在解压jdk.tar.gz文件时,如遇如下错误,使用-C参数进行解决:

[root@heron03 Desktop]# tar -zxvf jdk-8u151-linux-x64.tar.gz /usr/java/
tar: /usr/java: Not found in archive
tar: Exiting with failure status due to previous errors
[root@heron03 Desktop]# tar -zxvf jdk-8u151-linux-x64.tar.gz -C /usr/java/

4. 安装Mesos

这里为集群中的每个主机安装Mesos(分布式集群管理器,类似YARN)。如下的步骤需要在集群中每个主机中进行。Mesos scheduler和executor在同样的包中。因此我们可以在每个合适的主机上运行Mesos Master和Slaves,这里我们在Heron04中运行master,Heron05和Heron06主机运行slaves。

对Mesos的安装有两种方式进行:

  • 一种是官网中(Apahce Mesos官方文档)提供的编译安装的方式。该安装方式耗时较长,但官方支持,经测试可以正常完成。(采用)
  • 另一种方式为使用ubuntu中agt-get命令安装,此安装方式为Aurora官方文档(详细地址点我。)中给出的,安装过程简单,耗时较短,但在该集群配置的过程中没有采用,之后会在测试之后进行另行说明。(未采用

这里先采用Mesos编译安装的方式,为集群中的所有主机中安装Mesos,进行配置。

使用apt-get命令安装Mesos见:

启动Mesos集群:

注:在配置heron集群进行mesos这一步骤的安装时,使用一台主机(A)完成上面的公共配置步骤,并通过拷贝虚拟机文件快速建立另一个虚拟机(B)。这样,第一太虚拟机(A)即可作为Master主机,继续完成下面的Master配置。主机B作为集群中的Slave主机,完成相应的安装配置工作。最后,将虚拟机B进行复制,然后对复制而来的各个主机做加入集群的操作(),配置ip,ssh,mesos,hdfs等操作,最终完成整个集群的搭建。

5. 安装Aurora Scheduler(Master)& Aurora Executor(Slaves)

到这里,我们完成了mesos和zookeeper集群的安装和配置,下面进行Apache Aurora的安装和配置。

不同于Mesos,Aurora的Scheduler和Executor需要使用不同的包进行安装,所以,下面我们将在Mesos Master(heron04)中安装Scheduler,在Slaves(heron05,heron06)中安装Executor

首先,在采用Ubuntu16.04 安装配置Mesos集群中完成mesos的集群配置之后,在heron04主机中启动集群。

root@heron04:/home/yitian# ./mesosinstall/sbin/mesos-start-cluster.sh

mesos集群启动成功后如下:

image

Aurora Scheduler的具体安装详情见:Installing Aurora

5.1 Master(heron01)

1. 查看Master主机中mesos的运行状态

yitian@ubuntu:~/mesosinstall/sbin$ ps -e |grep mesos
    4848 ?        00:00:02 mesos-master

这里仍需要使用命令安装mesos:Ubuntu16.04 使用apt-get命令安装Mesos

2. 安装ZooKeeper

这里仍然需要使用命令安装zookeeper和mesos,因为aurora-scheduler会依赖zookeeper和mesos的安装。虽然在之前已经成功配置了zookeeper和mesos集群环境,但这里仍需要执行如下的命令:

sudo apt-get install -y zookeeperd

注:在这里安装完成mesos和zookeeper后,可以不对其进行使用和配置,mesos和zookeeper的配置仍然使用上述已经完成的方式进行。其中,zookeeper的配置和安装见:Ubuntu16.04安装配置ZooKeeper集群

3. 安装Aurora Scheduler

sudo add-apt-repository -y ppa:openjdk-r/ppa
sudo apt-get update
sudo apt-get install -y openjdk-8-jre-headless wget
sudo update-alternatives --set java /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-scheduler_0.17.0_amd64.deb
sudo dpkg -i aurora-scheduler_0.17.0_amd64.deb

安装aurora-scheduler时,遇到的错误及解决:Aurora scheduler安装遇到的问题

4. 停止scheduler进行配置:

(1)停止aurora-scheduler:

yitian@ubuntu:~$ sudo service aurora-scheduler stop

(2)配置aurora-scheduler:(Finalizing)

yitian@ubuntu:~$ sudo -u aurora mkdir -p 
/var/lib/aurora/scheduler/db
yitian@ubuntu:~$ sudo -u aurora mesos-log initialize 
--path=/var/lib/aurora/scheduler/db
I0212 07:19:34.222137  7540 replica.cpp:795] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0212 07:19:34.224320  7544 replica.cpp:322] Persisted replica status to 
VOTING

By default, the scheduler will start in an uninitialized mode. This is because external coordination is necessary to be certain operator error does not result in a quorum of schedulers starting up and believing their databases are empty when in fact they should be re-joining a cluster.

Because of this, a fresh install of the scheduler will need intervention to start up. First, stop the scheduler service. Ubuntu: sudo stop aurora-scheduler CentOS: sudo systemctl stop aurora

Now initialize the database:

sudo -u aurora mkdir -p /var/lib/aurora/scheduler/db
sudo -u aurora mesos-log initialize --path=/var/lib/aurora/scheduler/db

Now you can start the scheduler back up. Ubuntu:

sudo start aurora-scheduler CentOS: sudo systemctl start aurora

5. 修改scheduler配置中的ZooKeeper URL:

  • 修改文件:yitian@ubuntu:/etc/default$ sudo vim /etc/default/aurora-scheduler ,内容见:Aurora Schduler and Thermos 配置中对aurora-scheduler文件的修改部分

6. 启动aurora-scheduler

yitian@ubuntu:/etc/default$ sudo service aurora-scheduler start

image

注:安装Aurora Scheduler中的前三步来自Aurora的官方文档:Installing Aurora

5.2 Slaves(heron02)中安装Executor和Observer

1. 启动mesos集群后,查看slave中mesos的运行状态

yitian@ubuntu:~/mesosinstall/sbin$ ps -e |grep mesos
    3471 ?        00:00:05 mesos-agent

2. 安装Aurora executor和observer:

sudo apt-get install -y python2.7 wget
# NOTE: This appears to be a missing dependency of the mesos deb package and is needed
# for the python mesos native bindings.
sudo apt-get -y install libcurl4-nss-dev
wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-executor_0.17.0_amd64.deb
sudo dpkg -i aurora-executor_0.17.0_amd64.deb

5.3 Worker Configuration

为了确保在Slave主机中的Thermos可以配置到正确的Mesos工作目录。默认情况下,Mesos slave的工作目录为/tmp/mesos(Installing Aurora中Worker Configuration部分)。但这里在对mesos进行配置时,将agent主机中配置的工作目录设置为/home/yitian/mesosdata/run,因此进行如下配置:见Aurora Schduler and Thermos 配置中对thermos配置的部分

之后启动thermos-observer:Aurora thermos_observer的配置与启动

6. 安装Aurora Client(heron01)

安装Aurora Client的主机为用户提供提交Jobs的服务,这里安装在heron01中:

sudo apt-get install -y python2.7 wget

wget -c https://apache.bintray.com/aurora/ubuntu-trusty/aurora-tools_0.17.0_amd64.deb
sudo dpkg -i aurora-tools_0.17.0_amd64.deb

Client Configuration(这里Aurora Client的安装过程来自Aurora官方文档)

Client configuration lives in a json file that describes the clusters available and how to reach them. By default this file is at /etc/aurora/clusters.json.

Jobs may be submitted to the scheduler using the client, and are described with job configurations expressed in .aurora files. Typically you will maintain a single job configuration file to describe one or more deployment environments (e.g. dev, test, prod) for a production job.

注:上述关于Apache Mesos和Apche Aurora的安装步骤,主要参考自:Installing Aurora

7. 完成mesos和aurora的安装

在正常安装配置Mesos和Aurora的集群后,在浏览器中查看如下地址:

(1)在mesos中查看frameworks:

image

(2)查看aurora scheduler:

image

(3)查看aurora agents:

image

到这里,集群中的Aurora即安装和配置完成。接下来安装Hadoop以及在Master(heron01)主机上安装Heron。

8. 安装配置Hadoop HDFS(所有节点)

这里以heron01主机为例进行配置,集群中的HDFS配置详见The configuration of Hadoop HDFS in Heron Cluster

参考自:[Hadoop] Linux(CentOS6.4)下Hadoop单机/集群的安装和配置[Hadoop] Linux(CentOS6.4)单机Hadoop伪分布式配置

9. 安装Heron(Master)

我们只需要在用于提交topology的主机中安装heron,这里我们只需要在Master(heron01)中安装heron

1. 下载Heron Client和Tools安装文件

$ wget https://github.com/twitter/heron/releases/download/0.17.1/heron-client-install-0.17.1-ubuntu.sh
$ wget https://github.com/twitter/heron/releases/download/0.17.1/heron-tools-install-0.17.1-ubuntu.sh

注:这里直接在heron提供的官方下载地址进行了下载,没有使用上述的下载命令。

2. 安装Heron Client和Tools

yitian@ubuntu:~/Desktop/heron intall files$ chmod +x heron-*.sh
yitian@ubuntu:~/Desktop/heron intall files$ ll
total 335560
drwxrwxr-x 2 yitian yitian      4096 Feb 11 22:21 ./
drwxr-xr-x 4 yitian yitian      4096 Feb 12 02:20 ../
-rwxrwxrwx 1 yitian yitian 304280790 Feb 10 23:52 heron-client-install-0.17.1-ubuntu.sh*
-rwxrwxrwx 1 yitian yitian  39317773 Feb 10 23:36 heron-tools-install-0.17.1-ubuntu.sh*

yitian@ubuntu:~/Desktop/heron intall files$ ./heron-client-install-0.17.1-ubuntu.sh --user
--warning=no-timestamp
Heron client installer
----------------------
Uncompressing..tar xfz /home/yitian/.heron/heron-client.tar.gz -C /home/yitian/.heron --warning=no-timestamp
....
Heron is now installed!
Make sure you have "/home/yitian/bin" in your path.
See http://heronstreaming.io/docs/getting-started for how to use Heron.
heron.build.version : '0.17.1'
heron.build.time : Sat Nov 18 01:07:07 UTC 2017
heron.build.timestamp : 1510967227000
heron.build.host : ci-server-01
heron.build.user : release-agent1
heron.build.git.revision : 874feb31b1ad9df6ea4a51d58b573750468ad28d
heron.build.git.status : Clean

yitian@ubuntu:~/Desktop/heron intall files$ ./heron-tools-install-0.17.1-ubuntu.sh --user
--warning=no-timestamp
Heron tools installer
---------------------
Uncompressing..tar xfz /home/yitian/.herontools/heron-tools.tar.gz -C /home/yitian/.herontools --warning=no-timestamp
....
Heron Tools is now installed!
Make sure you have "/home/yitian/bin" in your path.
See http://heronstreaming.io/docs/getting-started for how to use Heron.
heron.build.version : '0.17.1'
heron.build.time : Sat Nov 18 01:07:07 UTC 2017
heron.build.timestamp : 1510967227000
heron.build.host : ci-server-01
heron.build.user : release-agent1
heron.build.git.revision : 874feb31b1ad9df6ea4a51d58b573750468ad28d
heron.build.git.status : Clea

yitian@ubuntu:~/Desktop/heron intall files$ heron version
heron.build.git.revision : 874feb31b1ad9df6ea4a51d58b573750468ad28d
heron.build.git.status : Clean
heron.build.host : ci-server-01
heron.build.time : Sat Nov 18 01:07:07 UTC 2017
heron.build.timestamp : 1510967227000
heron.build.user : release-agent1
heron.build.version : 0.17.1

3. 配置环境变量 /etc/profile

export PATH=~/bin:$PATH

注:本地单节点Heron安装详见:[Heron] Heron单节点(Locally)简易安装

10. 部署Heron配置

在Master主机(heron01)中安装完成heron之后,默认安装文件为~/.heron。在该集群中,将heron部署至Aurora中,采用aurora scheduler作为heron的调度器(scheduler),使用zookeeper作为状态管理器(state manager),使用HDFS作为Uploader。因此需要修改的配置文件为:

  • /conf/aurora/scheduler.yaml
  • /conf/aurora/uploader.yaml
  • /conf/aurora/statemgr.yaml
  • /conf/aurora/heron.aurora
  • /conf/aurora/client.yaml

10.1 共享Heron core binary

    注意:我们需要将Heron core binary放到一个共享的位置上,以使scheduler可以下载并执行Heron的Topologies。这里我们使用HDFS作为共享的文件系统

    yitian@ubuntu:~/hadoop/hadoop-2.7.4$ bin/hdfs dfs -mkdir /heron
    yitian@ubuntu:~/hadoop/hadoop-2.7.4$ bin/hdfs dfs -mkdir /heron/dist
    yitian@ubuntu:~/hadoop/hadoop-2.7.4$ bin/hdfs dfs -put ~/.heron/dist/heron-core.tar.gz /heron/dist
    yitian@ubuntu:~/hadoop/hadoop-2.7.4$ bin/hdfs dfs -ls /heron/dist
    Found 1 items
    -rw-r--r--   1 yitian supergroup  160410746 2018-02-13 01:10 /heron/dist/heron-core.tar.gz
    

    10.2 修改配置文件

    1. sheduler.yaml

    # scheduler class for distributing the topology for execution
    heron.class.scheduler:                       com.twitter.heron.scheduler.aurora.AuroraScheduler
    # launcher class for submitting and launching the topology
    heron.class.launcher:                        com.twitter.heron.scheduler.aurora.AuroraLauncher
    # location of java - pick it up from shell environment
    heron.directory.sandbox.java.home:           /usr/lib/jvm/java-1.8.0-openjdk-amd64/
    # Invoke the IScheduler as a library directly
    heron.scheduler.is.service:                  False
    

    2. statemgr.yaml

    使用ZooKeeper作为state manager:

    # local state manager class for managing state in a persistent fashion
    # heron.class.state.manager:                      com.twitter.heron.statemgr.localfs.LocalFileSystemStateManager
    heron.class.state.manager:                      com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager
    # local state manager connection string
    # heron.statemgr.connection.string:               LOCALMODE
    heron.statemgr.connection.string:               "heron01:2181" # 这里连接zookeeper集群中一台主机,不知是否可以列出多台
    # path of the root address to store the state in a local file system
    # heron.statemgr.root.path:                       /vagrant/.herondata/repository/state/${CLUSTER}
    heron.statemgr.root.path:                       "/heron" # state manager的root路径,在配置heron_tracker.yaml中会用到
    # create the sub directories, if needed
    heron.statemgr.localfs.is.initialize.file.tree: True

    3. Uploader.yaml

    使用HDFS作为Uploader:

    # uploader class for transferring the topology jar/tar files to storage
    # heron.class.uploader:                            com.twitter.heron.uploader.localfs.LocalFileSystemUploader
    heron.class.uploader:                            "com.twitter.heron.uploader.hdfs.HdfsUploader"
    # name of the directory to upload topologies for local file system uploader
    # heron.uploader.localfs.file.system.directory:    /vagrant/.herondata/repository/topologies/${CLUSTER}/${ROLE}/${TOPOLOGY}
    # add
    heron.uploader.hdfs.config.directory: "/home/yitian/hadoop/hadoop-2.7.4/etc/hadoop"
    #heron.uploader.hdfs.topologies.directory.uri: hdfs://heron/topologies/${CLUSTER}
    heron.uploader.hdfs.topologies.directory.uri: "/heron/topologies/${CLUSTER}" # 这样的路径是hdfs文件系统中的路径
    #heron.uploader.hdfs.topologies.directory.uri: hdfs://home/yitian/heron/topologies/${CLUSTER} # 这种配置有问题

    4. heron.aurora

    使用HDFS作为Uploader时:

    #fetch_heron_system = Process(
    #  name = 'fetch_heron_system',
    #  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_core_release_uri, core_release_file, core_release_file)
    #)
    fetch_heron_system = Process(
           name = 'fetch_heron_system',
           cmdline = '/home/yitian/hadoop/hadoop-2.7.4/bin/hdfs dfs -get %s %s && tar zxf %s' % (heron_core_release_uri, 
                 core_release_file, core_release_file) # 主机中配置的hadoop目录
    )
    #fetch_user_package = Process(
    #  name = 'fetch_user_package',
    #  cmdline = 'curl %s -o %s && tar zxf %s' % (heron_topology_jar_uri, topology_package_file, topology_package_file)
    #)
    fetch_user_package = Process(
           name = 'fetch_user_package',
           cmdline = '/home/yitian/hadoop/hadoop-2.7.4/bin/hdfs dfs -get %s %s && tar zxf %s' % (heron_topology_jar_uri, 
                   topology_package_file, topology_package_file)
    )

    5. Client.yaml

    # location of the core package
    # heron.package.core.uri:                      "file:///vagrant/.herondata/dist/heron-core-release.tar.gz"
    heron.package.core.uri:                      "/heron/dist/heron-core.tar.gz" #这里不要将路径设置为hdfs://
    # Whether role/env is required to submit a topology. Default value is False.
    heron.config.is.role.required:               True
    heron.config.is.env.required:                True
    

    11. 提交示例拓扑

    11.1修改集群名称

    首先,我们需要保证在aurora client和aurora scheduler的配置目录中的客户端名称一致。因此修改如下配置文件:

    yitian@ubuntu:~/.heron/conf/aurora$ cd /etc/aurora/
    yitian@ubuntu:/etc/aurora$ cat clusters.json
    [
           {
             "auth_mechanism": "UNAUTHENTICATED",
             "name": "aurora", # 设置集群名称
             "scheduler_zk_path": "/aurora/scheduler",
             "slave_root": "/home/yitian/mesosdata/run", # 与上述mesos的work_dir配置一致
             "slave_run_directory": "latest",
             "zk": "218.195.228.52" # zk安装配置的主机IP
           }
    
    ]

    11.2 提交拓扑示例:

    在解决如下两个遇到的问题之后,成功提交heron的示例拓扑WordCountTopology:

    成功提交拓扑的命令运行情况如下:

    yitian@heron04:~$ heron submit aurora/yitian/devel --config-path ~/.heron/conf ~/.heron/examples/heron-api-examples.jar com.twitter.heron.examples.api.WordCountTopology WordCountTopology --deploy-deactivated
    [2018-03-15 05:53:45 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
    [2018-03-15 05:53:45 +0000] [INFO]: Launching topology: 'WordCountTopology'
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/uploader/heron-dlog-uploader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/statemgr/heron-zookeeper-statemgr.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
    [2018-03-15 05:53:46 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron04:2181 
    [2018-03-15 05:53:46 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting 
    [2018-03-15 05:53:46 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED 
    [2018-03-15 05:53:46 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized. 
    [2018-03-15 05:53:46 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/WordCountTopology 
    [2018-03-15 05:53:50 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/WordCountTopology-yitian-tag-0-8136175565428738886.tar.gz'. Overwriting it now 
    [2018-03-15 05:53:50 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmp2JPHpD/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/WordCountTopology-yitian-tag-0-8136175565428738886.tar.gz' 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/WordCountTopology 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/WordCountTopology 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/executionstate/WordCountTopology 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Launching topology in aurora 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Updating scheduled-resource in packing plan: WordCountTopology 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/packingplans/WordCountTopology 
    [2018-03-15 05:53:54 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/packingplans/WordCountTopology 
      INFO] Creating job WordCountTopology
      INFO] Checking status of aurora/yitian/devel/WordCountTopology
    Job create succeeded: job url=http://218.195.228.52:8081/scheduler/yitian/devel/WordCountTopology
    [2018-03-15 05:54:06 -0700] [INFO] com.twitter.heron.scheduler.utils.SchedulerUtils: Setting Scheduler locations: topology_name: "WordCountTopology"
    http_endpoint: "scheduler_as_lib_no_endpoint"
    [2018-03-15 05:54:06 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/schedulers/WordCountTopology 
    [2018-03-15 05:54:06 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron04:2181 
    [2018-03-15 05:54:06 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes 
    [2018-03-15 05:54:06 +0000] [INFO]: Successfully launched topology 'WordCountTopology'
    

    12. 启动Heron Tracker & UI

    12.1. 启动heron-tracker

    (1)首先修改heron tracker配置文件。此时使用Zookeeper维护集群状态,修改heron_tracker.yaml文件。

    yitian@ubuntu:~/.herontools/conf$ vim heron_tracker.yaml文件如下:
    statemgrs:
    #  -
    #    type: "file"
    #    name: "local"
    #    rootpath: "~/.herondata/repository/state/local"
    #    tunnelhost: "127.0.0.1"
    #
    # To use 'localzk', launch a zookeeper server locally
    # and create the following path:
    #   *. /heron/topologies
    #
        -
         type: "zookeeper"
         name: "aurorazk"
         hostport: "heron04:2181" # 连接zookeeper集群中的一个主机
          rootpath: "/heron" # 与statemgr.yaml文件中的heron.statemgr.root.path变量值一致
          tunnelhost: "127.0.0.1"

    (2) 运行Heron Tracker:

    image

    12.2 运行Heron UI

    image

    image

    image

    image

    13. 查看相关集群的运行状态

    1. 查看Mesos

    image

    2. 查看Aurora

    image

    image

    3. 查看HDFS

    image

    注:集群运行成功后,各组件详细运行状态见:解决“Regular plan unhealthy!” – 成功启动集群以及【】

    14. 查看Zookeeper和HDFS集群文件情况

    1. Zookeeper集群

    在之前配置heron配置文件时,在statemgr.yaml文件中使用zookeeper作为heron的state manager。并在配置运行了三个节点的zookeeper集群。其中,在statemgr.yaml配置文件中,有如下设置:

    # path of the root address to store the state in a local file system
    heron.statemgr.root.path:                       "/heron"

    现在使用zkCli.sh命令连接zookeeper集群中的一个主机(这里为heron01),进行查看:

    yitian@heron01:~$ ./zookeeper/zookeeper-3.4.10/bin/zkCli.sh -server heron01:2181Connecting to heron01:2181
    2018-02-25 02:33:53,727 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
    2018-02-25 02:33:53,735 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=heron01
    2018-02-25 02:33:53,736 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.8.0_151
    2018-02-25 02:33:53,739 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
    2018-02-25 02:33:53,739 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_151/jre
    2018-02-25 02:33:53,739 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/home/yitian/zookeeper/zookeeper-3.4.10/bin/../build/classes:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../build/lib/*.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/slf4j-api-1.6.1.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/netty-3.10.5.Final.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/log4j-1.2.16.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../lib/jline-0.9.94.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../zookeeper-3.4.10.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../src/java/lib/*.jar:/home/yitian/zookeeper/zookeeper-3.4.10/bin/../conf:.:/usr/java/jdk1.8.0_151/lib:/usr/java/jdk1.8.0_151/jre/lib
    2018-02-25 02:33:53,739 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
    2018-02-25 02:33:53,739 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
    2018-02-25 02:33:53,739 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
    2018-02-25 02:33:53,740 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
    2018-02-25 02:33:53,740 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
    2018-02-25 02:33:53,740 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=4.13.0-32-generic
    2018-02-25 02:33:53,740 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=yitian
    2018-02-25 02:33:53,740 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/yitian
    2018-02-25 02:33:53,740 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/yitian
    2018-02-25 02:33:53,743 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=heron01:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@1a86f2f1
    Welcome to ZooKeeper!
    2018-02-25 02:33:53,835 [myid:] - INFO  [main-SendThread(heron01:2181):ClientCnxn$SendThread@1032] - Opening socket connection to server heron01/192.168.201.136:2181. Will not attempt to authenticate using SASL (unknown error)
    JLine support is enabled
    2018-02-25 02:33:53,956 [myid:] - INFO  [main-SendThread(heron01:2181):ClientCnxn$SendThread@876] - Socket connection established to heron01/192.168.201.136:2181, initiating session
    2018-02-25 02:33:53,992 [myid:] - INFO  [main-SendThread(heron01:2181):ClientCnxn$SendThread@1299] - Session establishment complete on server heron01/192.168.201.136:2181, sessionid = 0x161cc29543c006b, negotiated timeout = 30000
    WATCHER::
    WatchedEvent state:SyncConnected type:None path:null
    [zk: heron01:2181(CONNECTED) 0]

    查看zookeeper中的目录文件:

    [zk: heron01:2181(CONNECTED) 3] ls /
    [mesos, zookeeper, aurora, heron, home]
    [zk: heron01:2181(CONNECTED) 4] ls /heron
    [statefulcheckpoints, pplans, schedulers, metricscaches, packingplans, tmasters, executionstate, locks, topologies]
    

    Zookeeper常用命令:

    2. HDFS集群

    在配置heron时,在uploader.yaml配置文件中使用HDFS作为uploader,配置如下:

    #heron.uploader.hdfs.topologies.directory.uri: hdfs://heron/topologies/${CLUSTER}
    heron.uploader.hdfs.topologies.directory.uri: "/heron/topologies/${CLUSTER}"

    现在,进入HDFS文件系统中进行查看:

    yitian@heron01:~/.herontools/conf$ hdfs dfs -ls /
    Found 2 items
    drwxr-xr-x   - yitian supergroup          0 2018-02-25 01:34 /heron
    drwxr-xr-x   - yitian supergroup          0 2018-02-18 07:16 /home
    yitian@heron01:~/.herontools/conf$ hdfs dfs -ls /heron/
    Found 1 items
    drwxr-xr-x   - yitian supergroup          0 2018-02-25 01:34 /heron/topologies
    yitian@heron01:~/.herontools/conf$ hdfs dfs -ls /heron/topologies
    Found 1 items
    drwxr-xr-x   - yitian supergroup          0 2018-02-25 01:57 /heron/topologies/aurora
    yitian@heron01:~/.herontools/conf$ hdfs dfs -ls /heron/topologies/aurora
    Found 1 items
    -rw-r--r--   1 yitian supergroup    2981646 2018-02-25 01:57 /heron/topologies/aurora/WordCountTopology-yitian-tag-0--8168281366065059093.tar.gz

    注:HDFS常用命令HDFS基本命令

    15. Manage Topologies

    15.1 Activate Topologies

    yitian@heron04:~$ heron activate aurora/yitian/devel WordCountTopology
    [2018-03-19 05:00:17 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
    [2018-03-19 05:00:19 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron04:2181  
    [2018-03-19 05:00:19 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting  
    [2018-03-19 05:00:19 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED  
    [2018-03-19 05:00:19 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.  
    [2018-03-19 05:00:19 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/WordCountTopology  
    [2018-03-19 05:00:20 -0700] [INFO] com.twitter.heron.spi.utils.TMasterUtils: Topology command ACTIVATE completed successfully.  
    [2018-03-19 05:00:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron04:2181  
    [2018-03-19 05:00:20 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
    [2018-03-19 05:00:20 +0000] [INFO]: Successfully activate topology: WordCountTopology

    15.2 Deactivate Topologies

    yitian@heron04:~$ heron deactivate aurora/yitian/devel WordCountTopology
    [2018-03-19 05:15:14 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
    [2018-03-19 05:15:14 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron04:2181  
    [2018-03-19 05:15:14 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting  
    [2018-03-19 05:15:15 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED  
    [2018-03-19 05:15:15 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.  
    [2018-03-19 05:15:15 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/WordCountTopology  
    [2018-03-19 05:15:15 -0700] [INFO] com.twitter.heron.spi.utils.TMasterUtils: Topology command DEACTIVATE completed successfully.  
    [2018-03-19 05:15:15 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron04:2181  
    [2018-03-19 05:15:15 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
    [2018-03-19 05:15:15 +0000] [INFO]: Successfully deactivate topology: WordCountTopology

    15.3 Kill Topologies

    yitian@ubuntu:~/.heron/conf/aurora$ heron kill aurora/yitian/devel WordCountTopology
    [2018-02-14 19:06:58 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
    [2018-02-14 19:06:59 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron01:2181 
    [2018-02-14 19:06:59 -0800] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting 
    [2018-02-14 19:06:59 -0800] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED 
    [2018-02-14 19:06:59 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized. 
    [2018-02-14 19:06:59 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /home/yitian/heron/state/topologies/WordCountTopology 
    [2018-02-14 19:07:00 -0800] [INFO] com.twitter.heron.scheduler.aurora.AuroraLauncher: Will try 5 attempts at interval: 2000 ms  
       INFO] Killing tasks for job: aurora/yitian/devel/WordCountTopology
       INFO] Instances to be killed: [0, 1]
    Successfully killed instances [0, 1]
    Job killall succeeded
    [2018-02-14 19:07:01 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /home/yitian/heron/state/packingplans/WordCountTopology 
    [2018-02-14 19:07:01 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /home/yitian/heron/state/executionstate/WordCountTopology 
    [2018-02-14 19:07:01 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /home/yitian/heron/state/topologies/WordCountTopology 
    [2018-02-14 19:07:01 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181 
    [2018-02-14 19:07:01 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
    [2018-02-14 19:07:01 +0000] [INFO]: Successfully kill topology: WordCountTopology

    常见问题

    1. heron使用hdfs作为statemanager,提交topology时,出现如下错误:

    yitian@ubuntu:~/heron$ heron submit aurora/yitian/devel --config-path ~/.heron/conf ~/.heron/examples/heron-api-examples.jar com.twitter.heron.examples.api.WordCountTopology WordCountTopology
    [2018-02-18 04:00:19 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
    [2018-02-18 04:00:19 +0000] [INFO]: Launching topology: 'WordCountTopology'
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/uploader/heron-dlog-uploader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/statemgr/heron-zookeeper-statemgr.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
    [2018-02-18 04:00:20 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron01:2181 
    [2018-02-18 04:00:20 -0800] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting 
    [2018-02-18 04:00:20 -0800] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED 
    [2018-02-18 04:00:20 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized. 
    [2018-02-18 04:00:20 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /home/yitian/heron/state/topologies/WordCountTopology 
    -test: java.net.UnknownHostException: heron
    Usage: hadoop fs [generic options] -test -[defsz] <path>
    [2018-02-18 04:00:23 -0800] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: The destination directory does not exist. Creating it now at URI 'hdfs://heron/topologies' 
    -mkdir: java.net.UnknownHostException: heron
    Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
    [2018-02-18 04:00:26 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181 
    [2018-02-18 04:00:26 -0800] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
    [2018-02-18 04:00:26 +0000] [ERROR]: Failed to create directory for topology package at URI 'hdfs://heron/topologies'
    [2018-02-18 04:00:26 +0000] [ERROR]: Failed to launch topology 'WordCountTopology'
    yitian@ubuntu:~/heron$

    解决方法:在设置heron的uploader为HDFS时,路径不需要带hdfs://。

    2. 多次提交相同拓扑时出现错误: INFO] Creating job WordCountTopology Job creation failed due to error: Job yitian/devel/WordCountTopology already exists

    问题详情及解决方法见Heron Job creation failed due to error: Job yitian/devel/WordCountTopology already exists

    3. Heron常用命令与常见问题及解决见:

    资料参考(References)

    Hadoop参考:

    Aurora+Mesos参考: