spark集群搭建-秋月博客

1搭建hadoop集群 1.1 hadoop集群搭建至少3台机器： [root@spark1 kafka]# cat /etc/hosts127.0.0.1 localhost localhost

1 搭建hadoop集群

1.1 hadoop集群搭建至少3台机器：

[root@spark1 kafka]# cat /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6172.18.0.30 spark1172.18.0.35 spark2172.18.0.37 spark3

1.2 这三台机器之间要能免密登录

敲命令ssh-keygen 后面几步直接敲回车生成公钥和私钥。

通过命令ssh-copy-id -i ~/.ssh/id_rsa.pub saprk2 把公钥拷贝到其他的机器上实现免密登录。这里就不多做演示。

1.3 下载hadoop-2.4.1.tar.gz 并上传到spark1上，解压，删除原压缩包，修改解压包名字为hadoop

[root@spark1 kafka]# cd /opt/[root@spark1 opt]# lltotal 44drwxr-xr-x  10  502 dialout  4096 May 23 14:41 hadoop  # hadoop的解压包drwxr-xr-x   8 root root     4096 May 23 15:12 hivedrwxr-xr-x   6 root root     4096 May 23 17:27 kafkadrwx------.  2 root root    16384 May 22 11:53 lost+founddrwxrwxr-x   6 2000    2000  4096 Oct 24  2014 scaladrwxrwxrwx  19 root root     4096 Feb  5  2014 slf4j-1.7.6drwxr-xr-x  11  501 games    4096 May 23 16:02 zk-rw-r--r--   1 root root     1033 May 23 16:11 zookeeper.out

1.4 安装java jdk

[root@spark1 opt]# yum list|grep jdk|grep 1.8java-1.8.0-openjdk.x86_64              1:1.8.0.91-1.b14.el6               @cdromjava-1.8.0-openjdk-devel.x86_64        1:1.8.0.91-1.b14.el6               @cdromjava-1.8.0-openjdk-headless.x86_64     1:1.8.0.91-1.b14.el6               @cdromjdk1.8.0_131.i586                      2000:1.8.0_131-fcs                 installed# 安装java-1.8.0[root@spark1 opt]# which java/usr/bin/java[root@spark1 opt]# java -versionopenjdk version "1.8.0_91"OpenJDK Runtime Environment (build 1.8.0_91-b14)OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)# jdk安装成功

1.5 配置环境变量

[root@spark1 opt]# cat ~/.bashrc# .bashrc# User specific aliases and functionsexport JAVA_HOME=/usrexport HADOOP_HOME=/opt/hadoopexport HIVE_HOME=/opt/hiveexport ZOOKEEPER_HOME=/opt/zkexport SCALA_HOME=/opt/scalaexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/binalias rm='rm -i'alias cp='cp -i'alias mv='mv -i'# Source global definitionsif [ -f /etc/bashrc ]; then	. /etc/bashrcfi# 配置好java环境变量和hadoop环境变量

[root@spark1 ~]# source ~/.bashrc 让环境变量生效

1.6 配置hadoop集群

进入hadoop/etc/hadoop目录

1.6.1 配置core-site.xml

[root@spark1 hadoop]# cat core-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property>  <name>fs.default.name</name>  <value>hdfs://spark1:9000</value></property></configuration>

1.6.2 配置 hdfs-site.xml

[root@spark1 hadoop]# cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property>  <name>dfs.name.dir</name>  <value>/usr/local/data/namenode</value> # 配置hdfs文件系统命名空间结点的存放路径</property><property>  <name>dfs.data.dir</name>  <value>/usr/local/data/datanode</value> # 配置hdfs文件系统数据存储结点的路径</property><property>  <name>dfs.tmp.dir</name>                # 配置临时路径  <value>/usr/local/data/tmp</value></property><property>  <name>dfs.replication</name>  <value>3</value>                        # 配置结点数</property></configuration>

注意：上面自定义的存放路径要在spark2，spark3中手动创建，切spark1不可手动创建，否则后面启动的时候namenode进程起不起来。

1.6.3 配置mapred-site.xml

[root@spark1 hadoop]# cat mapred-site.xml.template <?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><!-- Put site-specific property overrides in this file. --><configuration><property>  <name>mapreduce.framework.name</name>  <value>yarn</value></property></configuration>

1.6.4 配置slaves

[root@spark1 hadoop]# cat slaves spark1spark2spark3# 写上所有的结点

1.6.5 配置yarn-site.xml

[root@spark1 hadoop]# cat yarn-site.xml <?xml version="1.0"?><!--  Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at    http://www.apache.org/licenses/LICENSE-2.0  Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License. See accompanying LICENSE file.--><configuration><!-- Site specific YARN configuration properties --><property>  <name>yarn.resourcemanager.hostname</name>  <value>spark1</value>                     # 指定主结点为spark1</property><property>  <name>yarn.nodemanager.aux-services</name>  <value>mapreduce_shuffle</value></property></configuration>

1.7 配置好的hadoop文件和~/.bashrc 都拷贝到其他spark2和spark3中，而且记得要用source ~/.bashrc 让环境变量生效

1.8 启动hadoop

start-dfs.sh 启动hdfs

start-yarn.sh 启动yarn

1.9 检查

[root@spark1 hadoop]# jps3921 Jps2499 SecondaryNameNode2089 NameNode2154 DataNode2986 ResourceManager3086 NodeManager#spark1 主结点中一定要有namenode,datanode,SecondaryNameNode,ResourceManager,NodeManager[root@spark2 kafka]# jps2165 NodeManager2406 Jps1932 DataNode# spark3 也一样

到此hadoop 搭建成功

登陆50070端口查看hdfs的页面

登陆8088端口访问hadoop页面

2 搭建hive

hive 的元数据还是记录在mysql中的，所以我们要先安装一个mysql

2.1 安装mysql

[root@spark1 kafka]# yum install mysql-server.x86_64 -y

启动mysql请配置为开机启动

/etc/init.d/mysqld start

chkconfig mysqld no

[root@spark1 kafka]# chkconfig mysqld --listmysqld         	0:off	1:off	2:on	3:on	4:on	5:on	6:off[root@spark1 kafka]# /etc/init.d/mysqld statusmysqld (pid  1906) is running...

使用yum安装mysql connect

[root@spark1 kafka]# yum list|grep mysql-connector-javamysql-connector-java.noarch            1:5.1.17-6.el6                     @cdrom# 安装这个

2.2 安装hive

将apache-hive-0.13.1-bin.tar.gz 放在spark1的相关目录并解压，删除压缩包，修改解压包名称为hive

配置hive环境变量

[root@spark1 kafka]# cat ~/.bashrc|grep HIVEexport HIVE_HOME=/opt/hive        # 这个就是解压更名后的hiveexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

注意：修改环境变量后一定要source

2.3 配置hive的元数据存放数据库mysql

将mysql connector拷贝到hive的lib包中

[root@spark1 kafka]# ll /usr/share/java/mysql-connector-java-5.1.17.jar -rw-r--r-- 1 root root 819803 Jun 22  2012 /usr/share/java/mysql-connector-java-5.1.17.jar[root@spark1 kafka]# cp /usr/share/java/mysql-connector-java-5.1.17.jar /opt/hive/lib/

在mysql 上件一个hive元数据库并对hive用户授权

create database if not exists hive_metadata;grant all privileges on hive_metadata.* to 'hive'@'%' identified by 'hive';grant all privileges on hive_metadata.* to 'hive'@'localhost' identified by 'hive';grant all privileges on hive_metadata.* to 'hive'@'spark1' identified by 'hive';flush privileges;use hive_metadata;

[root@spark1 kafka]# mysqlWelcome to the MySQL monitor.  Commands end with ; or /g.Your MySQL connection id is 11Server version: 5.1.73 Source distributionCopyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.Oracle is a registered trademark of Oracle Corporation and/or itsaffiliates. Other names may be trademarks of their respectiveowners.Type 'help;' or '/h' for help. Type '/c' to clear the current input statement.mysql> show databases;+--------------------+| Database           |+--------------------+| information_schema || hive_metadata      || mysql              || test               |+--------------------+4 rows in set (0.01 sec)mysql> use hive_metadata;Reading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -ADatabase changedmysql> show tables;+---------------------------+| Tables_in_hive_metadata   |+---------------------------+| BUCKETING_COLS            || CDS                       || COLUMNS_V2                || DATABASE_PARAMS           || DBS                       || FUNCS                     || FUNC_RU                   || GLOBAL_PRIVS              || IDXS                      || INDEX_PARAMS              || PARTITIONS                || PARTITION_KEYS            || PARTITION_KEY_VALS        || PARTITION_PARAMS          || PART_COL_PRIVS            || PART_COL_STATS            || PART_PRIVS                || ROLES                     || SDS                       || SD_PARAMS                 || SEQUENCE_TABLE            || SERDES                    || SERDE_PARAMS              || SKEWED_COL_NAMES          || SKEWED_COL_VALUE_LOC_MAP  || SKEWED_STRING_LIST        || SKEWED_STRING_LIST_VALUES || SKEWED_VALUES             || SORT_COLS                 || TABLE_PARAMS              || TAB_COL_STATS             || TBLS                      || TBL_COL_PRIVS             || TBL_PRIVS                 || VERSION                   |+---------------------------+35 rows in set (0.00 sec)#这些表不是手动创建的，只要配置好hive后续会自动创建的

2.3.1 配置hive-site.xml

[root@spark1 conf]# pwd/opt/hive/conf#进入hive/conf目录# 下面配置hive链接mysql的url[root@spark1 conf]# vim hive-site.xml<property>  <name>javax.jdo.option.ConnectionURL</name>  <value>jdbc:mysql://spark1:3306/hive_metadata?createDatabaseIfNotExist=true</value>  <description>JDBC connect string for a JDBC metastore</description></property># 配置jdbc驱动<property>  <name>javax.jdo.option.ConnectionDriverName</name>  <value>com.mysql.jdbc.Driver</value>  <description>Driver class name for a JDBC metastore</description></property>#配置用户名密码<property>  <name>javax.jdo.option.ConnectionUserName</name>  <value>hive</value>   #用户名hive  <description>username to use against metastore database</description></property><property>  <name>javax.jdo.option.ConnectionPassword</name>  <value>hive</value>   # 密码hive  <description>password to use against metastore database</description></property><property>  <name>hive.metastore.warehouse.dir</name>  <value>/user/hive/warehouse</value>  <description>location of default database for the warehouse</description></property>

2.3.2 配置hive-env.sh 和hive_config.xml

在hive_config.xml 里面添加环境变量

[root@spark1 bin]# cat hive-config.sh |grep exportexport JAVA_HOME=/usrexport HADOOP_HOME=/opt/hadoopexport HIVE_HOME=/opt/hive

2.4 检查hive

直接敲hive命令就可以进入hive

[root@spark1 bin]# hive17/05/24 10:20:07 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* insteadLogging initialized using configuration in jar:file:/opt/hive/lib/hive-common-0.13.1.jar!/hive-log4j.propertieshive> show databases;OKdefaultTime taken: 0.648 seconds, Fetched: 1 row(s)hive>

3 按照zookeeper包

3.1 上传zookeeper包

zookeeper-3.4.5.tar.gz 上传，解压，删除压缩包，重命名，这些都不多做演示了。

3.2 配置环境变量

[root@spark1 bin]# cat ~/.bashrc|grep ZOOexport ZOOKEEPER_HOME=/opt/zkexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

3.3 修改配置文件

重命名后的压缩包为zk，

[root@spark1 opt]# cd zk/confmv zoo_sample.cfg zoo.cfgvi zoo.cfg修改：dataDir=/usr/local/zk/data新增：server.0=spark1:2888:3888	server.1=spark2:2888:3888server.2=spark3:2888:3888

[root@spark1 conf]# cat zoo.cfg # The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use /tmp for storage, /tmp here is just # example sakes.dataDir=/opt/zk/data# the port at which the clients will connectclientPort=2181## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable auto purge feature#autopurge.purgeInterval=1server.0=spark1:2888:3888	server.1=spark2:2888:3888server.2=spark3:2888:3888

在zk/data目录夏创建一个myid的文件，里面写上结点的序号（从0开始）

[root@spark1 conf]# cd ..[root@spark1 zk]# cd data/[root@spark1 data]# cat myid 0

3.4 zk软件包都拷贝到其他结点上

zk软件包，环境变量都拷贝到其他节点上，注意要source。而且zk/data/myid 里面的序号修改一下就可以了

[root@spark2 data]# cat myid 1[root@spark3 data]# cat myid 2

3.5 启动zookeeper

在所有的结点上分别运行zkServer.sh start并检查

[root@spark1 data]# zkServer.sh statusJMX enabled by defaultUsing config: /opt/zk/bin/../conf/zoo.cfgMode: follower[root@spark1 data]# jps|grep QuorumPeerMain3403 QuorumPeerMain#主节点zookeeper启动成功[root@spark2 data]# zkServer.sh statusJMX enabled by defaultUsing config: /opt/zk/bin/../conf/zoo.cfgMode: follower[root@spark2 data]# jps|grep QuorumPeerMain2299 QuorumPeerMain# 从结点启动成功[root@spark3 data]# zkServer.sh statusJMX enabled by defaultUsing config: /opt/zk/bin/../conf/zoo.cfgMode: leader[root@spark3 data]# jps|grep Q2218 QuorumPeerMain# 3个结点都成功启动zookeeper

4 搭建kafka集群

4.1 准备工作

4.1.1 安装scala

scala-2.11.4.tgz 上传到相应目录，并解压，删除压缩包，重命名解压文件名为scala

配置环境变量

[root@spark1 data]# cat ~/.bashrc|grep SCAexport SCALA_HOME=/opt/scalaexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin

检查scala的版本，并把安装包和环境变量拷贝到其他结点

root@spark1 data]# scala -versionScala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL

4.2 安装kafka包

4.2.1 准备工作

kafka_2.9.2-0.8.1.tgz上传到相应目录，解压，删除压缩包，修改解压包名称为kafka

4.2.2 配置kafka

[root@spark1 opt]# cd kafka/config/[root@spark1 config]# cat server.properties |grep connect# Zookeeper connection string (see zookeeper docs for details).zookeeper.connect=spark1:2181,spark2:2181,spark3:2181# 配置好spark1,spark2,spark3[root@spark1 config]# cat server.properties |grep broker.idbroker.id=0#broker.id设置为0

4.2.3 后续安装

slf4j-1.7.6.zip 上传到相应目录，解压

解压包里面的slf4j-nop-1.7.6.jar 拷贝到kafk/libs目录下

4.2.4 安装其他结点

kafka的安装包，环境变量都拷贝到其他结点并source 环境变量

修改其他结点配置文件里面的broker.id 分别为1，2，以此类推

4.2.5 启动kafka

[root@spark1 kafka]# nohup bin/kafka-server-start.sh config/server.properties &# 运行这个命令[root@spark1 kafka]# jps|grep K3477 Kafka# kafka启动成功#其他结点也一样

4.2.6 验证kafka

在主结点spark1上kafka/目录下运行：

bin/kafka-topics.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic TestTopic --replication-factor 1 --partitions 1 --create

[root@spark1 kafka]# bin/kafka-topics.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic TestTopic --replication-factor 1 --partitions 1 --createError while executing topic command Topic "TestTopic" already exists.kafka.common.TopicExistsException: Topic "TestTopic" already exists.	at kafka.admin.AdminUtils$.createOrUpdateTopicPartitionAssignmentPathInZK(AdminUtils.scala:171)	at kafka.admin.AdminUtils$.createTopic(AdminUtils.scala:156)	at kafka.admin.TopicCommand$.createTopic(TopicCommand.scala:88)	at kafka.admin.TopicCommand$.main(TopicCommand.scala:50)	at kafka.admin.TopicCommand.main(TopicCommand.scala)# 这个结点之前已经用这个梦里创建过TestTopic 管道,所以现在再次创建保存说已经存在了.第一次创建就不报错.

在spark1上开两个终端1和2

1上运行bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic TestTopic

通过端口9092传输数据

2 上运行bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic TestTopic --from-beginning

通过端口2181接受数据

[root@spark1 kafka]# bin/kafka-console-producer.sh --broker-list spark1:9092,spark2:9092,spark3:9092 --topic TestTopic20170524# 通过数据传送端输入一个字符串[root@spark1 kafka]# bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic TestTopic --from-beginninghello workdwangzilongbaobao20170524# 在接收端能马上收到输入的字符串,而且还能看到之前队列中输入的字符串

不仅仅在同一个结点上能看到传送过来的数据，在不同结点上也可以

[root@spark2 kafka]# bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic TestTopic --from-beginninghello workdwangzilongbaobao20170524# spark2上也可以看见[root@spark3 kafka]# bin/kafka-console-consumer.sh --zookeeper spark1:2181,spark2:2181,spark3:2181 --topic TestTopic --from-beginninghello workdwangzilongbaobao20170524# spark3上也可以接收到

kafka集群搭建成功

5 搭建spark集群

5.1 准备工作

spark-1.3.0-bin-hadoop2.4.tgz 上传到相应的目录，解压，删除压缩包，修改解压包名称为spark

5.2 设置环境变量

[root@spark1 sbin]# cat ~/.bashrc |grep SPARKexport SPARK_HOME=/opt/sparkexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$ZOOKEEPER_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin# 下面再添加上classpath[root@spark1 sbin]# cat ~/.bashrc |grep CLASSPATHexport CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

5.3 修改配置文件

[root@spark1 opt]# cd spark/conf/# 修改spark-env.sh[root@spark1 conf]# cat spark-env.sh |grep exportexport JAVA_HOME=/usrexport SCALA_HOME=/opt/scalaexport SPARK_MASTER_IP=spark1export SPARK_WORKER_MEMORY=1gexport HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop# 其实这个文件里面都是空的，添加上环境变量就好了

5.4 修改slave文件

在slave里面添加上所有的结点

[root@spark1 conf]# cat slaves # A Spark Worker will be started on each of the machines listed below.spark2spark3# 这里由于spark比较吃内存，所以主结点spark1 就不添加了

5.5 设置其他结点

安装包，环境变量~/.bashrc 文件都拷贝到其他结点上，并source 环境变量

5.6 启动

[root@spark1 opt]# cd spark/sbin/[root@spark1 sbin]# startstart                start-all.sh         start-dfs.cmd        start-secure-dns.sh  startx               start-yarn.sh        start-all.cmd        start-balancer.sh    start-dfs.sh         start_udev           start-yarn.cmd       [root@spark1 sbin]# ./start-all.sh

5.7 检查

[root@spark1 sbin]# jps|grep Mas5001 Master# 主节点master[root@spark2 opt]# jps|grep Wor2644 Worker#从结点work[root@spark3 opt]# jps|grep W2618 Worker

登陆8080端口访问spark的页面

通过spark-shell直接后面命令行访问spark

[root@spark1 spark]# spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath17/05/24 13:49:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/05/24 13:49:50 INFO spark.SecurityManager: Changing view acls to: root17/05/24 13:49:51 INFO spark.SecurityManager: Changing modify acls to: root17/05/24 13:49:51 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)17/05/24 13:49:51 INFO spark.HttpServer: Starting HTTP Server17/05/24 13:49:51 INFO server.Server: jetty-8.y.z-SNAPSHOT17/05/24 13:49:51 INFO server.AbstractConnector: Started [email protected]:4689517/05/24 13:49:51 INFO util.Utils: Successfully started service 'HTTP class server' on port 46895.Welcome to      ____              __     / __/__  ___ _____/ /__    _/ // _ // _ `/ __/  '_/   /___/ .__//_,_/_/ /_//_/   version 1.3.0      /_/Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.8.0_91)Type in expressions to have them evaluated.Type :help for more information.17/05/24 13:49:59 INFO spark.SparkContext: Running Spark version 1.3.017/05/24 13:49:59 INFO spark.SecurityManager: Changing view acls to: root17/05/24 13:49:59 INFO spark.SecurityManager: Changing modify acls to: root17/05/24 13:49:59 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)17/05/24 13:50:00 INFO slf4j.Slf4jLogger: Slf4jLogger started17/05/24 13:50:00 INFO Remoting: Starting remoting17/05/24 13:50:01 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@spark1:52597]17/05/24 13:50:01 INFO util.Utils: Successfully started service 'sparkDriver' on port 52597.17/05/24 13:50:01 INFO spark.SparkEnv: Registering MapOutputTracker17/05/24 13:50:01 INFO spark.SparkEnv: Registering BlockManagerMaster17/05/24 13:50:01 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-2d22f8c2-01d4-465a-80ea-8ca43ec5adac/blockmgr-30236a6d-6d00-4daf-aa1b-1e488a568f8417/05/24 13:50:01 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB17/05/24 13:50:01 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-f78ab7dc-0bed-425e-bd6b-7410a68a394f/httpd-0ef36ee2-efb1-44df-a7a8-74e6a91d584717/05/24 13:50:01 INFO spark.HttpServer: Starting HTTP Server17/05/24 13:50:01 INFO server.Server: jetty-8.y.z-SNAPSHOT17/05/24 13:50:01 INFO server.AbstractConnector: Started [email protected]:4484017/05/24 13:50:01 INFO util.Utils: Successfully started service 'HTTP file server' on port 44840.17/05/24 13:50:01 INFO spark.SparkEnv: Registering OutputCommitCoordinator17/05/24 13:50:01 INFO server.Server: jetty-8.y.z-SNAPSHOT17/05/24 13:50:02 INFO server.AbstractConnector: Started [email protected]:404017/05/24 13:50:02 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.17/05/24 13:50:02 INFO ui.SparkUI: Started SparkUI at http://spark1:404017/05/24 13:50:02 INFO executor.Executor: Starting executor ID <driver> on host localhost17/05/24 13:50:02 INFO executor.Executor: Using REPL class URI: http://172.18.0.30:4689517/05/24 13:50:02 INFO util.AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@spark1:52597/user/HeartbeatReceiver17/05/24 13:50:02 INFO netty.NettyBlockTransferService: Server created on 4995217/05/24 13:50:02 INFO storage.BlockManagerMaster: Trying to register BlockManager17/05/24 13:50:02 INFO storage.BlockManagerMasterActor: Registering block manager localhost:49952 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 49952)17/05/24 13:50:02 INFO storage.BlockManagerMaster: Registered BlockManager17/05/24 13:50:03 INFO repl.SparkILoop: Created spark context..Spark context available as sc.17/05/24 13:50:04 INFO repl.SparkILoop: Created sql context (with Hive support)..SQL context available as sqlContext.scala>

目录CONTENT

spark集群搭建

评论区