本文共 5508 字,大约阅读时间需要 18 分钟。
SparkSQL需要的是hive表的元数据,如果配置了metastore的服务,直接去服务里面拿,如果没有,那就只能去mysql中间拿
1. 将hive的hive-site.xml文件复制或者软连接到spark的conf文件夹中
cd /opt/modules/spark-2.1.0-bin-2.7.3/conf/ln -s /opt/modules/hive-1.2.1/conf/hive-site.xml
2.hive的配置文件hive-site.xml
hive.exec.mode.local.auto true Let Hive determine whether to run in local mode automatically javax.jdo.option.ConnectionURL jdbc:mysql://bigdata.ibeifeng.com:3306/metastore?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword 123456 hive.cli.print.current.db true Whether to include the current database in the Hive prompt. hive.cli.print.header true Whether to print the names of the columns in query output. hive.metastore.uris thrift://bigdata.ibeifeng.com:9083 IP address (or fully-qualified domain name) and port of the metastore host hive.exec.max.dynamic.partitions.pernode 100 Maximum number of dynamic partitions allowed to be created in each mapper/reducer node. hive.exec.max.dynamic.partitions 1000 Maximum number of dynamic partitions allowed to be created in total. hive.exec.dynamic.partition true Whether or not to allow dynamic partitions in DML/DDL. hive.exec.dynamic.partition.mode nonstrict In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. hive.support.sql11.reserved.keywords false hbase.zookeeper.quorum bigdata.ibeifeng.com hive.server2.thrift.bind.host bigdata.ibeifeng.com
3.根据hive的配置文件的hive.metastore.uris参数的配置值选择不同的操作方式
(1) 如果没有给定参数(默认情况)
将hive元数据数据库的驱动包(就是mysql)添加到spark的classpath环境变量中即可完成spark和hive的集成
(2) 给定具体的metastore服务所在的节点信息(值非空,一般情况下用这种比较多)
(a)启动hive的metastore服务(目录下:/opt/modules/hive-1.2.1)
bin/hive --service metastore &
(b)完成spark和hive的集成
1.Hive创建表,插入测试数据
【员工表】create table emp(empno int,ename string,job string,mgr int,hiredate string,sal double,comm double,deptno int)row format delimited fields terminated by '\t';load data local inpath '/opt/datas/emp.txt' into table emp;加载数据:overwriteload data local inpath '/opt/datas/emp.txt' overwrite into table emp;【部门表】create table dept(deptno int,dname string,loc string)row format delimited fields terminated by '\t';load data local inpath '/opt/datas/dept.txt' into table dept;
2.测试两种方式
(1)spark-sql测试
-》开启
./bin/spark-sql
-》运行语句
spark-sql (default)> select * from default.emp;spark-sql (default)> select * from default.emp a join default.dept b on a.deptno = b.deptno; spark-sql (default)> explain select * from default.emp a join default.dept b on a.deptno = b.deptno;
(2)spark-shell测试
-》开启
./bin/spark-shell
-》运行语句
scala>spark.sqlContextscala>spark.sqlContext.sql("select * from default.emp a join default.dept b on a.deptno = b.deptno").show()结果:+-----+------+---------+----+----------+------+------+------+------+----------+--------+|empno| ename| job| mgr| hiredate| sal| comm|deptno|deptno| dname| loc|+-----+------+---------+----+----------+------+------+------+------+----------+--------+| 7369| SMITH| CLERK|7902|1980-12-17| 800.0| null| 20| 20| RESEARCH| DALLAS|| 7499| ALLEN| SALESMAN|7698| 1981-2-20|1600.0| 300.0| 30| 30| SALES| CHICAGO|| 7521| WARD| SALESMAN|7698| 1981-2-22|1250.0| 500.0| 30| 30| SALES| CHICAGO|| 7566| JONES| MANAGER|7839| 1981-4-2|2975.0| null| 20| 20| RESEARCH| DALLAS|| 7654|MARTIN| SALESMAN|7698| 1981-9-28|1250.0|1400.0| 30| 30| SALES| CHICAGO|| 7698| BLAKE| MANAGER|7839| 1981-5-1|2850.0| null| 30| 30| SALES| CHICAGO|| 7782| CLARK| MANAGER|7839| 1981-6-9|2450.0| null| 10| 10|ACCOUNTING|NEW YORK|| 7788| SCOTT| ANALYST|7566| 1987-4-19|3000.0| null| 20| 20| RESEARCH| DALLAS|| 7839| KING|PRESIDENT|null|1981-11-17|5000.0| null| 10| 10|ACCOUNTING|NEW YORK|| 7844|TURNER| SALESMAN|7698| 1981-9-8|1500.0| 0.0| 30| 30| SALES| CHICAGO|| 7876| ADAMS| CLERK|7788| 1987-5-23|1100.0| null| 20| 20| RESEARCH| DALLAS|| 7900| JAMES| CLERK|7698| 1981-12-3| 950.0| null| 30| 30| SALES| CHICAGO|| 7902| FORD| ANALYST|7566| 1981-12-3|3000.0| null| 20| 20| RESEARCH| DALLAS|| 7934|MILLER| CLERK|7782| 1982-1-23|1300.0| null| 10| 10|ACCOUNTING|NEW YORK|+-----+------+---------+----+----------+------+------+------+------+----------+--------+
1.参考blog:,注意外网访问服务器,需要配置datanode使用hostname的属性,否则无法访问
转载地址:http://gxygi.baihongyu.com/