Skip to content

Commit 87c6bee

Browse files
committed
#120 add test sample: scala, hive and spark-sql
1 parent d5ef98d commit 87c6bee

File tree

14 files changed

+275
-0
lines changed

14 files changed

+275
-0
lines changed
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# DSS用户测试样例1:Scala
2+
3+
DSS用户测试样例的目的是为平台新用户提供一组测试样例,用于熟悉DSS的常见操作,并验证DSS平台的正确性
4+
5+
![image-20200408211243941](../../../images/zh_CN/chapter3/tests/home.png)
6+
7+
## 1.1 Spark Core(入口函数sc)
8+
9+
在script中,已经默认为您注册了SparkContext,所以直接使用sc即可:
10+
11+
### 1.1.1 单Value算子(Map算子为例)
12+
13+
```scala
14+
val rddMap = sc.makeRDD(Array((1,"a"),(1,"d"),(2,"b"),(3,"c")),4)
15+
val res = rddMap.mapValues(data=>{data+"||||"})
16+
res.collect().foreach(data=>println(data._1+","+data._2))
17+
```
18+
19+
### 1.1.2 双Value算子(union算子为例)
20+
21+
```scala
22+
val rdd1 = sc.makeRDD(1 to 5)
23+
val rdd2 = sc.makeRDD(6 to 10)
24+
val rddCustom = rdd1.union(rdd2)
25+
rddCustom.collect().foreach(println)
26+
```
27+
28+
### 1.1.3 K-V算子(reduceByKey算子为例子)
29+
30+
```scala
31+
val rdd1 = sc.makeRDD(List(("female",1),("male",2),("female",3),("male",4)))
32+
val rdd2 = rdd1.reduceByKey((x,y)=>x+y)
33+
rdd2.collect().foreach(println)
34+
```
35+
36+
### 1.1.4 执行算子(以上collect算子为例)
37+
38+
### 1.1.5 从hdfs上读取文件并做简单执行
39+
40+
```scala
41+
case class Person(name:String,age:String)
42+
val file = sc.textFile("/test.txt")
43+
val person = file.map(line=>{
44+
val values=line.split(",")
45+
46+
Person(values(0),values(1))
47+
})
48+
val df = person.toDF()
49+
df.select($"name").show()
50+
```
51+
52+
53+
54+
## 1.2 UDF函数测试
55+
56+
### 1.2.1 函数定义
57+
58+
59+
60+
```scala
61+
def ScalaUDF3(str: String): String = "hello, " + str + "this is a third attempt"
62+
```
63+
64+
### 1.2.2 注册函数
65+
66+
函数-》个人函数-》右击新增spark函数=》注册方式同常规spark开发
67+
68+
![img](../../../images/zh_CN/chapter3/tests/udf1.png)
69+
70+
## 1.3 UDAF函数测试
71+
72+
### 1.3.1 Jar包上传
73+
74+
​ idea上开发一个求平均值的udaf函数,打成jar(wordcount)包,上传dss jar文件夹。
75+
76+
![img](../../../images/zh_CN/chapter3/tests/udf2.png)
77+
78+
### 1.3.2 注册函数
79+
80+
函数-》个人函数-》右击新增普通函数=》注册方式同常规spark开发
81+
82+
![img](../../../images/zh_CN/chapter3/tests/udf-3.png)
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# DSS用户测试样例2:Hive
2+
3+
DSS用户测试样例的目的是为平台新用户提供一组测试样例,用于熟悉DSS的常见操作,并验证DSS平台的正确性
4+
5+
![image-20200408211243941](../../../images/zh_CN/chapter3/tests/home.png)
6+
7+
## 2.1 数仓建表
8+
9+
​ 进入“数据库”页面,点击“+”,依次输入表信息、表结构和分区信息即可创建数据库表:
10+
11+
<img src="../../../images/zh_CN/chapter3/tests/hive1.png" alt="image-20200408212604929" style="zoom:50%;" />
12+
13+
![img](../../../images/zh_CN/chapter3/tests/hive2.png)
14+
15+
​ 通过以上流程,分别创建部门表dept、员工表emp和分区员工表emp_partition,建表语句如下:
16+
17+
```sql
18+
create external table if not exists default.dept(
19+
deptno int,
20+
dname string,
21+
loc int
22+
)
23+
row format delimited fields terminated by '\t';
24+
25+
create external table if not exists default.emp(
26+
empno int,
27+
ename string,
28+
job string,
29+
mgr int,
30+
hiredate string,
31+
sal double,
32+
comm double,
33+
deptno int
34+
)
35+
row format delimited fields terminated by '\t';
36+
37+
create table if not exists emp_partition(
38+
empno int,
39+
ename string,
40+
job string,
41+
mgr int,
42+
hiredate string,
43+
sal double,
44+
comm double,
45+
deptno int
46+
)
47+
partitioned by (month string)
48+
row format delimited fields terminated by '\t';
49+
```
50+
51+
## 2.2 基本SQL语法测试
52+
53+
### 2.2.1 简单查询
54+
55+
```sql
56+
select * from dept;
57+
```
58+
59+
### 2.2.2 Join连接
60+
61+
```sql
62+
select * from emp
63+
left join dept
64+
on emp.deptno = dept.deptno;
65+
```
66+
67+
### 2.2.3 聚合函数
68+
69+
```sql
70+
select dept.dname, avg(sal) as avg_salary
71+
from emp left join dept
72+
on emp.deptno = dept.deptno
73+
group by dept.dname;
74+
```
75+
76+
### 2.2.4 内置函数
77+
78+
```sql
79+
select ename, job,sal,
80+
rank() over(partition by job order by sal desc) sal_rank
81+
from emp;
82+
```
83+
84+
### 2.2.5 分区表简单查询
85+
86+
```sql
87+
show partitions emp_partition;
88+
select * from emp_partition where month='202001';
89+
```
90+
91+
### 2.2.6 分区表联合查询
92+
93+
```sql
94+
select * from emp_partition where month='202001'
95+
union
96+
select * from emp_partition where month='202002'
97+
union
98+
select * from emp_partition where month='202003'
99+
```
100+
101+
## 2.3 UDF函数测试
102+
103+
### 2.3.1 Jar包上传
104+
105+
进入scripts页面后,右键目录路径上传jar包:
106+
107+
![img](../../../images/zh_CN/chapter3/tests/hive3.png)
108+
109+
### 4.3.2 自定义函数
110+
111+
进入“UDF函数”选项(如1),右击“个人函数”目录,选择“新增函数”:
112+
113+
<img src="../../../images/zh_CN/chapter3/tests/hive4.png" alt="image-20200408214033801" style="zoom: 50%;" />
114+
115+
输入函数名称、选择jar包、并填写注册格式、输入输出格式即可创建函数:
116+
117+
![img](../../../images/zh_CN/chapter3/tests/hive5.png)
118+
119+
<img src="../../../images/zh_CN/chapter3/tests/hive-6.png" alt="image-20200409155418424" style="zoom: 67%;" />
120+
121+
获得的函数如下:
122+
123+
![img](../../../images/zh_CN/chapter3/tests/hive7.png)
124+
125+
### 4.3.3 利用自定义函数进行SQL查询
126+
127+
完成函数注册后,可进入工作空间页面创建.hql文件使用函数:
128+
129+
```sql
130+
select deptno,ename, rename(ename) as new_name
131+
from emp;
132+
```
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# DSS用户测试样例3:SparkSQL
2+
3+
DSS用户测试样例的目的是为平台新用户提供一组测试样例,用于熟悉DSS的常见操作,并验证DSS平台的正确性
4+
5+
![image-20200408211243941](../../../images/zh_CN/chapter3/tests/home.png)
6+
7+
## 3.1RDD与DataFrame转换
8+
9+
### 3.1.1 RDD转为DataFrame
10+
11+
```scala
12+
case class MyList(id:Int)
13+
14+
val lis = List(1,2,3,4)
15+
16+
val listRdd = sc.makeRDD(lis)
17+
import spark.implicits._
18+
val df = listRdd.map(value => MyList(value)).toDF()
19+
20+
df.show()
21+
```
22+
23+
### 3.1.2 DataFrame转为RDD
24+
25+
```scala
26+
case class MyList(id:Int)
27+
28+
val lis = List(1,2,3,4)
29+
val listRdd = sc.makeRDD(lis)
30+
import spark.implicits._
31+
val df = listRdd.map(value => MyList(value)).toDF()
32+
println("------------------")
33+
34+
val dfToRdd = df.rdd
35+
36+
dfToRdd.collect().foreach(print(_))
37+
```
38+
39+
## 3.2 DSL语法风格实现
40+
41+
```scala
42+
val df = df1.union(df2)
43+
val dfSelect = df.select($"department")
44+
dfSelect.show()
45+
```
46+
47+
## 3.3 SQL语法风格实现(入口函数sqlContext)
48+
49+
```scala
50+
val df = df1.union(df2)
51+
52+
df.createOrReplaceTempView("dfTable")
53+
val innerSql = """
54+
SELECT department
55+
FROM dfTable
56+
"""
57+
val sqlDF = sqlContext.sql(innerSql)
58+
sqlDF.show()
59+
```
60+
61+
36.1 KB
Loading
25.9 KB
Loading
37 KB
Loading
19.2 KB
Loading
41.8 KB
Loading
14.6 KB
Loading
17 KB
Loading

0 commit comments

Comments
 (0)