Presto
Presto是一个分布式SQL查询引擎,用于查询分布在一个或多个不同数据源中的大数据集。完整安装包括一个Coordinator和多个Worker。 由客户端提交查询,从Presto命令行CLI提交到Coordinator。 Coordinator进行解析,分析并执行查询计划,然后分发处理队列到Worker。
Presto是完全基于内存的分布式大数据查询引擎,所有查询和计算都在内存中执行。
Presto的输入是SQL语句;输出是具体的SQL执行结果。
Presto可以对接不同的数据源,例如MySQL、Hive等。
Presto可以对SQL的查询过程进行优化,包括SQL本身的执行计划优化,以及用分布式查询提高并发等。
Presto不是数据库,并不能处理在线事务。
ceph
Ceph是当前非常流行的开源分布式存储系统,具有高扩展性、高性能、高可靠性等优点,同时提供块存储服务(rbd)、对象存储服务(rgw)以及文件系统存储服务(cephfs),Ceph在存储的时候充分利用存储节点的计算能力,在存储每一个数据时都会通过计算得出该数据的位置,尽量的分布均衡。
rook
Rook是一个开放源码的云本机存储协调器,提供平台、框架和对各种存储解决方案的支持,以便与云本机环境进行本机集成。
Rook将存储软件转变为自我管理、自我扩展和自我修复的存储服务。它通过自动化部署、引导、配置、供应、扩展、升级、迁移、灾难恢复、监视和资源管理来实现这一点。Rook使用底层云本地容器管理、调度和协调平台提供的设施来执行其职责。
Rook利用扩展点深入集成到云本机环境中,为调度、生命周期管理、资源管理、安全、监控和用户体验提供无缝体验。
基于kubernetes部署presto
将presto数据持久化到ceph集群中,保证presto的数据高可用
- Kubernetes v1.15(
单节点环境亦可
) - Rook v1.0.2
- CentOS 7
- presto 332(prestosql版本)
- jdk-8
rook的部署可以使用kubernetes的包管理工具helm安装,由于rook及kubernetes版本众多以及涉及到相关源等问题,本文没有采用helm安装方式部署rook
$ git clone https://github.com/rook/rook.git
$ cd rook
$ git checkout v1.0.2
$ cd cluster/examples/kubernetes/ceph/
cluster/examples/kubernetes/ceph/
目录包含部署rook相关文件
安装Rook Common Objects & Operator
kubectl create -f common.yaml
kubectl create -f operator.yaml
检查rook-ceph-operator,rook-ceph-agent
, rook-discover`安装成功
$ kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-agent-chj4l 1/1 Running 0 84s
rook-ceph-operator-548b56f995-v4mvp 1/1 Running 0 4m5s
rook-discover-vkkvl 1/1 Running 0
kubectl apply -f cluster.yml
检查是否部署成功
kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-agent-chl7b 1/1 Running 0 4h14m
rook-ceph-agent-tbx4l 1/1 Running 2 4h14m
rook-ceph-mgr-a-bf88cfdc4-wvhjb 1/1 Running 0 3h52m
rook-ceph-mon-a-df88f8cbf-dklvj 1/1 Running 0 3h53m
rook-ceph-operator-548b56f995-q47jb 1/1 Running 5 4h14m
rook-ceph-osd-0-655957b867-p8tf7 1/1 Running 0 3h51m
rook-ceph-osd-1-7578dc7874-f9q64 1/1 Running 0 3h50m
rook-ceph-osd-prepare-node-2-nvr7d 0/2 Completed 0 3h45m
rook-ceph-osd-prepare-node-3-2mm2p 0/2 Completed 0 3h45m
rook-discover-44789 1/1 Running 3 4h14m
rook-discover-c4r7d 1/1 Running 0 4h14m
创建StorageClass以便ceph cluster动态创建PV、
kubectl create -f storageclass-test.yaml
# 查找Rook Operator
$ kubectl get pod -n rook-ceph --selector=app=rook-ceph-operator
NAME READY STATUS RESTARTS AGE
rook-ceph-operator-548b56f995-q47jb 1/1 Running 5 4h23m
# 检查ceph cluster健康状态
kubectl exec -n rook-ceph -it rook-ceph-operator-548b56f995-q47jb -- ceph status
cluster:
id: bed40efd-b3e8-4b18-b9a0-e8723f51e747
health: HEALTH_OK
services:
mon: 1 daemons, quorum a (age 4h)
mgr: a(active, since 3h)
osd: 2 osds: 2 up (since 3h), 2 in (since 3h)
data:
pools: 1 pools, 100 pgs
objects: 0 objects, 0 B
usage: 13 GiB used, 22 GiB / 35 GiB avail
pgs: 100 active+clean
# 查看ceph磁盘信息
kubectl exec -n rook-ceph -it rook-ceph-operator-548b56f995-q47jb -- ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 35 GiB 22 GiB 13 GiB 13 GiB 37.40
TOTAL 35 GiB 22 GiB 13 GiB 13 GiB 37.40
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
replicapool 1 0 B 0 0 B 0 19 GiB
在cluster.yaml文件中默认已经启用了ceph dashboard,查看dashboard的service:
[root@node-1 presto-kubernetes]# kubectl get svc -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.1.73.175 <none> 9283/TCP 6h29m
rook-ceph-mgr-dashboard ClusterIP 10.1.228.105 <none> 8443/TCP 6h29m
rook-ceph-mgr-dashboard-external-https NodePort 10.1.38.28 <none> 8443:30102/TCP 21m
rook-ceph-mon-a ClusterIP 10.1.130.153 <none> 6789/TCP,3300/TCP 6h31m
rook-ceph-mgr-dashboard监听的端口是8443,创建nodeport类型的service以便集群外部访问。
kubectl apply -f rook/cluster/examples/kubernetes/ceph/dashboard-external-https.yaml
查看一下nodeport暴露的端口,这里是30102端口:
[root@node-1 presto-kubernetes]# kubectl get svc -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.1.73.175 <none> 9283/TCP 6h29m
rook-ceph-mgr-dashboard ClusterIP 10.1.228.105 <none> 8443/TCP 6h29m
rook-ceph-mgr-dashboard-external-https NodePort 10.1.38.28 <none> 8443:30102/TCP 21m
rook-ceph-mon-a ClusterIP 10.1.130.153 <none> 6789/TCP,3300/TCP 6h31m
获取Dashboard的登陆账号和密码
[root@node-1 ~]# MGR_POD=`kubectl get pod -n rook-ceph | grep mgr | awk '{print $1}'`
[root@node-1 ~]# kubectl -n rook-ceph logs $MGR_POD | grep password
debug 2020-02-25 23:54:04.280 7ffb08d89700 0 log_channel(audit) log [DBG] : from='client.4198 -' entity='client.admin' cmd=[{"username": "admin", "prefix": "dashboard set-login-credentials", "password": "Wo9nf64nDs", "target": ["mgr", ""], "format": "json"}]: dispatch
debug 2020-02-25 23:58:11.177 7f8606e0e700 0 log_channel(audit) log [DBG] : from='client.4373 -' entity='client.admin' cmd=[{"username": "admin", "prefix": "dashboard set-login-credentials", "password": "Wo9nf64nDs", "target": ["mgr", ""], "format": "json"}]: dispatch
找到username和password字段,我这里是admin,Wo9nf64nDs 打开浏览器输入任意一个Node的IP+nodeport端口,这里使用master节点 ip访问:https://localhost:30102/#/dashboard
-
构建镜像presto-server端机presto-cli端的镜像,hdfs-site及core-site配置文件可以构建到镜像中,方便后续配置hive connector。通过脚本
build_image.sh
构建presto-serve端Dockerfile
FROM centos:centos7.5.1804 RUN mkdir -p /etc/hadoop/conf ADD jdk-11.0.6_linux-x64_bin.tar.gz /opt ADD presto-server-332.tar.gz /opt ADD core-site.xml /etc/hadoop/conf ADD hdfs-site.xml /etc/hadoop/conf ADD hudi-presto-bundle-0.5.2-incubating-sources.jar /opt/presto-server-332/plugin/hive-hadoop2 ADD hudi-presto-bundle-0.5.2-incubating.jar /opt/presto-server-332/plugin/hive-hadoop2 ADD original-hudi-presto-bundle-0.5.2-incubating-sources.jar /opt/presto-server-332/plugin/hive-hadoop2 ADD original-hudi-presto-bundle-0.5.2-incubating.jar /opt/presto-server-332/plugin/hive-hadoop2 ENV PRESTO_HOME /opt/presto-server-332 ENV JAVA_HOME /opt/jdk-11.0.6 ENV PATH $JAVA_HOME/bin:$PATH
presto-client端的Dockerfile
FROM openjdk:8-slim ADD presto-cli-332-executable.jar /opt RUN mv /opt/presto-cli-332-executable.jar /opt/presto-cli && chmod +x /opt/presto-cli
-
配置server端相关配置
presto-config-cm.yaml
server端的配置注意两个地方
- 每个presto节点的node.id需要不一样
- jvm参数需要加上
-DHADOOP_USER_NAME=hdfs
及-Dpresto-temporarily-allow-java8=true
确保presto以HDFS用户访问hdfs文件及解决presto安装jdk8报错的问题
apiVersion: v1 kind: ConfigMap metadata: name: presto-config-cm namespace: presto labels: app: presto-coordinator data: bootstrap.sh: |- #!/bin/bash cd /root/bootstrap mkdir -p $PRESTO_HOME/etc/catalog cat ./node.properties > $PRESTO_HOME/etc/node.properties cat ./jvm.config > $PRESTO_HOME/etc/jvm.config cat ./config.properties > $PRESTO_HOME/etc/config.properties cat ./log.properties > $PRESTO_HOME/etc/log.properties echo "" >> $PRESTO_HOME/etc/node.properties echo "node.id=$HOSTNAME" >> $PRESTO_HOME/etc/node.properties sed -i 's/${COORDINATOR_NODE}/'$COORDINATOR_NODE'/g' $PRESTO_HOME/etc/config.properties if ${COORDINATOR_NODE}; then echo coordinator else sed -i '7d' $PRESTO_HOME/etc/config.properties echo worker fi for cfg in ../catalog/*; do cat $cfg > $PRESTO_HOME/etc/catalog/${cfg##*/} done $PRESTO_HOME/bin/launcher run --verbose node.properties: |- node.environment=production node.data-dir=/var/presto/data jvm.config: |- -server -Xmx16G -XX:-UseBiasedLocking -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+ExplicitGCInvokesConcurrent -XX:+ExitOnOutOfMemoryError -XX:+UseGCOverheadLimit -XX:+HeapDumpOnOutOfMemoryError -XX:ReservedCodeCacheSize=512M -Djdk.attach.allowAttachSelf=true -Djdk.nio.maxCachedBufferSize=2000000 -Dpresto-temporarily-allow-java8=true -DHADOOP_USER_NAME=hdfs config.properties: |- coordinator=${COORDINATOR_NODE} node-scheduler.include-coordinator=true http-server.http.port=8080 query.max-memory=10GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://presto:8080 log.properties: |- io.prestosql=DEBUG
-
catalog相关配置
presto-catalog-config-cm.yaml
apiVersion: v1 kind: ConfigMap metadata: name: presto-catalog-config-cm namespace: presto labels: app: presto-coordinator data: hive.properties: |- connector.name=hive-hadoop2 hive.metastore.uri=thrift://10.10.1.128:9083 hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml mysql.properties: |- connector.name=mysql connection-url=jdbc:mysql://10.10.1.203:30306 connection-user=root connection-password=Qloud@dev?123
-
presto-svc、worker、coordinator、presto-cli的部署
deployment.yaml
apiVersion: v1 kind: Service metadata: name: presto namespace: presto spec: ports: - port: 8080 selector: app: presto-coordinator type: NodePort --- apiVersion: apps/v1 kind: Deployment metadata: name: presto-coordinator namespace: presto spec: replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: presto-coordinator template: metadata: labels: app: presto-coordinator spec: containers: - name: presto-coordinator image: reg.chebai.org/presto/presto-server:332 command: ["bash", "-c", "sh /root/bootstrap/bootstrap.sh"] ports: - containerPort: 8080 env: - name: COORDINATOR_NODE value: "true" volumeMounts: - name: presto-config-volume mountPath: /root/bootstrap - name: presto-catalog-config-volume mountPath: /root/catalog - name: presto-data-volume mountPath: /var/presto/data volumes: - name: presto-config-volume configMap: name: presto-config-cm - name: presto-catalog-config-volume configMap: name: presto-catalog-config-cm - name: presto-data-volume emptyDir: {} --- apiVersion: apps/v1 kind: Deployment metadata: name: presto-worker namespace: presto spec: replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: app: presto-worker template: metadata: labels: app: presto-worker spec: containers: - name: presto-worker image: reg.chebai.org/presto/presto-server:332 command: ["bash", "-c", "sh /root/bootstrap/bootstrap.sh"] ports: - containerPort: 8080 env: - name: COORDINATOR_NODE value: "false" volumeMounts: - name: presto-config-volume mountPath: /root/bootstrap - name: presto-catalog-config-volume mountPath: /root/catalog - name: presto-data-volume mountPath: /var/presto/data volumes: - name: presto-config-volume configMap: name: presto-config-cm - name: presto-catalog-config-volume configMap: name: presto-catalog-config-cm - name: presto-data-volume emptyDir: {} --- apiVersion: v1 kind: Pod metadata: name: presto-cli namespace: presto spec: containers: - name: presto-cli image: reg.chebai.org/presto/presto-cli:332 command: ["tail", "-f", "/dev/null"] imagePullPolicy: Always restartPolicy: Always
-
启动presto
kubectl create -f presto-config-cm.yaml kubectl create -f presto-catalog-config-cm.yaml kubectl create -f deployment.yaml
-
使用presto,找出外部地址
[root@node-1 ~]# kubectl get svc -n presto NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE presto NodePort 10.1.27.143 <none> 8080:32151/TCP 27h
-
使用presto-cli客户端连接不同connector
kubectl exec -it presto-cli -n presto /opt/presto-cli -- --server presto:8080 --catalog hive --schema default
由于github开源版本,没有提供数据卷挂载,因此需要相关yaml文件进行修改,配置rook提供RBD服务
rook可以提供以下3类型的存储: Block: Create block storage to be consumed by a pod Object: Create an object store that is accessible inside or outside the Kubernetes cluster Shared File System: Create a file system to be shared across multiple pods 在提供(Provisioning)块存储之前,需要先创建StorageClass和存储池。K8S需要这两类资源,才能和Rook交互,进而分配持久卷(PV)。 在kubernetes集群里,要提供rbd块设备服务,需要有如下步骤:
- 创建rbd-provisioner pod
- 创建rbd对应的storageclass
- 创建pvc,使用rbd对应的storageclass
- 创建pod使用rbd pvc
通过rook创建Ceph Cluster之后,rook自身提供了rbd-provisioner服务,所以我们不需要再部署其provisioner。
# 配置storageclass
kubectl apply -f rook/cluster/examples/kubernetes/ceph/storageclass.yaml
# 检查storageclass
kubectl get storageclass
NAME PROVISIONER AGE
rook-ceph-block ceph.rook.io/block 171m
修改preso-kubernetes文件夹中的worker相关配置
vim deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: presto-worker
spec:
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: presto-worker
template:
metadata:
labels:
app: presto-worker
spec:
initContainers:
- name: wait-coordinator
image: chenlianguu/presto-server:dm-0.208
command: ["bash", "-c", "until curl -sf http://presto-coordinator-service:8080/ui/; do echo 'waiting for coordinator started...'; sleep 2; done;"]
containers:
- name: presto-worker
image: chenlianguu/presto-server:dm-0.208
command: ["bash", "-c", "sh /root/bootstrap/bootstrap.sh"]
ports:
- name: http-coord
containerPort: 8080
protocol: TCP
env:
- name: COORDINATOR_NODE
value: "false"
volumeMounts:
- name: presto-config-volume
mountPath: /root/bootstrap
- name: presto-catalog-config-volume
mountPath: /root/catalog
- name: presto-data-volume
mountPath: /var/presto/data
readinessProbe:
initialDelaySeconds: 10
periodSeconds: 5
exec:
command: ["bash", "-c", "curl -s http://presto-coordinator-service:8080/v1/node | tr ',' '\n' | grep -s $(hostname -i)"]
volumes:
- name: presto-config-volume
configMap:
name: presto-config-cm
- name: presto-catalog-config-volume
configMap:
name: presto-catalog-config-cm
- name: presto-data-volume
persistentVolumeClaim:
claimName: presto-data-claim
重新apply yaml文件
kubectl aplly -f worker-deployment.yaml
检查是否配置成功
[root@node-1 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-1ff7bb97-238e-4ebd-bfc3-fb9db1ae5656 10Gi RWO Delete Bound default/presto-data-claim rook-ceph-block 100m
[root@node-1 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
presto-data-claim Bound pvc-1ff7bb97-238e-4ebd-bfc3-fb9db1ae5656 10Gi RWO rook-ceph-block 100m
注意:这里的pv会自动创建,当提交了包含 StorageClass 字段的 PVC 之后,Kubernetes 就会根据这个 StorageClass 创建出对应的 PV,这是用到的是Dynamic Provisioning机制来动态创建pv,PV 支持 Static 静态请求,和动态创建两种方式。
# ceph集群端检查
[root@node-1 ~]# kubectl exec -n rook-ceph -it rook-ceph-operator-548b56f995-q47jb -- rbd info -p replicapool pvc-1ff7bb97-238e-4ebd-bfc3-fb9db1ae5656
rbd image 'pvc-1ff7bb97-238e-4ebd-bfc3-fb9db1ae5656':
size 10 GiB in 2560 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 1dd4352e100d
block_name_prefix: rbd_data.1dd4352e100d
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Feb 26 05:23:09 2020
access_timestamp: Wed Feb 26 05:23:09 2020
modify_timestamp: Wed Feb 26 05:23:09 2020
登陆pod检查rbd设备
[root@node-1 presto-kubernetes]# kubectl exec -it worker-7b66b5cb5-d2vzt bash
root@worker-7b66b5cb5-d2vzt:/usr/lib/presto-server-0.167-t.0.3/etc# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/docker-253:0-203738-8a1bc94ee31c6ca3e406a6c740b84cd8bd2c681c568dc8a4007f273208bbd9fd 10G 1.4G 8.7G 14% /
tmpfs 64M 0 64M 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/mapper/centos-root 18G 6.3G 12G 36% /etc/hosts
shm 64M 0 64M 0% /dev/shm
/dev/rbd0 10G 33M 10G 1% /var/presto/data
tmpfs 1.9G 12K 1.9G 1% /run/secrets/kubernetes.io/serviceaccount
tmpfs 1.9G 0 1.9G 0% /proc/acpi
tmpfs 1.9G 0 1.9G 0% /proc/scsi
tmpfs 1.9G 0 1.9G 0% /sys/firmware
相关connector采用configmap方式进行灵活配置,不同connector的配置参考官网Doc
# 创建presto-catalog-config-cm.yaml
cat << EOF > ~/presto-kubernetes/presto-catalog-config-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: presto-catalog-config-cm
labels:
app: presto-coordinator
data:
# hive相关配置
hive.properties: |-
connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-ip:9083
# mysql相关配置
mysql.properties: |-
connector.name=mysql
connection-url=jdbc:mysql://example.net:3306
connection-user=root
connection-password=secret
cassandra.properties: |-
connector.name=cassandra
cassandra.contact-points=host1,host2
EOF
配置好之后需要生效,重新apply presto-catalog-config-cm.yaml
并删除coordinator及word相关的pod,删除之后会重新生成新的pod,新的pod会载入新的connector配置
kubectl apply -f presto-catalog-config-cm.yaml
kubectl apply -f coordinator-deployment.yaml
kubectl apply -f worker-deployment.yaml
kubectl get pods
NAME READY STATUS RESTARTS AGE
coordinator-8549f46c58-q8ngm 1/1 Running 0 9m29s
worker-d64744f4d-8j2vg 1/1 Running 0 9m10s
kubectl delete pod coordinator-8549f46c58-q8ngm worker-d64744f4d-8j2vg
首先准备hive环境,这里直接利用docker启动hive环境,简单方便高效快捷,参考github项目docker-hive
git clone https://github.com/big-data-europe/docker-hive.git
cd docker-hive
# 启动docker-hive
docker-compose up -d
# 查看启动状态
Name Command State Ports
--------------------------------------------------------------------------------------------------------------------------------------
docker-hive_datanode_1 /entrypoint.sh /run.sh Up (healthy) 0.0.0.0:50075->50075/tcp
docker-hive_hive-metastore-postgresql_1 /docker-entrypoint.sh postgres Up 5432/tcp
`docker-hive_hive-metastore_1 entrypoint.sh /opt/hive/bi ... Up 10000/tcp, 10002/tcp, 0.0.0.0:9083->9083/tcp`
docker-hive_hive-server_1 entrypoint.sh /bin/sh -c s ... Up 0.0.0.0:10000->10000/tcp, 10002/tcp
docker-hive_namenode_1 /entrypoint.sh /run.sh Up (healthy) 0.0.0.0:50070->50070/tcp
docker-hive_presto-coordinator_1 ./bin/launcher run Up 0.0.0.0:8080->8080/tcp
# 往hive里边加载数据
docker-compose exec hive-server bash
# /opt/hive/bin/beeline -u jdbc:hive2://localhost:10000
> CREATE TABLE pokes (foo INT, bar STRING);
> LOAD DATA LOCAL INPATH '/opt/hive/examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;
hive-metastore的连接端口为9083,确认该端口处于监听状态 presto连接hive进行测试
kubectl exec -it presto-cli -n presto /opt/presto-cli -- --server presto:8080 --catalog hive --schema default
presto:default> show schemas;
Schema
--------------------
default
information_schema
(2 rows)
Query 20200409_121447_00002_3jete, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [2 rows, 35B] [16 rows/s, 290B/s]
presto:default> show tables;
Table
-------
`pokes`
(1 row)
Query 20200409_121459_00003_3jete, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 22B] [9 rows/s, 201B/s]
presto:default> select * from pokes;
Query 20200409_121548_00004_3jete, FAILED, 1 node
Splits: 16 total, 0 done (0.00%)
0:01 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20200409_121548_00004_3jete failed: java.net.UnknownHostException: namenode
可以找到hive中表,但是查询不成功,查询不成功是因为使用docker-compose
方式搭建的hive环境,其中namenode节点和kubernetes中的presto不在一套网络环境中,无法解析namenode地址,有条件可以用hive集群测试,本次测试环境都基于单机。
首先准备mysql环境,这里直接利用docker启动mysql环境
docker run --name mysql -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 mysql:5.6.36
docker exec -it mysql /bin/bash
# 造数据
mysql -uroot -p123456
mysql> create database test;
mysql> use test;
Database changed
mysql> create table user(id int not null, username varchar(32) not null, password varchar(32) not null);
mysql> insert into user values(1,'user1','password1');
mysql> insert into user values(2,'user2','password2');
mysql> insert into user values(3,'user3','password3');
mysql> select * from user;
+----+----------+-----------+
| id | username | password |
+----+----------+-----------+
| 1 | user1 | password1 |
| 2 | user2 | password2 |
| 3 | user3 | password3 |
+----+----------+-----------+
3 rows in set (0.00 sec)
mysql的连接端口为3306,确认该端口处于监听状态 presto连接mysql进行测试
kubectl exec -it presto-cli -n presto /opt/presto-cli -- --server presto:8080 --catalog mysql --schema test
presto:test> show tables;
Table
-------
user
(1 row)
Query 20200409_122956_00004_6mm78, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 18B] [10 rows/s, 187B/s]
presto:test> select * from user;
id | username | password
----+----------+-----------
1 | user1 | password1
2 | user2 | password2
3 | user3 | password3
(3 rows)
Query 20200409_123010_00005_6mm78, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:00 [3 rows, 0B] [10 rows/s, 0B/s]
presto
可以查到mysql中的表及表数据
首先准备mysql环境,这里直接利用docker启动cassandra环境
docker run --name cassandra -p 9042:9042 -d cassandra:3.0
连接数据库
准备数据
CREATE KEYSPACE IF NOT EXISTS pimin_net
WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':1};
USE pimin_net;
CREATE TABLE users (
id int,
user_name varchar,
PRIMARY KEY (id) );
INSERT INTO users (id,user_name) VALUES (1,'china');
INSERT INTO users (id,user_name) VALUES (2,'taiwan');
select * from users;
查询数据
kubectl exec -it presto-cli -n presto /opt/presto-cli -- --server presto:8080 --catalog cassandra --schema default
presto:pimin_net> show tables;
Table
-------
users
(1 row)
Query 20200410_024828_00002_4ekb9, FINISHED, 2 nodes
Splits: 18 total, 18 done (100.00%)
0:00 [1 rows, 24B] [5 rows/s, 133B/s]
presto:pimin_net> select * from users;
id | user_name
----+-----------
1 | china
2 | taiwan
(2 rows)
Query 20200410_024834_00003_4ekb9, FINISHED, 1 node
Splits: 273 total, 273 done (100.00%)
0:02 [2 rows, 2B] [1 rows/s, 1B/s]
presto:pimin_net>
可以查询到cassandra中的数据
kubectl get deployment -n presto
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
presto-coordinator 1 1 1 1 21m
presto-worker 2 2 2 2 21m
kubectl scale deployment presto-worker --replicas=3 -n presto
deployment "presto-worker" scaled
kubectl get deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
presto-coordinator 1 1 1 1 23m
presto-worker 3 3 3 3 23m
-
Docker拉取镜像源time out或者拉取不上,增加docker镜像源,把163,阿里,Azure的docker加速器最好都加上
vim /etc/docker/daemon.json { "registry-mirrors": [ "https://dockerhub.azk8s.cn", "https://b3sst9pc.mirror.aliyuncs.com", "https://hub-mirror.c.163.com" ] } systemctl daemon-reload systemctl restart docker
-
presto查询hive数据出现io.prestosql.spi.PrestoException: Could not obtain block: BP-1548201263错误,
show tables
desc table
正常显示,通过在presto的pod容器内部使用hdfs 命令ls可以查看目录,但是cat hdfs上面的文件报相同的错误,说明无法联通datanode默认的50010端口,使用telnet命令可查看远程服务器是否开放次端口,通过开放端口或者解决防火墙方式解决,能够telnet成功该端口即可解决该问题 参考presto hive connector error reading from HDFS
- 当更新configmap的时候,由于不支持热更新,需要销毁掉presto的work及coordinatoe相关pod,销毁之后新启动的pod会载入最新的相关配置
- prestosql-332版本要求jdk11,但hadoop对jdk11不兼容,需要使用jdk8,并在presto的jvm参数上面加上-Dpresto-temporarily-allow-java8=true