Skip to content

Commit b818464

Browse files
authored
Merge pull request #105216 from msft-tacox/patch-5
Update apache-hive-migrate-workloads.md
2 parents 4f2d1c0 + 05918d1 commit b818464

File tree

1 file changed

+55
-7
lines changed

1 file changed

+55
-7
lines changed

articles/hdinsight/interactive-query/apache-hive-migrate-workloads.md

Lines changed: 55 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,34 +34,82 @@ Create a new copy of your external metastore. If you're using an external metast
3434

3535
If you're using the internal metastore, you can use queries to export object definitions in the Hive metastore, and import them into a new database.
3636

37+
Once this script is complete, it is assumed that the old cluster will no longer be used for accessing any of the tables or databases referred to in the script.
38+
39+
> [!NOTE]
40+
> In the case of ACID tables, a new copy of the data underneath the table will be created.
41+
3742
1. Connect to the HDInsight cluster by using a [Secure Shell (SSH) client](../hdinsight-hadoop-linux-use-ssh-unix.md).
3843

3944
1. Connect to HiveServer2 with your [Beeline client](../hadoop/apache-hadoop-use-hive-beeline.md) from your open SSH session by entering the following command:
4045

4146
```hiveql
42-
for d in `beeline -u "jdbc:hive2://localhost:10001/;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show databases;"`; do echo "create database $d; use $d;" >> alltables.sql; for t in `beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show tables;"` ; do ddl=`beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table $t;"`; echo "$ddl ;" >> alltables.sql ; echo "$ddl" | grep -q "PARTITIONED\s*BY" && echo "MSCK REPAIR TABLE $t ;" >> alltables.sql ; done; done
47+
for d in `beeline -u "jdbc:hive2://localhost:10001/;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show databases;"`;
48+
do
49+
echo "Scanning Database: $d"
50+
echo "create database if not exists $d; use $d;" >> alltables.hql;
51+
for t in `beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show tables;"`;
52+
do
53+
echo "Copying Table: $t"
54+
ddl=`beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table $t;"`;
55+
56+
echo "$ddl;" >> alltables.hql;
57+
lowerddl=$(echo $ddl | awk '{print tolower($0)}')
58+
if [[ $lowerddl == *"'transactional'='true'"* ]]; then
59+
if [[ $lowerddl == *"partitioned by"* ]]; then
60+
# partitioned
61+
raw_cols=$(beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table $t;" | tr '\n' ' ' | grep -io "CREATE TABLE .*" | cut -d"(" -f2- | cut -f1 -d")" | sed 's/`//g');
62+
ptn_cols=$(beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table $t;" | tr '\n' ' ' | grep -io "PARTITIONED BY .*" | cut -f1 -d")" | cut -d"(" -f2- | sed 's/`//g');
63+
final_cols=$(echo "(" $raw_cols "," $ptn_cols ")")
64+
65+
beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "create external table ext_$t $final_cols TBLPROPERTIES ('transactional'='false');";
66+
beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "insert into ext_$t select * from $t;";
67+
staging_ddl=`beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table ext_$t;"`;
68+
dir=$(echo $staging_ddl | grep -io " LOCATION .*" | grep -m1 -o "'.*" | sed "s/'[^-]*//2g" | cut -c2-);
69+
70+
parsed_ptn_cols=$(echo $ptn_cols| sed 's/ [a-z]*,/,/g' | sed '$s/\w*$//g');
71+
echo "create table flattened_$t $final_cols;" >> alltables.hql;
72+
echo "load data inpath '$dir' into table flattened_$t;" >> alltables.hql;
73+
echo "insert into $t partition($parsed_ptn_cols) select * from flattened_$t;" >> alltables.hql;
74+
echo "drop table flattened_$t;" >> alltables.hql;
75+
beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "drop table ext_$t";
76+
else
77+
# not partitioned
78+
beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "create external table ext_$t like $t TBLPROPERTIES ('transactional'='false');";
79+
staging_ddl=`beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "show create table ext_$t;"`;
80+
dir=$(echo $staging_ddl | grep -io " LOCATION .*" | grep -m1 -o "'.*" | sed "s/'[^-]*//2g" | cut -c2-);
81+
82+
beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "insert into ext_$t select * from $t;";
83+
echo "load data inpath '$dir' into table $t;" >> alltables.hql;
84+
beeline -u "jdbc:hive2://localhost:10001/$d;transportMode=http" --showHeader=false --silent=true --outputformat=tsv2 -e "drop table ext_$t";
85+
fi
86+
fi
87+
echo "$ddl" | grep -q "PARTITIONED\s*BY" && echo "MSCK REPAIR TABLE $t;" >> alltables.hql;
88+
done;
89+
done
4390
```
4491
45-
This command generates a file named **alltables.sql**. Because default database can't be deleted/re-created, please remove `create database default;` statement in **alltables.sql**.
92+
This command generates a file named **alltables.hql**.
4693
47-
1. Exit your SSH session. Then enter a scp command to download **alltables.sql** locally.
94+
1. Exit your SSH session. Then enter a scp command to download **alltables.hql** locally.
4895
4996
```bash
50-
scp [email protected]:alltables.sql c:/hdi
97+
scp [email protected]:alltables.hql c:/hdi
5198
```
5299
53-
1. Upload **alltables.sql** to the *new* HDInsight cluster.
100+
1. Upload **alltables.hql** to the *new* HDInsight cluster.
54101
55102
```bash
56-
scp c:/hdi/alltables.sql [email protected]:/home/sshuser/
103+
scp c:/hdi/alltables.hql [email protected]:/home/sshuser/
57104
```
58105
59106
1. Then use SSH to connect to the *new* HDInsight cluster. Run the following code from the SSH session:
60107
61108
```bash
62-
beeline -u "jdbc:hive2://localhost:10001/;transportMode=http" -i alltables.sql
109+
beeline -u "jdbc:hive2://localhost:10001/;transportMode=http" -i alltables.hql
63110
```
64111
112+
65113
## Upgrade metastore
66114
67115
Once the metastore **copy** is complete, run a schema upgrade script in [Script Action](../hdinsight-hadoop-customize-cluster-linux.md) on the existing HDInsight 3.6 cluster to upgrade the new metastore to Hive 3 schema. This allows the database to be attached as HDInsight 4.0 metastore.

0 commit comments

Comments
 (0)