Skip to content

Commit 6f716bd

Browse files
authored
Docs: Clean up README and add polaris examples (#34)
1 parent 7b7f280 commit 6f716bd

File tree

1 file changed

+75
-48
lines changed

1 file changed

+75
-48
lines changed

iceberg-catalog-migrator/README.md

Lines changed: 75 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Introduce a command-line tool that enables bulk migration of Iceberg tables from
2222

2323
There are various reasons why users may want to move their Iceberg tables to a different catalog. For instance,
2424
* They were using hadoop catalog and later realized that it is not production recommended. So, they want to move tables to other production ready catalogs.
25-
* They just heard about the awesome Arctic catalog (or Nessie) and want to move their existing iceberg tables to Dremio Arctic.
25+
* They just heard about the awesome Apache Polaris catalog and want to move their existing iceberg tables to Apache Polaris catalog.
2626
* They had an on-premise Hive catalog, but want to move tables to a cloud-based catalog as part of their cloud migration strategy.
2727

2828
The CLI tool should support two commands
@@ -45,7 +45,7 @@ Need to have Java installed in your machine (Java 21 is recommended and the mini
4545

4646
Below is the CLI syntax:
4747
```
48-
$ java -jar iceberg-catalog-migrator-cli-0.3.0.jar -h
48+
$ java -jar iceberg-catalog-migrator-cli-0.0.1.jar -h
4949
Usage: iceberg-catalog-migrator [-hV] [COMMAND]
5050
-h, --help Show this help message and exit.
5151
-V, --version Print version information and exit.
@@ -56,7 +56,7 @@ Commands:
5656
```
5757

5858
```
59-
$ java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate -h
59+
$ java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate -h
6060
Usage: iceberg-catalog-migrator migrate [-hV] [--disable-safety-prompts] [--dry-run] [--stacktrace] [--output-dir=<outputDirPath>]
6161
(--source-catalog-type=<type> --source-catalog-properties=<String=String>[,<String=String>...]
6262
[--source-catalog-properties=<String=String>[,<String=String>...]]...
@@ -130,83 +130,110 @@ Identifier options:
130130
Note: Options for register command is exactly same as migrate command.
131131

132132
# Sample Inputs
133-
## Bulk registering all the tables from Hadoop catalog to Nessie catalog (main branch)
133+
134+
Note:
135+
a) Before migrating tables to Apache polaris, Make sure the catalog instance is configured to the `base-location`
136+
same as source catalog `warehouse` location during catalog creation.
137+
138+
```
139+
{
140+
"catalog": {
141+
"name": "test",
142+
"type": "INTERNAL",
143+
"readOnly": false,
144+
"properties": {
145+
"default-base-location": "file:/path/to/source_catalog"
146+
},
147+
"storageConfigInfo": {
148+
"storageType": "FILE",
149+
"allowedLocations": [
150+
"file:/path/to/source_catalog"
151+
]
152+
}
153+
}
154+
}
155+
```
156+
157+
b) Get the Oauth token and export it to the local variable
158+
134159
```shell
135-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar register \
136-
--source-catalog-type HADOOP \
137-
--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
138-
--target-catalog-type NESSIE \
139-
--target-catalog-properties uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse
160+
curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
161+
-d "grant_type=client_credentials" \
162+
-d "client_id=my-client-id" \
163+
-d "client_secret=my-client-secret" \
164+
-d "scope=PRINCIPAL_ROLE:ALL"
165+
166+
export TOKEN=xxxxxxx
140167
```
141168

142-
## Register all the tables from Hadoop catalog to Arctic catalog (main branch)
169+
c) Also export the required storage related configs and use them respectively for catalog configuration.
170+
For s3,
143171

144172
```shell
145-
export PAT=xxxxxxx
146173
export AWS_ACCESS_KEY_ID=xxxxxxx
147174
export AWS_SECRET_ACCESS_KEY=xxxxxxx
148175
export AWS_S3_ENDPOINT=xxxxxxx
149176
```
150177

178+
for ADLS,
151179
```shell
152-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar register \
153-
--source-catalog-type HADOOP \
154-
--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
155-
--target-catalog-type NESSIE \
156-
--target-catalog-properties uri=https://nessie.dremio.cloud/v1/repositories/8158e68a-5046-42c6-a7e4-c920d9ae2475,ref=main,warehouse=/tmp/warehouse,authentication.type=BEARER,authentication.token=$PAT
180+
export AZURE_SAS_TOKEN=<token>
157181
```
158182

159-
## Migrate selected tables (t1,t2 in namespace foo) from Arctic catalog (main branch) to Hadoop catalog.
160-
183+
## Bulk registering all the tables from Hadoop catalog to Polaris catalog
161184
```shell
162-
export PAT=xxxxxxx
163-
export AWS_ACCESS_KEY_ID=xxxxxxx
164-
export AWS_SECRET_ACCESS_KEY=xxxxxxx
165-
export AWS_S3_ENDPOINT=xxxxxxx
185+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar register \
186+
--source-catalog-type HADOOP \
187+
--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
188+
--target-catalog-type REST \
189+
--target-catalog-properties uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
166190
```
167191

192+
## Migrate selected tables (t1,t2 in namespace foo) from Hadoop catalog to Polaris catalog
193+
168194
```shell
169-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
170-
--source-catalog-type NESSIE \
171-
--source-catalog-properties uri=https://nessie.dremio.cloud/v1/repositories/8158e68a-5046-42c6-a7e4-c920d9ae2475,ref=main,warehouse=/tmp/warehouse,authentication.type=BEARER,authentication.token=$PAT \
172-
--target-catalog-type HADOOP \
195+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
196+
--source-catalog-type HADOOP \
197+
--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
198+
--target-catalog-type REST \
199+
--target-catalog-properties uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN \
173200
--identifiers foo.t1,foo.t2
174201
```
175202

176-
## Migrate all tables from GLUE catalog to Arctic catalog (main branch)
203+
## Migrate all tables from GLUE catalog to Polaris catalog
177204
```shell
178-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
205+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
179206
--source-catalog-type GLUE \
180207
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
181-
--target-catalog-type NESSIE \
182-
--target-catalog-properties uri=https://nessie.dremio.cloud/v1/repositories/612a4560-1178-493f-9c14-ab6b33dc31c5,ref=main,warehouse=s3a://some-other-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,authentication.type=BEARER,authentication.token=$PAT
208+
--target-catalog-type REST \
209+
--target-catalog-properties uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
183210
```
184211

185-
## Migrate all tables from HIVE catalog to Arctic catalog (main branch)
212+
## Migrate all tables from HIVE catalog to Polaris catalog
186213
```shell
187-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
214+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
188215
--source-catalog-type HIVE \
189216
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083 \
190-
--target-catalog-type NESSIE \
191-
--target-catalog-properties uri=https://nessie.dremio.cloud/v1/repositories/612a4560-1178-493f-9c14-ab6b33dc31c5,ref=main,warehouse=s3a://some-other-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,authentication.type=BEARER,authentication.token=$PAT
217+
--target-catalog-type REST \
218+
--target-catalog-properties uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
192219
```
193220

194-
## Migrate all tables from DYNAMODB catalog to Arctic catalog (main branch)
221+
## Migrate all tables from DYNAMODB catalog to Polaris catalog
195222
```shell
196-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
223+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
197224
--source-catalog-type DYNAMODB \
198225
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
199-
--target-catalog-type NESSIE \
200-
--target-catalog-properties uri=https://nessie.dremio.cloud/v1/repositories/612a4560-1178-493f-9c14-ab6b33dc31c5,ref=main,warehouse=s3a://some-other-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,authentication.type=BEARER,authentication.token=$PAT
226+
--target-catalog-type REST \
227+
--target-catalog-properties uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
201228
```
202229

203-
## Migrate all tables from JDBC catalog to Arctic catalog (main branch)
230+
## Migrate all tables from JDBC catalog to Polaris catalog
204231
```shell
205-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
232+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
206233
--source-catalog-type JDBC \
207234
--source-catalog-properties warehouse=/tmp/warehouseJdbc,jdbc.user=root,jdbc.password=pass,uri=jdbc:mysql://localhost:3306/db1,name=catalogName \
208-
--target-catalog-type NESSIE \
209-
--target-catalog-properties uri=https://nessie.dremio.cloud/v1/repositories/612a4560-1178-493f-9c14-ab6b33dc31c5,ref=main,warehouse=/tmp/nessiewarehouse,authentication.type=BEARER,authentication.token=$PAT
235+
--target-catalog-type REST \
236+
--target-catalog-properties uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
210237
```
211238

212239
# Scenarios
@@ -219,7 +246,7 @@ Users can use a new catalog by creating a fresh table to test the new catalog's
219246

220247
Sample input:
221248
```shell
222-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
249+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
223250
--source-catalog-type HIVE \
224251
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083 \
225252
--target-catalog-type NESSIE \
@@ -235,7 +262,7 @@ The list of table identifiers in `dry_run.txt` can be altered (if needed) and re
235262

236263
Sample input:
237264
```shell
238-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
265+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
239266
--source-catalog-type HIVE \
240267
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083 \
241268
--target-catalog-type NESSIE \
@@ -287,7 +314,7 @@ and also log any table level failures, if present.
287314

288315
Sample input:
289316
```shell
290-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
317+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
291318
--source-catalog-type HIVE \
292319
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083 \
293320
--target-catalog-type NESSIE \
@@ -331,7 +358,7 @@ Users can provide the selective list of identifiers to migrate using any of thes
331358

332359
Sample input: (only migrate tables that starts with "foo.")
333360
```shell
334-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
361+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
335362
--source-catalog-type HIVE \
336363
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083 \
337364
--target-catalog-type NESSIE \
@@ -342,7 +369,7 @@ java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
342369

343370
Sample input: (migrate all tables in the file ids.txt where each entry is delimited by newline)
344371
```shell
345-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
372+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
346373
--source-catalog-type HIVE \
347374
--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
348375
--target-catalog-type NESSIE \
@@ -352,7 +379,7 @@ java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
352379

353380
Sample input: (migrate only two tables foo.tbl1, foo.tbl2)
354381
```shell
355-
java -jar iceberg-catalog-migrator-cli-0.3.0.jar migrate \
382+
java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
356383
--source-catalog-type HIVE \
357384
--source-catalog-properties warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083 \
358385
--target-catalog-type NESSIE \

0 commit comments

Comments
 (0)