Skip to content

Commit 4c8b0be

Browse files
committed
Merging branch 'scanner'. Following changes:
* `records` methods to iterate over large number of records * Faster test cases * prefix scan
1 parent d83ec02 commit 4c8b0be

19 files changed

+518
-155
lines changed

README.md

Lines changed: 119 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -8,44 +8,52 @@ An ultra-light-weight HBase ORM library that enables:
88

99

1010
## Usage
11-
Let's say you've an HBase table `citizens` with row-key format of `country_code#UID`. Now, let's say your table is created with three column families `main`, `optional` and `tracked`, which may have columns `uid`, `name`, `salary` etc.
11+
Let's say you've an HBase table `citizens` with row-key format of `country_code#UID`. Now, let's say this table is created with three column families `main`, `optional` and `tracked`, which may have columns (qualifiers) `uid`, `name`, `salary` etc.
1212

13-
This library enables to you represent your HBase table as a bean-like class, as below:
13+
This library enables to you represent your HBase table as a *bean-like class*, as below:
1414

1515
```java
16-
@HBTable(name = "citizens", families = {@Family(name = "main"), @Family(name = "optional", versions = 3), @Family(name = "tracked", versions = 10)})
16+
@HBTable(name = "citizens",
17+
families = {
18+
@Family(name = "main"),
19+
@Family(name = "optional", versions = 3),
20+
@Family(name = "tracked", versions = 10)
21+
}
22+
)
1723
public class Citizen implements HBRecord<String> {
1824

1925
@HBRowKey
2026
private String countryCode;
21-
27+
2228
@HBRowKey
2329
private Integer uid;
24-
30+
2531
@HBColumn(family = "main", column = "name")
2632
private String name;
27-
33+
2834
@HBColumn(family = "optional", column = "age")
2935
private Short age;
30-
36+
3137
@HBColumn(family = "optional", column = "salary")
3238
private Integer sal;
3339

3440
@HBColumn(family = "optional", column = "counter")
3541
private Long counter;
36-
42+
3743
@HBColumn(family = "optional", column = "custom_details")
3844
private Map<String, Integer> customDetails;
39-
45+
4046
@HBColumn(family = "optional", column = "dependents")
4147
private Dependents dependents;
42-
48+
4349
@HBColumnMultiVersion(family = "tracked", column = "phone_number")
4450
private NavigableMap<Long, Integer> phoneNumber;
45-
46-
@HBColumn(family = "optional", column = "pincode", codecFlags = {@Flag(name = BestSuitCodec.SERIALIZE_AS_STRING, value = "true")})
51+
52+
@HBColumn(family = "optional", column = "pincode", codecFlags = {
53+
@Flag(name = BestSuitCodec.SERIALIZE_AS_STRING, value = "true")
54+
})
4755
private Integer pincode;
48-
56+
4957
@Override
5058
public String composeRowKey() {
5159
return String.format("%s#%d", countryCode, uid);
@@ -57,22 +65,23 @@ public class Citizen implements HBRecord<String> {
5765
this.countryCode = pieces[0];
5866
this.uid = Integer.parseInt(pieces[1]);
5967
}
60-
68+
6169
// Constructors, getters and setters
6270
}
6371
```
6472
That is,
6573

6674
* The above class `Citizen` represents the HBase table `citizens`, using the `@HBTable` annotation.
6775
* Logics for conversion of HBase row key to member variables of `Citizen` objects and vice-versa are implemented using `parseRowKey` and `composeRowKey` methods respectively.
68-
* The data type representing row key is the type parameter to `HBRecord` generic interface (in above case, `String`). Fields that form row key are annotated with `@HBRowKey`.
76+
* The data type representing row key is the type parameter to `HBRecord` generic interface (in above case, `String`).
77+
* Fields that form row key are annotated with `@HBRowKey` (just a marker annotation).
6978
* Names of columns and their column families are specified using `@HBColumn` or `@HBColumnMultiVersion` annotations.
7079
* The class may contain fields of simple data types (e.g. `String`, `Integer`), generic data types (e.g. `Map`, `List`), custom class (e.g. `Dependents`) or even generics of custom class (e.g. `List<Dependent>`)
7180
* The `@HBColumnMultiVersion` annotation allows you to map multiple versions of column in a `NavigableMap<Long, ?>`. In above example, field `phoneNumber` is mapped to column `phone_number` within the column family `tracked` (which is configured for multiple versions)
7281

7382
See source files [Citizen.java](./src/test/java/com/flipkart/hbaseobjectmapper/testcases/entities/Citizen.java) and [Employee.java](./src/test/java/com/flipkart/hbaseobjectmapper/testcases/entities/Employee.java) for detailed examples. Specifically, [Employee.java](./src/test/java/com/flipkart/hbaseobjectmapper/testcases/entities/Employee.java) demonstrates using "column inheritance" of this library, a useful feature if you have many HBase tables with common set of columns.
7483

75-
Alternatively, you can model the class as below:
84+
Alternatively, you can model your class as below:
7685

7786
```java
7887
...
@@ -82,9 +91,23 @@ public class Citizen implements HBRecord<CitizenKey> {
8291
String countryCode;
8392
Integer uid;
8493
}
94+
95+
@HBRowKey
96+
private CitizenKey rowKey;
97+
98+
...
99+
100+
@Override
101+
public CitizenKey composeRowKey() {
102+
return return rowKey;
103+
}
104+
105+
@Override
106+
public void parseRowKey(CitizenKey rowKey) {
107+
this.rowKey = rowKey;
108+
}
85109
...
86110
}
87-
...
88111
```
89112

90113

@@ -95,38 +118,34 @@ public class Citizen implements HBRecord<CitizenKey> {
95118
* uses [Jackson's JSON serializer](https://en.wikipedia.org/wiki/Jackson_(API)) for all other data types
96119
* serializes `null` as `null`
97120
* To customize serialization/deserialization behavior, you may define your own codec (by implementing the [Codec](./src/main/java/com/flipkart/hbaseobjectmapper/codec/Codec.java) interface) or you may extend the default codec.
98-
* The optional parameter `codecFlags` (supported by both `@HBColumn` and `@HBColumnMultiVersion` annotations) can be used to pass custom flags to the underlying codec. (e.g. You may write your codec to serialize field `Integer id` in `Citizen` class differently from field `Integer id` in `Employee` class)
121+
* The optional parameter `codecFlags` (supported by both `@HBColumn` and `@HBColumnMultiVersion` annotations) can be used to pass custom flags to the underlying codec. (e.g. You may want your codec to serialize field `Integer id` in `Citizen` class differently from field `Integer id` in `Employee` class)
99122
* The default codec class `BestSuitCodec` takes a flag `BestSuitCodec.SERIALIZE_AS_STRING`, whose value is "serializeAsString" (as in the above `Citizen` class example). When this flag is set to `true` on a field, the default codec serializes that field (even numerical fields) as strings.
100123
* Your custom codec may take other such flags to customize serialization/deserialization behavior at a **class field level**.
101124

102125
## Using this library for database access (DAO)
103126
This library provides an abstract class to define your own [data access object](https://en.wikipedia.org/wiki/Data_access_object). For example, you can create one for `Citizen` class in the above example as follows:
104127

105128
```java
106-
import org.apache.hadoop.conf.Configuration;
107-
129+
import org.apache.hadoop.hbase.client.Connection;
108130
import java.io.IOException;
109131

110132
public class CitizenDAO extends AbstractHBDAO<String, Citizen> {
111133
// in above, String is the row type of Citizen
112134

113-
public CitizenDAO(Configuration conf) throws IOException {
114-
super(conf); // if you need to customize your codec, you may use super(conf, codec)
115-
// alternatively, you can construct CitizenDAO by passing instance of 'Connection'
135+
public CitizenDAO(Connection connection) throws IOException {
136+
super(connection); // if you need to customize your codec, you may use super(connection, codec)
137+
// alternatively, you can construct CitizenDAO by passing instance of 'org.apache.hadoop.conf.Configuration'
116138
}
117139
}
118140
```
119141
(see [CitizenDAO.java](./src/test/java/com/flipkart/hbaseobjectmapper/testcases/daos/CitizenDAO.java))
120142

121-
Once defined, you can access, manipulate and persist a row of `citizens` HBase table as below:
143+
Once defined, you can instantiate your *data access object* as below:
122144

123145
```java
124-
org.apache.hadoop.conf.Configuration configuration = getConf();
125-
126-
// Create a data access object:
127-
CitizenDAO citizenDao = new CitizenDAO(configuration);
128-
// alternatively, in above, you can pass HBase client's Connection to your constructor
146+
CitizenDAO citizenDao = new CitizenDAO(connection);
129147
```
148+
You can access, manipulate and persist records of `citizens` table as shown in below examples:
130149

131150
Create new record:
132151

@@ -135,22 +154,80 @@ String rowKey = citizenDao.persist(new Citizen("IND", 1, /* more params */));
135154
// In above, output of 'persist' is a String, because Citizen class implements HBRecord<String>
136155
```
137156

138-
Read data from HBase in various ways:
157+
Fetch a single record by its row key:
139158

140159
```java
141-
// Fetch a row from "citizens" HBase table with row key "IND#1":
160+
// Fetch row from "citizens" HBase table whose row key is "IND#1":
142161
Citizen pe = citizenDao.get("IND#1");
162+
```
163+
164+
Fetch multiple records by their row keys:
143165

166+
```java
144167
Citizen[] ape = citizenDao.get(new String[] {"IND#1", "IND#2"}); //bulk get
168+
```
169+
170+
Fetch records by range of row keys (start row key, end row key):
171+
172+
```java
173+
List<Citizen> lpe1 = citizenDao.get("IND#1", "IND#5");
174+
// above uses default behavior: start key inclusive, end key exclusive, 1 version
175+
176+
List<Citizen> lpe2 = citizenDao.get("IND#1", true, "IND#9", true, 5, 10000);
177+
// above fetches with: start key inclusive, end key inclusive, 5 versions, caching set to 10,000 rows
178+
179+
```
180+
181+
Iterate over *large number of records* by range of row keys:
182+
183+
```java
184+
try (Records<Citizen> citizens = citizenDao.records("IND#000000001", true, "IND#100000000", true, 1, 10000)) {
185+
// using try-with-resources above to close the resources after iteration
186+
for (Citizen citizen : citizens) {
187+
// your code
188+
}
189+
}
190+
```
191+
**Note:** All the `.records(...)` methods efficiently use iterators internally and do not load records upfront into memory. Hence, it's safe to fetch millions of records using them.
145192

146-
// In below, note that "IND#1" is inclusive and "IND#5" is exclusive
147-
List<Citizen> lpe = citizenDao.get("IND#1", "IND#5"); //range get
148-
// ('versioned' variant above method is available)
193+
Fetch records by row key prefix:
149194

150-
// for row keys in range ["IND#1", "IND#5"), fetch 3 versions of field 'phoneNumber' as a NavigableMap<row key, NavigableMap<timestamp, column value>>:
195+
```java
196+
// For small number of records:
197+
List<Citizen> lpe3 = citizenDao.getByPrefix(citizenDao.toBytes("IND#"));
198+
199+
// For large number of records:
200+
try (Records<Citizen> citizens = citizenDao.recordsByPrefix(citizenDao.toBytes("IND#"))) {
201+
for (Citizen citizen : citizens) {
202+
// do something
203+
}
204+
}
205+
```
206+
207+
Fetch records by HBase's native `Scan` object: (for very advanced access patterns)
208+
209+
```java
210+
Scan scan = new Scan().setAttribute(...)
211+
.setReadType(...)
212+
.setACL(...)
213+
.withStartRow(...)
214+
.withStopRow(...)
215+
.readAllVersions(...);
216+
try (Records<Citizen> citizens = citizenDao.records(scan)) {
217+
for (Citizen citizen : citizens) {
218+
// do something
219+
}
220+
}
221+
```
222+
223+
224+
Fetch specific field(s) for given row key(s):
225+
226+
```java
227+
// for row keys in range ["IND#1", "IND#5"), fetch 3 versions of field 'phoneNumber':
151228
NavigableMap<String, NavigableMap<Long, Object>> phoneNumberHistory
152229
= citizenDao.fetchFieldValues("IND#1", "IND#5", "phoneNumber", 3);
153-
// (bulk variants of above range method are also available)
230+
// bulk variants of above range method are also available
154231
```
155232

156233
Read data from HBase using HBase's native `Get`:
@@ -163,7 +240,7 @@ Get get2 = citizenDao.getGet("IND#2").setTimeRange(1, 5).setMaxVersions(2); // A
163240
counterDAO.getOnGets(get2);
164241
```
165242

166-
Manipulate and persist an object to HBase:
243+
Manipulate and persist an object back to HBase:
167244

168245
```java
169246
// change a field:
@@ -296,7 +373,7 @@ CitizenSummary citizenSummary = hbObjectMapper.readValue(
296373
## Advantages
297374
* Your application code will be clean and minimal.
298375
* Your code need not worry about HBase methods or serialization/deserialization at all, thereby helping you maintain clear [separation of concerns](https://en.wikipedia.org/wiki/Separation_of_concerns).
299-
* Classes are **thread-safe**. You can just have to instantiate your DAO classes once at the start of your application and use them anywhere.
376+
* Classes are **thread-safe**. You just have to instantiate your DAO classes once at the start of your application and use them anywhere!
300377
* Light weight: This library depends on just HBase Client and few other small libraries. It has very low overhead and hence is very fast.
301378
* Customizability/Extensibility: Want to use HBase native methods directly in some cases? No problem. Want to customize ser/deser in general or for a given class field? No problem. This library is high flexible.
302379

@@ -313,21 +390,22 @@ Add below entry within the `dependencies` section of your `pom.xml`:
313390
<dependency>
314391
<groupId>com.flipkart</groupId>
315392
<artifactId>hbase-object-mapper</artifactId>
316-
<version>1.12.1</version>
393+
<version>1.13</version>
317394
</dependency>
318395
```
396+
319397
See artifact details: [com.flipkart:hbase-object-mapper on **Maven Central**](https://search.maven.org/search?q=g:com.flipkart%20AND%20a:hbase-object-mapper&core=gav) or
320398
[com.flipkart:hbase-object-mapper on **MVN Repository**](https://mvnrepository.com/artifact/com.flipkart/hbase-object-mapper).
321399
## How to build?
322400
To build this project, follow below simple steps:
323401

324402
1. Do a `git clone` of this repository
325-
2. Checkout latest stable version `git checkout v1.12.1`
403+
2. Checkout latest stable version `git checkout v1.13`
326404
3. Execute `mvn clean install` from shell
327405

328406
### Please note:
329407

330-
* Currently, projects that use this library are running on [Hortonworks Data Platform v3.1](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/index.html) (corresponds to Hadoop 3.1 and HBase 2.0). However, if you are using a different version of Hadoop, you may change the versions in [pom.xml](./pom.xml) to desired ones and build the project.
408+
* Currently, projects that use this library are running on [Hortonworks Data Platform v3.1](https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/index.html) (corresponds to Hadoop 3.1 and HBase 2.0). However, if you are using a different version of Hadoop/HBase, you may change the versions in [pom.xml](./pom.xml) to desired ones and build the project.
331409
* Test cases are **very comprehensive**. So, `mvn` build times can sometimes be longer, depending on your machine configuration.
332410
* By default, test cases spin an [in-memory HBase test cluster](https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java) to run data access related test cases (near-realworld scenario).
333411
* If test cases are failing with time out errors, you may increase the timeout by setting environment variable `INMEMORY_CLUSTER_START_TIMEOUT` (seconds). For example, on Linux you may run the command `export INMEMORY_CLUSTER_START_TIMEOUT=8` on terminal, before running the aforementioned `mvn` command.

pom.xml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
<modelVersion>4.0.0</modelVersion>
1212
<groupId>com.flipkart</groupId>
1313
<artifactId>hbase-object-mapper</artifactId>
14-
<version>1.12.1</version>
14+
<version>1.13</version>
1515
<url>https://flipkart-incubator.github.io/hbase-orm/</url>
1616
<scm>
1717
<url>https://github.com/flipkart-incubator/hbase-orm/</url>
@@ -124,7 +124,6 @@
124124
<configuration>
125125
<source>1.8</source>
126126
<target>1.8</target>
127-
<compilerArgument>-Xlint:all</compilerArgument>
128127
</configuration>
129128
</plugin>
130129
<plugin>
@@ -148,7 +147,7 @@
148147
<plugin>
149148
<groupId>org.apache.maven.plugins</groupId>
150149
<artifactId>maven-source-plugin</artifactId>
151-
<version>3.1.0</version>
150+
<version>3.2.1</version>
152151
<executions>
153152
<execution>
154153
<id>attach-sources</id>
@@ -175,7 +174,7 @@
175174
<plugin>
176175
<groupId>org.jacoco</groupId>
177176
<artifactId>jacoco-maven-plugin</artifactId>
178-
<version>0.8.4</version>
177+
<version>0.8.5</version>
179178
<executions>
180179
<execution>
181180
<goals>

0 commit comments

Comments
 (0)