Skip to content

Tutorial 5. Identity Management

Osman Shoukry edited this page Dec 16, 2015 · 12 revisions

What is Identity Management?

In Java a class' identity is defined by 3 methods

  1. equals
  2. hashCode
  3. toString

These three methods are critical to implement in every class, but they are very hard to get right and even harder to maintain post the first implementation.

Did you know that if you override one of them, you MUST override all the remaining?

Every class in Java inherits the basic behavior for those three methods from the Object class, but the implementation is very basic, here is what you inherit and why you must override

Default behavior

equals

The Object class defines equals as pointer equality, so it will only be true if x == y.
For example

  Object object1 = new Object();
  Object object2 = object1;
  System.out.println(object1.equals(object2)); // returns true;
  object2 = new Object();
  System.out.println(object1.equals(object2)); // returns false;  

Note: If you use any kind of caching technology, or plan to ever place your class in a Set you will have to ensure your equals does more than just pointer equality.

hashCode

The Object class defines hashCode as the memory address of the object.
For example

  Object object = new Object();
  // The output will be the same from both print outs.
  System.out.println("hashCode: " + object.hashCode);
  System.out.println("hashCode: " + System.identityHashCode(object));

Output:

hashCode: 1837543557
hashCode: 1837543557

toString

The Object class defines toString as the following format "classname" + "@" + hex encoded output from hashCode call.
For example

  Object object = new Object();
  System.out.println("toString: " + object);
  System.out.println("toString: " + object.getClass().getName()
             + "@" + Integer.toHexString(object.hashCode()));

Output:

toString: java.lang.Object@6d86b085
toString: java.lang.Object@6d86b085

Why am I talking about all of this?

Because this stuff matters, a great deal in fact, and if you plan to use your data structures in caching or persistence, it will matter even more. I recall working with a team once, many years ago, that was loading rows from the DB with customer contact information to send emails. The customer contact card relied on older set of fields for equality which was not unique in the DB and was no longer accurate. The application kept loading the same contact information for each customer since hibernate kept "finding" that the loaded item was "equals" to a cached item, so it would abandon processing the loaded item and use what was in the cache.

The flip side is if you don't manage identity properly your cache usage will be ineffective at best (every object is unique) or straight out corrupted at worst (looking for A, returns B).

Issues with equals, hashCode and toString

1. The rules you must follow
  1. equals and hashCode must be managed together, because two equal objects must return the same hashCode.
  2. equals must be: 1. Reflexive: x.equals(x) must return true 2. Symmetric: If x.equals(y) is true, then y.equals(x) must be true, and vise versa. 3. Transitive: If x.equals(y) is true, and y.equals(z) is true, then x.equals(z) must be true. 4. Consistent: Without modifying x or y in between, multiple calls to x.equals(y) must return the same value. 5. For any non-null reference x, x.equals(null) should return false.
  3. hashCode should be 1. Consistent: without modifying an object, multiple calls to hashCode must return the same value. 2. Relatable to equality: equal objects must return the same hashCode, unequal objects may or may not return the same hashCode.
  4. toString is mostly used for debugging, sometimes to display to the user and so it should 1. Return a string that is human readable and understandable.
  5. Every time you modify the data in a class - add / remove fields, maintenance must be performed on equals, hashCode and toString to maintain.
2. The issues you must pay attention to

Every model object in your system must define what makes it unique. This is important so you don't end up with multiple copies of the same object that you can't reconcile because their identity is unmanaged.
For example, in your database, the table Employee will have a social security #, or some other unique business constraint key that identifies that person so you the down stream business processes, like payroll are able to distinguish that a payment has been made, or not even if the process is interrupted half way through and restarted.

Additionally, your O.R. (object relational mapping tool) needs to cache stuff that is read from the database to speed up retrieval process. This caching is impossible if the OR mapping can't answer the simple question "Have I seen you before?" which requires a quick hash lookup - using the value from the hashCode and a call to equals if a hash yields a hit to ensure this is not just a hash collision.

Also, all O.R. vendors will discourage usage of Database primary key for equality as it is possible that a call to equals occurs before persistence is done and should return true / false base on the actual data.

Lastly, the maintenance of equality, hashCode and toString while changing data fields in the class can be quite a chore and is often missed. I have experienced this first hand!!

The OpenPojo solution to the problem

Whenever you create a class annotate your fields on how you want the equality and hashCode to treat them, and let the toString always print all data on all the fields.

A Full working example

public final class Person implements Serializable {
  private static final long serialVersionUID = 1L;

  // Default constructor for persistence service loading.
  public Person() {}

  // Minimal business constructor.
  public Person(String socialSecurityNumber, String firstName, String lastName) {
    this.socialSecurityNumber = socialSecurityNumber;
    this.firstName = firstName;
    this.lastName = lastName;
  }

  // Full Constructor.
  public Person(String id, String socialSecurityNumber, String firstName,
           String lastName, Timestamp created, Timestamp lastUpdated) {
    this(socialSecurityNumber, firstName, lastName);
    this.id = id;
    this.created = created;
    this.lastUpdated = lastUpdated;
  }

  private String id;
  public String getId() {
    return id;
  }

  public void setId(final String id) {
    this.id = id;
  }

  @BusinessKey(caseSensitive = false, required = true)
  private String firstName;

  @BusinessKey
  private String socialSecurityNumber;

  @BusinessKey(caseSensitive = false, required = false)
  private String lastName;

  private Timestamp created;
  private Timestamp lastUpdated;

  // [... getters / setters ...]

  @Override
  public int hashCode() {
    return BusinessIdentity.getHashCode(this);
  }

  @Override
  public boolean equals(final Object obj) {
    return BusinessIdentity.areEqual(this, obj);
  }

  @Override
  public String toString() {
    return BusinessIdentity.toString(this);
  }
}

An Example is not complete without the tests to validate it

public class PersonTest {
  @Test
  public void validatePerson() {
    PojoClass personPojo = PojoClassFactory.getPojoClass(Person.class);
    PojoValidator pojoValidator = new PojoValidator();

    pojoValidator.addRule(new GetterMustExistRule());
    pojoValidator.addRule(new SetterMustExistRule());
    pojoValidator.addRule(new BusinessKeyMustExistRule());

    pojoValidator.addTester(new SetterTester());
    pojoValidator.addTester(new GetterTester());
    pojoValidator.addTester(new BusinessIdentityTester());

    pojoValidator.runValidation(personPojo);
  }

The above code will help you ensure that you no longer have to worry about the equals, hashCode, or the toString for that matter. All you'll have to do is manage your fields. Ensure that they are annotated properly.

Additionally, if you try to compare equality on non-completely constructed Person objects, for example doing this:

  Person person1 = new Person();
  Person person2 = new Person();
  person1.equals(person2);  // An exception is thrown.

This exception will tell you that you are trying to compare two incomplete business objects to each other.

This will help you catch scenarios where you end up persisting blank records in your DB, or putting those blank records in Sets, HashMap or in a cache. And that is a good thing!

Further Readings