What is object identity, and why should I care?
Academically, an object is the combination of three fundamental concepts into one atomic unit:
- Behavior is the way in which an object interacts with its environment.
- State is the memory an object retains of prior interactions.
- Identity is what distinguishes one object from another, even when they’re identical under any other measure.
If you’re familiar with an object-oriented language, chances are you could easily identify what features of the language correspond with “behavior” and “state”. In Java, for instance, you model “behavior” using methods, and you model “state” using fields1. But “identity” is a bit of an odd concept in this lineup: it’s a bit less clear where “identity” comes into play. At worst, its existence can seem like a technicality or a useless piece of trivia.
My goal in this article is not only to explain how object identity manifests in Java (and just how deeply it’s embedded in object-oriented thinking), but to show how you can willfully leverage identity to achieve better software designs. With luck, a better understanding of identity will serve you well in functional languages, where – despite not being provided as a language primitive – it plays just as essential a role.
Looking for identity
Rather than conjuring up a prescriptive explanation for object identity and asking you to take it on faith, let’s consider an example program, and ask ourselves where identity plays a role.
class Counter {
// state
private int count = 0;
// behavior
public void increment() { this.count += 1; }
public void getCount() { return this.count; }
}
class Main {
public static void main(String[] args) {
var c1 = new Counter();
var c2 = new Counter();
System.out.println(c1.getCount()); //=> 0
System.out.println(c2.getCount()); //=> 0
c1.increment();
System.out.println(c1.getCount()); //=> 1
System.out.println(c2.getCount()); //=> 0
}
}
In this program, what draws your eye as being related to identity? Certainly, immediately after construction, c1
and c2
have the same state. They also have the same behavior: we can post a query to both counters and get the same answer. And, undoubtedly, if we continued performing the same interactions in parallel with these objects, they would continue to agree in every way2. But we don’t – we next increment one counter, but not the other, and their subsequent answers diverge.
There are two keywords you might be eyeing at this point: this
and new
. It’s via new
that objects get created in the first place, after all, and it’s via this
that an object’s methods access and update its state. And you’d be right to eye these two keywords… but we can dig much deeper.
Removing this
Let’s start by looking at the this
keyword. Yes, this
allows our object’s behaviors to depend on its state, but we can actually do without it. We can transform our program into one that doesn’t use this
– let’s see where we end up. (Remember that we’re not necessarily making this code any better – we’re only doing this to learn what makes objects tick.)
If we can’t get access to count
via the this
keyword, we have to get at it somehow. We’ll pass an instance of Counter
in, instead.
class Counter {
// state
private int count = 0;
// behavior
public void increment(Counter self) { self.count += 1; }
public void getCount(Counter self) { return self.count; }
}
Now there’s no this
at all, but we’ve lost some fidelity in modeling: we have to reference the same object twice in c1.increment(c1)
, and nothing prevents us from mixing and matching objects. Let’s make these methods static, for the time being.
class Counter {
// state
private int count = 0;
// behavior
public static void increment(Counter state) { state.count += 1; }
public static void getCount(Counter state) { return state.count; }
}
That’s a bit better, but also a bit worse: we now invoke Counter.increment(c1)
, but we’ve lost dynamic dispatch. It didn’t matter much in this case, but if somebody had subclassed3 Counter
and overridden its methods, their methods would have been called before, but now our static methods always get called.
I like to think of static
items as being part of an implicit singleton object. Let’s extract that singleton into a real class:
class Counter {
// state
private int count = 0;
// behavior
public static class Trait {
public void increment(Counter state) { state.count += 1; }
public void getCount(Counter state) { return state.count; }
}
}
class Main {
public static void main(String[] args) {
var trait = new Counter.Trait();
var c1 = new Counter();
var c2 = new Counter();
System.out.println(trait.getCount(c1)); //=> 0
System.out.println(trait.getCount(c2)); //=> 0
trait.increment(c1);
System.out.println(trait.getCount(c1)); //=> 1
System.out.println(trait.getCount(c2)); //=> 0
}
}
Now it’s possible to subclass both Counter
and Trait
; our methods are no longer static
, so method calls will be dynamically dispatched to whatever concrete Trait
has been implemented for a Counter
subclass.
Since we can always transform our Java classes to avoid this
, it can’t be essential to object identity. At best, we’ve encoded it using other forms of identity, and at worst it had nothing to do with identity at all. (I could go either way, but I think it’s telling that all we really did was provide the object as an explicit argument.)
Mutation without new
Something rather interesting happened with our cleanups after removing this
: our object’s state and behavior have been separated cleanly into two classes. Since identity is a part of every object, both the Trait
instance and our original Counter
instances must have identity. But we’re reusing our trait instance across both counters; in principle we could reuse a singleton Trait
across all counters. It’s hard to claim, perhaps, that a singleton with no state of its own really needs identity.
This points to one of the roadblocks to understanding identity in Java: everything has it, in exactly the same way. It’s an invisible part of the fabric of every program, whether or not you actually make use of it. Worse, we sometimes have to work against identity – if you’ve ever made defensive copies, this is why.
But more to our point, identity supports mutation. Objects with no state (or immutable state) may not need identity (but we’ll get to that), but identity is a requirement for objects with mutable state. To make the point, let’s rewrite our original4 program to use mutation as absolutely little as possible.
// Records are new in Java 16!
record Counter(int count) {
public Counter() { this(0); }
public Counter increment() {
return new Counter(this.count + 1);
}
public int getCount() {
return this.count;
}
}
class Main {
public static void main(String[] args) {
var c1 = new Counter();
var c2 = new Counter();
System.out.println(c1.getCount()); //=> 0
System.out.println(c2.getCount()); //=> 0
c1 = c1.increment();
System.out.println(c1.getCount()); //=> 1
System.out.println(c2.getCount()); //=> 0
}
}
We were a little too successful, actually. You’ll notice that nowhere is an object itself actually being mutated, and yet we can still make c1.getCount()
return a different value. This tells us something very interesting about identity: it’s not only objects that have it! We can sneak mutation into the system through mutable local variables. Even though our counter objects do have identity, just as with our trait from before, it’s hard to argue that they actually need identity.
Moreover, while traits are a less common style in Java (though they exist; see Comparator), records and other [value-based types][value-based type] are common, and in fact are preferred wherever possible. Far from being one of the essential attributes of objects, identity sometimes seems like a feature nobody wanted!
Nonetheless, local variables are not sufficient for mutation in general. Since they’re local, only the function body can even reference them, much less change them. And sometimes, very occasionally, you do want to be able to check if two variables refer to the same object. Since every object has identity, these capabilities are available in every class, and it becomes very tempting (and all too easy) to leverage identity and mutation when other solutions might be more suitable – which is a shame, because you can do some very nice things with identity when you apply it consciously!
Shared mutability
We can solve the first problem with a new class, Register
. In some sense, we’re just taking the things we can do with local variables – read them and write them – and making a class with those behaviors. We don’t need to be able to check if two Register
s are the same object here, but we get it thanks to object identity anyway. (Again, note that we’re not doing this because it’s necessarily a good idea, but because we’re trying to take different angles on this whole identity thing.)
final class Register<T> {
private T value;
public Register(T initialValue) {
this.value = value;
}
public void set(T newValue) {
this.value = value;
}
public T get() {
return this.value;
}
}
// ...Counter...
class Main {
public static void main(String[] args) {
final var c1 = new Register<>(new Counter());
final var c2 = new Register<>(new Counter());
System.out.println(c1.get().getCount()); //=> 0
System.out.println(c2.get().getCount()); //=> 0
c1.set(c1.get().increment());
System.out.println(c1.get().getCount()); //=> 1
System.out.println(c2.get().getCount()); //=> 0
}
}
The first nice thing about Register
is how clearly it signals that mutation not only can happen, but that it’s expected to happen. In an environment where every object could potentially be mutable, and where complex object graphs mean interacting with one object could invisibly influence your interactions with another object, these kinds of signposts are invaluable to whichever developer is trying to understand what’s going on.
The second nice thing about Register
is that it works for any type. We started with mutability that was specific to Counter
, and in many ways we refactored the whole concept of shared mutability into a separate class altogether. I could wave my hands and mutter something about the SOLID principles here, but the point is that Counter
is now simpler in a very real sense5, and the concept of shared mutability is both centralized and reusable.
At this point, I hope it’s clear that without identity, we couldn’t have shared mutability. We could still have local mutability, but we can always replace a mutable local with a series of immutable locals (even if it can get quite complex), so local mutability is actually a pretty weak capability.
While shared mutability requires identity, we haven’t established that shared mutability is the only thing we can do with object identity. In fact, identity can be useful even in the absence of state and behavior.
Unforgeable identity
In Java, object identity is stronger than local variable identity in two respects. First, only the immediate scope of the variable can access it; there is no way to give a called method access to that variable. Second, we cannot check if two variables are the same variable; it almost makes no sense to ask for such a thing! However, you can do both of these things with objects.
Since multiple methods can access the same object, we have shared mutability. But what can we do with the ability to check if two object references are the same? Is that even useful?
Yes. Yes it is.
First, identity-based equality is essential for writing a type-keyed heterogeneous map.
final class TypeMap {
private final Map<Class<?>, Object> map = new HashMap<>();
public void put(Object value) {
this.map.put(value.getClass(), value);
}
@SuppressWarnings("unchecked")
public <T> T get(Class<T> klass) {
return (T) this.map.get(klass);
}
}
final class Main {
public static void main(String[] args) {
final var map = new TypeMap();
map.put("Hello!");
System.out.println(map.get(String.class));
}
}
It’s not obvious here, but HashMap#get
will eventually invoke equals()
to compare the given key with its stored key, and Class#equals
uses reference equality. We can make things a little clearer by avoiding reflection altogether.
final class TokenMap {
private final Map<Token<?>, Object> map = new HashMap<>();
public <T> void put(Token<T> key, T value) {
this.map.put(key, value);
}
@SuppressWarnings("unchecked")
public <T> T get(Token<T> key) {
return (T) this.map.get(key);
}
public static final class Token<T> {}
}
final class Main {
public static void main(String[] args) {
final var map = new TokenMap();
final var key = new TokenMap.Token<String>();
map.put(key, "Hello!");
System.out.println(map.get(key));
}
}
In this case, we’re no longer restricted to one value of any type. We can quite easily construct two distinct tokens with the same type parameter. Either way, the key type carries type information about any associated value, and we recover that type information by comparing the token to one that we stored before. When we compare the given Token<T>
with the stored Token<?>
and discover that they are equal, we know6 that the wildcard is in fact T
, and therefore the value we stored with that token also has type T
.
Moreover, we can safely pass this map around to other parts of the code with full confidence that if they don’t have the exact key that we used here, there’s no way to access the value associated with that key. The Token
type models an unforgeable token, also – especially in the object-oriented domain – known as an object capability7.
The Token
type here has neither state nor behavior; we’re using it purely for its identity, and with nothing at all to do with mutation of its (nonexistent) contents.
Putting it back together
I’d like to close with an example where we consciously use object identity, supporting both mutation in the presence of shared references and token-based access to data.
In this case, we have a process (the “producer”) that may produce events of different kinds (“topics”), and some processes (the “consumers”) that are interested in specific kinds of events. We won’t futz with actual parallelism, but the producer and consumers are logically concurrent tasks: as we add more events, the consumers should be able to make further progress.
final class Topic<T> {}
final class EventStream {
private final List<Event<?>> events = new ArrayList<>();
public void add(Topic<T> topic, T event) {
Objects.requireNotNull(topic);
this.events.add(new Event<>(topic, event));
}
public Iterator<T> events(Topic<T> topic) {
return new TopicIterator<>(topic);
}
private final class TopicIterator<T> implements Iterator<T> {
private final Topic<T> topic;
private int index = 0;
private TopicIterator(EventStream stream, Topic<T> topic) {
this.topic = topic;
}
public boolean hasNext() {
while (this.index < EventStream.this.events.size()) {
final var value = EventStream.this.events.get(this.index);
if (value.topic == this.topic) return true;
this.index += 1;
}
return false;
}
@SuppressWarnings("unchecked")
public T next() {
if (!hasNext()) throw new NoSuchElementException();
return (T) EventStream.this.events.get(index).value;
}
}
private record Event<T>(Topic<T> topic, T event) {}
}
final class Main {
public static void main(String[] args) {
final var stream = new EventStream();
final var topic1 = new Topic<String>();
final var topic2 = new Topic<Integer>();
stream.put(topic1, "1");
stream.put(topic2, 2);
stream.put(topic1, "3");
for (final var event : stream.events(topic1)) System.out.println(event);
//=> 1, 3
for (final var event : stream.events(topic2)) System.out.println(event);
//=> 2
}
}
Let’s dissect this a bit.
-
An
EventStream
object isn’t directly mutable in its own right, but theArrayList
it holds a reference to certainly is. We can think ofArrayList
much like theRegister
class from before: an encapsulation of some pattern of mutability. TheEventIterator
is also mutable, but this time it’s because it has to be to obey the standardIterator
interface. -
A
Topic
is an unforgeable token granting access to events of a particular type. It carries no runtime state or behavior; it exist only to be distinguished from otherTopic
s. TheEventStream
packages each event with its associatedTopic
, and lets you filter events on that topic8. while we use that sameTopic
to filter the top-level stream for events of interest.We could have achieved a similar system by replacing
Topic
with, say,int
, butint
s can be forged, and they also carry no type information about events. Consumer code would have to explicitly cast for the event type they’re expecting, and there would be nothing preventing the producer from adding events of the wrong type.
Now, what could this be useful for? By its nature, an EventStream
is something you’ll probably be sharing between multiple components of your system. At the same time, you don’t necessarily want every component to be able to couple to any other component9 – it becomes pretty hard to understand the flow of data through your system otherwise.
With this design, you can configure each component with the topics it should use, and it can only access those topics – the EventStream
API makes it impossible to interact with any other topics. There’s no discoverability, and each topic is effectively its own little world; it just happens that, behind the scenes, all topics are conveniently collected and ordered in one place.
It’s not always what you want – but sometimes it’s exactly what you want. It’s really useful to have this kind of modeling tool in your toolbox!
-
There’s some subtlety here, as interacting with a public field can be considered a kind of “behavior” supported by the object. I would argue that it’s the “public” half of “public field” that’s doing the heavy lifting.
When you access a public field, you’re still interacting with the object in a way that depends on its identity and state; the fact that the interaction “looks” different doesn’t change the fact that it’s an interaction.
If this bothers you, you’re not alone: the Uniform Access Principle suggests that all interactions should have a consistent syntax. Different software environments make different attempts on this ideal, either making methods more like fields or fields more like methods. Java developers often disallow field-like interactions altogether, instead providing the same behavior with explicit getters and setters. ↩
-
This idea is called observational equivalence, and is a great source of test cases. If you can define an object in both an “obvious but inefficient” way and a “complex but efficient” way, you can transfer confidence from the obvious way to the complex way by writing property tests showing that they behave equivalently in context.
I especially like to do this for algorithms with both iterative and recursive forms, where the recursive form is often “obviously correct” but could potentially blow the stack. ↩
-
I’m leaving
final
off everywhere to make a point, but in a real codebase, I’d be marking all classes asfinal
by default. Most classes are not designed for subclassing, and in general I want to clearly document the contract between a base class and its subclasses. ↩ -
We can do this with our trait-style approach as well, and there are even some benefits that way, but traits may already be an unfamiliar-enough design mode that I don’t want it to drown out the rest of the discussion. Maybe for another blog post. ↩
-
This isn’t the time or the place, but one of my biggest pet peeves is when something appears simple but isn’t. Deceptive simplicity lulls you into a false sense of confidence, until something unexpected happens and you’re frantically looking for the faulty assumption. I want my mental model to be as accurate as possible to the reality of the code; any deviations are, in my opinion, simply bugs waiting to happen.
The original
Counter
is quite compact; before records were added in Java 17, the immutable version would have required addingfinal
in order to remove mutation. Similarly forstatic
inner classes,final
classes, … ↩ -
Assuming nobody casts it to a different type parameter. But that’s like casting a
List<Integer>
to aList<String>
– why would you do that? You’re just going to shoot yourself in the foot! (Ultimately, via aClassCastException
.)There are some useful reasons to perform this kind of casting.
Optional.empty()
is a good example; internally it casts a statically allocated empty instance to whichever type was requested, since its behavior is the same no matter what type it’s instantiated with. But this trick works precisely because Optional is a value-based type: it pretends it has no true identity of its own, so it’s not suitable for use as a token. ↩ -
The question of whether capabilities must support equality comparison is rather interesting, The website for the E language has a good example of where equality is important. ↩
-
In some functional languages, it’s not even necessary to perform a cast to assert that the unknown type variable is equal to some known type. GADTs are one way to do this, though I’m not sure they’d be sufficient for what we’re doing with topics and events. I think Agda, as a dependently-typed language, may be expressive enough to capture such principles.
In any case, Java isn’t expressive enough to do this for us. As long as we’re confident in the principle of inference at hand, and tightly scope it so it’s reasonably verifiable by manual inspection, we can leverage the same idea. It’s not unlike using an
unsafe
block in Rust. ↩ -
Unless you have a microservices architecture.
You probably don’t need a microservices architecture. If you need microservices, you’ll know. ↩