Event Sourcing
The persistence model allowing us to have full traceability, domain meaning and event-driven communications.
Introduction
Event Sourcing is a different persistence approach; instead of saving the last state of an object, in event sourcing, we persist all the domain events that have affected this object in its entire life. This is, actually, not an innovative nor revolutionary way to do it, as banks, for instance, have been doing it from the beginning, conscious or not.
The banking example
When you open your bank webpage and look at one of your accounts, you use to find a table with more or less the following columns:
- Date
- Concept
- Amount (which can be positive or negative)
- Total
The interesting thing here is the last column, the total. Isn't this a calculated column? Isn't this the sum of the different amounts from bottom to top?
So, if you were going to model this problem, you would probably end up doing something like this:
ACCOUNT
---
id
transactions
TRANSACTION
---
id
concept
amount
date
If you analyze a little bit more, you conclude that transactions are things happening to the account; which is almost the definition of a domain event. The other thing you may conclude is that by having all the domain events related to an account, you can get any of the "total" values for any point in time.
This is like to say that by having the domain event stream of an object you have all the different states of this object. You can instantiate an object in any given state just by "sourcing" all the "events" involved in the history of this object. This is event sourcing.
The event store
So event sourcing consists of storing all the domain events related to the different objects of our domain and then using it to get to the different states of these objects as needed in our applications.
So, the first question would be, where should we store the domain events of an object? And, by the way, we should start calling entities or aggregates to these generic "objects".
The event store is the storage system we use to persist these events. It can be a table on a database like MySQL, or a specific product like EventStore. Anyway, it will have most of the following fields:
- An identifier of the domain event, usually a flavor of UUID.
- An identifier of the stream -which uses to be an entity/aggregate id, again usually a UUID.
- A version of the domain event: as code changes from time to time and so do domain events, we store a version so we can use events depending on it. You will find more on upcasting below.
- Data: obviously, the domain event will include some kind of data; in the banking example, the concept and the amount. This uses to be a serialized string, most of the time JSON.
- Date: the meaning of this field should be obvious, but I would add that, having into account we could have millions of domain events, this date should store up to microseconds.
One of the key things to have in mind to store domain events is that the past can't change, and neither do the domain events, so you will not need to use updates or deletes in your storage system; no database locks either. So the event store should only support appending operations, that's it, inserts, and should also be fast to read entries aggregated by... yes, aggregates -streams in domain event language.
Having this into account, a simple MySQL table with an index in aggregate_id and created_at fields should be enough for now -wait a bit to read about vertical and horizontal optimization.
Reconstituting: how to get an entity from the domain event stream
So we have all the domain events stored in our new event store. How should we reconstitute -or rehydrate- an entity from this event stream?
Let's work again with the banking example. For the sake of this we will use the following three domain events:
ACCOUNT_CREATED
---
id: b0ad9dfc-8822-4afc-8313-28d52d7dd1ee
aggregate: f32ca87a-96ce-4819-81a0-ea11a4a180a0
data: { transactions: [] }
...
TRANSACTION_ADDED
---
id: 3b658f8e-fb71-4665-953e-93fa5dea9b78
aggregate: f32ca87a-96ce-4819-81a0-ea11a4a180a0
data: { id: 73805aa9-51cf-481d-b2b3-1fd1084531de }
...
TRANSACTION_ADDED
---
id: 57c8c3d0-51a9-4ad8-93ea-93146ac98cf9
aggregate: f32ca87a-96ce-4819-81a0-ea11a4a180a0
data: { id: e8aef5a5-b981-44f3-8c28-ae3988282a91 }
...
To get to the actual balance of the account, we only need to "apply" the fourth events in the order they were created -chronological order- using the data to create and modify the entity until we reach the end -the last event, which is the current state.
So, let’s begin with the first one: AccountCreated. AccountCreated could be more complex in a real situation, but for this example, applying an AccountCreated event consists of creating an empty Account object, and then setting the id and transactions from the AccountCreated event, which in this case is cb11f55c-6023-11ec-8607-0242ac130002 and an empty array respectively. And we have the following account object:
ACCOUNT
---
id: f32ca87a-96ce-4819-81a0-ea11a4a180a0
transactions: []
Then, we apply the first transaction event, TransactionAdded. To apply this kind of event, we must add just the id of the transaction to the transactions array of the account object. So, now, we have the following account state:
ACCOUNT
---
id: f32ca87a-96ce-4819-81a0-ea11a4a180a0
transactions: [
73805aa9-51cf-481d-b2b3-1fd1084531de
]
We do the same thing with the other TransactionAdded event having the following account state:
ACCOUNT
---
id: f32ca87a-96ce-4819-81a0-ea11a4a180a0
transactions: [
73805aa9-51cf-481d-b2b3-1fd1084531de,
e8aef5a5-b981-44f3-8c28-ae3988282a91
]
And this is the final state of the account.
We have done this process from start to end, but we could have stopped at any point, so we can get any state of the history of the aggregate.
But… hey, how could I get the list of accounts that have a total greater than 100$..., paginated and ordered by amount? Should I get all the accounts from the system, reconstitute them, then filter in the application code, and then paginate, and then…? Eh… no, of course not.
The event store is our write model. Maybe we should talk about CQRS, right?
CQRS
CQRS stands for Command Query Responsibility Segregation. In other words, we want to separate reads from writes. So, every software unit, class, function, module, and even system should return a value or change the environment, but not both things.
Taking this to the extreme, we should have a read model and a written model, which means, a system to store data and a system to read data. In event sourcing, the write model is the event store. But, as the event store is the write model, we should not use it to read except in situations we want to update information. For the rest of the time, we should use the read model. So, what is the read model?
The read model
As we have a system only for reading -obviously we will need to write to update the model itself, but we don’t need this to be optimized because we will probably do it asynchronously-, we can optimize it.
There are many different options to build a read model, but a document database is a usual choice, as we can be more flexible, and we don’t need structured data -because we have the right model for that. MongoDB, for instance, is the option I use.
And what should be stored in the read model?
Exactly the information we will need for our queries, exactly with the shape we will use!
So, for the situation we talked about before -the list of accounts that have a total greater than 100$, paginated and ordered by amount- we could save a document for each account with the following structure:
ACCOUNT
---
id
total
created_at
last_movement_at
So now, we can just query this document database and get what we want without needing to reconstitute any aggregate from the event store.
But you probably are wondering how we just maintain this read model database. How do we add or update items?
It’s easy: listening to domain events. Every time a domain event happens, we will update -or create new items- the read model. So, if we listen to an AccountCreated event, we will add a new document in the read model; if we listen TransactionAdded, we will update the total and the last_movement_at fields. And so on.
And these operations could -and probably should- be asynchronous as long as you push domain events to a queue system like Google PubSub or AWS SQS or RabbitMQ and then you pull it from a daemon. Bear in mind you should manage order and duplication.
But hey, aren’t you being a little bit tricky here? What happens if we update an entity before the read model gets updated? How can we get the current values to update?
When to use the read model and when to use the write model
TL; TR: Use the write model for updating or deleting operations and the read model for all the other things.
So, if you have your read model processing asynchronous, then you cannot trust the read model for any writing operation. The source of truth is the event store. So, when you need to update aggregates in your domain, you need to make checks and data recovery from the write model. But, as you use to update only one aggregate, this operation is cheap.
What model should we use to show information to the user? The read one. Could we present outdated information in some situations if we trust the read model? Yes. We could. But bear in mind this delay, this “eventual consistency”, should be a matter of milliseconds. This is a problem for updating operations in batch, but don’t use to be a problem to show information to the user because, to begin with, user interfaces use to be slower than that.
If you have a situation where you need any part of the read model to be trustful, then you can just update this part of the read model synchronously.
Upcasting
Didn’t you mention something like upcasting? Yep. Upcasting is what we need to do when we change the structure of an aggregate or a domain event. As we apply it on reconstitution based on this structure, if we change it, we need kind of... transformations.
Therefore, we stored versions. When we receive an event with a version lower than the current version, then we transform -upcast- this event to the next version; and we do this until we get to the last version. This way, the aggregate always uses the last version and the current code for reconstitution always works.
But how do we do this upcasting? It depends.
The more usual situation is when we added a new attribute to the aggregate. In these situations, upcasting consists of setting a default value for this attribute. When a field is removed, you can just ignore it cause the applier is not going to get it and nothing will happen. When you change the type of a field, then you need to transform it to the new type. At the end of the day, is a case by case based transformation.
Event Sourcing for microservices architectures
In microservices architectures, communication between services is key. And this kind of communication should be, most of the times, through messaging, which means, sending a message from one microservice to a queue, and being listened to by another one. This is asynchronous communication and is more reliable than synchronous one. So, if we need to send a message, what better message than a domain event? And if we send domain events, what could be better than a pattern that has domain events at its core?
Performance issues on reconstitution
As I have told you before, reconstituting an entity means applying all the domain events related to this entity in order.
What happens when you have a huge number of events related to the same entity? You have a problem with vertical scalability.
In order to fix that, we can use snapshots. A snapshot is a copy of the state of an entity in a moment. So, if you have one million events for the same entity, but you have a snapshot every ten or twenty events, to reconstitute the entity you will need only the last snapshot and the events with a date greater than the snapshot date, which can be nine in the worst case. Problem fixed.
But hey, what happens if you have billions of events, no matter the entity, and the indexes just start to fail? Then, you have a problem with horizontal scalability.
In this situation, you should break the event store table into a number of tables, having a heuristic for the id’s, so when you write or you read you know exactly from what table you need to read. Then, instead of having a huge table, you will have a number of little tables, and indexes will work as always. Fixed again.
Backups?
In every and each system, we need to do some backups. This is not the question. The question is what should be backed. And for event sourcing you have two options: back up everything, or back up only the event store. As the read model is a consequence of the write model, you could recover the latter by just reapplying all the events in order.
The problem with this approach is that this recovery could be slower than a regular backup.
Anyway, this quality of event sourcing could be also used to fix problems. If you made a mistake and now the read model has wrong information, you can just create fixing events, so the read model will be updated accordingly. This fixing strategy also has the advantage of being explicit, as you can see the fix in the event store.
Full traceability of everything
I imagine it is obvious by now that one of the advantages of using event sourcing is that you have the traceability of everything happening in the system. You have the full history.
This is especially valuable in a world where data is so important. Understanding how we got to a point is easy when you have the whole event stream of an aggregate -of every aggregate.
Conclusion
Event sourcing may be hard to understand when you start. But I really think it’s a very natural way to think when you are used to it.
It has also challenges, especially when managing the read model -there are situations where you need very consuming tasks- or managing a huge event store -in this situation you’ll probably enjoy a solution like EventStore.
But it also has very important advantages, like having the full history of the system, being able to update the read model from the event store at any time, the performance won by having dedicated models for reading and writing, and the natural way it integrates with microservices communications usual ways.
The key here, as always, is to know which kind of project's advantages make disadvantages worthwhile!