CQRS: separating commands from queries

CQRS stands for Command Query Responsibility Segregation and is an evolution of the CQS, Command Query Separation from Bertrand Meyer.

Related tags

Introduction

CQRS stands for Command Query Responsibility Segregation and is an evolution of the CQS, Command Query Separation from Bertrand Meyer. CQRS seems to be stated by Greg Young.

CQS

The meaning of CQS is basically that a method should be either a command or a query; in other words, it should change or return something, but not both things. Asking a question should not change the answer.

Let's see an example, in PHP:

public function getCounter(): int
{
    $this->counter++;
    return $this->counter;
}

This method increases the counter and then returns the new value. But, we just want to know the value, without changing it. This method is both changing something and returning a value, so it violates CQS.

How should we proceed?

public function getCounter(): int
{
    return $this->counter;
}

public function increaseCounter(): void
{
    $this->counter++;
}

Now, we can call each method when needed, without side effects on the getter.

Let's look at another example:

public function createUser(array $userData): int {
    $id = $this->userRepository->create($userData);

    return $id;
}

This one is very common when identifiers are generated in database engines, like using an autoincrement in a database server. In this situation, we should use previously generated IDs. The most common ones are UUID. So we could replace the method like this:

public function createUser(UUID $id, array $userData): int {
    $this->userRepository->create($id, $userData);
}

Now, our method only changes something -adding a new user- but doesn't answer anything.

CQRS

So, where is the evolutionary part of CQRS? CQRS goes beyond and aims to separate the read model from the write model. That means we use a different model to create/update information than for reading it.

The reason to do this is that many times there are different needs for reading than for writing, so having a different model makes it easier to reason about the system. Also, because of scalability reasons: having a different model, even a different storage system, make it easier to optimize both models properly; you can even have different hardware power for each model.

Now I have to stop here and remind you that CQRS does not necessarily imply Event Sourcing. You can build a written model without basing the persistence on storing domain events of aggregates to reconstitute them later. You can save the current state in relational tables in boring database storage systems like MySQL o MariaDB o PostgreSQL and still do CQRS.

So, how should we optimize the write model?

The write model

When we think about the write model the first we should be conscious of is that it needs to be reliable. This is where we are going to do domain validations, transactionality, and any other measures we need to apply to ensure the domain will remain valid and consistent.

Here we have the command concept. A command is an action or a group of actions applied to the write model; it needs to be transactional, so if more than one action is applied, then all the actions should succeed or fail. And it needs to be valid, so it does not break any domain rule.

Usually, these commands are executed through a command bus, so we have a pair command/command handler. Let's see an example in PHP:

class CreateUserCommand {
    private UUID $id;
    private string $email;
    private string $password;

    public function __construct(UUID $id, string $email, string $password) {
        // Validation
        if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
            throw new \InvalidArgumentException("Email $email is not valid");
        }
        if(!strlen($password) >= 8) {
            throw new \InvalidArgumentException("Password should be longer or equal than 8 characters.");
        }

        $this->id = $id;
        $this->email = $email;
        $this->password = $password;
    }
}

class CreateUserCommandHandler {
    private UserRepository $userRepository;

    public function __construct(UserRepository $userRepository) {
        $this->userRepository = $userRepository;
    }

    public function handle(CreateUserCommand $command): void
    {
        // More validation
        if($this->userRepository->has($command->id()) {
            throw new \InvalidArgumentException("A user with the provided id already exists.");
        }

        $this->userRepository->save($command->id(), $command->email, $command->password);
    }
}

In this situation, the command would be run like this:

$this->commandBus->handle($command);

The Command Bus could know what handler to execute using a naming or whatever strategy and would open a transaction before and commit it to success after the handling. In case of an error, it would roll back. So we have transactionality because of the command bus and validation because of the command and the handler validations.

You may notice I prescinded the domain part. This is because this is an example, but we should have an aggregate User with all the needed validations inside (in addition to the application command part, both layers should be consistent).

Now, where is this data stored? Well, it depends on your persistent decisions. If you were practicing event sourcing, you should probably use a solution like EventStore; if you just store the last state of the aggregates relationally, a storage system like MySQL should work for you.

The read model

Now, let's think about the read model. The read model doesn't need all of these validations and transactionality of the write one, because we have done all of this when writing, so we only need to copy the information to the read model.

To copy the information, there are several approaches; if you are under event sourcing, you probably should listen to the events, synchronously or asynchronously, and then update the read model. If not, you could do it in the command handler, using synchronous listeners, or even sending messages to queues to do this asynchronously; the latter is not common because to do that you could be doing event sourcing.

And what should be the structure of the information? If you want to optimize... then the one you need to read is exactly that. So you just need to query and show, without joins or transformations in the middle. This uses to be done through a document-based database system, so...

Where should I store the information? If you use a document-based storage system, MongoDB or Elastic could be a very good option. You still can use a relational database such as MySQL or PostgreSQL and use the JSON field types.

Advantages of using CQRS

You can configure different hardware and scalability in general for each model, so you can put more power into writing or on reading depending on the nature of the project.
Having concerns separated means each model is easier than merging both. Easy to reason, easy to maintain.
Security: only domain entities are changing the source of truth -the write model.

Disadvantages of using CQRS

If you use two different database storage systems, you need both knowledge and resources.
It adds complexity in terms of modeling, as you need to think about how to optimize two different layers.
Syncing both models can be tough, especially if you go async when it comes to the scene the eventual consistency.

Conclusion

Applying CQRS is not an easy thing. Any good practice is. But advantages tend to overcome disadvantages.

CQRS: separating commands from queries

Category