Replication and Automatic Lag Tracking

Replication (vertical scaling) and microsharding (horizontal scaling) are the two key features that define Ent Framework—without them, the library would lose its core purpose.

In this article, we’ll focus on replication.

“Replication” means you can write data to a single database machine and, after a short (but noticeable) delay, read the same data from one or more replica machines. PostgreSQL’s built-in replication ensures that all data written to the master database eventually appears on every replica.

There are 2 main reasons why replicatiomn has to be used in pretty much every serious service:

Fault tolerance. The service should survive (ideally with no downtime) in case one single database node goes down. So for every database, you must have at least 2 copies on 2 independent hardware nodes.
Scaling reads. In most of the projects, there are way more reads than writes happen. Since we need replicas for fault tolerance anyways, it makes sense to also use them for read queries.

By data propagation method, there are asynchronous and synchronous replication approaches. And by node roles, there are singe-master and multi-master configurations. They all have different trade-offs. In this article, by "replication" we mean the most popular setup: "asynchronous single-master replication".

Terminology

Before we continue, let's agree on a common terminology.

Master and Replica: you commit data to the master node, and it eventually appears on all replica nodes. Transactions are remained atomic: multiple rows committed in one transaction to the master also appear "simultaneously" at replicas.
Replication lag: time passed between the moment the data is committed to the master and the moment when this data can be read from a replica. Each replica has its own replication lag, since they all replay transactions from the master independently.
Read-After-Write Consistency: if you write data in some "context" and then can read it back immediately in the same context, the read-write API is called "read-after-write consistent". Of course, "write to master, read from replica" workflow is not read-after-write consistent (but "write to master, read from master" is). Despite that, Ent Framework's API is read-after-write consistent (we'll discuss it in details below).
Eventual consistency: you write data, and then eventually, after some delay (possibly large), you can read it back. "Write to master, read from replica" is an example of an eventually consistent workflow (which is not read-after-write consistent).
Write-Ahead Log (WAL): when you commit data to the master node, transactional databases (like PostgreSQL) first write it to a special "append-only" file called WAL. Once it's done, they save the rows to the database files. (In practice it's way more complicated, but for simplicity, we can stop on the simple definition.) WAL is also replayed on all replicas, so it's guaranteed that the replicas eventually follow the master bit by bit.
Log Sequence Number (LSN): on master, a position in WAL after some transaction commit; on replica, a position in WAL up to which the replica has already replayed the commits from master. To check that a replica is "good enough" for reading the data previously written to the master, you can compare the replica's current LSN with the master's LSN after the write: if it's greater or equal, then you'll read the data back.

Setting up Replication in PostgreSQL

Ent Framework is just a client library, which means that you need to configure PostgreSQL replication before we continue.

You have 2 options:

Use low-level tools like repmgr or Patroni to connect your master DB with your replica DBs.
Pay more money and use a PaaS solution like AWS RDS for PostgreSQL or AWS RDS Aurora. They have replication set up out of the box.

Replication Lag

OK, you have a master database where you write the data to, and you have replica databases where you read from. It's just that simple, right?

Not so fast.

Consider the following code:

await query(MASTER, "INSERT INTO comments(text) VALUES('Hello')");
return res.redirect("/comments");

And on your /comments page:

const comments = await query(REPLICA, "SELECT * FROM comments");
return res.render("comments.tpl", { comments });

Unfortunately, you won't see the just-added comment on that rendered page, because there is a replication lag issue: the data written to a MASTER DB doesn't appear on the REPLICA DB immediately, there is 10-500 ms latency (and sometimes more, it depends on the database load, network stability etc.).

This issue appears independently on the database engine you use, be it Aurora, RDS or vanilla PostgreSQL replication. The only difference between the engines is the average lag duration, but the lag always exists.

To solve the replication lag issue, there are 2 options:

Read from the master DB. The question is, how do we know, should we read from master or from replica at a particular moment.
Read from replica, but if the data is not yet there, wait a bit and retry. If there is no luck, fallback to master. The main question here is how do we understand that "the data is not there yet".

Addressing replication lag problem improperly can quickly turn your codebase into a boilerplate mess.

Luckily, Ent Framework takes care of this all automatically. In most of the cases, you don't need to think about the replication lag at all: the engine will choose, should it read from master or from replicas, transparently for your code.

Cluster Configuration

First, you need to tell Ent Framework, where can it find the master database and all replicas:

export const cluster = new Cluster({
  islands: async () => [ // sync or async
    {
      no: 0,
      nodes: [
        { name: "pg-001a", host: "pg-001a.your-domain.com", ... },
        { name: "pg-001b", host: "pg-001b.your-domain.com", ... },
        { name: "pg-001c", host: "pg-001c.your-domain.com", ... },
      ],
    },
  ],
  createClient: ({ name, ...config }) => new PgClient({ name, config }),
  ...,
});

Notice that we don't tell it, what endpoint is master and what endpoints are replicas: Ent Framework will detect it automatically.

In fact, master and one of replicas may switch roles in real time (when you do some PostgreSQL maintenance, or when a master node fails, and you promote a replica to be the new master). Ent Framework handles such switches automatically and with no downtime.

AWS RDS Writer and Reader Endpoints

If you use Amazon's RDS or Aurora, it provides you with 2 hostnames:

Writer (master) endpoint. When there is an outage on the master node, RDS automatically promotes one of the replicas to be a new master, and changes the writer endpoint routing to point to the new master.
Reader (random replica) endpoint. If there are multiple replicas in the cluster, RDS routes the connections to a "random" replica (i.e. it's unpredictable, to which one).

From the first glance, it looks like having just 2 endpoints is a pretty useful feature. There are several downsides though:

Writer endpoint switch latency: if there is a master outage, then, even after the new master is promoted in the cluster, the writer endpoint switches to it not immediately: there is some artificial latency,
Reader endpoint routing is unpredictable: often times, one replica can already be "in sync" with the master (relative to the current user; we'll talk about it a bit later), whilst another replica is not yet. The engine like Ent Framework needs to know exactly, which replica does it connect to, to properly track its replication lag and metrics.

So, although you can use writer and reader endpoints in your Cluster instance (especially when you don't need Ent Framework's built-in mechanism for replication lag tracking), it's discouraged. Instead, you'd better tell the engine the exact list of nodes in the cluster, and let it decide the rest.

In Ent Framework, you can even modify the list of nodes in real time, without restarting the Node app. I.e. if you have a periodic timer loop that reads the up-to-date list of cluster nodes and returns it to Ent Framework, it will work straight away and with no downtime. Nodes may appear and disappear from the cluster, and the master may switch roles with replicas: Ent Framework will take care of it all and do the needed transparent retries.

This is why in Cluster configuration, the list of islands (nodes) is returned by a callback. You can tell this callback to return a different list once the cluster layout changes:

export const cluster = new Cluster({
  islands: async () => [ // <-- sync or async callback
    {
      no: 0,
      nodes: [
        {
          name: "abc-instance-1",
          host: "abc-instance-1.abcd.us-west-2.rds.amazonaws.com",
          ...,
        },
        {
          name: "abc-instance-3",
          host: "abc-instance-2.efgh.us-west-2.rds.amazonaws.com",
          ...,
        },
      ],
    },
  ],
  ...,
});

Automatic Replication Lag Tracking

Once you set up the Cluster instance, Ent Framework is able to automatically discover, which exact node is master and what nodes are replicas.

Imagine you run the following series of calls:

await EntComment.insert(vc, { ... });
... // short delay (like 10 ms)
const comments = await EntComment.select(vc, {...}, 100); // <-- master or replica?

The 1st call will be executed against the master node, but will a replica be used for the 2nd call? No, it won't: the 2nd call will also run against the master node. 10 ms is a too short time interval for the replica to receive the update from master. If it was queried from a replica, we would not receive the just-inserted comment in the list of all comments returned by select() call.

Ent Framework knows that in should use the master for reading, because for the VC used, it remembers the LSN (write-ahead log position) after each write. For replicas, it also knows their LSNs, so before sending a query to some replica, Ent Framework compares the master LSN at the time of the last write in this VC with the LSN at the replica.

Timelines, Einstein and Special Relativity

So, Ent Framework provides a "read-after-write consistency" guarantee within the context of the same VC's principal.

The context within which a read-after-write consistency is guaranteed is called a Timeline. Timeline is a special property of VC which remembers, what were LSNs on the master node after each write to each microshard+table. It's like a temporal state of the database related to the operations in a particular VC (basically, by a particular user).

Here is a physics analogy to help you better understand, what a timeline is: frame of reference in special relativity. It is well known that the order of 2 events happened in one frame of reference is not necessarily the same as the order of the same exact events in another frame of reference. E.g. events "light bulb A blinked" and "bulb B blinked" separated by 1 mln miles may happen at the same time in one frame of reference, or "first A then B" in another frame or reference, or "first B then A" in a 3rd frame of reference. The order is strictly defined only in case when the light (the fastest speed of signal propagation possible) is able to travel between A and B (then, it will be "first A then B"). I.e. some information needs to be passed from A to B, and only then we can tell for sure that "B happened after A" and not vice versa.

The same thing applies to timelines in Ent Framework: read-after-write consistency is only guaranteed within the same timeline. Also, one timeline can send a "signal" to another timeline propagating the knowledge about the change (which is called "causality"). After that signal is received, the read-after-write consistency will apply across those timelines.

Propagating Timelines via Session

Consider the following pseudo-code:

app.post("/comments", async (req, res) => {
  await EntComment.insert(req.vc, { ... });
  req.session.timelines = req.vc.serializeTimelines();
  return res.redirect("/comments");
});

app.get("/comments", async (req, res) => {
  req.vc.deserializeTimelines(req.session.timelines);
  const comments = await EntComment.select(req.vc, {...}, 100);
  return res.render("comments.tpl", { comments });
});

The browser sends a POST /comments request, so a new comment is inserted in the database, and the browser is immediately redirected to a GET /comments endpoint. Since we serialize all VC's timelines in the POST endpoint ("1st frame of reference") and then deserialize them in the GET endpoint ("2nd frame of reference"), the second VC receives a "↯-signal" from the first VC, and it establishes a strong read-after-write consistency between them. Thus, the 2nd request will be served by the master node and read the recent data.

Independent Timelines Use Case

Notice that the above way of timelines propagation (via session) only works in the context of a single user (single session), when we're able to send a "↯-signal" from the write event to the read event moments.

Now let's see what happens when we have two independent users, Alice and Bob.

Alice calls POST /comments and adds a comment to the master database.
Immediately "after" that, Bob calls GET /comments to see the list of comments.

Since the timelines of Bob are in another "frame of reference" than Alice's, and we did not send any signal from Alice to Bob, the request will likely be served from a replica node (not from the master), which means that Bob will likely see the old data.

The same way as there is no absolute sequentiality in special relativity, there is also no guarantee regarding read-after-write consistency between different timelines. And it is generally fine: we don't care whether Bob loaded the old or the new data. Even if he is lucky and got the new data, he could instead have had just a little higher network latency, or pressed Reload button a little earlier, so he could have seen the old data even in the case there was no replicas in the cluster at all, and all requests would have been served by the master node only.

Propagating Timelines via a Pub-Sub Engine

There are still cases where we want one user to immediately see the data modified by another user, i.e. establish some cross-user read-after-write consistency.

If we think about it, we realize that it happens only in one use case: when a data modification made by Alice causes other users (Bob, Charlie etc.) to "unfreeze and re-render". I.e. we must already have a transport to propagate that "fanout-unfreeze" signal. So all we need is to just add a payload (with serialized timelines) as a piggy-back to this signal, and then, Bob, Charlie etc. will establish a read-after-write consistency with Alice's prior write.

// Ran by Alice who adds a comment.
app.post("/:topic_id/comments", async (req, res) => {
  const topicID = req.params.topic_id;
  const commentID = await EntComment.insert(req.vc, { topic_id: topicID, ... });
  await pubSub.publish(topicID, {
    commentID,
    timelines: req.vc.serializeTimelines(),
  });
});

// Ran by each user (Bob, Charlie etc.) to receive updates related
// to a particular topic (rough pseudo-code).
wsServer.on("subscribe", (ws, message) => {
  return pubSub.subscribe(message.topicID, async (payload) => {
    ws.vc.deserializeTimelines(payload.timelines);
    const comments = await EntComment.select(
      ws.vc,
      { topic_id: message.topicID, ... }, 
      100,
    );
    ws.send(comments);
  });
});

VC's method deserializeTimelines() merges the received timelines signal into the current VC's timelines. You can call call it as many times as needed, when you receive a pub-sub signal.

What Data is Stored In a Timeline

VC timelines are basically an array of the following structures:

shard: microshard number where a write happened;
table: the name of the table experienced a write in that microshard;
LSN: a write-ahead log position after the above write;
expiration time: a timestamp when the above information stops making sense, so Ent Framework can forget about it (typically, 60 seconds; defined in maxReplicationLagMs option of PgClient).

There is no magic here: to propagate the minimal read-after-write consistency signal, we must know, which table at which microshard experienced a write.

Notice one important thing: since there are no JOINs in Ent Framework, we read data from different microshards and track their timelines independently. That allows to assemble a read-after-write consistent snapshot from multiple microshards when reading.

Forced Master

Besides the automatic replication lag tracking, you can also tell Ent Framework explicitly that you want to use only the master nodes for some particular calls.

const topic = await EntTopic.loadX(
  vc.withTransitiveMasterFreshness(),
  topicID,
);
const comments = await EntComment.select(
  topic.vc,
  { topic_id: topic.id },
  100,
);

Calling to vc.withTransitiveMasterFreshness() derives you a new VC that, when used, will force the utilization of a "special" timeline. All the Ents loaded in this VC will be read from the corresponding island's master node. This mechanism is called "VC freshness"; in our case, vc.freshness equals to MASTER (the default freshness is null meaning "use timelines tracking engine").

Also, in the example above, topic.vc keeps master freshness, so any other Ents loaded with it will be read from master nodes as well. This is why it's called "transitive": once enabled on a VC, the freshness propagates to the further loaded Ents if you use your Ent's vc property. Be careful to not abuse the VC master freshness, otherwise you may introduce bottlenecks in your app's performance.

Forced Replica

You can ask Ent Framework to always use a replica for a particular call:

const topic = await EntTopic.loadX(
  vc.withOneTimeStaleReplica(),
  topicID,
);
const comments = await EntComment.select(
  topic.vc, // <-- it does NOT remember withOneTimeStaleReplica()!
  { topic_id: topic.id },
  100,
);

Notice that the replica freshness is not transitive ("one time"): in the example above, you tell Ent Framework to load EntTopic with it, but topic.vc will have a regular (default) freshness. I.e. comments will be loaded using the regular timelines engine (from a replica if there were no recent writes to EntComment, or from master if there were).

To highlight that you may likely read out-of-date rows from the database (replica is always lagging behind master), the method is named withOneTimeStaleReplica(): notice the word "stale".

If you try to write some Ent using a VC with STALE_REPLICA freshness, then an interesting thing will happen: the write will still go to the master node, but the Ent's timeline won't remember that. This is convenient when you want to have some "insignificant write in background": i.e. you need your write to not affect the further reads going to replicas.

Conclusion

In the examples above, we called serializeTimelines() and deserializeTimelines() methods manually on every endpoint. In real projects though, you most likely don't want to call them explicitly, since it produces too much boilerplate in the code.

Instead, a recommended approach is to embed the above calls into your higher-level framework. E.g. a middleware can call deserializeTimelines() very early in the request processing lifecycle (and for all requests), and another middleware may call serializeTimelines() right before the response is flushed back to the browser. You may also want to store the timelines not in the session, but in some other ephemeral storage, like Redis. Each framework has its own way of processing the requests, so it's up to you, how you want to use those low-level Ent Framework methods.

PreviousEnt API: Configuration and Types NextSharding and Microsharding

Last updated 5 months ago

Was this helpful?