N+1 Selects Solution
To reveal some magic, could you please make a small favor?
Stop thinking in terms of lists when loading. Always think in terms of an individual row/object and an individual ID. Not in terms of an array of IDs:
It sounds contradictory. In the example above, if we always use loadComment(id)
, how do we avoid sending too many queries to the database, especially when it comes to loading children records for each loaded parent? (This problem is well known as "N+1 Selects".)
The answer is: let the DB access engine take care of batching.
Traditional List Based Approach
Imagine we have some list of comment IDs shown on the screen. For each comment, we want to load its creator, the owning topic, and for each topic, load its creator too. Then, return it all as a JSON to the client.
Of course we want to send as few SQL queries to the database as possible to minimize connections utilization and round-trip latency. We also do not want to use JOINs (imagine loadUsers()
, loadTopics()
and loadComments()
live in independent modules and don't want to know about each other, plus the data lives in different microshards).
First, let's see, what will happen if we think in terms of "load a list of things" abstraction. This is how people used to fight the "N+1 Selects" problem in the past.
Look at this spaghetti mess. The code appears very coupled.
The root of the problem here is clear: we think in terms of the lists, and the code encourages us to "accumulate" lists manually.
Ent Framework Approach: Automatic Batching
Now let's see what happens if we stop thinking in terms of lists and, instead, switch to "per individual object" paradigm.
All calls to uniq()
, keyBy()
and map()
are gone. We now use only loadX(vc, id)
which accepts an individual ID and returns an individual Ent.
And still, it runs only 3 SQL queries under the hood:
Batching: Ent Framework recognizes that the
loadX()
calls happen in concurrent Promises and batches them together intelligently.Coalescing: in case multiple
loadX(vc, id)
try to load the same Ent by the same ID, Ent Framework coalesces those calls into one.Caching: if enabled, an Ent loaded in some VC remains in the VC's cache, so next time it's attempted to load again, the Ent is returned from the cache directly. Ents are immutable JS objects, so it simplifies things even further.
To learn more about batching, "parallel Promises", and how event loop works in Node, check out Loaders and Custom Batching article.
Helper Loading Methods
Each Ent is an immutable object, which means that you can't change its fields after loading from the DB. But you can add helper methods to simplify things like loading.
Let's optimize the above example even further by adding topic()
and creator()
helper methods into Ent classes directly.
Now it's responsibility of each Ent to load the related data.
This will, as previously, produce the same exact 3 DB queries:
In traditional ORMs, such helper loading methods are added to the classes automatically. Ent Framework doesn't do it and requires you to write a bit of boilerplate. Why? For general purpose use cases, we may need not one, but 2 method for each field, like creator()
and creatorNullable()
, which is not elegant. This is because foreign keys do not work reliably enough across microshards, so in some cases, we should always be ready that some Ent is not in the database, even when its field is technically non-nullable. Luckily, in practice, it is not hard at all to add such methods manually, so we don't lose too much here.
Batching vs. JOINs
In traditional SQL and in many ORMs, people use JOINs to minimize the number of queries they send to the database engine. Despite the JOINs having advantages, they are also problematic:
One cannot do JOINs across microshards or machines.
JOINs encourage people to write highly coupled code, similar to the 1st example on this page.
JOINs generally can't run their subqueries in parallel.
Ent Framework's automatic batching can be treated as an alternative to JOINs. It doesn't have any of the above problems, plus (and more importantly), the calls are batched across the entire async functions call stack, which means that you can split the code into independent abstraction layers easily.
Stop thinking in terms of lists. Start thinking in terms of an individual Ent and its behavior.
Last updated
Was this helpful?