Ent API: select() by Expression
Last updated
Was this helpful?
Last updated
Was this helpful?
The previous chapters explained how to load an Ent by its ID: load*()
API. Loading by ID is the most basic operation, and it is usually the most common one in the code as well. Now, let's talk about some more complicated ways of loading.
TL;DR:
Below, there will be a bit of theory, fasten your seatbelt.
In graph terms, where each Ent is a node, an Ent's field that points to the ID of another Ent represents an edge. (Or, in relational databases, people typically use "foreign key" term.) We often refer to it as field edge; traversing such edges is typically straightforward: you simply load another Ent by the ID obtained from a field of the current Ent. For example, EntComment#topic_id
or EntTopic#creator_id
are field edges.
From a different perspective, traversing a field edge can be seen as "going from a child Ent to a parent Ent" (for example, from EntComment
to its owning EntTopic
). In other words, it’s a child-to-parent traversal, or a many-to-one relationship:
Nothing too new yet, right? Just a regular relational theory so far.
How do we go in the opposite direction, performing a parent-to-children traversal in a one-to-many relationship?
To accomplish this, Ent Framework provides (surprise!) a select()
primitive. It allows you to fetch Ents from the database using any arbitrary expression, including those that specify constraints on which parent Ent's ID the selected Ents should have:
In production databases with millions of Ents, it's assumed that the relevant table has a necessary index to run such queries efficiently; in the above example,
Nothing new again. Or there is something?..
Let's think about those ???
on the diagram. To traverse edges in a graph in both directions, the edges must be bi-directional (or, there should be pairs of edges, which is the same). In the graph with bi-directional edges we discussed earlier, the child-to-parent direction of an edge is represented by an "Ent field edge". But what corresponds to ???
, the opposite parent-to-children direction of that edge?
This ???
, dear friends, is the automatic database index (or an index prefix, which is topic_id
in the example). In fact, as we hinted above, without such an index, the queries will just blow up.
This distinction between graph edge directions is crucial to understand: for free traversal of the graph, both field edges and indexes are absolutely essential.
By defining a DB foreign key on an Ent, you define a field edge, which represents child-to-parent direction in the graph.
By defining a DB index, you define the opposite direction of that edge, which is parent-to-children direction.
Modern database engines are pretty good at managing indexes. You can add them without acquiring write locks on the tables (CREATE INDEX CONCURRENTLY
), and you can also add more field edges (aka fields with foreign keys) on a table with no downtime, to refer some other Ent from an existing one.
The point of view described above works straightforwardly when your database is monolithic. Scaling your app introduces more complexity though due to the involvement of microshards in the traversal process.
Luckily, we can still rely on the parent-to-children indices mainly. And this is where the theory pays off.
When loading children of a parent Ent, the children might be distributed across multiple microshards. A naive way would thus be to just query all microshards using the exact same query (Ent IDs are globally unique) and then merge the results, but of course it would blow up the DB nodes.
For now, all you need to know is that there is a magical subsystem in Ent Framework called Inverses which, given a parent ID (e.g. EntTopic ID), returns the list of microshards where the children Ents (e.g. EntComment) may or may not reside. This "may not" is important: cross-shard writes are not transactional, so sometimes (rarely), slightly more candidate microshards may be returned, but never less. In reality it produces no problems for business logic: the "excess" microshards, when queried, will just return 0 children Ents.
The select()
API uses a simple query language.
If a plain object is passed, it combines all the specified field constraints using an AND operation:
The full list of operations include:
equality and "one of array element" implicit operators (see examples above)
logical: $or
, $and
, $not
binary: $lte
, $lt
, $gte
, $gt
$overlap
(useful for array fields, typically backed by a PostgreSQL GIN index)
$isDistinctFrom
(for NULL-safe comparisons)
$literal
(to run a custom SQL sub-expression)
These operations can be nested in any way, but it's important to ensure that the actual SQL engine uses an appropriate database index for efficiency.
If your project uses microsharding, one of the top-level fields in the select()
expression must match a parent ID or an array of parent IDs to help Ent Framework identify the relevant microshards. Notice that we used topic_id
for this purpose in the example above. There’s no magic here: sometimes, it has to determine, which microshards are involved. Alternatively, you can use the special $shardOfID
operator to explicitly provide this hint in the query.
For illustrative purposes, below is a giant select()
expression from one of Ent Framework's unit tests. It is generally obvious, how the operations work (as opposed to e.g. Elasticsearch query language BTW):
As everything in Ent Framework, when multiple select()
calls run in parallel, they are batched into one giant SQL UNION ALL
query.
The following code will produce only one SQL query:
SQL query produced under the hood (with some simplifications):
Sometimes, select()
calls are meant to be relatively slow, and we don't want to batch them; instead, we prefer to run them in parallel, in different DB connections. To do so, you can just inject an "event loop spin" barrier:
In addition to database-independent features, select()
call also supports engine-specific customizations using its last optional argument:
Read more in:
Therefore, before Ent Framework executes the actual SELECT queries in parallel on multiple nodes to merge their results later, it first determines the minimal set of microshards that needs to be queried; in the vast majority of cases, this is just one microshard. Those mechanisms are known as and correspondingly, and we’ll explore them in detail later, in advanced sections.
For more details, see TypeScript Where<...>
definition in .