Inverses and Cross Shard Foreign Keys
Last updated
Was this helpful?
Last updated
Was this helpful?
We have already touched the topic of inverses and loading Ents across multiple microshards in Ent API: select() by Expression article. We also noted that in many cases, it's better to colocate "related" Ents in one microshard: Shard Affinity and Ent Colocation.
Now, it's time to discuss how inverses work in details.
Let's first build a pretty artificial "family" of the Ents (EntUser—EntTopic—EntComment), where each Ent is created in a random shard at insert time. (In real life, you'll likely want most of your Ents to be to their parents, but for the best illustration, we'll make the opposite assumption).
Notice the following:
shardAffinity=[]
for all of the above Ents. It means that, at insert time, the target microshard will be chosen randomly.
There is inverses
configuration property, which tells Ent Framework, how it can find children Ents located in other microshards than the parent Ents (e.g. how to find all topics related to a particular creator).
Before we continue, let's look at the tables structure we need to have in all microshards of the database.
Tables users
, topics
and comments
are not much special, except that their *_id
fields are not declared as FOREIGN KEY
. No surprise: there can be no SQL-enforced foreign keys across microshards, so we just keep the fields being of the regular bigint
type. We still define indexes for those fields though, for faster selection.
Now, notice the inverses
table. It is treated by Ent Framework in a special way.
When you e.g. insert an EntTopic row, Ent Framework first chooses the target microshard randomly and then creates a row in inverses
table in the parent's shard. It then inserts the row to the destination microshard. Thus, Ent Framework remembers, what are the children microshards (encoded in inverses.id2
) for each parent ID (inverses.id1
).
Probably the simplest way to understand inverses is to look at a particular example, what's inserted and where when creating the Ents.
Consider the following "family" Ents creation.
Internally, Ent Framework will run the following SQL queries (pseudo-code):
Notice that, because of { name: "inverses", type: "topic2creators" }
inverse specifier, Ent Framework knows that the inverses table name is inverses
, and the value of the type
field there is "topic2creators"
. You can choose your own values for both of those options: the above example is just a convention.
Inverses are always inserted to the parent's shard (not to the child's shard) and before the actual row (id_gen()
defined in autoInsert
option is called in a separate database query). This guarantees that there will be no situation when there is a row in some shard, but its corresponding inverse does not exist in another shard. I.e. there are always not less inverses in the cluster than the Ent rows.
And to maintain the "not less" invariant, when an Ent is deleted, the corresponding inverses are deleted after it (not before). So even if the inverse deletion query fails, it will still be okay for us: we'll just have a "hanging inverse".
As a result, the following rows will appear in the database tables:
So, inverses allow Ent Framework to find all children Ents related to a particular parent ID, like all topics by a last commenter ID:
This would send the following SQL queries to the database (pseudo-code):
(Of curse, as everything in Ent Framework, the above will be batched into compound SQL queries when multiple select()
calls are happening in parallel.)
Notice that, despite inverses hold the full ID of the child Ent in id2
field, only the microshard number from id2
is used in Ent Framework logic (see shard_from_id()
in the above pseudo-code). I.e. id2
is used to get "covering hints" about what microshards should be queried when loading the data.
This is a really powerful approach. It solves the problem of cross-shard consistency. Since we have not less inverses than the children rows, Ent Framework just collects all unique microshards from those inverses id2
fields and then queries them using a regular SELECT. If there were some hanging (excess) inverses in the database, it is not a big deal: the engine will just query a little bit more microshards than it needs to, but the SELECTs from those extra microshards will just come empty.
It makes sense to delete the hanging inverses in background from time to time ("inverses fixer" infra), since they may accumulate slowly. Currently, Ent Framework doesn't have any logic to help with this, but in the future, such functionality may appear. Since inverses are treated as just covering hints, not cleaning them up is generally fine.
When shardAffinity=[]
, the microshard is chosen randomly at insert time.
But if the Ent has a unique key defined in its schema, this randomness is not absolute. The seed of the random number generator is assigned based on the value of the Ent's unique key, so all insert()
calls with the same unique key will essentially choose the same microshard for insertion.
Let's consider an example.
Imagine you have a web server route which creates a new topic with a particular slug:
The user clicks the button in the browser, and the topic creation POST request is sent to the server.
Now imagine that there is some network issue happened between the backend and the database after the INSERT query is sent, but before the response is received. Or the database is overloaded, so it accepted the INSERT query, but failed to respond within the allocated time. In this case, there will be a "non-idempotent query timeout": the backend will see the error, but it will not know, whether the INSERT query succeeded (and thus, the topic is created), or it failed internally.
In case of an error, the user will likely press the button again, and it will cause another POST request to insert the topic with the same slug.
If we chose the microshard trully randomly, then the 2nd INSERT would create the topic row in another shard, and there would be a possiblilty to create 2 topics with the same slug in 2 different microshard. This would be really bad: we would violate the unique key constraint across shards (there is nothing in inverses engine preventing from creating duplicates).
This is why the microshard is chosen pseudo-randomly, based on the Ent's unique key (it it is defined). Thus, an attempt to create a duplicated Ent would fail with a unique key constraint error, and the app would be able to handle it. In the above example, insert()
would've thrown an error, but you could use insertIfNotExists()
call which would just return null; see Ent API: insert*().