Locating a Shard and ID Format
To enable microshardig support, we first need to configure the instance of Cluster
class:
Shards Discovery
Notice the shards
configuration property above.
nameFormat
: this sprintf-style template defines, how Ent Framework should build the microshard schema name when it knows the microshard number. In our case, the schema names will look likesh0123
orsh0000
, and there will be up to 10000 microshards allowed.discoverQuery
: Ent Framework will run this query on all islands from time to time to figure out, what shards are located where. It will also run this query immediately in several conditions, like "table not found" error (which may mean that a microshard has just been moved from one island to another, so Ent Framework needs to rediscover).
There is also pg-microsharding library which allows you to manipulate microshard schemas: create them, activate, move and rebalance microshards across islands. When this library is used, you can utilize SELECT * FROM unnest(microsharding.list_active_shards())
as a value for discoverQuery
.
As of the islands in the cliuster, just enumerate them and their nodes. Ent Framework will figure out, what nodes are masters and whan nodes are replicas. You can also change the list of islands and nodes in real-time, without restarting the app: Ent Framework is smart enough to pick up the changes if islands
callback returns a different value (it is called from time to time).
Format of IDs
Assume we have the following call:
When users are distributed across multiple microshards, Ent Framework decides, which microshard should it query the data from. The decision is made based on the ID prefix:
To use the default microshard location strategy, there is a convention on ID format, it must consist of 3 parts:
"1"
(Environment Number): you may want to make your IDs globally unique across the entire world, so all IDs in dev environment will start from e.g. 1, IDs in staging with 2, and IDs in production with 3."0246"
(Shard Number): this is where the microshard number reside in the ID structure. In the code, it is also referred as "Shard No"."57131744498804"
(Entropy): a "never-repeating random-looking" part of the ID. It may not necessarily be random (other strategies are "auto-incremental" and "timestamp-based"), i.e. the concrete generation algorithm it's up to the library which generates the new IDs.
Ent ID (and thus, its microshard number) is determined once, at the time when the Ent is inserted. Typically, each microshard schema has its own function that build the IDs, fills the environment and shard number, generates the "never-repeating random-looking" part:
Stored Functions in pg-id Library
id_gen_timestampic()
: similar toid_gen()
, but instead of generating randomly looking ids, prepends the "sequence" part of the id with the current timestamp.id_gen_monotonic()
: the simplest and fastest function among the above: generates next globally-unique monotonic id, without using any timestamps as a prefix. Monotonic ids are more friendly to heavy INSERTs since they maximize the chance for btree index to reuse the newly created leaf pages.id_gen_uuid()
: returns an ID in UUID format (PostgreSQLuuid
type) with first several digits assigned toEssss
prefix as in all other functions above.
Using UUID v4 for ID Fields
The UUID generated by that function looks like this:
Here, as in the previous examples, 1
is environment number (e.g. production), 0246
is microshard number, 4
is the UUID version field, and N
is a so-called "variant". All other digits are randomly generated.
Notice that id_gen_uuid()
replaces the first several digits in the string representation of UUID with the information regarding environment and microshard numbers. This trick doesn't cut too much of the UUID's entropy (UUID is 16 bytes; compare it to 8 bytes of bigint
), but allows to use UUIDs in microsharded environment.
Also, you need to use type String
and not ID
for the fields that hold an UUID data. This applies to id
property and to all "foreign key like" fields.
Why Using Database Generated IDs?
Let's get back to the previous example of an ID field definition:
Also, the corresponding SQL table schema in every microshard is:
(In case you want IDs as UUID, use the built-in PostgreSQL type uuid
instead of bigint
.)
id_gen() is Mentioned in Two Places
Technically, you don't have to include DEFAULT id_gen()
clause in your SQL table definition. For Ent Framework to operate, it's fully enough to define just autoInsert="id_gen()"
.
But we strongly advise to have both. Otherwise, you won't be able to e.g. connect to a node with psql
and run INSERT INTO users ...
safely, without thinking of IDs generation. It will also be hard to build database triggers if they insert users
rows.
autoInsert is a String Property, not a Callback
You probably wondered, why doesn't Ent Framework support autoInsert
being a TypeScript callback? Why do we always ask the database to generate IDs, why don't we support application code ID generation (especially for UUIDs)?
There are several reasons for this.
As mentioned above, the best practice is to have the
autoInsert
expression defined in both Ent Framework schema and in the SQL table definition. Thus, we need an approach available in both TypeScript and SQL worlds; that is using an SQL expression as a string. (BTW, for non-ID fields, other available values forautoInsert
are:"now()"
,"NULL"
or even"'{}'"
for e.g. an empty array.)When building batched INSERTs, Ent Framework uses the expression from
autoInsert
directly in the batched SQL queries.If an Ent class has
beforeInsert
triggers, Ent Framework runs the expressions fromautoInsert
in a separate query, so the generated IDs are available inbeforeInsert
triggers early, even though the row is not yet inserted into the table. This allows to build "eventually consistent" logic without transactions. See more details about this in Triggers article.
Last updated
Was this helpful?