ent-framework
  • Ent Framework
  • Getting Started
    • Code Structure
    • Connect to a Database
    • Create Ent Classes
    • VC: Viewer Context and Principal
    • Ent API: insert*()
    • Built-in Field Types
    • Ent API: load*() by ID
    • N+1 Selects Solution
    • Automatic Batching Examples
    • Ent API: select() by Expression
    • Ent API: loadBy*() Unique Key
    • Ent API: update*()
    • Ent API: deleteOriginal()
    • Ent API: count() by Expression
    • Ent API: exists() by Expression
    • Ent API: selectBy() Unique Key Prefix
    • Ent API: upsert*()
    • Privacy Rules
    • Validators
    • Triggers
    • Custom Field Types
  • Ent API: Configuration and Types
  • Scalability
    • Replication and Automatic Lag Tracking
    • Sharding and Microsharding
    • Sharding Terminology
    • Locating a Shard and ID Format
    • Sharding Low-Level API
    • Shard Affinity and Ent Colocation
    • Inverses and Cross Shard Foreign Keys
    • Shards Rebalancing and pg-microsharding Tool
    • Connection Pooling
  • Advanced
    • Database Migrations and pg-mig Tool
    • Ephemeral (Symbol) Fields
    • Atomic Updates and CAS
    • Custom Field Refactoring
    • VC Flavors
    • Query Cache and VC Caches
    • Loaders and Custom Batching
    • PostgreSQL Specific Features
    • Query Planner Hints
    • Cluster Maintenance Queries
    • Logging and Diagnostic Tools
    • Composite Primary Keys
    • Passwords Rotation
  • Architecture
    • Abstraction Layers
    • Ent Framework, Meta’s TAO, entgo
    • JIT in SQL Queries Batching
    • To JOIN or not to JOIN
Powered by GitBook
On this page
  • Shards Discovery
  • Format of IDs
  • Stored Functions in pg-id Library
  • Using UUID v4 for ID Fields
  • Why Using Database Generated IDs?
  • id_gen() is Mentioned in Two Places
  • autoInsert is a String Property, not a Callback

Was this helpful?

Edit on GitHub
  1. Scalability

Locating a Shard and ID Format

To enable microshardig support, we first need to configure the instance of Cluster class:

export const cluster = new Cluster({
  islands: async () => [ // sync or async
    {
      no: 0,
      nodes: [
        { name: "abc-instance-1", config: { host: "...", ... } },
        { name: "abc-instance-2", config: { host: "...", ... } },
      ],
    },
    {
      no: 1,
      nodes: [
        { name: "abc-instance-3", config: { host: "...", ... } },
        { name: "abc-instance-4", config: { host: "...", ... } },
      ],
    },
  ],
  createClient: (node) => new PgClientPool(node),
  shardNamer: new ShardNamer({
    nameFormat: "sh%04d",
    discoverQuery:
      "SELECT unnest FROM unnest(microsharding.microsharding_list_active_shards())",
  }),
  ...,
});

Shards Discovery

Notice the shards configuration property above.

  • nameFormat: this sprintf-style template defines, how Ent Framework should build the microshard schema name when it knows the microshard number. In our case, the schema names will look like sh0123 or sh0000, and there will be up to 10000 microshards allowed.

  • discoverQuery: Ent Framework will run this query on all islands from time to time to figure out, what shards are located where. It will also run this query immediately in several conditions, like "table not found" error (which may mean that a microshard has just been moved from one island to another, so Ent Framework needs to rediscover).

There is also pg-microsharding library which allows you to manipulate microshard schemas: create them, activate, move and rebalance microshards across islands. When this library is used, you can utilize SELECT * FROM unnest(microsharding.list_active_shards()) as a value for discoverQuery.

As of the islands in the cliuster, just enumerate them and their nodes. Ent Framework will figure out, what nodes are masters and whan nodes are replicas. You can also change the list of islands and nodes in real-time, without restarting the app: Ent Framework is smart enough to pick up the changes if islands callback returns a different value (it is called from time to time).

Format of IDs

Assume we have the following call:

const user = EntUser.loadX(vc, id);

When users are distributed across multiple microshards, Ent Framework decides, which microshard should it query the data from. The decision is made based on the ID prefix:

To use the default microshard location strategy, there is a convention on ID format, it must consist of 3 parts:

  • "1" (Environment Number): you may want to make your IDs globally unique across the entire world, so all IDs in dev environment will start from e.g. 1, IDs in staging with 2, and IDs in production with 3.

  • "0246" (Shard Number): this is where the microshard number reside in the ID structure. In the code, it is also referred as "Shard No".

  • "57131744498804" (Entropy): a "never-repeating random-looking" part of the ID. It may not necessarily be random (other strategies are "auto-incremental" and "timestamp-based"), i.e. the concrete generation algorithm it's up to the library which generates the new IDs.

Ent ID (and thus, its microshard number) is determined once, at the time when the Ent is inserted. Typically, each microshard schema has its own function that build the IDs, fills the environment and shard number, generates the "never-repeating random-looking" part:

const schema = new PgSchema(
  "users",
  {
    id: { type: ID, autoInsert: "id_gen()" },
    email: { type: String },
  },
  ["email"]
);
EssssRRRRRRRRRRRRRR
 ^   ^^^^^^^^^^^^^^
 4   14

Stored Functions in pg-id Library

  • id_gen_timestampic(): similar to id_gen(), but instead of generating randomly looking ids, prepends the "sequence" part of the id with the current timestamp.

  • id_gen_monotonic(): the simplest and fastest function among the above: generates next globally-unique monotonic id, without using any timestamps as a prefix. Monotonic ids are more friendly to heavy INSERTs since they maximize the chance for btree index to reuse the newly created leaf pages.

  • id_gen_uuid(): returns an ID in UUID format (PostgreSQL uuid type) with first several digits assigned to Essss prefix as in all other functions above.

Using UUID v4 for ID Fields

The UUID generated by that function looks like this:

10246xxx-xxxx-4xxx-Nxxx-xxxxxxxxxxxx

Here, as in the previous examples, 1 is environment number (e.g. production), 0246 is microshard number, 4 is the UUID version field, and N is a so-called "variant". All other digits are randomly generated.

Notice that id_gen_uuid() replaces the first several digits in the string representation of UUID with the information regarding environment and microshard numbers. This trick doesn't cut too much of the UUID's entropy (UUID is 16 bytes; compare it to 8 bytes of bigint), but allows to use UUIDs in microsharded environment.

Also, you need to use type String and not ID for the fields that hold an UUID data. This applies to id property and to all "foreign key like" fields.

Why Using Database Generated IDs?

Let's get back to the previous example of an ID field definition:

const schema = new PgSchema(
  "users",
  {
    id: { type: ID, autoInsert: "id_gen()" },
    ...
  },
  ...
);

Also, the corresponding SQL table schema in every microshard is:

CREATE TABLE users(
  id bigint PRIMARY KEY DEFAULT id_gen(),

  ...
)

(In case you want IDs as UUID, use the built-in PostgreSQL type uuid instead of bigint.)

id_gen() is Mentioned in Two Places

Technically, you don't have to include DEFAULT id_gen() clause in your SQL table definition. For Ent Framework to operate, it's fully enough to define just autoInsert="id_gen()".

But we strongly advise to have both. Otherwise, you won't be able to e.g. connect to a node with psql and run INSERT INTO users ... safely, without thinking of IDs generation. It will also be hard to build database triggers if they insert users rows.

autoInsert is a String Property, not a Callback

You probably wondered, why doesn't Ent Framework support autoInsert being a TypeScript callback? Why do we always ask the database to generate IDs, why don't we support application code ID generation (especially for UUIDs)?

There are several reasons for this.

  1. As mentioned above, the best practice is to have the autoInsert expression defined in both Ent Framework schema and in the SQL table definition. Thus, we need an approach available in both TypeScript and SQL worlds; that is using an SQL expression as a string. (BTW, for non-ID fields, other available values for autoInsert are: "now()", "NULL" or even "'{}'" for e.g. an empty array.)

  2. When building batched INSERTs, Ent Framework uses the expression from autoInsert directly in the batched SQL queries.

  3. If an Ent class has beforeInsert triggers, Ent Framework runs the expressions from autoInsert in a separate query, so the generated IDs are available in beforeInsert triggers early, even though the row is not yet inserted into the table. This allows to build "eventually consistent" logic without transactions. See more details about this in Triggers article.

PreviousSharding TerminologyNextSharding Low-Level API

Last updated 1 day ago

Was this helpful?

Here, we use id_gen() function from library, which by default generates the IDs in the format we mentioned above:

The complete list of id_gen*() functions in library are:

id_gen(): generates next globally-unique randomly-looking id. The main idea is to not let external people infer the rate at which the ids are generated, even when they look at some ids sample. The function implicitly uses a sequence to get the information about the next available number, and then uses to generate a randomly-looking non-repeating ID based off it.

You can also use id_gen_uuid() function if you want your primary keys to be in UUID v4 format (or utilize the built-in PostgreSQL function in case you don't need microsharding support).

pg-id
pg-id
Feistel cipher
gen_random_uuid()