Support logical replication of global object commands

  • Jump to comment-1
    Hello, Logical replication of DDL commands support is being worked on in [1]. However, global object commands are quite different from other non-global object DDL commands and need to be handled differently. For example, global object commands include ROLE statements, DATABASE statements, TABLESPACE statements and a subset of GRANT/REVOKE statements if the object being modified is a global object. These commands are different from other DDL commands in that: 1. Global object commands can be executed in any database. 2. Global objects are not schema qualified. 3. Global object commands are not captured by event triggers. I’ve put together a prototype to support logical replication of global object commands in the attached patch. This patch builds on the DDL replication patch from ZJ in [2] and must be applied on top of it. Here is a list of global object commands that the patch replicate, you can find more details in function LogGlobalObjectCommand: /* ROLE statements */ CreateRoleStmt AlterRoleStmt AlterRoleSetStmt DropRoleStmt ReassignOwnedStmt GrantRoleStmt /* Database statements */ CreatedbStmt AlterDatabaseStmt AlterDatabaseRefreshCollStmt AlterDatabaseSetStmt DropdbStmt /* TableSpace statements */ CreateTableSpaceStmt DropTableSpaceStmt AlterTableSpaceOptionsStmt /* GrantStmt and RevokeStmt if objtype is a global object determined by EventTriggerSupportsObjectType() */ GrantStmt RevokeStmt The idea with this patch is to support global objects commands replication by WAL logging the command using the same function for DDL logging - LogLogicalDDLMessage towards the end of standard_ProcessUtility. Because global objects are not schema qualified, we can skip the deparser invocation and directly log the original command string for replay on the subscriber. A key problem to address is that global objects can become inconsistent between the publisher and the subscriber if a command modifying the global object gets executed in a database (on the source side) that doesn't replicate the global object commands. I think we can work on the following two aspects in order to avoid such inconsistency: 1. Introduce a publication option for global object commands replication and document that logical replication of global object commands is preferred to be enabled on all databases. Otherwise inconsistency can happen if a command modifies the global object in a database that doesn't replicate global object commands. For example, we could introduce the following publication option publish_global_object_command : CREATE PUBLICATION mypub FOR ALL TABLES WITH (publish = 'insert, delete, update', publish_global_object_command = true); We may consider other fine tuned global command options such as “publish_role_statements”, “publish_database_statements”, “publish_tablespace_statements” and "publish_grant_statements", i.e. you pick which global commands you want replicated. For example, you can do this if you need a permission or tablespace to be set up differently on the target cluster. In addition, we may need to adjust the syntax once the DDL replication syntax finalizes. 2. Introduce the following database cluster level logical replication commands to avoid such inconsistency, this is especially handy when there is a large number of databases to configure for logical replication. CREATE PUBLICATION GROUP mypub_ FOR ALL DATABASES WITH (publish = 'insert, delete, update', publish_global_object_command = true); CREATE SUBSCRIPTION GROUP mysub_ CONNECTION 'dbnames = \“path to file\” host=hostname user=username port=5432' PUBLICATION GROUP mypub_; Under the hood, the CREATE PUBLICATION GROUP command generates one CREATE PUBLICATION mypub_n sub-command for each database in the cluster where n is a monotonically increasing integer from 1. The command outputs the (dbname, publication name) pairs which can be saved in a file and then used on the subscription side. Similarly, the CREATE SUBSCRIPTION GROUP command will generate one CREATE SUBSCRIPTION mysub_n sub-command for each database in the dbnames file. The dbnames file contains the (dbname, publication name) pairs which come from the output of the CREATE PUBLICATION GROUP command. Notice the connection string doesn’t have the dbname field, During execution the connection string will be appended the dbname retrieved from the dbnames file. By default the target DB name is the same as the source DB name, optionally user can specify the source_db to target_db mapping in the dbnames file. In addition, we might want to create dependencies for the publications/subscriptions created by the above commands in order to guarantee the group consistency. Also we need to enforce that there is only one group of publications/subscriptions for database cluster level replication. Logical replication of all commands across an entire cluster (instead of on a per-database basis) is a separate topic. We can start another thread after implementing a prototype. Please let me know your thoughts. [1] [2] With Regards, Zheng Li Amazon RDS/Aurora for PostgreSQL