Postgres Email Lists

pgsql-hackers

❮

[PoC] Umbra: a remap-aware smgr prototype on PostgreSQL master

Jump to comment-1
Mingwei Jia<i@nayishan.top>
Apr 24, 2026, 2:14 PM UTC
Hi hackers,
Apologies if my earlier attempt did not reach the list correctly. I am sending this as a single PoC introduction with repository links only, rather than as an attached patch series.
I would like to share a working Proof-of-Concept for Umbra, an alternative smgr implementation on PostgreSQL master.
To be clear about scope: this is not a merge-ready proposal, and it is not a new table AM or a separate storage engine. The goal is narrower: to make the current design, code structure, recovery model, and
patch decomposition concrete enough for technical discussion, and to preserve a usable baseline for anyone interested in continuing the work.
Umbra operates at the smgr layer. The central idea is to decouple logical page identity from physical page placement, so that the ordinary first-dirty-after-checkpoint path does not have to rely on
PostgreSQL's default full-page-image path in the same way. In the current prototype:
- PostgreSQL callers still work in logical block numbers.
- Umbra maintains lblk -> pblk translation in its own metadata fork.
- WAL can publish remap state explicitly.
- redo reconstructs the correct mapping view before replaying page contents.
Umbra's metadata fork contains only two formats: a 512-byte superblock for fork-level control state, and single-purpose MAP pages for mapping entries. These are not ordinary heap/index pages. In that
respect they are closer to system control/state metadata such as pgcontrol and pgxact/SLRU pages, and they do not rely on PostgreSQL's ordinary FPW path for data pages. Instead, they are protected by
Umbra-specific metadata WAL/redo rules for those two formats.
The implementation is currently organized in the repository as:
- P0: design notes and repository navigation
- P1-P9: code patches covering smgr boundary, metadata fork, MAP subsystem, WAL/redo, checkpoint integration, preallocation, and compaction
Current verification state:
- final tip passes `make check`
- final tip passes `make -C src/test/recovery check`
- strict per-patch state is:
- P1-P5: all four matrix items pass
- P6: MD make check / MD recovery / UMBRA make check pass, but UMBRA recovery does not pass
- P7-P9: all four matrix items pass
That boundary is intentional in the current decomposition: P6 establishes the WAL record / birth / basic redo state-machine layer, while P7 closes the ordinary remap / block-reference remap / checkpoint-
boundary replacement loop.
I do not want to overclaim on performance. The numbers below should be read as directional PoC signals, not as a final benchmark claim.
On a TPC-C-style workload (BenchmarkSQL), the current results are:
Throughput (`checksum=off`)
terminals md + fpw=on md + fpw=off Umbra + fpw=on
10 158709 154283 155781
50 577005 626954 656353
200 641899 981436 995635
500 322660 943295 859058
1000 275609 899631 729989
Throughput (`checksum=on`)
terminals md + fpw=on md + fpw=off Umbra + fpw=on
10 155754 152025 150606
50 601974 635597 650844
200 621176 1015923 938311
500 316950 972795 729801
1000 282713 891770 674865
WAL size ratio (`md + fpw=on` / `Umbra + fpw=on`)
terminals checksum=on checksum=off
10 1.82 2.03
50 2.11 2.51
200 3.81 5.22
500 4.58 6.90
1000 4.87 6.55
At 1000 terminals, Umbra recovers roughly 85% of the throughput gap between `md + fpw=on` and `md + fpw=off`, while reducing WAL volume by roughly 4.9x (`checksum=on`) or 6.6x (`checksum=off`).
The `md + fpw=off` numbers should be read only as a sensitivity / upper-bound reference, not as a correctness-equivalent baseline.
Known follow-up work still includes:
- deeper host-tree engineering around AIO
- `CREATE DATABASE` `WAL_LOG` copy path
- stronger primary/standby physical-page alignment validation
- more complete production-grade space management
- an explicit upper-layer owner model for `range-born / batch mapping publish`
The last point is worth calling out explicitly: the current prototype has internal range-shaped lifecycle operations, but it does not yet claim a generic upper-layer `RangeMap` contract. I do not believe
that should be introduced without a clear upper-layer use site and owner model.
For personal reasons, my availability for sustained follow-up may be limited for some time. Rather than leave this work in a private or half-documented state, I would prefer to put the current PoC and
design notes in front of the community while they are still coherent and runnable.
If the direction looks interesting, I would welcome discussion, criticism, or a future maintainer/collaborator willing to continue the engineering work from this baseline.
Repository and design notes:
https://github.com/nayishan/postgre_umbra/tree/umbra-poc-pgmaster
Regards,
Mingwei Jia
i@nayishan.top
View in PostgreSQL Archives →

terminals	md + fpw=on	md + fpw=off	Umbra + fpw=on
10	158709	154283	155781
50	577005	626954	656353
200	641899	981436	995635
500	322660	943295	859058
1000	275609	899631	729989

terminals	md + fpw=on	md + fpw=off	Umbra + fpw=on
10	155754	152025	150606
50	601974	635597	650844
200	621176	1015923	938311
500	316950	972795	729801
1000	282713	891770	674865

terminals	checksum=on	checksum=off
10	1.82	2.03
50	2.11	2.51
200	3.81	5.22
500	4.58	6.90
1000	4.87	6.55