Postgres Email Lists

pgsql-general

❮

Attempting to delete excess rows from table with BATCH DELETE

Jump to comment-1
Gus Spier<gus.spier@gmail.com>
Jan 28, 2026, 3:22 AM UTC
Environment: AWS RDS Aurora for PostgreSQL 15 hosting tables that
support scientific research. The development environment predominantly
uses JPA with Hibernate.
Years of neglect have allowed mission tables to accumulate hundreds of
millions of rows of excess data. The developers and the customer
decided we must delete all rows older than 75 days. Table partitioning
was briefly considered but discarded because of the effort needed to
refactor the codebase.
I proposed the straight-forward course of action: delete by batches
from the victim tables.
The solution seemed obvious:
For candidate tables:
- Determine timestamp column iwith the most relevant value and call it
the Discriminator.
- Delete any row whose Discriminator value is 60 days older than the
Discriminator, with a LIMIT of 50000,
- Get the results of the batch by querying GET DIAGNOSTICS. That value
held aside for later use.
- COMMIT the delete transaction
- Loop to the top and continue deleting batches until there are no
more rows older than 60 days.
- Before running a test, I ensure that the Discriminator column is indexed
I write a procedure to accomplish all this work but it persists in
returning a error to the effect that a COMMIT is not valid in a block
tht tries to DELETE data.
Has anybody seen this before? Is there a solution?
Thanks in advance for any help you may be able to offer.
Regards,
Gus Spier
View in PostgreSQL Archives →
- Jump to comment-1
  Ron Johnson<ronljohnsonjr@gmail.com>
  Jan 28, 2026, 3:58 AM UTC
  On Tue, Jan 27, 2026 at 10:22 PM Gus Spier <gus.spier@gmail.com> wrote:
  Environment: AWS RDS Aurora for PostgreSQL 15 hosting tables that
  support scientific research. The development environment predominantly
  uses JPA with Hibernate.
  
  Years of neglect have allowed mission tables to accumulate hundreds of
  millions of rows of excess data. The developers and the customer
  decided we must delete all rows older than 75 days. Table partitioning
  was briefly considered but discarded because of the effort needed to
  refactor the codebase.
  
  I proposed the straight-forward course of action: delete by batches
  from the victim tables.
  
  The solution seemed obvious:
  
  For candidate tables:
  - Determine timestamp column iwith the most relevant value and call it
  the Discriminator.
  - Delete any row whose Discriminator value is 60 days older than the
  Discriminator, with a LIMIT of 50000,
  - Get the results of the batch by querying GET DIAGNOSTICS. That value
  held aside for later use.
  - COMMIT the delete transaction
  - Loop to the top and continue deleting batches until there are no
  more rows older than 60 days.
  
  - Before running a test, I ensure that the Discriminator column is indexed
  
  I write a procedure to accomplish all this work but it persists in
  returning a error to the effect that a COMMIT is not valid in a block
  tht tries to DELETE data.
  
  Has anybody seen this before?
  Yup, when putting the COMMIT in a DO block, which isn't allowed.
  Is there a solution?
  Loop using bash. In a similar case like this, I first looked for the
  oldest day's data in the tables, then did something like this bash
  pseudo-code:
  StopDate=$(date -d'60 days ago')
  DeleteDay=$1
  export PGHOST=foo.example.com
  export PGDATABASE=bar
  while [[ "$DeleteDay <= "$StopDate" ]]; do
  psql -Xc "DELETE FROM blarge WHERE txn_date > '$DeleteDay' + INTERVAL
  '1' DAY;"
  DeleteDay=$(date -d "$DeleteDay + 1 day" +"%Y-%m-%d")
  done
  Using that method, I developed a fast and automated monthly archive process
  which exported and then deleted from 120 tables.
  --
  Death to <Redacted>, and butter sauce.
  Don't boil me, I'm still alive.
  <Redacted> lobster!
  View in PostgreSQL Archives →
- Jump to comment-1
  David G. Johnston<david.g.johnston@gmail.com>
  Jan 28, 2026, 3:39 AM UTC
  On Tuesday, January 27, 2026, Gus Spier <gus.spier@gmail.com> wrote:
  
  I write a procedure to accomplish all this work but it persists in
  returning a error to the effect that a COMMIT is not valid in a block
  tht tries to DELETE data.
  Haven’t tested to be sure but this doesn’t seem like a community edition
  limitation. I don’t see any mention of this here at least:
  https://www.postgresql.org/docs/current/xproc.html
  You’d have to move the logic to a proper client application that executes
  top-level commands.
  David J.
  View in PostgreSQL Archives →
  Jump to comment-1
  Tom Lane<tgl@sss.pgh.pa.us>
  Jan 28, 2026, 3:53 AM UTC
  "David G. Johnston" <david.g.johnston@gmail.com> writes:
  On Tuesday, January 27, 2026, Gus Spier <gus.spier@gmail.com> wrote:
  I write a procedure to accomplish all this work but it persists in
  returning a error to the effect that a COMMIT is not valid in a block
  tht tries to DELETE data.
  Haven’t tested to be sure but this doesn’t seem like a community edition
  limitation.
  Yeah, you can definitely do that in standard Postgres (at least since
  we invented procedures). Sounds like Aurora is behind the times.
  I know that their storage engine is fundamentally different from ours;
  perhaps it has problems with this idea.
  You’d have to move the logic to a proper client application that executes
  top-level commands.
  Yup.
  regards, tom lane
  View in PostgreSQL Archives →
  Jump to comment-1
  Ron Johnson<ronljohnsonjr@gmail.com>
  Jan 28, 2026, 4:04 AM UTC
  On Tue, Jan 27, 2026 at 10:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
  "David G. Johnston" <david.g.johnston@gmail.com> writes:
  On Tuesday, January 27, 2026, Gus Spier <gus.spier@gmail.com> wrote:
  I write a procedure to accomplish all this work but it persists in
  returning a error to the effect that a COMMIT is not valid in a block
  tht tries to DELETE data.
  Haven’t tested to be sure but this doesn’t seem like a community edition
  limitation.
  
  Yeah, you can definitely do that in standard Postgres (at least since
  we invented procedures).
  Hmm. Must have been START TRANSACTION which I remember causing issues in DO
  blocks.
  --
  Death to <Redacted>, and butter sauce.
  Don't boil me, I'm still alive.
  <Redacted> lobster!
  View in PostgreSQL Archives →
  Jump to comment-1
  Tom Lane<tgl@sss.pgh.pa.us>
  Jan 28, 2026, 4:06 AM UTC
  Ron Johnson <ronljohnsonjr@gmail.com> writes:
  Hmm. Must have been START TRANSACTION which I remember causing issues in DO
  blocks.
  Too lazy to test, but I think we might reject that. The normal rule
  in a procedure is that the next command after a COMMIT automatically
  starts a new transaction, so you don't need an explicit START.
  regards, tom lane
  View in PostgreSQL Archives →
  Jump to comment-1
  Olivier Gautherot<ogautherot@gautherot.net>
  Jan 28, 2026, 7:32 AM UTC
  Hi Gus!
  This reminds me of a costly mistake I made and you want to avoid: it was a
  mission critical database (say physical safety, real people) and the vacuum
  froze the DB for 24 hours, until I finally took it offline.
  If you can take it offline (and you have a couple of hours)
  - disconnect the DB
  - drop indexes (that's the killer)
  - remove unnecessary data
  - vaccuum manually (or better, copy the relevant data to a new table and
  rename it - this will save the DELETE above and will defragment the table)
  - rebuild indexes
  - connect the DB
  The better solution would be partitioning:
  - choose a metrics (for instance a timestamp)
  - create partition tables for the period you want to keep
  - copy the relevant data to the partitions and create partial indexes
  - take the DB off line
  - update the last partition with the latest data (should be a fast update)
  - truncate the original table
  - connect partitions
  - connect the DB
  In the future, deleting historic data will be a simple DROP TABLE.
  Hope it helps
  --
  Olivier Gautherot
  Tel: +33 6 02 71 92 23
  El mié, 28 de ene de 2026, 5:06 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
  Ron Johnson <ronljohnsonjr@gmail.com> writes:
  Hmm. Must have been START TRANSACTION which I remember causing issues
  in DO
  blocks.
  
  Too lazy to test, but I think we might reject that. The normal rule
  in a procedure is that the next command after a COMMIT automatically
  starts a new transaction, so you don't need an explicit START.
  
  regards, tom lane
  
  View in PostgreSQL Archives →
  Jump to comment-1
  Gus Spier<gus.spier@gmail.com>
  Jan 28, 2026, 10:57 AM UTC
  Thanks to all.
  I'll give the bash loop method a try and let you know how it works out.
  Regards to all,
  Gus
  On Wed, Jan 28, 2026 at 2:32 AM Olivier Gautherot
  <ogautherot@gautherot.net> wrote:
  
  Hi Gus!
  
  This reminds me of a costly mistake I made and you want to avoid: it was a mission critical database (say physical safety, real people) and the vacuum froze the DB for 24 hours, until I finally took it offline.
  
  If you can take it offline (and you have a couple of hours)
  - disconnect the DB
  - drop indexes (that's the killer)
  - remove unnecessary data
  - vaccuum manually (or better, copy the relevant data to a new table and rename it - this will save the DELETE above and will defragment the table)
  - rebuild indexes
  - connect the DB
  
  The better solution would be partitioning:
  - choose a metrics (for instance a timestamp)
  - create partition tables for the period you want to keep
  - copy the relevant data to the partitions and create partial indexes
  - take the DB off line
  - update the last partition with the latest data (should be a fast update)
  - truncate the original table
  - connect partitions
  - connect the DB
  
  In the future, deleting historic data will be a simple DROP TABLE.
  
  Hope it helps
  --
  Olivier Gautherot
  Tel: +33 6 02 71 92 23
  
  El mié, 28 de ene de 2026, 5:06 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
  Ron Johnson <ronljohnsonjr@gmail.com> writes:
  Hmm. Must have been START TRANSACTION which I remember causing issues in DO
  blocks.
  
  Too lazy to test, but I think we might reject that. The normal rule
  in a procedure is that the next command after a COMMIT automatically
  starts a new transaction, so you don't need an explicit START.
  
  regards, tom lane
  
  View in PostgreSQL Archives →
  Jump to comment-1
  Ron Johnson<ronljohnsonjr@gmail.com>
  Jan 28, 2026, 3:02 PM UTC
  More thoughts:
  1. In cases where records are huge (bytea storing images) I added an inner
  hourly loop.
  2. Disable autovaccum on the table you're purging, then run pg_repack on it
  and re-enable autovacuum.
  3. pg_repack --no-order is a lot faster than having it order by the PK.
  (You might want it ordered by an indexed date field, though.)
  On Wed, Jan 28, 2026 at 5:57 AM Gus Spier <gus.spier@gmail.com> wrote:
  Thanks to all.
  
  I'll give the bash loop method a try and let you know how it works out.
  
  Regards to all,
  Gus
  
  On Wed, Jan 28, 2026 at 2:32 AM Olivier Gautherot
  <ogautherot@gautherot.net> wrote:
  
  Hi Gus!
  
  This reminds me of a costly mistake I made and you want to avoid: it was
  a mission critical database (say physical safety, real people) and the
  vacuum froze the DB for 24 hours, until I finally took it offline.
  
  If you can take it offline (and you have a couple of hours)
  - disconnect the DB
  - drop indexes (that's the killer)
  - remove unnecessary data
  - vaccuum manually (or better, copy the relevant data to a new table and
  rename it - this will save the DELETE above and will defragment the table)
  - rebuild indexes
  - connect the DB
  
  The better solution would be partitioning:
  - choose a metrics (for instance a timestamp)
  - create partition tables for the period you want to keep
  - copy the relevant data to the partitions and create partial indexes
  - take the DB off line
  - update the last partition with the latest data (should be a fast
  update)
  - truncate the original table
  - connect partitions
  - connect the DB
  
  In the future, deleting historic data will be a simple DROP TABLE.
  
  Hope it helps
  --
  Olivier Gautherot
  Tel: +33 6 02 71 92 23
  
  El mié, 28 de ene de 2026, 5:06 a.m., Tom Lane <tgl@sss.pgh.pa.us>
  escribió:
  Ron Johnson <ronljohnsonjr@gmail.com> writes:
  Hmm. Must have been START TRANSACTION which I remember causing
  issues in DO
  blocks.
  
  Too lazy to test, but I think we might reject that. The normal rule
  in a procedure is that the next command after a COMMIT automatically
  starts a new transaction, so you don't need an explicit START.
  
  regards, tom lane
  
  --
  Death to <Redacted>, and butter sauce.
  Don't boil me, I'm still alive.
  <Redacted> lobster!
  View in PostgreSQL Archives →
  Jump to comment-1
  Gus Spier<gus.spier@gmail.com>
  Jan 28, 2026, 3:39 PM UTC
  Good thoughts. Thanks, Ron!
  On Wed, Jan 28, 2026 at 10:02 AM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
  
  More thoughts:
  1. In cases where records are huge (bytea storing images) I added an inner hourly loop.
  2. Disable autovaccum on the table you're purging, then run pg_repack on it and re-enable autovacuum.
  3. pg_repack --no-order is a lot faster than having it order by the PK. (You might want it ordered by an indexed date field, though.)
  On Wed, Jan 28, 2026 at 5:57 AM Gus Spier <gus.spier@gmail.com> wrote:
  
  Thanks to all.
  
  I'll give the bash loop method a try and let you know how it works out.
  
  Regards to all,
  Gus
  
  On Wed, Jan 28, 2026 at 2:32 AM Olivier Gautherot
  <ogautherot@gautherot.net> wrote:
  
  Hi Gus!
  
  This reminds me of a costly mistake I made and you want to avoid: it was a mission critical database (say physical safety, real people) and the vacuum froze the DB for 24 hours, until I finally took it offline.
  
  If you can take it offline (and you have a couple of hours)
  - disconnect the DB
  - drop indexes (that's the killer)
  - remove unnecessary data
  - vaccuum manually (or better, copy the relevant data to a new table and rename it - this will save the DELETE above and will defragment the table)
  - rebuild indexes
  - connect the DB
  
  The better solution would be partitioning:
  - choose a metrics (for instance a timestamp)
  - create partition tables for the period you want to keep
  - copy the relevant data to the partitions and create partial indexes
  - take the DB off line
  - update the last partition with the latest data (should be a fast update)
  - truncate the original table
  - connect partitions
  - connect the DB
  
  In the future, deleting historic data will be a simple DROP TABLE.
  
  Hope it helps
  --
  Olivier Gautherot
  Tel: +33 6 02 71 92 23
  
  El mié, 28 de ene de 2026, 5:06 a.m., Tom Lane <tgl@sss.pgh.pa.us> escribió:
  Ron Johnson <ronljohnsonjr@gmail.com> writes:
  Hmm. Must have been START TRANSACTION which I remember causing issues in DO
  blocks.
  
  Too lazy to test, but I think we might reject that. The normal rule
  in a procedure is that the next command after a COMMIT automatically
  starts a new transaction, so you don't need an explicit START.
  
  regards, tom lane
  
  --
  Death to <Redacted>, and butter sauce.
  Don't boil me, I'm still alive.
  <Redacted> lobster!
  View in PostgreSQL Archives →
- Jump to comment-1
  David G. Johnston<david.g.johnston@gmail.com>
  Jan 28, 2026, 3:31 AM UTC
  On Tuesday, January 27, 2026, Gus Spier <gus.spier@gmail.com> wrote:
  Environment: AWS RDS Aurora for PostgreSQL 15 hosting tables that
  support scientific research. The development environment predominantly
  uses JPA with Hibernate.
  
  Years of neglect have allowed mission tables to accumulate hundreds of
  millions of rows of excess data. The developers and the customer
  decided we must delete all rows older than 75 days. Table partitioning
  was briefly considered but discarded because of the effort needed to
  refactor the codebase.
  
  I proposed the straight-forward course of action: delete by batches
  from the victim tables.
  
  Strongly encourage you to try to accomplish your goal without any delete
  commands at that scale that causes vacuuming. Can you just create an empty
  copy and load the data to keep into it then point at the newly filled
  database? Truncate is OK.
  Daily trimming going forward would be less problematic at least.
  David J.
  View in PostgreSQL Archives →
  Jump to comment-1
  Greg Sabino Mullane<htamfids@gmail.com>
  Jan 28, 2026, 3:39 PM UTC
  On Tue, Jan 27, 2026 at 10:31 PM David G. Johnston <
  david.g.johnston@gmail.com> wrote:
  Strongly encourage you to try to accomplish your goal without any delete
  commands at that scale that causes vacuuming. Can you just create an empty
  copy and load the data to keep into it then point at the newly filled
  database? Truncate is OK.
  This is really the best solution, especially if most of the rows are > 75
  days old. This removes 100% of your bloat, allows you to keep the old data
  around in case something goes wrong, reduces WAL compared to massive
  deletes, and removes the need to mess with autovacuum.
  Cheers,
  Greg
  --
  Crunchy Data - https://www.crunchydata.com
  Enterprise Postgres Software Products & Tech Support
  View in PostgreSQL Archives →
  Jump to comment-1
  Ron Johnson<ronljohnsonjr@gmail.com>
  Jan 28, 2026, 3:57 PM UTC
  On Wed, Jan 28, 2026 at 10:39 AM Greg Sabino Mullane <htamfids@gmail.com>
  wrote:
  On Tue, Jan 27, 2026 at 10:31 PM David G. Johnston <
  david.g.johnston@gmail.com> wrote:
  Strongly encourage you to try to accomplish your goal without any delete
  commands at that scale that causes vacuuming. Can you just create an empty
  copy and load the data to keep into it then point at the newly filled
  database? Truncate is OK.
  This is really the best solution, especially if most of the rows are > 75
  days old. This removes 100% of your bloat, allows you to keep the old data
  around in case something goes wrong, reduces WAL compared to massive
  deletes, and removes the need to mess with autovacuum.
  Looping DELETE is the Dirt Simple option when the application is writing
  24x7, when there's a lot of FK dependencies, etc. It also allows you to
  throttle the process (bash sleep between DELETE statements, or only purging
  a few old days per script execution and then only run the script at night).
  --
  Death to <Redacted>, and butter sauce.
  Don't boil me, I'm still alive.
  <Redacted> lobster!
  View in PostgreSQL Archives →