Monday, 23 June 2025

Partition in AWS Aurora PostgreSQL


When your PostgreSQL tables start growing into the millions (or even billions) of rows, performance can take a hit — especially for read-heavy or write-heavy workloads. That’s where partitioningcomes in. It’s a technique that breaks a big table into smaller chunks (called partitions) based on time or ID, so queries run faster and maintenance becomes much easier.

If you're using Amazon Aurora PostgreSQL, you can supercharge this with the pg_partman extension. It automates pretty much everything: creating new partitions as data comes in, managing indexes, and even cleaning up old data with retention rules. It’s a great fit for use cases like logging, audit trails, analytics, and time-series data — basically anything that grows over time and can be logically split.

But automation doesn’t stop there.

Pair pg_partman with pg_cron, another powerful extension available in Aurora, and you can schedule routine maintenance jobs directly inside the database. For example, you can schedule pg_partman.run_maintenance() to run every hour or day, so new partitions are created and old ones are cleaned up — all without any manual effort or external scripts.

Why this combo is so useful:

  • pg_partman keeps your tables partitioned, indexed, and tidy.

  • pg_cron makes sure those maintenance tasks happen regularly and automatically.

  • Together, they reduce manual DBA work and help your database scale smoothly.

So if you're working with large or growing datasets on Aurora, setting up pg_partman with pg_cron is a no-brainer. It’s like setting your database on autopilot.

PostgreSQL pg_partman


Used to automate the creation and maintenance of table partitions.

We can configure pg_partman with the following settings:


Table to be partitioned

  • Partition type
  • Partition key
  • Partition granularity
  • Partition pre-creation and management options


A partitioned table can be registered to pg_partman by calling a function `create_parent`. This creates necessary partitions based on the input given by us


Enabling the pg_partman extension


The below command has to be executed on each database where we need to partition tables.


CREATE SCHEMA partman;

CREATE EXTENSION pg_partman WITH SCHEMA partman;



 


Note: We need superuser privilege to run the above command. No DB restart is needed.


Configuring partitions using the create_parent function


By executing the function create_parent we can add partitions to the table.


SELECT partman.create_parent( p_parent_table => 'public.job_logs',

 p_control => 'created_at',

 p_type => 'native',

 p_interval=> 'daily',

 p_premake => 10);


p_parent_table – The parent partitioned table. This table must already exist and be fully qualified, including the schema.

p_control – The column on which the partitioning is to be based. The data type must be an integer or time-based.

p_type – The type is either 'native' or 'partman'. We typically use the native type for its performance improvements and flexibility. The partman type relies on inheritance.

p_interval – The time interval or integer range for each partition. Example values include daily, hourly, and so on.

p_premake – The number of partitions to create in advance to support new inserts.


Example


The below example illustrates how to add a partition for a table


CREATE TABLE job_logs

  (

     event_id        BIGSERIAL,

     operation       CHAR(1),

     value           FLOAT(24),

     parent_event_id BIGINT,

     event_type      VARCHAR(25),

     created_at      TIMESTAMP,

     CONSTRAINT pk_job_logs_event PRIMARY KEY (event_id, created_at)

  ) partition BY range (created_at); 

CREATE INDEX idx_event_type ON  job_logs(event_type);







To create partitions, this will create partitions based on create date and the interval will be for 1 day, This means for each day we will have a partitioned table.


SELECT partman.create_parent(

p_parent_table => 'public.job_logs', 

p_control => 'created_at', 

p_type => 'native', 

p_interval=> 'quarter-hour', 

p_premake => 2);






In the above, you can see 6 partitions have been created, as we chose to premake as 2 it has created 2 future partitions from the current time. Also, we could see a new partition table job_logs_default, this table will store all the data which are out of the partitioned data.



Info-message


Note: When first running create_parent() to create a partition set, intervals less than a day round down when determining what the first partition to create will be. Intervals less than 24 hours but greater than 1 minute use the nearest hour rounded down. Intervals less than 1 minute use the nearest minute rounded down. However, enough partitions will be made to support up to what the real current time is. This means that when create_parent() is run, more previous partitions may be made than expected and all future partitions may not be made. The first run of run_maintenance() will fix the missing future partitions. This happens due to the nature of being able to support custom time intervals. Any intervals greater than or equal to 24 hours should set things up as would be expected.



Testing by adding data to the partition,






Configuring partition maintenance using the run_maintenance_proc function


We can run partition maintenance operations to automatically create new partitions, detach partitions, or remove old partitions. Partition maintenance relies on the run_maintenance_proc function of the pg_partman extension .


We need to make some changes in the existing partition config based on our needs.


UPDATE partman.part_config 

SET infinite_time_partitions = true,

    retention = '15 minutes', 

    retention_keep_table=false,

    retention_keep_index=false

WHERE parent_table = 'public.job_logs';


With the above query, we will update the existing info which was created using function partman.create_parent


Explanation for above query,


infinite_time_partitions = true, – Configures the table to be able to automatically create new partitions without any limit.

retention = '15 minutes', – Configures the table to have a maximum retention of 15 minutes.

retention_keep_table = false – Configures the table so that when the retention period is due, the tables are dropped. While setting it to true, the table isn't deleted automatically. Instead, partitions that are older than the retention period are only detached from the parent table.

retention_keep_index = false - Same like retention_keep_table where index of the partitioned tables are dropped.When set to true, pg_partman retains the index names of dropped partitions in its internal tracking tables.






Execute the below command from the database.


select partman.run_maintenance(p_parent_table => 'public.job_logs');





From the above o/p we can see it has created new partitions (job_logs_p2025_06_23_1230,job_logs_p2025_06_23_1245,job_logs_p2025_06_23_1300) with our limit of 2 new partitions from the current time and removed the older partitions.



How to remove a table from a partition

In order to remove a table from being partitioned, we should remove the table details from the partman schema tables.


delete from partman.part_config;

drop table partman.template_public_job_logs;



Pg_partman can help in managing the partitions, but if we need to automate it we can achieve this by using pg_cron extension on our database.


PostgreSQL pg_cron


PostgreSQL pg_cron extension to schedule maintenance commands within a PostgreSQL database.


Setting up the pg_cron extension


Add pg_cron to the shared_preload_libraries parameter value.

DB restart is needed for the change to take place

Execute the below command on the database, default connect to Postgres database.


CREATE EXTENSION pg_cron;


Use the following command to check the value of the cron.log_run parameter.


postgres=> SHOW cron.log_run;


Scheduling pg_cron jobs


Partition maintenance relies on the run_maintenance_proc function of the pg_partman extension and the pg_cron extension, which initiates an internal scheduler. The pg_cron scheduler automatically executes SQL statements, functions, and procedures defined in your databases.


To add a partition configuration under pg_cron


INSERT INTO cron.job (schedule, command, nodename, nodeport, database, username, active,jobname)

VALUES (

    '*/15 * * * *',

    'SELECT partman.run_maintenance(p_parent_table => ''public.job_logs'');',

    'localhost',

    5432,

    'btest',

    'root',

    true,

    'job_logs_partition'

);


To disable a cron


Update the job ID with a basic SQL command;


update from cron.job set active = false where jobid=107; 


To remove a cron


Delete the job ID with a basic SQL command;


delete from cron.job where jobid=107;

 


How to monitor the pg_cron and partitions


The table cron.job_run_details contains historical information about past scheduled jobs that ran. This is useful to investigate the status, return messages, and start and end times of the job that ran. 

No comments: