Categories
Kubernetes

Migrating from Zalado’s postgres-operator to CloudNativePG

I recently migrated from Zalando’s postgres-operator to CloudNativePG. It was complicated and not well-documented enough that I felt it was worth a post (my first post in nearly six years!).

Motivation

When I adopted postgres-operator in late 2023/early 2024, it was one of the most popular operators, and it met my requirements (whatever they were; I was not good at taking notes for personal projects). Things got a bit rocky in 2025, however. First, Immich dropped its Postgres subchart, and I had to figure out how to migrate the data and use postgres-operator. Then Immich migrated to VecotrChord, which was another complicated thing I had to figure out how to do with postgres-operator. Additionally, with release 1.15.0, there were missing images that took months to fix, and during that time my backup process was not running because of the missing images. It was not clear if I could just downgrade the operator or not.

Given all the difficulties I was experiencing, I decided to reconsider my use of the operator. CloudNativePG has had a pretty impressive growth trajectory, and lots of folks seem to be using it to self-host Immich. In some of the discussions around both issues I had to navigate with Immich, lots of folks seemed to be using it and sharing their fixes, which would have made my life a lot easier at the time. I pretty quickly identified that it met my needs, and decided to try it out. Now I had to figure out how to migrate some of my databases over.

Migration

I did this in a few more steps than I probably had to, but I found it easier to reason about in my piecemeal approach (when I had time to migrate a database).

There were two resources I found to be incredibly useful while navigating this (in addition to CloudNativePG’s own documentation):

Custom Image

The postgres image that postgres-operator uses has two extensions included and used by default: pgaudit/set_user and powa-team/pg_stat_kcache. In order for CloudNativePG to be able to clone from my existing cluster, I needed to have these two extensions installed. I made an image based on CloudNativePG’s image for postgres-17, and installed those two extensions. This will be used in the spec.imageName in the Cluster resource.

Cloned Cluster

For each migration, I started with a throw-away CloudNativePG cluster that would import from the existing database. Before I did anything, I scaled the workload down so that no data would change in the database.

---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: clone-cluster
spec:
  bootstrap:
    initdb:
      database: imported-database
      import:
        type: microservice
        databases:
          - imported-database
        source:
          externalCluster: to-be-cloned
      owner: imported-database-owner
  enablePDB: false
  externalClusters:
    - connectionParameters:
        host: postgres-operator-postgresql
        user: imported-database_owner_user
        dbname: imported-database
      name: to-be-cloned
      password:
        key: password
        name: imported-database-owner-user.postgres-operator-postgresql.credentials.postgresql.acid.zalan.do
  imageName: ghcr.io/sdwilsh/zlando-cnpg-migration:17-latest@sha256:fa4c3afd2bb178791ea1c5b680be30a12a77370e36a9ccf94593ccb835061862
  instances: 1
  postgresql:
    shared_preload_libraries:
      - set_user
      - pg_stat_statements
      - pg_stat_kcache
  primaryUpdateStrategy: unsupervised
  storage:
    size: 10Gi
    storageClass: cnpg-data-encrypted-storage

When using this, I would replace the following:

  • imported-datbase with the name of the database I was migrating.
  • imported-database-owner with the username of the owner of the new database.
  • postgres-operator-postgresql with the name of the postgresql resource defining the cluster managed by postgres-operator.
  • imported-database_owner_user with the name of the owner user of the cluster managed by postgres-operator

I would run watch kubectl cnpg status clone-cluster (with the kubectl extension) until the status was reported as “Cluster in healthy state”. Then I would run kubectl exec -it pod/clone-cluster-1 -- psql -U postgres -d imported-database and execute the following SQL to remove the use of the two extensions:

DROP VIEW IF EXISTS metric_helpers.table_bloat;
DROP VIEW IF EXISTS metric_helpers.pg_stat_statements;
DROP VIEW IF EXISTS metric_helpers.index_bloat;
DROP VIEW IF EXISTS metric_helpers.nearly_exhausted_sequences;
DROP FUNCTION IF EXISTS user_management.terminate_backend(pid integer);
DROP FUNCTION IF EXISTS user_management.revoke_admin(username text) ;
DROP FUNCTION IF EXISTS user_management.random_password(length integer);
DROP FUNCTION IF EXISTS user_management.drop_user(username text);
DROP FUNCTION IF EXISTS user_management.drop_role(username text);
DROP FUNCTION IF EXISTS user_management.create_user(username text);
DROP FUNCTION IF EXISTS user_management.create_role(rolename text);
DROP FUNCTION IF EXISTS user_management.create_application_user_or_change_password(username text, password text);
DROP FUNCTION IF EXISTS user_management.create_application_user(username text);
DROP FUNCTION IF EXISTS metric_helpers.pg_stat_statements(showtext boolean);
DROP FUNCTION IF EXISTS metric_helpers.get_nearly_exhausted_sequences(double precision);
DROP FUNCTION IF EXISTS metric_helpers.get_table_bloat_approx(OUT t_database name, OUT t_schema_name name, OUT t_table_name name, OUT t_real_size numeric, OUT t_extra_size double precision, OUT t_extra_ratio double precision, OUT t_fill_factor integer, OUT t_bloat_size double precision, OUT t_bloat_ratio double precision, OUT t_is_na boolean) ;
DROP FUNCTION IF EXISTS metric_helpers.get_btree_bloat_approx(OUT i_database name, OUT i_schema_name name, OUT i_table_name name, OUT i_index_name name, OUT i_real_size numeric, OUT i_extra_size numeric, OUT i_extra_ratio double precision, OUT i_fill_factor integer, OUT i_bloat_size double precision, OUT i_bloat_ratio double precision, OUT i_is_na boolean);
DROP EXTENSION IF EXISTS set_user;
DROP EXTENSION IF EXISTS pg_stat_kcache;
DROP EXTENSION IF EXISTS pg_stat_statements;
DROP SCHEMA IF EXISTS user_management;
DROP SCHEMA IF EXISTS metric_helpers;

Destination Cluster

This cluster is meant as the final resting place of the data. It is very likely I could have just changed the imageName in the cloned cluster, but I found the extra step marginally useful enough to keep doing (especially when I forgot to drop the extensions because import took a while).
The important part here is adding the spec.bootstrap.initdb.postInitApplicationSQL. Most of my applications did not specify which schema to use, and they seemed to end up writing all their data to data. However, this is not part of the search path in the CloudNativePG cluster, so it looked like all the data was dropped (scary!). Since the use of the extensions was dropped, this is just a clone of the previously created clone-cluster using the postgres-17 image from CloudNativePG.

---
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: destination-pg-cluster
spec:
  backup:
    target: primary
  bootstrap:
    initdb:
      database: imported-database
      import:
        type: microservice
        databases:
          - imported-database
        source:
          externalCluster: clone-cluster
      owner: imported-database-owner
      postInitApplicationSQL:
        - ALTER USER imported-database-owner SET search_path TO "$user", data, public
  externalClusters:
    - connectionParameters:
        host: clone-cluster-rw
        user: imported-database-owner
        dbname: imported-database
      name: clone-cluster
      password:
        key: password
        name: clone-cluster-app
  imageName: ghcr.io/cloudnative-pg/postgresql:17@sha256:b473a9c10debd74827c2966632012c84e335ccacff1d87da4ad4facf96a62e21
  instances: 1
  storage:
    size: 10Gi
    storageClass: cnpg-data-encrypted-storage

When using this, I would replace the following (not repeating the things stated in the “Cloned Cluster” section):

  • destination-pg-cluster with the name of the cluster we want to use.

Once this cluster is healthy, you can safely drop the spec.boostrap.initdb.import and spec.externalClusters section and adjust the spec.instances to whatever size you want. It is also now safe to delete the clone-cluster and point your application at the new database. The Secret provided by CloudNativePG actually has more information provided in it, so more can be injected from the secret instead of hard-coding things like the URI, database name, or service name.