Skip to content

Pleroma restore very slow, optimization possible #285

@utzer

Description

@utzer

Hello all,
first of all I want to say how happy I am that you people keep the Pleroma package for Yunohost up-to-date. Thanks so much for your work!

Second most important, please do not take this issue here as a complaint, but more like constructive feedback, maybe I can help with my input to improve the project.

I am currently restoring Pleroma on my Yunohost, I am moving from one server to another due to some issue with read-only remounts on the old system. I was not able to figure out the root-cause for those so I finally decided to move to a new server.

Back to the topic, I already started the restore twice, I canceled it in disbelief that it could take more than 12 hours for my single user instance, but also I had to run some other urgent Yunohost command.

In short: Restoring Pleroma with a large activities table seems to take more than 48 hours, even with parallel_workers optimized and hardware well-utilized.

Perhaps the restore hook could allow adjusting Postgres configuration (parallel_workers etc.) dynamically or warn users about expected restore times for large instances.

Is there a recommended strategy for speeding up this part of the restore? Or anything I might be missing in my configuration?

For reference: the server is an AMD Ryzen 5 3600 (12 threads are shown in htop) with 64 GB RAM and NVMe SSD storage.

I started the restore like this:
nice -n -5 ionice -c2 -n0 yunohost backup restore pleroma_neu_13072025 --debug

it keeps outputting things like this:

877839 DEBUG   ALTER TABLE
878055 DEBUG   ALTER TABLE
878060 DEBUG   ALTER TABLE
878063 DEBUG   ALTER TABLE
878066 DEBUG   ALTER TABLE
878068 DEBUG   ALTER TABLE
878074 DEBUG   ALTER TABLE
878075 DEBUG   ALTER TABLE
878085 DEBUG   ALTER TABLE
878091 DEBUG   ALTER TABLE
878098 DEBUG   ALTER TABLE
878102 DEBUG   ALTER TABLE
878105 DEBUG   ALTER TABLE
914270 DEBUG   ALTER TABLE
914277 DEBUG   ALTER TABLE
914280 DEBUG   ALTER TABLE
914282 DEBUG   ALTER TABLE
914285 DEBUG   ALTER TABLE
915670 DEBUG   ALTER TABLE
915676 DEBUG   ALTER TABLE
915678 DEBUG   ALTER TABLE
915682 DEBUG   ALTER TABLE
915685 DEBUG   ALTER TABLE
915691 DEBUG   ALTER TABLE
915693 DEBUG   ALTER TABLE
915697 DEBUG   ALTER TABLE
917036 DEBUG   ALTER TABLE
946008 DEBUG   CREATE INDEX
1012261 DEBUG   CREATE INDEX
1280649 DEBUG   CREATE INDEX
1287589 DEBUG   CREATE INDEX
1291096 DEBUG   CREATE INDEX
1528878 DEBUG   CREATE INDEX
1572996 DEBUG   CREATE INDEX

In psql I keep running:

postgres=# SELECT
  now() - a.xact_start AS duration,
  p.phase,
  round(p.blocks_done * 100.0 / NULLIF(p.blocks_total, 0), 2) AS percent_done
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a USING(pid);
\watch 5

Which show this:

               Tue Jul 15 09:02:53 2025 (every 5s)

    duration     |             phase              | percent_done 
-----------------+--------------------------------+--------------
 10:24:31.205517 | building index: scanning table |        19.48
(1 row)

The process is running well, it completes about 2%/hour on this task.

I reniced the postgres processes to get a bit closer to the hardware limits, I used this command:
for p in $(pgrep postgres); do sudo renice -n -5 -p $p; sudo ionice -c2 -n0 -p $p; done
and got this output:

121804
121805
121807
121808
121809
178449
178473
217520
217522
217525
217529
217530
217531
217535
217536
217538
217539
691142
699201
700084
708536
708537
708538
708539
708540
708541
708542; do sudo renice -n -5 -p ; sudo ionice -c2 -n0 -p ; done"
                                                                121804
121805
121807
121808
121809
178449
178473
217520
217522
217525
217529
217530
217531
217535
217536
217538
217539
691142
699201
700084
708536
708537
708538
708539
708540
708541
708542; do sudo renice -n -5 -p ; sudo ionice -c2 -n0 -p ; done)121803 (process ID) old priority 0, new priority -5
121804 (process ID) old priority 0, new priority -5
121805 (process ID) old priority 0, new priority -5
121807 (process ID) old priority 0, new priority -5
121808 (process ID) old priority 0, new priority -5
121809 (process ID) old priority 0, new priority -5
178449 (process ID) old priority 0, new priority -5
178473 (process ID) old priority 0, new priority -5
217520 (process ID) old priority 0, new priority -5
217522 (process ID) old priority 0, new priority -5
217525 (process ID) old priority 0, new priority -5
217529 (process ID) old priority 0, new priority -5
217530 (process ID) old priority 0, new priority -5
217531 (process ID) old priority 0, new priority -5
217535 (process ID) old priority 0, new priority -5
217536 (process ID) old priority 0, new priority -5
217538 (process ID) old priority 0, new priority -5
217539 (process ID) old priority 0, new priority -5
691142 (process ID) old priority 0, new priority -5
699201 (process ID) old priority 0, new priority -5
700084 (process ID) old priority 0, new priority -5
708536 (process ID) old priority 0, new priority -5
708537 (process ID) old priority 0, new priority -5
708538 (process ID) old priority 0, new priority -5
708539 (process ID) old priority 0, new priority -5
708540 (process ID) old priority 0, new priority -5
708541 (process ID) old priority 0, new priority -5
708542 (process ID) old priority 0, new priority -5

Another command in psql I have running:
SELECT pid, query FROM pg_stat_activity WHERE state='active';\watch 5

output:

                                                                                      Tue Jul 15 09:06:33 2025 (every 5s)

  pid   |                                                                                                 query                                                                                                 
--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 699201 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 700084 | SELECT pid, query FROM pg_stat_activity WHERE state='active';
 708536 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 708537 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 708538 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 708539 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 708540 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 708541 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
 708542 | CREATE INDEX activities_visibility_index ON public.activities USING btree (public.activity_visibility(actor, recipients, data), id DESC NULLS LAST) WHERE ((data ->> 'type'::text) = 'Create'::text);
(9 rows)

I also have this running on CLI:
while true; do echo "----- $(date) -----"; ps -eo pid,user,%cpu,%mem,cmd --sort=-%cpu | head -n15; vmstat 1 2 | tail -n1; sleep 5; done

output:

----- Tue Jul 15 09:07:37 CEST 2025 -----
    PID USER     %CPU %MEM CMD
 699201 postgres 73.6  0.3 postgres: 15/main: postgres pleroma [local] CREATE INDEX
 708536 postgres 73.6  0.2 postgres: 15/main: parallel worker for PID 699201 
 708541 postgres 73.6  0.2 postgres: 15/main: parallel worker for PID 699201 
 708540 postgres 73.5  0.2 postgres: 15/main: parallel worker for PID 699201 
 708539 postgres 73.5  0.2 postgres: 15/main: parallel worker for PID 699201 
 708542 postgres 73.5  0.2 postgres: 15/main: parallel worker for PID 699201 
 708537 postgres 73.5  0.2 postgres: 15/main: parallel worker for PID 699201 
 708538 postgres 73.4  0.2 postgres: 15/main: parallel worker for PID 699201 

 4  0 738048 1908416 515272 58342072    0    0     0    19 42788 820999 31 21 48  0  0

Explanation for that last line:

r=4 → 4 runnable (waiting for CPU) → high CPU load
b=0 → no uninterruptible (I/O) wait
si/so=0 → no swap in/out → good
bi/bo=0/19 → very low block I/O
in/cs=42788/820999 → high interrupts & context switches (normal with many workers)
us=31 sy=21 id=48 wa=0 st=0
    us=31% → user CPU (your postgres processes)
    sy=21% → system CPU (kernel)
    id=48% → idle (half of CPU free)
    wa=0% → I/O wait (none → disk not bottleneck)

I could probably go even more load, but postgres can will not change given settings while a process is already running. I changed the seetings before the start of the latest restore attempt like this:

ALTER SYSTEM SET max_parallel_maintenance_workers = 8;
ALTER SYSTEM SET max_parallel_workers = 10;
ALTER SYSTEM SET max_worker_processes = 16;
ALTER SYSTEM SET maintenance_work_mem = '8GB';

Checks for performance related settings:

  1. I executed:
SHOW max_parallel_maintenance_workers;
SHOW max_parallel_workers;
SHOW max_worker_processes;
SHOW maintenance_work_mem;
 max_parallel_maintenance_workers

output was:

----------------------------------
 8
(1 row)

 max_parallel_workers
----------------------
 10
(1 row)

 max_worker_processes
----------------------
 8
(1 row)

 maintenance_work_mem
----------------------
 8GB
(1 row)

  1. I executed:
SHOW work_mem;
SHOW shared_buffers;
SHOW effective_cache_size;
SHOW max_connections;
SHOW random_page_cost;
SHOW effective_io_concurrency;
SHOW synchronous_commit;
SHOW checkpoint_completion_target;
SHOW wal_buffers;
SHOW default_statistics_target;
 work_mem

output:

----------
 4MB
(1 row)

 shared_buffers
----------------
 128MB
(1 row)

 effective_cache_size
----------------------
 4GB
(1 row)

 max_connections
-----------------
 100
(1 row)

 random_page_cost
------------------
 4
(1 row)

 effective_io_concurrency
--------------------------
 1
(1 row)

 synchronous_commit
--------------------
 on
(1 row)

 checkpoint_completion_target
------------------------------
 0.9
(1 row)

 wal_buffers
-------------
 4MB
(1 row)

 default_statistics_target
---------------------------
 100
(1 row)

Note: despite setting max_worker_processes to 16 before the restore, SHOW max_worker_processes; later still returns 8, perhaps this requires a restart or there is another system limit or I just mixed it up and set 8 and not 16.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions