PostgreSQL performing huge updates [1106]

PostgreSQL is a pretty powerful database server and will work with almost any settings thrown at it. It is really good at making do with what it has and performing as it is asked.

We recently found this as we were trying to update every row in a table that had over eight million entries. We found in the first few tries that the update was taking over 24 hours to complete which was far too long for an update script.

Our investigation of this led us to the pgsql_tmp folder and the work_mem configuration parameter.

Continue reading

Tracking progress of an update statement [1101]

Sometimes there is a need to execute a long running update statement. This update statement might be modifying millions of rows as was the case when we went hunting for a way to track the progress of the update. Hunting around took us to http://archives.postgresql.org/pgsql-admin/2002-07/msg00286.php In our particular case, we are using postgresql but this should work with any database server that provides sequences. Our original sql was of the form:

update only table1 t1
set amount = t2.price
from table2 t2
where t1.id = t2.id;

There is of course now way of figuring out how many rows had been updated already. The first step was to create a sequence

CREATE TEMPORARY SEQUENCE seq_progress START 1;

We can then use this sequence in the update statement to ensure that each row updated also increments the sequence

update only table1 t1
set amount = t2.price
from table2 t2
where nextval('seq_progress') != 0
and t1.id = t2.id;

Once the query is running, you can open another connection to the database. To get an indication of how far it has got, you can just run the following

 select nextval('seq_progress');

Bear in mind that this will also increment it by 1 but if you have millions of rows which is really the only case in which this would be useful, a few additional increments is hardly going to make a difference.

Good luck and have fun!