PostgreSQL performing huge updates [1106]

PostgreSQL is a pretty powerful database server and will work with almost any settings thrown at it. It is really good at making do with what it has and performing as it is asked.

We recently found this as we were trying to update every row in a table that had over eight million entries. We found in the first few tries that the update was taking over 24 hours to complete which was far too long for an update script.

Our investigation of this led us to the pgsql_tmp folder and the work_mem configuration parameter.

Continue reading

Tracking progress of an update statement [1101]

Sometimes there is a need to execute a long running update statement. This update statement might be modifying millions of rows as was the case when we went hunting for a way to track the progress of the update. Hunting around took us to http://archives.postgresql.org/pgsql-admin/2002-07/msg00286.php In our particular case, we are using postgresql but this should work with any database server that provides sequences. Our original sql was of the form:

update only table1 t1
set amount = t2.price
from table2 t2
where t1.id = t2.id;

There is of course now way of figuring out how many rows had been updated already. The first step was to create a sequence

CREATE TEMPORARY SEQUENCE seq_progress START 1;

We can then use this sequence in the update statement to ensure that each row updated also increments the sequence

update only table1 t1
set amount = t2.price
from table2 t2
where nextval('seq_progress') != 0
and t1.id = t2.id;

Once the query is running, you can open another connection to the database. To get an indication of how far it has got, you can just run the following

 select nextval('seq_progress');

Bear in mind that this will also increment it by 1 but if you have millions of rows which is really the only case in which this would be useful, a few additional increments is hardly going to make a difference.

Good luck and have fun!

Database Systems Compared

My first experiences of a computer started with DBase III+ which is now dBASE, then went on to Foxpro, now Microsoft Visual Foxpro. I have since used Filemaker Pro, Microsoft Access, Microsoft SQL Server, MySQL, PostgreSQL, SQLite and HSQLDB. I have not yet used IBM DB2, Oracle. Wikipedia has a list of database systems.

Having worked with this range of database systems and having done copious amounts of research into DB2, Oracle and other DB systems I have not mentioned, I like answering the age old questions. Which is the best database system?

Ah! if only it was that simple. There is no database system that is appropriate for any given requirement. But then, if you have been in the technology sector long enough, you would already know that. It’s all about using the right tool for the job.

I separate these systems into two broad categories and Oracle. There are the Desktop based database systems:

  • DBase
  • Foxpro
  • SQLite
  • HSQLDB
  • Filemaker Pro
  • Microsoft Access
  • MySQL

DBase, FoxPro, Filemaker Pro and Microsoft Access are essentially a GUI frontend that has a database backing.

Access is the best choice for this purpose under the majority of circumstances. Filemaker Pro is relevant in some. The usual reason to use DBase or FoxPro is simply that the developer is used to it. This is not a good enough reason.

I have used DBase III+ for developing an office management suite back in 1994. I have since used Filemaker Pro to develop a simple contact management database in 1998, Microsoft Access to develop a patient management system for a clinic.

SQLite, HSQLDB and MySQL are database engines that are to be utilised by popping a frontend on top; sometimes the frontend is Microsoft Access. Microsoft Access can also be used for its database engine.

Access is usually the worst choice for this except as a stopgap. There are exceptions to this. One is for a web frontend if the site is not too busy and its running on a microsoft platform. You don’t have to go to the hassle of installing anything on the server. The drivers will take care of it all.

HSQLDB becomes an obvious choice for a light java based application and SQLite for any other lightweight applications.

MySQL is substantially more powerful and scales a lot better. I include it in this section because it is a server grade database system that can also work well in a desktop environment.

I have used Access for several web based systems and I have used HSQLDB for unit testing hibernate and for a quick and dirty MP3 library that linked into musicBrainz. I have used SQLite in passing to be utilised by open source products.

I have used MySQL with an Access frontend as a management suite for a website as well.

And we have the server based database systems:

  • MySQL
  • Microsoft SQL Server
  • IBM DB2
  • PostgreSQL

MySQL was used as the backed database system for the edFringe.com website. This was the perfect choice since the most important requirement was speed. Particuarly with the Query Cache and Master Slave replication, MySQL was the best choice.

SQL Server was used as the backend system for an online course for the Scottish Enterprise around 1999/2000. While MySQL would have been a good choice this, it was not of production quality at the time.

We have also used Ms SQL Server for an insurance company since all the infrastructure was based on Windows and PostgreSQL did not have a viable Windows version at the time.

We use PostgreSQL for megabus. While speed is absolutely critical, it is a ticketing system which means that transactionality is absolutely critical.

While MySQL now has transactionality with innodb, it is still nowhere near as good as the transactionality provided by PostgreSQL through MVCC (Multi-version Concurrency Control). We could have used Ms SQL Server but the cost savings are dramatic.

To summarise, each system has a specific use, specific strengths and weaknesses and which should be used is highly dependent on what it is to be used for. I am hopeful that the summary of what we have used each of these systems for us useful in determining which one is best placed to solve any specific problem 😀

We have not yet used Oracle and it was a strong contender for megabus but the serious heavyweight functionality provided by Oracle comes at a price and it is not yet a cost effective option.