Penguin

IDENT authentication failed for user

PostgreSQL under Debian by default uses IDENT authentication to authenticate a user connecting to the database. In english this means that you can only connect to the database with the same username as your unix login. This can be edited in /etc/postgres/pg_hba.conf (Host Based Authentication) by changing ident sameuser with trust to let anyone connect to the database with any username without a password, or crypt etc. The file is well commented, refer to it for more details.

If you change Debian's default setup, then you may get error messages every day from cron, as the postgres user connects to every database and runs some optimisations.

The Unix SuperUser can change to the postgres user to run the shell commands createdb and createuser -- if you create a database user with the same name as a Unix account, then that Unix account user can connect (when using ident sameuser). If you don't want to use the shell command, the postgres user could run psql template1 to connect via the interactive shell, and then do create database name; or create user name;.

Upgrading from 7.2 to 7.3

Check your pg_hba.conf file. Previously "password" authentication covered all forms of stored password. Now it is strict. If you are using MD5 passwords, it must be MD5.

Also, in SQL-land, LIMIT used to be fairly forgiving about its arguments. It would accept MySQL format without blinking. Now it's strict.

Finally, if you are a Debian user, don't believe the instructions in the update README! psql -e < dumpfile will not restore your database... you need to do psql template1 -e dumpfile.

You should create PGDATA directory first in Gentoo after an emerge install

If you get the following error:

# /etc/init.d/postgresql start
 * directory not found: /var/lib/postgresql/data
 * You should create PGDATA directory first.

then you need to issue ebuild /var/db/pkg/dev-db/postgresql-$VERSION/postgresql-$VERSION.ebuild config

Connecting over TCP/IP

A default server will only listen for connections on a socket. If you want to connect to it via a host name then you need to start the server with the -i flag

You will also need to edit your pg_hba.conf file to set access permissions.

Determining Table Primary Keys

Within a procedure I was writing I needed a way to determine which fields of the table where the primary keys. The following views can help to provide this information.

CREATE VIEW relid_pkeys AS SELECT indrelid AS relid, attname AS column_name
    FROM pg_index, pg_class, pg_attribute WHERE indisprimary='t' AND
    indexrelid=pg_class.oid AND pg_attribute.attrelid=pg_class.relfilenode;

CREATE VIEW table_pkeys AS SELECT c.relname AS table_name, p.column_name FROM
    relid_pkeys p, pg_class c WHERE p.relid=c.relfilenode AND c.relowner!=1 AND
    c.relkind='r';

Now you can go SELECT column_name FROM table_pkeys WHERE table_name='foo' and get back a list of the columns in the table foo that are primary keys. No records will be returned if foo does not have a primary key defined.

I added the restrictions on the relowner and relkind fields so that queries such as SELECT * FROM table_pkeys are a bit cleaner. The restrictions aim to filter the list so that it only contains user created tables. I chose the values based on trial and error, they work for me, but your milage may vary. If you are having problems simply remove those to restrictions.

Tuning the WAL

Some configuration parameters for configuring WAL behaviour are available. They are all documented in the online PostgreSQL documentation. A good overview of some of the specific configuration parameters is available here.

Many people extol the virtue of placing the WAL on its own device, seperate from the main data files. Extensive PgBench testing leads to questioning that advice; the performance increase appears to be related to the increase in number of spindles more than the specific disk configuration. Compare the 4-disk RAID1+0 and dual RAID1 configurations. Those tests, however, were performed using a battery-backed write cache; it is entirely possible that without a write-cache the fsync(2) seek latency would significantly impact the main data file I/O.

Some additional configuration notes:

commit_delay
commit_siblings

These settings trade throughput for COMMIT latency. The circumstances under which increasing commit_delay has most notable effect are when:

  • There is no battery-backed write cache
  • There are a lot of small transactions
  • The WAL device has a high fsync(2) latency
  • There are a large number of concurrent connections

GuyThornley's favourite settings for these parameters are:

commit_delay = 20000
commit_siblings = 3
wal_buffers

This setting controls the upper bound on the size of write(2) requests to WAL segment files. Modern operating systems and disks work very well with writes in the range of 128kiB to 512kiB, often reaching peak I/O bandwidth with write(2) requests in the range.

Before generating some new WAL activity, PostgreSQL writes out the WAL buffers if more than half are full. Given a default block size of 8 kiB, and the disk performance above, this suggests that wal_buffers = 128 is a good starting point. The default setting of 8 is surely too small for most systems.

Doing COMMIT both writes out all the used WAL buffers and issues an expensive fsync(2). Thus small transactions come with a heavy I/O penalty, and the WAL buffers are wasted. Using commit_delay may help significantly in this situation.

checkpoint_segments
checkpoint_timeout
checkpoint_warning

Good settings for these parameters are particularly hard to determine. On one hand, you want to allow PostgreSQL and the operating system perform write-caching of the main data files; on the other hand, you want to constrain crash-recovery times to reasonable levels.

Current reccomendations are to have checkpoints occur no more frequently than 5 minute intervals. Thus the settings

checkpoint_timeout = 600
checkpoint_warning = 300

seem reasonable.

It is impossible to tune checkpoint_segments for worst-case WAL activity, such as a bulk load using COPY FROM STDIN SQL command. This might generate 30MiB/s or more of WAL activity. 30MiB/s for 5 minutes is 8.7GiB of WAL data! The best suggestion appears to be let the server run for a while, then study the timestamps on the WAL segment files in pg_xlog to calculate the WAL write rate.

Once the WAL write rate is deduced, the number of log segments required for 10 minutes of WAL activity can be calculated as checkpoint_segments = (wal_write_rate * 600) / 16

It is the author's experience that crash recovery from the WAL is a fairly slow operation. I/O rates in the region of 2MiB/s to 4MiB/s should be expected. The checkpoint_segments setting could be limited to the longest permissible recovery time. For example, to complete recovery within 5 minutes at 3.0MiB/sec, the maximum setting of checkpoint_segments is (300 * 3) / 16 = 56