Penguin
Note: You are viewing an old revision of this page. View the current version.

This is a scratch pad for some PostgreSQL 8.0 benchmarks. The contributed utility pgbench is used for the testing.

For most of the testing, important parts of the PostgreSQL configuration used are
shared_buffers    = 23987
max_fsm_relations = 5950
max_fsm_pages     = 3207435

wal_buffers         = 544
checkpoint_segments = 40
checkpoint_timeout  = 900
checkpoint_warning  = 300
commit_delay        = 20000
commit_siblings     = 3
wal_sync_method     = fdatasync

enable_seqscan        = off
default_with_oids     = off
stats_start_collector = false

Exceptions are noted as the tests are performed.

The pgbench test database was created with the -s600 scale factor option. This results in a fresh database of about 8.6GiB, along with 1.3GiB of WAL. The test database was then backed up to a .tar.gz file so it could easily be restored between test runs.

Each test was executed 5 times in sequence, and the median result is reported. All tests were executed with the -c100 option for 100 connections. The transaction count per connection was adjusted as necessary so that each single test would span several minutes. Typical settings were -t500 to -t1000.

The pgbench client was actually run over a 100Mbit, full-duplex network connection from a client machine for all of the testing. Running pgbench remotely has not measurably degraded the performance. The client machine is a dual 3.06GHz Xeon running Linux 2.4.27. SSL encryption was disabled.

The base hardware:

  • HP DL380 G4
  • Dual 3.20GHz Xeon, 1MB L2 Cache, 800MHz FSB, HyperThreading disabled
  • 1GB DDR2-400 (PC2-3200) registered ECC memory
  • Broadcom PCI-X onboard NIC
  • SmartArray 6i onboard RAID controller
  • Battery-backed write cache enabled

The base software:

On with the testing!

Update: On Tue Jun 20 2006, all results were replaced with updated results. The previous test results were invalid and incomparable, due to inconsistencies and errors in the testing process.

Results: 4-disk configurations

  • Data array: RAID5, 4x 72GB 10k RPM
    WAL array: On data array

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 500
    number of transactions actually processed: 50000/50000
    tps = 124.728272 (including connections establishing)
    tps = 124.885813 (excluding connections establishing)
  • Data array: RAID5, 4x 72GB 10k RPM
    WAL array: On data array
    Other notes: commit_delay disabled

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 500
    number of transactions actually processed: 50000/50000
    tps = 129.347747 (including connections establishing)
    tps = 129.517978 (excluding connections establishing)
  • Data array: RAID5, 4x 72GB 10k RPM
    WAL array: On data array
    Other notes: battery-backed write cache disabled

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 500
    number of transactions actually processed: 50000/50000
    tps = 114.885220 (including connections establishing)
    tps = 115.020971 (excluding connections establishing)
  • Data array: RAID5, 4x 72GB 10k RPM
    WAL array: On data array
    Other notes: Battery-backed write cache and commit_delay disabled

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 500
    number of transactions actually processed: 50000/50000
    tps = 80.177806 (including connections establishing)
    tps = 80.244181 (excluding connections establishing)
  • Data array: RAID1, 2x 72GB 10k RPM
    WAL array: RAID1, 2x 72GB 10k RPM

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 1000
    number of transactions actually processed: 100000/100000
    tps = 131.213838 (including connections establishing)
    tps = 131.309052 (excluding connections establishing)
  • Data array: RAID1+0, 4x 72GB 15k RPM
    WAL array: On data array

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 1000
    number of transactions actually processed: 100000/100000
    tps = 284.662951 (including connections establishing)
    tps = 285.127666 (excluding connections establishing)
  • Data array: RAID5, 4x 72GB 15k RPM
    WAL array: On data array

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 1000
    number of transactions actually processed: 100000/100000
    tps = 189.203382 (including connections establishing)
    tps = 189.379783 (excluding connections establishing)
  • Data array: RAID1, 2x 72GB 15k RPM
    WAL array: RAID1, 2x 72GB 15k RPM

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 1000
    number of transactions actually processed: 100000/100000
    tps = 171.537230 (including connections establishing)
    tps = 171.680858 (excluding connections establishing)

Results: 6-disk configurations

  • Data array: RAID1+0, 4x 72GB 15k RPM
    WAL array: RAID1, 2x 72GB 10k RPM

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 1000
    number of transactions actually processed: 100000/100000
    tps = 340.756686 (including connections establishing)
    tps = 341.404543 (excluding connections establishing)
  • Data array: RAID5, 4x 72GB 15k RPM
    WAL array: RAID1, 2x 72GB 10k RPM

    scaling factor: 600
    number of clients: 100
    number of transactions per client: 1000
    number of transactions actually processed: 100000/100000
    tps = 212.377629 (including connections establishing)
    tps = 212.615105 (excluding connections establishing)
  • Data array:
    WAL array:
    Other notes:

    
    

Other observations

  • The WAL consumes large amounts of Kernel page cache. When moving the WAL between devices, when the old files are unlinked, 1/2 of the page cache is freed. Since the WAL is never read and written only once, this is as waste!
  • The battery-backed write cache makes write performance very erratic.
  • The HP SmartArray hardware (or perhaps driver) tends to block reads while there are cached writes occuring. Large read latencies (whole seconds) result. I have not yet found a way to tune this.

Part of CategoryDiskNotes