Penguin

Differences between current version and revision by previous author of PgBench.

Other diffs: Previous Major Revision, Previous Revision, or view the Annotated Edit History

Newer page: version 42 Last edited on Thursday, June 22, 2006 9:50:19 pm by GuyThornley
Older page: version 17 Last edited on Thursday, June 15, 2006 8:22:56 pm by AristotlePagaltzis Revert
@@ -1,5 +1,5 @@
-This is a scratch pad for some [PostgreSQL] benchmarks. The contributed utility <tt>pgbench</tt> is used for the testing. 
+This is a scratch pad for some [PostgreSQL] 8.0 benchmarks. The contributed utility <tt>pgbench</tt> is used for the testing. 
  
 For most of the testing, important parts of the [PostgreSQL] configuration used are: 
  <verbatim> 
  shared_buffers = 23987 
@@ -18,13 +18,15 @@
  default_with_oids = off 
  stats_start_collector = false 
  </verbatim> 
  
-Exceptions will be listed as the tests are performed. 
+Exceptions are noted as the tests are performed. 
  
-The <tt>pgbench</tt> test database was created with the <tt>-s100 </tt> scale factor option. This results in a fresh database of about 1.4GB. Consecutive runs of <tt>pgbench</tt> grow the database, however . All test runs were executed with the <tt>-c100</tt> option for 100 connections. The transactions per connection was adjusted as needed to give a stable test result, without obvious effects of caching. Typical settings were <tt>-t100</tt> to <tt>-t1000 </tt>. 
+The <tt>pgbench</tt> test database was created with the <tt>-s600 </tt> scale factor option. This results in a fresh database of about 8.6GiB, along with 1.3GiB of WAL . The test database was then backed up to a <tt>.tar.gz </tt> file so it could easily be restored between test runs
  
-The <tt>pgbench</tt> client was actually run over a 100Mbit, full-duplex network connection from a client machine for most of the testing. Running <tt>pgbench</tt> remotely has not measurably degraded the performance. The client machine is a dual 3.06GHz Xeon running Linux 2.4.27. 
+Each test was executed 5 times in sequence, and the median result is reported. All tests were executed with the <tt>-c100</tt> option for 100 connections. The transaction count per connection was adjusted as necessary so that each single test would span several minutes. Typical settings were <tt>-t500</tt> to <tt>-t1000</tt>.  
+  
+ The <tt>pgbench</tt> client was actually run over a 100Mbit, full-duplex network connection from a client machine for all of the testing. Running <tt>pgbench</tt> remotely has not measurably degraded the performance. The client machine is a dual 3.06GHz Xeon running Linux 2.4.27. [SSL] encryption was disabled
  
 The base hardware: 
  
 * [HP] DL380 G4 
@@ -36,136 +38,183 @@
  
 The base software: 
  
 * LinuxKernel 2.4.27 from [Debian]’s <tt>kernel-image-2.4.27-2-686-smp</tt> [Package] 
-* Using [Ext3] with <tt>ordered</tt> data mode 
+* Using [Ext3] with 128MB journal and <tt>ordered</tt> data mode 
  
 On with the testing! 
  
-!! Results 
+__Update__: On Tue Jun 20 2006, all results were replaced with updated results. The previous test results were invalid and incomparable, due to inconsistencies and errors in the testing process.  
+  
+ !! Results: 4-disk configurations  
  
 * Data array: RAID5, 4x 72GB 10k RPM%%% 
  WAL array: On data array%%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
- number of transactions per client: 100  
- number of transactions actually processed: 10000 /10000  
- tps = 132 .257337 (including connections establishing)  
- tps = 141 .908320 (excluding connections establishing) 
+ number of transactions per client: 500  
+ number of transactions actually processed: 50000 /50000  
+ tps = 124 .728272 (including connections establishing)  
+ tps = 124 .885813 (excluding connections establishing) 
  </pre> 
  
 * Data array: RAID5, 4x 72GB 10k RPM%%% 
  WAL array: On data array%%% 
  Other notes: <tt>commit_delay</tt> disabled%%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
- number of transactions per client: 100  
- number of transactions actually processed: 10000 /10000  
- tps = 135 .567199 (including connections establishing)  
- tps = 146 .354640 (excluding connections establishing) 
+ number of transactions per client: 500  
+ number of transactions actually processed: 50000 /50000  
+ tps = 129 .347747 (including connections establishing)  
+ tps = 129 .517978 (excluding connections establishing) 
  </pre> 
  
 * Data array: RAID5, 4x 72GB 10k RPM%%% 
  WAL array: On data array%%% 
  Other notes: battery-backed write cache disabled%%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
- number of transactions per client: 50  
- number of transactions actually processed: 5000 /5000  
- tps = 76 .678506 (including connections establishing)  
- tps = 83 .263195 (excluding connections establishing) 
+ number of transactions per client: 500  
+ number of transactions actually processed: 50000 /50000  
+ tps = 114 .885220 (including connections establishing)  
+ tps = 115 .020971 (excluding connections establishing) 
  </pre> 
  
 * Data array: RAID5, 4x 72GB 10k RPM%%% 
  WAL array: On data array%%% 
+ Other notes: Battery-backed write cache and <tt>commit_delay</tt> disabled%%%  
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
- number of transactions per client: 50  
- number of transactions actually processed: 5000 /5000  
- tps = 50 .434271 (including connections establishing)  
- tps = 53 .195151 (excluding connections establishing) 
+ number of transactions per client: 500  
+ number of transactions actually processed: 50000 /50000  
+ tps = 80 .177806 (including connections establishing)  
+ tps = 80 .244181 (excluding connections establishing) 
  </pre> 
  
-* Data array: RAID1, 2x 72GB 10k RPM%%%  
- WAL array: RAID1, 2x 72GB 10k RPM %%% 
+* Data array: RAID1+ , 4x 72GB 10k RPM%%%  
+ WAL array: On data array %%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
  number of transactions per client: 1000 
  number of transactions actually processed: 100000/100000 
- tps = 217 .737758 (including connections establishing)  
- tps = 220 .277597 (excluding connections establishing) 
+ tps = 216 .341138 (including connections establishing)  
+ tps = 216 .606298 (excluding connections establishing) 
  </pre> 
  
-* Data array: RAID1+ , 4x 72GB 15k RPM%%% 
+* Data array: RAID1, 2x 72GB 10k RPM%%% 
  WAL array: RAID1, 2x 72GB 10k RPM%%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
- number of transactions per client: 2000  
- number of transactions actually processed: 200000 /200000  
- tps = 409 .561669 (including connections establishing)  
- tps = 414 .078634 (excluding connections establishing) 
+ number of transactions per client: 1000  
+ number of transactions actually processed: 100000 /100000  
+ tps = 131 .213838 (including connections establishing)  
+ tps = 131 .309052 (excluding connections establishing) 
  </pre> 
  
 * Data array: RAID1+0, 4x 72GB 15k RPM%%% 
  WAL array: On data array%%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
  number of transactions per client: 1000 
  number of transactions actually processed: 100000/100000 
- tps = 325 .140579 (including connections establishing)  
- tps = 330 .843403 (excluding connections establishing) 
+ tps = 284 .662951 (including connections establishing)  
+ tps = 285 .127666 (excluding connections establishing) 
  </pre> 
  
 * Data array: RAID5, 4x 72GB 15k RPM%%% 
+ WAL array: On data array%%%  
+  
+ <pre>  
+ scaling factor: 600  
+ number of clients: 100  
+ number of transactions per client: 1000  
+ number of transactions actually processed: 100000/100000  
+ tps = 189.203382 (including connections establishing)  
+ tps = 189.379783 (excluding connections establishing)  
+ </pre>  
+  
+* Data array: RAID1, 2x 72GB 15k RPM%%%  
+ WAL array: RAID1, 2x 72GB 15k RPM%%%  
+  
+ <pre>  
+ scaling factor: 600  
+ number of clients: 100  
+ number of transactions per client: 1000  
+ number of transactions actually processed: 100000/100000  
+ tps = 171.537230 (including connections establishing)  
+ tps = 171.680858 (excluding connections establishing)  
+ </pre>  
+  
+!! Results: 6-disk configurations  
+  
+* Data array: RAID1+0, 4x 72GB 15k RPM%%%  
  WAL array: RAID1, 2x 72GB 10k RPM%%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
  number of transactions per client: 1000 
  number of transactions actually processed: 100000/100000 
- tps = 236 .721312 (including connections establishing)  
- tps = 239 .738377 (excluding connections establishing) 
+ tps = 340 .756686 (including connections establishing)  
+ tps = 341 .404543 (excluding connections establishing) 
  </pre> 
  
 * Data array: RAID5, 4x 72GB 15k RPM%%% 
- WAL array: On data array %%% 
+ WAL array: RAID1, 2x 72GB 10k RPM %%% 
  
  <pre> 
- scaling factor: 100  
+ scaling factor: 600  
  number of clients: 100 
  number of transactions per client: 1000 
  number of transactions actually processed: 100000/100000 
- tps = 192 .430583 (including connections establishing)  
- tps = 194 .404205 (excluding connections establishing) 
+ tps = 212 .377629 (including connections establishing)  
+ tps = 212 .615105 (excluding connections establishing) 
  </pre> 
  
 * Data array: %%% 
  WAL array: %%% 
  Other notes: %%% 
  
  <pre> 
  </pre> 
+  
+!!! Insights and observations  
+  
+* Using RAID1+0 for the heap files provides a dramatic performance gain. This is because RAID1+0 performs random write very well, compared to RAID5. Using RAID1+0 is an expensive option because of the additional disks (perhaps expensive SCSI disks), space, power and cooling capacity required. Whether the gain is worth the cost is a deployment specific question.  
+  
+* Current [PostgreSQL] myths claim that moving the [WAL] to its seperate spindles, often on RAID1 or RAID1+0, increases performance. Most of the performance gain arises from increasing the number spindles the total IO load is distributed over, rather than the specific disk configuration. In particular it should be noted that:  
+  
+ # RAID5, with the help of a battery backed write cache, does sequential write very well.  
+ # The [WAL] is written sequentially.  
+  
+* The <tt>commit_delay</tt> parameter is a help for non-battery-backed systems, and a loss for battery-backed systems. The reason is clear: with a battery, the fsync() is almost free, thus the delay means the total throughput goes down. However without a battery the fsync() is very expensive, and if the delay allows it to be eliminated in the majority of cases, the throughput can go up.  
+  
+!!! <tt>pgbench</tt> test limitations  
+  
+* The test is of the saturated throughput, rather than latency. In typical database usage, however, it is the latency that is the dominant performance metric. Many database situations have a cirtain bound on user acceptable latency. It may be interesting to perform throughput-vs-connections-vs-latency comparisons. Under such a test, the true gain of a battery-backed cache, and the true cost of <tt>commit_delay</tt> would be evident.  
+  
+* The probability distrubution used to access the data is a completely flat probability curve. That is, all data items have the same probability of being accessed. This is completely unrepresentivive of typical situations where some data items are accessed very frequently and some hardly at all. Unfortunately the actual specific probability distribtuion is likely to be very application specific.  
+  
+* The TPC-B benchmark, which the <tt>pgbench</tt> program is somewhat based upon, was actually retired by the [Transaction Processing Council (TPC) | http://www.tpc.org] in 1995. The test database schema is very basic, with very simple queries. Better benchmark tools are available, such as the [DBT suite | http://osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/] developed by [Open Source Development Labs (OSDL) | http://www.osdl.org] which aim to be a fair-use implementation of the current [TPC benchmarks | http://www.tpc.org/information/benchmarks.asp].  
  
 !!! Other observations 
  
-* The test database started at 1.4GB, and got to at least 14GB during testing. Has this growth affected results?  
-* The WAL consumes large amounts of [Kernel] page cache. When moving the WAL between devices, when the old files are unlinked, 1/2 of the page cache is freed. Since the WAL is never read and written only once, this is as waste! 
+* The [ WAL] consumes large amounts of [Kernel] page cache. When moving the WAL between devices, when the old files are unlinked, 1/2 of the page cache is freed. Since the WAL is never read and written only once, this is as waste! 
 * The battery-backed write cache makes write performance very erratic. 
-* The [HP] ~SmartArray hardware (or perhaps driver) tends to block reads while there are cached writes occuring. Large read latencies (seconds) results . I have not yet found a way to tune this. 
+* The [HP] ~SmartArray hardware (or perhaps driver) tends to block reads while there are cached writes occuring. Large read latencies (whole seconds) result . I have not yet found a way to tune this. 
  
 ---- 
 Part of CategoryDiskNotes