Penguin
Annotated edit history of PgBench version 42, including all changes. View license author blame.
Rev Author # Line
37 GuyThornley 1 This is a scratch pad for some [PostgreSQL] 8.0 benchmarks. The contributed utility <tt>pgbench</tt> is used for the testing.
1 GuyThornley 2
17 AristotlePagaltzis 3 For most of the testing, important parts of the [PostgreSQL] configuration used are:
1 GuyThornley 4 <verbatim>
5 shared_buffers = 23987
6 max_fsm_relations = 5950
7 max_fsm_pages = 3207435
8
9 wal_buffers = 544
10 checkpoint_segments = 40
11 checkpoint_timeout = 900
12 checkpoint_warning = 300
13 commit_delay = 20000
14 commit_siblings = 3
15 wal_sync_method = fdatasync
4 GuyThornley 16
17 enable_seqscan = off
18 default_with_oids = off
19 stats_start_collector = false
1 GuyThornley 20 </verbatim>
21
37 GuyThornley 22 Exceptions are noted as the tests are performed.
1 GuyThornley 23
37 GuyThornley 24 The <tt>pgbench</tt> test database was created with the <tt>-s600</tt> scale factor option. This results in a fresh database of about 8.6GiB, along with 1.3GiB of WAL. The test database was then backed up to a <tt>.tar.gz</tt> file so it could easily be restored between test runs.
1 GuyThornley 25
37 GuyThornley 26 Each test was executed 5 times in sequence, and the median result is reported. All tests were executed with the <tt>-c100</tt> option for 100 connections. The transaction count per connection was adjusted as necessary so that each single test would span several minutes. Typical settings were <tt>-t500</tt> to <tt>-t1000</tt>.
27
28 The <tt>pgbench</tt> client was actually run over a 100Mbit, full-duplex network connection from a client machine for all of the testing. Running <tt>pgbench</tt> remotely has not measurably degraded the performance. The client machine is a dual 3.06GHz Xeon running Linux 2.4.27. [SSL] encryption was disabled.
1 GuyThornley 29
17 AristotlePagaltzis 30 The base hardware:
1 GuyThornley 31
17 AristotlePagaltzis 32 * [HP] DL380 G4
33 * Dual 3.20GHz Xeon, 1MB L2 [Cache], 800MHz [FSB], HyperThreading disabled
34 * 1GB DDR2-400 (PC2-3200) registered [ECC] memory
35 * Broadcom PCI-X onboard [NIC]
36 * ~SmartArray 6i onboard [RAID] controller
37 * Battery-backed write cache enabled
1 GuyThornley 38
17 AristotlePagaltzis 39 The base software:
40
41 * LinuxKernel 2.4.27 from [Debian]’s <tt>kernel-image-2.4.27-2-686-smp</tt> [Package]
37 GuyThornley 42 * Using [Ext3] with 128MB journal and <tt>ordered</tt> data mode
1 GuyThornley 43
44 On with the testing!
39 GuyThornley 45
46 __Update__: On Tue Jun 20 2006, all results were replaced with updated results. The previous test results were invalid and incomparable, due to inconsistencies and errors in the testing process.
1 GuyThornley 47
19 GuyThornley 48 !! Results: 4-disk configurations
1 GuyThornley 49
17 AristotlePagaltzis 50 * Data array: RAID5, 4x 72GB 10k RPM%%%
51 WAL array: On data array%%%
1 GuyThornley 52
53 <pre>
54 scaling factor: 600
37 GuyThornley 55 number of clients: 100
56 number of transactions per client: 500
57 number of transactions actually processed: 50000/50000
58 tps = 124.728272 (including connections establishing)
59 tps = 124.885813 (excluding connections establishing)
1 GuyThornley 60 </pre>
61
17 AristotlePagaltzis 62 * Data array: RAID5, 4x 72GB 10k RPM%%%
63 WAL array: On data array%%%
64 Other notes: <tt>commit_delay</tt> disabled%%%
1 GuyThornley 65
66 <pre>
67 scaling factor: 600
38 GuyThornley 68 number of clients: 100
69 number of transactions per client: 500
70 number of transactions actually processed: 50000/50000
71 tps = 129.347747 (including connections establishing)
72 tps = 129.517978 (excluding connections establishing)
32 GuyThornley 73 </pre>
1 GuyThornley 74
17 AristotlePagaltzis 75 * Data array: RAID5, 4x 72GB 10k RPM%%%
76 WAL array: On data array%%%
77 Other notes: battery-backed write cache disabled%%%
1 GuyThornley 78
79 <pre>
80 scaling factor: 600
37 GuyThornley 81 number of clients: 100
82 number of transactions per client: 500
83 number of transactions actually processed: 50000/50000
84 tps = 114.885220 (including connections establishing)
85 tps = 115.020971 (excluding connections establishing)
1 GuyThornley 86 </pre>
87
17 AristotlePagaltzis 88 * Data array: RAID5, 4x 72GB 10k RPM%%%
89 WAL array: On data array%%%
20 GuyThornley 90 Other notes: Battery-backed write cache and <tt>commit_delay</tt> disabled%%%
35 GuyThornley 91
1 GuyThornley 92 <pre>
93 scaling factor: 600
37 GuyThornley 94 number of clients: 100
95 number of transactions per client: 500
96 number of transactions actually processed: 50000/50000
97 tps = 80.177806 (including connections establishing)
98 tps = 80.244181 (excluding connections establishing)
42 GuyThornley 99 </pre>
100
101 * Data array: RAID1+0, 4x 72GB 10k RPM%%%
102 WAL array: On data array%%%
103
104 <pre>
105 scaling factor: 600
106 number of clients: 100
107 number of transactions per client: 1000
108 number of transactions actually processed: 100000/100000
109 tps = 216.341138 (including connections establishing)
110 tps = 216.606298 (excluding connections establishing)
1 GuyThornley 111 </pre>
112
17 AristotlePagaltzis 113 * Data array: RAID1, 2x 72GB 10k RPM%%%
114 WAL array: RAID1, 2x 72GB 10k RPM%%%
1 GuyThornley 115
116 <pre>
39 GuyThornley 117 scaling factor: 600
1 GuyThornley 118 number of clients: 100
119 number of transactions per client: 1000
120 number of transactions actually processed: 100000/100000
39 GuyThornley 121 tps = 131.213838 (including connections establishing)
122 tps = 131.309052 (excluding connections establishing)
1 GuyThornley 123 </pre>
7 GuyThornley 124
17 AristotlePagaltzis 125 * Data array: RAID1+0, 4x 72GB 15k RPM%%%
126 WAL array: On data array%%%
7 GuyThornley 127
128 <pre>
129 scaling factor: 600
28 GuyThornley 130 number of clients: 100
131 number of transactions per client: 1000
132 number of transactions actually processed: 100000/100000
133 tps = 284.662951 (including connections establishing)
134 tps = 285.127666 (excluding connections establishing)
12 GuyThornley 135 </pre>
28 GuyThornley 136
12 GuyThornley 137 * Data array: RAID5, 4x 72GB 15k RPM%%%
17 AristotlePagaltzis 138 WAL array: On data array%%%
13 GuyThornley 139
140 <pre>
141 scaling factor: 600
29 GuyThornley 142 number of clients: 100
143 number of transactions per client: 1000
144 number of transactions actually processed: 100000/100000
145 tps = 189.203382 (including connections establishing)
146 tps = 189.379783 (excluding connections establishing)
18 GuyThornley 147 </pre>
148
149 * Data array: RAID1, 2x 72GB 15k RPM%%%
150 WAL array: RAID1, 2x 72GB 15k RPM%%%
151
152 <pre>
153 scaling factor: 600
30 GuyThornley 154 number of clients: 100
155 number of transactions per client: 1000
156 number of transactions actually processed: 100000/100000
157 tps = 171.537230 (including connections establishing)
158 tps = 171.680858 (excluding connections establishing)
31 GuyThornley 159 </pre>
19 GuyThornley 160
161 !! Results: 6-disk configurations
162
163 * Data array: RAID1+0, 4x 72GB 15k RPM%%%
164 WAL array: RAID1, 2x 72GB 10k RPM%%%
165
166 <pre>
167 scaling factor: 600
27 GuyThornley 168 number of clients: 100
169 number of transactions per client: 1000
170 number of transactions actually processed: 100000/100000
171 tps = 340.756686 (including connections establishing)
172 tps = 341.404543 (excluding connections establishing)
19 GuyThornley 173 </pre>
174
175 * Data array: RAID5, 4x 72GB 15k RPM%%%
176 WAL array: RAID1, 2x 72GB 10k RPM%%%
177
178 <pre>
179 scaling factor: 600
26 GuyThornley 180 number of clients: 100
181 number of transactions per client: 1000
182 number of transactions actually processed: 100000/100000
183 tps = 212.377629 (including connections establishing)
184 tps = 212.615105 (excluding connections establishing)
1 GuyThornley 185 </pre>
186
17 AristotlePagaltzis 187 * Data array: %%%
188 WAL array: %%%
189 Other notes: %%%
1 GuyThornley 190
191 <pre>
192 </pre>
40 GuyThornley 193
194 !!! Insights and observations
195
196 * Using RAID1+0 for the heap files provides a dramatic performance gain. This is because RAID1+0 performs random write very well, compared to RAID5. Using RAID1+0 is an expensive option because of the additional disks (perhaps expensive SCSI disks), space, power and cooling capacity required. Whether the gain is worth the cost is a deployment specific question.
197
198 * Current [PostgreSQL] myths claim that moving the [WAL] to its seperate spindles, often on RAID1 or RAID1+0, increases performance. Most of the performance gain arises from increasing the number spindles the total IO load is distributed over, rather than the specific disk configuration. In particular it should be noted that:
199
200 # RAID5, with the help of a battery backed write cache, does sequential write very well.
201 # The [WAL] is written sequentially.
41 GuyThornley 202
203 * The <tt>commit_delay</tt> parameter is a help for non-battery-backed systems, and a loss for battery-backed systems. The reason is clear: with a battery, the fsync() is almost free, thus the delay means the total throughput goes down. However without a battery the fsync() is very expensive, and if the delay allows it to be eliminated in the majority of cases, the throughput can go up.
204
205 !!! <tt>pgbench</tt> test limitations
206
207 * The test is of the saturated throughput, rather than latency. In typical database usage, however, it is the latency that is the dominant performance metric. Many database situations have a cirtain bound on user acceptable latency. It may be interesting to perform throughput-vs-connections-vs-latency comparisons. Under such a test, the true gain of a battery-backed cache, and the true cost of <tt>commit_delay</tt> would be evident.
208
209 * The probability distrubution used to access the data is a completely flat probability curve. That is, all data items have the same probability of being accessed. This is completely unrepresentivive of typical situations where some data items are accessed very frequently and some hardly at all. Unfortunately the actual specific probability distribtuion is likely to be very application specific.
210
211 * The TPC-B benchmark, which the <tt>pgbench</tt> program is somewhat based upon, was actually retired by the [Transaction Processing Council (TPC) | http://www.tpc.org] in 1995. The test database schema is very basic, with very simple queries. Better benchmark tools are available, such as the [DBT suite | http://osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/] developed by [Open Source Development Labs (OSDL) | http://www.osdl.org] which aim to be a fair-use implementation of the current [TPC benchmarks | http://www.tpc.org/information/benchmarks.asp].
3 PerryLorier 212
17 AristotlePagaltzis 213 !!! Other observations
214
40 GuyThornley 215 * The [WAL] consumes large amounts of [Kernel] page cache. When moving the WAL between devices, when the old files are unlinked, 1/2 of the page cache is freed. Since the WAL is never read and written only once, this is as waste!
17 AristotlePagaltzis 216 * The battery-backed write cache makes write performance very erratic.
18 GuyThornley 217 * The [HP] ~SmartArray hardware (or perhaps driver) tends to block reads while there are cached writes occuring. Large read latencies (whole seconds) result. I have not yet found a way to tune this.
14 GuyThornley 218
219 ----
17 AristotlePagaltzis 220 Part of CategoryDiskNotes

PHP Warning

lib/blame.php:177: Warning: Invalid argument supplied for foreach() (...repeated 13 times)