「PostgreSQLを使う理由」の版間の差分

2008年1月25日 (金) 00:44時点における版

作成中です - Mitsuhiro Yoshida 2008年1月22日 (火) 00:59 (CST)

Martin Langhoffは、PostgreSQLの使用推奨を主張します (出典: Moodle over webct and LNLS at Athabasca University? forum posting)

Postgresを選ぶ理由はいくつかありますが、簡単な要点をまとめてみます。私たちは、Catalystで様々なRDBMS (Oracle、Postgres、MySQL、ProgressおよびいくつかのRDBMS) を動作させ、社内で数多くの経験があります。また、私たちはデータベースのレプリケーション、クラスタリングおよび他の技術 (tricks) に関して経験があります -- .nzルートドメインサーバのバックエンド、そしていくつかのミッションクリティカル (極めて重要) なシステムにこれらの技術使っています。

パフォーマンスに関して、Postgresは最初の設定がMySQLよりも少しだけさらに必要です。適切に設定されたPostgresは、小さなMySQLデータベースのSELECTパフォーマンスにおいて、非常に近くなります。大きなテーブルではMySQLにパフォーマンスの問題がありますが、Postgresは快適に動作します。

また、書き込み (write) パフォーマンスでMySQLに問題があります -- トラフィックが多い場合、同時書き込みに深刻な問題があります。高負荷の場合でも、Postgresは快適に動作します。

しかし、本当のことを言えば、Postgresを選択する真の理由は「信頼性」です。We maintain a lot of databases, and Postgres is rock-solid reliable and has a focus on ACID-correctness: when it returns from a commit, the data is safely on disk and won't be lost -- barring actual disk problems, which we offset using RAID-1.

No matter how hard we try, MySQL databases with a lot of usage have recurring index corruption issues. If you look at the startup scripts for MySQL on most Linux distributions, they check for data corruption on every startup -- this is to mask the fact that it is a frequent occurrence.

And while this is passable with small installations where the data isn't mission critical, you have to consider how much you can trust such approch. And with large datasets, runing isamchk/myisamchk can take hours -- we cannot afford that.

The clustering solution for MySQL is being touted a lot, and I think it is a red herring. My main concern about is that it writes "asynchronously" -- that is, there is no guarantee that your data is on-disk safely. It'll get to the disk sometime. It'll get to the slaves... sometime. Hmmm.

Given that the MySQL cluster uses async writes, splitting read/writes between the master and the slaves breaks down in cases where we write some data, and read it back in immediately (or soon after). And this does happen in quite a few places.

And you also have to consider the performance boost of using async writes: if you tell a standalone Postgres or MySQL to use async writes, it'll run scale much better (should be able to handle up to 3-4 times more simultaneous writes). Once you do that, the performance advantage of the MySQL cluster mostly vanishes. It still has semi-hot takeover in case the master goes down, but Postgres can do that using Slony, and with better guarantess of consistency of the data in the slave.

In a nutshell, MySQL isn't normally very solid when it comes to ensure my data is safely stored on-the-disk, even if it theoretically guarantees that it's been saved. And MySQL Cluster says up-front that there isn't a guarantee any more. Riiiiiight wink

Michael is talking about having UPSs. We have a car-sized UPS and a container-sized on-site generator that auto-starts. And yet, I wouldn't depend on that for my DB consistency on a large Installation. So many things other than power can (and do) go amiss. If a process has a problem storing the data, the right thing is to tell that back to the user. With async writes, you end up with a queue of data that hasn't been stored yet, but you already told the user it was.

That's not what a database is supposed to do.

I am currently exploring some techniques similar to those being used in livejournal and slashdot. We should be able to increase Moodle scalability by cutting down on DB load by about 50%. This is happening slowly in the gaps between more urgent projects. Feel free to ping Richard or me if you're interested in that track.

Documentation

「PostgreSQLを使う理由」の版間の差分

2008年1月25日 (金) 00:44時点における版

関連情報

@@ 7行目: / 7行目: @@
 パフォーマンスに関して、Postgresは最初の設定がMySQLよりも少しだけさらに必要です。適切に設定されたPostgresは、小さなMySQLデータベースのSELECTパフォーマンスにおいて、非常に近くなります。大きなテーブルではMySQLにパフォーマンスの問題がありますが、Postgresは快適に動作します。
-Write performance is also an issue with MySQL -- with a lot of traffic, it has serious problems with concurrent writes. Under heavy load, Postgres performs much better.
+また、書き込み (write) パフォーマンスでMySQLに問題があります -- トラフィックが多い場合、同時書き込みに深刻な問題があります。高負荷の場合でも、Postgresは快適に動作します。
-But to tell you the truth, the real reason for choosing Postgres is reliability. We maintain a lot of databases, and Postgres is rock-solid reliable and has a focus on ACID-correctness: when it returns from a commit, the data is safely on disk and won't be lost -- barring actual disk problems, which we offset using RAID-1.
+しかし、本当のことを言えば、Postgresを選択する真の理由は「信頼性」です。We maintain a lot of databases, and Postgres is rock-solid reliable and has a focus on ACID-correctness: when it returns from a commit, the data is safely on disk and won't be lost -- barring actual disk problems, which we offset using RAID-1.
 No matter how hard we try, MySQL databases with a lot of usage have recurring index corruption issues. If you look at the startup scripts for MySQL on most Linux distributions, they check for data corruption on every startup -- this is to mask the fact that it is a frequent occurrence.