PostgreSQLを使う理由

作成中です - Mitsuhiro Yoshida 2008年1月22日 (火) 00:59 (CST)

Martin Langhoffは、PostgreSQLの使用推奨を主張します (出典: Moodle over webct and LNLS at Athabasca University? forum posting)

Postgresを選ぶ理由はいくつかありますが、簡単な要点をまとめてみます。私たちは、Catalystで様々なRDBMS (Oracle、Postgres、MySQL、ProgressおよびいくつかのRDBMS) を動作させ、社内で数多くの経験があります。また、私たちはデータベースのレプリケーション、クラスタリングおよび他の技術 (tricks) に関して経験があります -- .nzルートドメインサーバのバックエンド、そしていくつかのミッションクリティカル (極めて重要) なシステムにこれらの技術使っています。

パフォーマンスに関して、Postgresは最初の設定がMySQLよりも少しだけさらに必要です。適切に設定されたPostgresは、小さなMySQLデータベースのSELECTパフォーマンスにおいて、非常に近くなります。大きなテーブルではMySQLにパフォーマンスの問題がありますが、Postgresは快適に動作します。

また、書き込み (write) パフォーマンスでMySQLに問題があります -- トラフィックが多い場合、同時書き込みに深刻な問題があります。高負荷の場合でも、Postgresは快適に動作します。

しかし、本当のことを言えば、Postgresを選択する真の理由は「信頼性」です。私たちは数多くのデータベースを管理しています。Postgresには磐石の信頼性があり、ACID (Atomicity, Consistency, Isolation, Durability) に重点が置かれています。またPostgresでは、コミットから戻る時点でデータは安全にディスクに保存され、私たちが使用しているRAID1に実際のディスクトラブルがない限り失われることはありません。私たちが何度試してみても、頻繁に使用されるMySQLデータベースでは、インデックス破損の問題が生じました。あなたがMySQLのスタートアップスクリプトを確認してみると、ほとんどのLinuxディストリビューションでは、スタートアップ毎に破損データをチェックします -- このチェックにより、インデックスの破損が頻繁に起きている事実を覆い隠しています。

そして、これは比較的小規模かつデータがミッションクリティカルではない組織では通用しますが、あなたはこのようなアプローチをどれほど信頼できるか考慮すべきです。また、大規模なデータセットでは、isamchk/myisamchkの実行に数時間かかってしまいます -- 私たちは、そのような時間を割くことはできません。

MySQLのクラスタリングソリューションが喧伝されていますが、根本の問題から注意をそらしているのだと私は思います。それに対する私の主な関心事は、MySQLクラスタリングでは「非同期」にデータが書き込まれることです -- つまり、あなたのデータがディスク上に安全に書き込まれるという保証はありません。あなたのデータは、時々ディスクに到達します。そして、時々．．．スレーブに到達します。Hmmm.

Given that the MySQL cluster uses async writes, splitting read/writes between the master and the slaves breaks down in cases where we write some data, and read it back in immediately (or soon after). And this does happen in quite a few places.

And you also have to consider the performance boost of using async writes: if you tell a standalone Postgres or MySQL to use async writes, it'll run scale much better (should be able to handle up to 3-4 times more simultaneous writes). Once you do that, the performance advantage of the MySQL cluster mostly vanishes. It still has semi-hot takeover in case the master goes down, but Postgres can do that using Slony, and with better guarantess of consistency of the data in the slave.

In a nutshell, MySQL isn't normally very solid when it comes to ensure my data is safely stored on-the-disk, even if it theoretically guarantees that it's been saved. And MySQL Cluster says up-front that there isn't a guarantee any more. Riiiiiight wink

Michael is talking about having UPSs. We have a car-sized UPS and a container-sized on-site generator that auto-starts. And yet, I wouldn't depend on that for my DB consistency on a large Installation. So many things other than power can (and do) go amiss. If a process has a problem storing the data, the right thing is to tell that back to the user. With async writes, you end up with a queue of data that hasn't been stored yet, but you already told the user it was.

That's not what a database is supposed to do.

I am currently exploring some techniques similar to those being used in livejournal and slashdot. We should be able to increase Moodle scalability by cutting down on DB load by about 50%. This is happening slowly in the gaps between more urgent projects. Feel free to ping Richard or me if you're interested in that track.

Documentation

PostgreSQLを使う理由

関連情報