SAN FRANCISCO — The power of PHP and an RDBMS is the ability to nail the major features of an application with cheaply paid developers in a record amount of time. Unfortunately, the default runtime environment used by PHP is simply an unscalable mess.
A lot of the folks I’ve worked with do not care about maintainability; their PHP applications are throwaway, but heavily loaded and highly concurrent. For example, I worked with a company that developed a PHP marketing application with an Oracle back end, where you bought its products and could exchange your “points” for features of an online game. It worked great — until it reached a few million users.
The truth is that if you have enough servers and enough database servers, you don’t have contention. But with a PHP Web app on top, an RDBMS like Oracle just can’t be scaled cost-effectively to deliver both good read and write performance.
As it turns out, there’s a modern solution to the problem: the cloud plus NoSQL. Cloud infrastructure gives us the ability to spin up enough servers, and a NoSQL database enables us to shard our data effectively. But first, let’s examine why PHP’s runtime environment is such dog to begin with.
Why PHP’s runtime environment sucks
The most common runtime for PHP is the Apache Web Server in prefork mode, which means that the Web server runs a series of separate subprocesses to support concurrent requests. When you combine this concurrency characteristic with the use of a traditional relational database like MySQL, PostgreSQL, or Oracle, this choice implies unpooled database connections because database connection pooling requires a shared memory space.
Native threads, on the other hand, have a shared memory space as part of their master process. Subprocesses do not have a shared memory space unless you use a specific operating system area called “shared memory.” This isn’t as fast as being able to pass memory by reference — besides, the Apache Web Server’s “prefork” module doesn’t support the use of shared memory for this purpose anyhow. It is sometimes possible to run PHP with native threads, aka worker mode, but this is heavily dependent on the modules you use and whether those modules are “thread safe.”
The PHP concurrency model has a major impact on vertical scalability when using a traditional RDBMS. While it’s possible to open thousands of unshared concurrent connections to MySQL or Oracle, this has a fairly negative impact on the number of concurrent requests. A typical PHP application — indeed, any Web application — consists of logic along these lines:
In this type of code, there are relatively long periods of time where the application is not actually interacting with the database and another request could “share” the same database connection — if only database connections could be pooled. Since the PHP process model precludes this, you are forced to make a decision: Hold the connection for the duration of the request/response cycle or let go each time the application is done.
The problem with letting go, however, is that it depends on the performance characteristics of opening socket connections. The TCP stack is set up to guard against orphaned packets from a previous connection interrupting a new connection; this is part of the reliability guarantee that TCP draws over IP. The way TCP/IP does this is by making you wait to reuse the same socket connection. Thus, the number of TCP sockets connections you can open in a second is limited. One way of escaping this limit is to reuse connections across multiple request cycles — a fundamentally sound idea that most PHP applications (due to the PHP concurrency model) simply cannot take advantage of.
If you examine the active connections on your Web server or database server when running a PHP application (on Unix/Linux servers, type netstat -na), you’ll see a large number of connections to or from the database in TIME_WAIT or CLOSE_WAIT state. Were you instead running your application on a runtime environment that allowed pooled connections, you would see a fixed number (the size of the database connection pool) in ESTABLISHED state. The bottom line: PHP applications are a load on the database due to the constraints of the concurrency model.
Why is PHP this way? Linux did not originally support threads. It only supported subprocesses. Windows NT-derived operating systems always supported threads (though heavier ones than modern Linux native threads) and thus would outscale Linux by a large margin. Unfortunately, no one believed those Microsoft funded studies that proved it.
To scale PHP on a relational database, you need to shard your data. This means splitting the data by some reasonable key. This might mean East Coast customers go on one RDBMS, Midwest customers go on another, and the West Coast on a third. This is a lot of complexity to swallow when you chose PHP because it was “simple” and “free.”
The cloud and NoSQL are game changers
In the cloud, if we can trade a conventional RDBMS for a database that autoshards and can balance connections to each node, PHP can scale pretty well. Rather than have a series of unpooled connections to one or two machines, you can balance this among several database servers.
More Web servers limit the impact of the lack of connection pooling on the database clients. More database nodes and sharding reduce the impact on the server nodes. I think it’s clear the move to NoSQL and the cloud are big scalability wins even for existing runtimes. The economic choices that have made PHP so successful may even make it more successful in the cloud and prevent the rework to a thread-safe PHP from ever having to take place.
I think it’s clear the move to NoSQL and the cloud are big scalability wins even for existing runtimes. The economic choices that have made PHP so successful may even make it more successful in the cloud and prevent the rework to a thread-safe PHP from ever having to take place.
Together, migration to the cloud and NoSQL greatly mitigate these issues or make them simply a deployment detail. It means we may be able to hire an offshore team of PHP coders to knock one out on a NoSQL database so long as we have a good NoSQL schema and a reasonable cloud deployment scheme.