November 22, 2011 in Feedback
Here is an idea for 3.3...
Open the db with a read-only user id and only do read-only queries while generating the HTML for a page. If the current operation requires updates to the db, queue the updates during page processing for execution later as a db transaction. Like the current shutdown update queries, wait to do these updates until the HTML has been emitted and the read-only db connection can be closed. After closing this read-only connection, then check if any update transactions have been queued, and open a write-able connection (using a second read-write MYSQL user id), apply any queued transactions, record to a log failed transactions (possibly as all SQL statements in the failed transaction so they may be applied later by an admin who can replay the transaction after the problem causing the failure has been resolved - like write-able db was offline for master/slave setups), and then close the write-able db connection.
This would involve some internal restructuring to add a class to handle the queueing of transaction data and executing (through callbacks) those transactions on the back side of the page load. You would probably have to invent some sort of (asynchronous) notification message type to the user to give them feedback for success or failure of the transaction since the page HTML would have already been emitted when the transaction executes.
I think this offers larger forums some major benefits. Separating read-only and updates allows support for master/slave db configurations, or other setups. I have been looking into using Gearman to execute all user generated updates where the updates can be done in background with possible parallelism and applied to a master db in a controlled fashion.
Also, db transactions that update multiple tables can be done using TRANSACTIONS so that all updates are applied or none are. This would minimize the possibility of these multiple tables having missing parts of a transaction. And, provide a mechanism for applying failed TRANSACTIONS later if the source of the failure can be identified and resolved.
Finally, having a read-only user id for the majority of the queries that are performed might increase security and get applications to think about the separation of modifications of the database from gathering data for the page generation.
Oh. One more thing... For my situation, I only need non-superadmin execution to separate read-only and updates. For superadmin, it would be okay to use the updatable MySQL user id for all page loads. I would want the superadmin to always operate on the master db directly (and not use a slave) so much of the PHP code wouldn't have to be modified for deferred transactions.
There are easier ways of separating reads and writes. Using transactions isn't a bad idea but IP.Board isn't currently set up in that way. There are several points in the code that expect an update to be processed there and then so the rest of the execution can continue.
There is a read/write driver available here: http://community.invisionpower.com/files/file/3841-high-performance-mysql-driver/
I've written one in the past for one of our managed clients as well. It's certainly doable. Nevertheless, your suggestion would not work in the real world - as Matt said, much of the code requires certain data to be updated before the next part can proceed (i.e. updating records in the database, followed by rebuilding a cache, or updating one table, then running a rebuild routine in another area that queries that original table). Massively rewriting the entire software platform to accommodate this would not be worth it IMO, when there are other less destructive solutions available.
I think that handling cache rebuilds wouldn't be a big deal. The "execute transaction" callback function would simply take over that function. That is, after the transaction succeeds, all caches that might have been affected by the transaction would be invalidated (or rebuilt).
Anyway, I'll implement this idea myself and see if it is as difficult as you indicate. Since I am okay with the admin (and even moderators) doing it the old way (always operating directly on the master), I only need to change the way page loads that are done for a normal user work. There are far fewer complicated updates in the user generated page loads so I really don't think this is all that complicated a mod.
I really wanting to use Gearman to do these transactions. The benefits are that these updates can be deferred/suspended/resumed and heavy loads can be distributed, if necessary. The read-only db access for most queries also appeals to me. I don't care if one user's changes take 5 or 10 minutes to be shown to other users (some IPB forums would probably care). This approach to controlled update transactions means I have more opportunity to cache query results (using memcached) per user, which will help my forum scale.
This topic is now archived and is closed to further replies.
Started 2 hours ago
Started 9 hours ago
Started 10 hours ago