Greetings,
Over the last 48 hours, Aru, Bert, Matt Craig, Patrick Wils, George Silvis, and I have been busy tracking down problematic sections of code, profiling long running queries, and implementing a variety of changes to resolves these problems. While there is still more work to do, I believe the most egregious issues have been resolved.
The specific changes we’ve made include adding local caching to some applications, introducing additional indexes to the AID and VSX databases to accelerate queries, modified our approach to certain problems to reduce the load on the database server, rewrote some queries to utilize new database indexes, and changed our approach to bot detection. Most of our applications now include software that automatically logs error conditions and provides user experience metrics. This additional tool will let us identify and concentrate our efforts on the worst portions of our website.
We’ve been monitoring our servers closely for the last 24 hours and things are looking good. As a result of our efforts, the average number of active sessions on our database has decreased from ~150 to 5. This difference is due almost exclusively to the team’s work on a single section of very heavily used code. We’ve even received reports that users are seeing significantly better response times to VSX queries!
Please let us know if you experience any additional issues. We’ll work as quickly as we can to resolve them.
Brian