Friday, January 9, 2009

The new calculation engine, a.k.a. That Shouldn't Be So Difficult

I received quite a few requests for Zero Sum support - usually something along the lines of "could you show the sum of all the gear values for the run divided by how many people attended, so we can fill in the value ourselves?"

While this would be one way to implement Zero Sum, it lacked that certain special something. Changing one thing in the run resulted in a huge headache for all involved - and I'm not a fan of that.

So I decided the best approach to take would be to let people treat the attendance values as weights and add a Zero Sum DKP priority calculation. By treating the attendance as a weight, users can enter in any value they like - each attendee is allocated their share of the gear value. This allows very straightforward ways of implementing partial attendance (enter in what % of the raid they attended) and other fun things.

One nice result of this is that you can flip your pool over to Zero Sum without changing any of your data and it will Just Work.

I will warn you about something, though: the numbers you see may be slightly off from what you've seen if you're importing from another system. The reason for this is that Swagbook preserves the Zero Sum property on bonuses and penalties - that is that bonuses and penalties affect the whole pool. If you give a 5 point bonus to Adam, 5/n points are deducted from everyone, where n is the number of people in the pool. For the most part this isn't that noticeable unless you're doing a ton of bonuses and penalties.

Well, that's not precisely true - there is one situation where it's noticeable. Let's say that you attend a run and afterward your priority is 5. You do not attend the next run, but a new person does. Your DKP will then NOT be 5 like you expect, because now the "pool cost" of bonuses and penalties are being sliced up a little finer. This is one of the undesirable qualities of Zero Sum - your priority score can change without any action on your part.

(the rest of this is boring programmer talk, if you want the short version: Zero Sum is in. Please let me know about any bugs.)

Satisfied that I'd found a good way to approach the problem, the problem of implementing this came next. For those of you who don't know (ie you aren't me), previously every time a DKP value was changed, the run was marked "dirty". Just before closing the database connection, a query would run that would find all the dirty runs and (using one giant SQL query) calculate the sum attendance, gear, bonuses, and penalties for each run in that pool. As a result, getting priorities was largely a question of grabbing the precalculated values from the table so generated.

This was the reason for the short delay after saving changes, incidentally.

Easy enough, right?

Well, with Zero Sum the query would simply become untenable. No longer just a sum() of the column, Zero Sum is quite intricate - particularly if you consider bonuses and penalties.

So, I tried moving all of the calculations into Python, just to see if I could. Would have been easy to maintain, so it was worth a shot. After all was in place I fired up the benchmark, and...waited about 18 minutes for the front page to load.

Yeah, not going to work.

Next up, I tried making a table of stored calculations and triggers to remove dirty calculations. It was my hope that caching intermediate results might make recalculation faster (after all, rarely does the first run in a pool get changed). I also trimmed the fat in the code to stop re-calculating the same values. Render time went down to about 50 seconds, though truth to tell it was more the re-calculation culling than anything else.

So, my next question was: is it the calculation time or the marshalling time (transferring the values between the SQL server and python)? A quick experiment showed that run time was almost directly proportional to number of queries.

I decided to try out a stored procedure for calculating rather than using Python, and the change was remarkable. Think 5 seconds down to 0.02 seconds. Stored procedures were implemented for the rest of the calculations, and we were off to the races.

This is not to say that the stored procedures didn't go through a few changes. I started out being really stupid and having calculate_sum_attendance(xxx) only insert results into the calculations table. That's the fun at looking back at all this: I notice how much my thinking was constrained at each step by the method I'd attempted to use previously. Python to slow? Cache it. Still too slow? Build the cache with a stored procedure. Waaaait a second.

The caching is still there, but now calculate_sum_attendance(xxx) actually returns the value - as a result, only one query per calculation, rather than "refresh then load".

How much does the cache help? Honestly not 100% sure. The framework was all there, it cost me very little to use for the other procedures, and I know it's about 5x faster when the calculation already exists...but it could be that the repeated INSERTs are the majority of the cost. At the moment it all works and works well, so I don't think I'm going to mess with it.

So the new calculation methods are in. Zero Sum works like a charm now, and the way is paved for more intricate priority systems. After the next round of bugfixes, of course.

No comments: