Automated Testing. How to Maintain Sanity as Complexity Grows

We’ve been heads down porting the code from our original prototype from C++11 to Lua.  Pushing more of the core engine to Lua will make the final game much more moddable than the track we were on before, so I’m super happy with the change, even if it was time consuming.

How’s that working out?

This is the first major project I’ve done with Lua and I’m really enjoying it.  I can turn the edit-test-debug loop much faster when I don’t have to wait for a recompile. That’s great, but many of the bugs I end up fixing are things which could be easily caught by a static-type checking compiler.  Lua 5.1 with some tools to help here (like strict.lua), but we definitely need some extra tools of our own.

Automated Testing to the Rescue!

It’s hard to overstate the value of good, automated tests. Without them the only way to test your code is by manually running the product, either with a staff of hired QA engineers mashing buttons or (gasp) your end-users.   Both are much more expensive relative to the cost of developing the good tests to begin with, especially if your amortize the cost of writing tests across the whole project.

Unfortunately tests one of the first things that gets skimped on when projects come under schedule pressure. Furthermore, the cost of stopping development and writing tests for features already developed gets higher and higher the longer you go without writing them. Once you stop writing tests it’s very hard to start up again, and soon the cost to back fill missing tests is more than you’re willing to pay, leading to either really long QA cycles between builds, an unstable product, or both. We really don’t want to find ourselves in that situation down the road, so we’re making writing code to test code a big priority.

The core of the Stonehearth engine is written in C++11 and handles the most compute expensive portions of the game. For now, that’s just path finding, collision detection, and rendering. The rest of the game is written in Lua. Writing a test is just a matter of creating a new “game” type for each test scenario. For example, here’s the test I’m working on for harvesting:

local MicroWorld = require ‘radiant.tests.micro_world’

function HarvestTest:start()
self:place_citizen(12, 12)
local tree = self:place_tree(-12, -12)

self:at(10, function() self:place_stockpile_cmd(4, 12) end)
self:at(100, function() self:harvest_cmd(tree) end)

MicroWorld is a helper class which creates a 32×32 flat world. At initialization time the test creates a new world and drops a tree and a worker on it. The “at” function schedules a callback at the specified time. So this test will create a new stockpile 10 ms into the game, and attempts to harvest the tree at 100 ms. We haven’t built up enough infrastructure to determine the success conditions yet (it should probably be something like the tree has been destroyed and all the wood generated now resides in the stockpile), but this is already super useful.

For example, here’s the video of another test we use to test our basic house-building code. The test framework can run tests in both slow-mode and real time. Slow-mode stops the simulation every time a new order is issued to a unit to give the developer enough time to visually verify the results on screen. Real-time mode runs the game as fast as possible, with no idle loop at all.

Here’s the slow version. The grey and red wireframes you see are debugging information. Those aren’t there in the actual game, obviously:

And now running in real-time mode. The client interpolation logic currently breaks down when the server tick rate is much faster than the client frame rate, which is why you see the worker sliding around the screen.

That’s all for now.

The game is small enough right now that simply running all the tests manually before committing new code is enough to make sure no old functionality regresses as we add more features.  There are a lot of tools I’d like to write to build as the game gets bigger.  For example, it would be great if we could periodically run all the tests and generate coverage data to measure how much of the source base they’re actually exercising.