I rather enjoy debugging. It’s just another type of puzzle, one of the many challenges of gamedev approached with logic and the tools at hand. It should be noted I actually don’t much like debugging if it involves a bunch of code I didn’t write, but this is why I almost entirely use my own tech, stuff that I built, am familiar with and… capable of understanding :P
Most bugs die a swift death in Cogmind, since it’s built using a pretty simple architecture and literally the first thing I put together for the framework was its error detection and reporting system, always taking into account what could go wrong whenever anything new is added.
But one nightmare bug in particular has been in there for a very long time…
Background
“Seeding” a game’s RNG allows it to produce the same numbers in the same sequence, and is therefore a useful feature in roguelikes, especially where map generation is concerned. I’ve already written an article on seeds and how they work in Cogmind, along with their many applications, so I won’t go into all that again.
This time I’m here to talk about a specific seed-related issue that popped up and how it was uncovered and resolved.
Nightmare
Around early 2017 occasional reports of seeded runs not always generating the same maps in some cases started popping up. Now obviously this isn’t right, because the same seed should always produce the same map, so clearly some player action before that point had managed to affect the generation, causing the seeded content to “diverge.”
Map generation can be divided into three main phases: layout, content, and player-affected content. It’s important to separate out all the latter stuff (C) so that it doesn’t affect the base map that everyone using the same seed should share (A/B), so I’m generally careful to do that, but obviously something had slipped in somewhere…
I say this bug was a “nightmare,” though honestly the effect on players was minimal since it rarely came into play and wasn’t a show-stopper or anything like that, it was a nightmare for me because I couldn’t easily track down something like this!
Nonetheless, this is a vital sort of bug to fix because not only are fully reliably consistent seeds important for built-in weekly seeds or other similar events (which are still something I’d like to do), but this bug had also already affected me several times before in other bug-solving efforts. Often times the quickest way to reproduce a bug in order to properly resolve it is to be able to generate a map using the same seed it was created from, especially when I get a random remote crash report which is nothing more than a stack trace and log containing the seed. More than once over the past couple years I couldn’t take that easiest route, or even recreate certain bugs at all since the seed results may not match what the player encountered!
So you can see why it was pretty important to fix this, and when kiedra suddenly brought it up and later offered relevant save files, I was happy to jump on it immediately, brushing aside my previously scheduled work for the day. (It’s best to do this sort of thing when the events are freshest in the player’s mind, in case I had any other questions.)
Data
kiedra provided exactly what I needed, two save files, each from separate runs, both from the map before the one in which the divergence was observed. The fact that Beta 6 added multiple interval autosaves to Cogmind made collecting these saves (and others needed for debugging) much easier, but solving this issue in particular still required that someone be playing actual seeded runs, and observing the differences, and be both able to save this data and willing to share it with me. Whew, finally got an, uh, convergence of all these variables ;)
Here are two screenshot excerpts demonstrating divergence on the same section of map:
You can see how the layout is identical, as are a couple machines and certain locations chosen for item placement, but other machine and item choices are actually different! Gotta find out where the changes started…
Sleuthing
My first guess was that it had something to do with global plot-related values. This is what I’d been thinking all along since I didn’t hear about this issue until much of the story and events were complete. In any case, this was really quick to check since we had two saves, so I loaded up each and just compared the list of globals…
That didn’t pan out, so I moved to comparing the values coming out of the RNG at several major points in the mapgen process, since if any value at a given point was different from that same point in the other save, then the divergence must be occurring between that point and the previous non-diverging one. Basically, if there’s a divergence the RNG must be handing out at least one extra number in one save, and that would entirely throw off where all the subsequent numbers are applied, hence different results from that point onward.
Even before that, based on just the screenshots I could pretty much narrow it down to placeRandomObjects(). “Narrow” is an overstatement though, because that’s also the bulk of the map content initialization process :P. Anyway, that’s where the number comparisons would start.
At the first three points the RNG gave the same number, so we can be pretty confident that the content generated prior to those points was identical between saves. Then comes the fourth check, and we have a winner! The RNG in each save gave a different number there, so they must have diverged somewhere between the last two checks.
Here I got a little ahead of myself and ended up wasting some time because I was excited about finally getting this close and immediately made an assumption based on the general code in that section. I thought it had something to do with how in a few cases later map generation stages were allowed to modify spawning restrictions for object types, different from what was set in the original layout. Problem was, this assumption was not at all based on actual evidence, so the lesson here is to follow the evidence, not your imagination, especially when there’s already a direct route to finding the solution. Oops.
Fortunately I realized my error when I was taking a quick break (it’s good to “get away” from problem solving for a bit, since it might allow for new perspectives, although clearly this was still rolling around in my head while on “break” xD).
I came up with a few ideas for narrowing down the problem space, and while most would solve the problem quickly once implemented, they’d also take a while to build and end up spending more time than they were worth, so I decided to just keep up the straightforward manual search. I did still chop out huge unrelated chunks of the content generation so that the resulting maps would have fewer distractions and be easier to visually analyze, possibly leading to more clues.
To go along with that view, I got a list of every room in the order they were filled, and what type of general content they included:
Getting closer! From the data above, it’s either an issue with Room 14 or 15. Room 15 has a different composition, but since composition is set first, it’s probably an issue with the room before it at (1,67) on the map…
To confirm real quick I also visually checked the final output of several rooms listed above 14, and those were identical in both saves.
Seeing as the Terminal looks identical but there are different numbers and types of items, I decided to take a look at the items first. Stepping through the code line by line for that room I recorded a few values under the first save, then went to the second save, only to discover that the very first numbers it started with were already different, so it must’ve been before item placement even started in there!
Well there wasn’t much before the items… just the Terminal, so I eyed it suspiciously and had an epiphany: it must be something inside the Terminals.
Gotcha!
As soon as I saw different hacks I knew the answer (although it becomes extra obvious by looking at the point from which the hacks change), recalling that schematic hacks at Terminals would favor the player by usually re-rolling if the randomly chosen schematic happened to be one they already had.
This kind of gameplay-improving tweak is fine, but it needs to be done in the player-affected content segment of mapgen! Here I’d checked for and applied the changes immediately, forgetting that we’re in the middle of the base content assignment. So if the player happened to already have a schematic which the game attempted to put on any Terminal on the new floor, it would roll again for a new one, advancing the RNG state and bam--everything after that point will be different.
This also explains why the issue tends to appear more often in the late-game (more time to accumulate schematics) and only for some players (those using schematics as part of their play style, and running seeds so they might actually notice it).
For the sake of double confirmation I did check that kiedra had different schematics in each save, four more in the second than the first, and one of them happened to be what was chosen for this Terminal.
Based on this finding I knew there were some other related instances, and fixed all of them at once. The same behavior exists (to varying degrees) with part schematics, robot schematics, lore records, and preloaded Fabricator schematics. Of course the fix is to move all these player-relative content modifications to the final mapgen phase.
The final check was to run the saves under the new code, and compare both those results to a completely fresh debug run using the same seed (which just teleports to that map so nothing at all can interfere with it). Same results across the board :D
And now seeds should be fully reliable once again!
Better Architecture
It’s worth mentioning (mainly to head off the inevitable comments to this effect :P) that there are ways to prevent this kind of thing from happening in the first place. Like if there are clear rules that should be obeyed, as there are here, then be sure to encapsulate all player-relative data and keep it hidden/inaccessible from the mapgen process until it’s allowed.
In any case, this made for an exciting debugging adventure ;)
2 Comments
Surprised I’m the only comment, but interesting article. Hunting the Butterfly Effect. How much would you say your note-keeping is for your personal sake and how much due to your journalistic tendencies? :)
Heh, well in recent years I get plenty of comments on blog posts, but they’re not often on… the actual blog here :P. Mainly because unlike the early years I now have a presence on so many social media sites and link (or even crosspost) these articles elsewhere, so lots of the comments end up elsewhere, too. Always happy to have people drop by here right next to the original source though! :D
To your question, it’s mostly just how I process things!
In this case, for example, as per the post above I wrote out quite a lot of notes while working through the solution, mainly because a lot of what I write is what I’m considering and planning to do, then as I do it I move down the list noting the results, and continue writing more plans as they come to mind, going down the list and reorganizing/adding future items as necessary based on new findings.
This is just how I debug tougher issues both so that there’s a record to refer to in case I might have gone wrong somewhere, and also because the process of writing it down (and saying it in my head!) helps me analyze it more deeply. I’ll do this even for feature designs and all kinds of stuff.
So I didn’t actually plan to write a blog post on this topic, but since this was solved via some back and forth with kiedra, who’d be curious about the details, I decided to share quite a lot of my notes on the forums, and then not long after Finestep on the Discord said a blog post about it would be interesting, so I went back to fill in some details and make a few images to go along with it. And a relatively quick new blog post is suddenly born :D
Now I’ll admit I do have some pretty strong “journalistic tendencies” ;). I enjoy sharing stories and methods and… all kinds of stuff (hence this here blog, and my others), and documenting and writing are essential to that process, so I tend to take a lot of notes for that reason, too.
One of my next blog posts will likely be about level design, so I’ve been saving more of my notes from the process used to do that, in order to share them later. For Cogmind’s next update I’m currently building a brand new map, the first in a while, and I’ve never covered the methodology despite having used it for years (mainly to avoid spoilers, but this map happens to be near the beginning so there’s less of an issue there!). The quantity of these notes (including multiple variants created along the way) is greater than what I’d normally leave behind--a lot of my design and implementation notes are actually just deleted, but this time I know I’ll be sharing a portion of them so gotta keep more around for now :P