What? Why?
So, I was implementing a ‘fun’ Capture The Flag challenge for upcoming event at 37c3 with the idea of having a simple crack-me that only uses cpu cache as it’s memory. The idea behind this is that I wanted the code to be uncomfortable to debug. Essentially preventing players from just looking for strings appearing in memory during runtime, etc.
Whilst talking the concept through with some beings I was told that the way I’ve implemented the code so far reminded them of how some debuggers might work. This got me thinking, and I ended up ditching the prior CTF idea in order to write a Silly debugger. A debugger that should eventually be fully functional on a system with no ram memory present.
I guess that sorta replies to what and why, the actual usecase for this was to be able to somewhat conveniently debug my diy bios project without having ability to run debuggers / gdbserver on the device yet.
How to have volatile memory without ram?
Coreboot project documents this process really well, so I wont go tooooo deep in detail here, but basically the we’ll want to set up CPU to use it’s cache as ram. We don’t need to have almost any hardware initialised on mainboard for the setup, most modern (Less than 30 years) intel and amd processors support this.
In practise, the process is fairly simple:
-
Set MTRR for desired address range as write-back. This essentially instructs the computer that our memory reads should be fetched from CPU cache if possible, and that our memory writes should update cache.
-
Enter normal cache mode, and do read from every address in our address range to mark the lines as valid.
-
Enter no-fill mode to prevent cache being filled from ram. After we’ve set up our cache, we don’t want it to ever be updated/filled from ram since there’s none present
After above, all there’s left to do now is just to set e{si,di}
and esp
registers to point to cache region, and we can start using memory without having ram present. For more thorough documentation of process, refer to Coreboot linked above or see my implementation
The debugger part
Basic idea here is, that we’ll fetch one whole instruction at a time, execute it, and have our main-loop continue as the instruction following the one we just executed. This’ll cause issues with any sort of branching though, and it’s overall slow and not very great, but this is the approach I chose and at least so far I do intend to stick to it.
The main code logic here is:
-
Read single instruction from non-volatile storage media to some executable and writable memory location. I wrote incomplete x86 instruction parser for this, so that I have a reasonable way to figure out how many bytes are needed for next instruction. Needless to say, parsing x86 instructions in 16-bit assembly was an interesting experience ^^`
-
Store debugger/‘host’ cpu state to known/predetermined location in cache.
-
Retrieve debuggee/‘guest’ cpu state from known/predetermined location in cache (apart from PC)
-
If the next instruction is not a branching one, jump to execute it
-
Store debuggee/‘guest’ cpu state, retrive debugger/‘host’ state, and jump back to 1st step
I’m yet to figure out how to handle branching exactly, currently I’m playing around with idea of debugger/‘host’ keeping track of debuggee/‘guest’ program counter, and just changing that as needed, and using that to read next instruction for debuggee. Not quite sure yet though.
I’ll likely update this text once I’ve figured out how to move on from here :)