Reverse Engineering Unknown Binary Files - Dwarf Fortress Save Files

About The Project

The video game hacking community is often a source of inspiration for those in the information security field. From in-depth memory hooking techniques to circumvent anti-cheat to beating the final boss via Cheat Engine scripts, there’s always something to learn that’s parallel to the challenges faced by those in the offensive or defensive field. The techniques discussed in this blog are analogous to methodologies used for malware analysis when dealing with custom packers. This blog post focuses on understanding and ultimately modifying the custom file format of the infamous Dwarf Fortress video game, let’s get started!

What is Dwarf Fortress?

Dwarf Fortress is an awesome resource management video game where you control a group of dwarfs to build a new home where they must find food, build structures, obtain water, and defend their castle against the elements and monsters alike. If you’ve ever played a game like Simcity, Factorio or Rimworld, you’ll be right at home with Dwarf Fortress. The origins of this video game go back to being a text based adventure game, and in the past couple of years was updated and released on Steam with 2D graphics. You can even install the original on Arch today!

$> pacman -Ss dwarffortress
extra/dwarffortress 50.13-1
    A single-player fantasy game in which you build a dwarven outpost or play an adventurer in a randomly generated world

Its easy to lose hours to Dwarf Fortress and I picked this game for the blog post because I’m a fan. With that being said, onto analyzing the save file!

Analyzing The Save File

The “Dwarf Fortress/save” directory contains all save files for the Dwarf Fortress worlds on a given machine. This folder can be found at the location below on Linux machines.

/home/dllcoolj/.steam/steam/steamapps/common/Dwarf Fortress/save

The folder structure is based on a player’s current “world” (assuming the game is running) or previous games they have saved. Within each of these folders is a file called “world.sav”. Per-the official Dwarf Fortress Wiki,

Depending on the mode currently active (or the lack thereof), a large file named world.sav or world.dat. In fortress mode, this file is named world.sav and includes the current fortress data, as well as the world data. In a save without a currently active game, this is the main save folder. The custom raws generated for the forgotten beasts, titans, demons, night creatures, and evil effects are stored inside this file. Replacing this entire file will almost certainly crash the game; however, replacing certain portions of the raws included may still keep the save folder working.

Highlighted in bold above is the sentence that stood out to me. By modifying content of the world.save file, we can change the structure of the world in Dwarf Fortress. Sounds pretty neat! Now to identify the location of world.sav, executing find -iname world.sav shows all the current worlds I have for Dwarf Fortress.

00_df_file_location.png

Having multiple files of an unknown file format can be helpful in identifying common structures. Opening both files in radare2 below we see the first 16 bytes are very similar with only two bytes being different at offset 8 and 9 respectively.

01_hexdump.png

02_hexdump.png

Opening a binary file in a hex editor can seem overwhelming at first. Lots of bytes, multiple colors and trying to figure out the structure can be difficult. To help assist in understanding the underlying format, binwalk can list known magic bytes within the file. However, this can also be the source of false positives if there just happens to be a common byte pattern that appears in said binary file. Binwalk reveals hundreds of zlib hits. Quickly referencing the offset of 0xC in the hexdump above, we see the start of the magic byte “78da”.

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------------------
12            0xC             Zlib compressed data, best compression
2207          0x89F           Zlib compressed data, best compression
4700          0x125C          Zlib compressed data, best compression
... truncated ....

A quick wikipedia search for file signatures verifies that this byte pattern is indeed zlib. To recap, the first 10 bytes of the world.sav file are still unknown followed by four zero bytes, and then the beginning of the first zlib section. Using radare2 to search for the zlib header over 10k matches are detected.

02_df_hexsearch.png

After executing radare2’s byte search , if match for those byte values are identified, each “hit” byte match can be used as a variable to jump to a given offset in the binary file. These variables are constructed via “hit_<num>” where num is the N-th time a byte value has appeared. For example, I execute s hit1_1 in my radare2 shell to jump to the next offset (0x00000089d)in the binary with the zlib magic value.

03_df_hexhits.png

All of these hits results in a lot of data to parse. This also presents an excellent opportunity to use Radare2’s Python API r2pipe and automate extracting each one of these zlib sections to obtain the plain text values they hold.

Automating Zlib Section Extraction with R2Pipe

R2pipe is a Python wrapper around radare2. As far as I know, anything you can do in radare2 you can execute via r2pipe. The Python snippet below returns a string of the offset in the binary where the zlib magic bytes have matched in addition to the magic bytes that were matched.

r = r2pipe.open("region1/world.sav")
hits = r.cmd("/x 000078da") # search for zlib magic 

# hits produces the following values
0x0000000a hit0_0 000078da
0x0000089d hit0_1 000078da
0x0000125a hit0_2 000078da
0x00001f7d hit0_3 000078da
0x00003011 hit0_4 000078da

Next, the result of this data is parsed to return just the first column of data. To achieve this, a Python list comprehension parsing is used to first split on the newline to get each row, followed by a space to get the first element of a row will return just the address.

[addrs.append(hit.split(" ")[0]) for hit in hits.split("\n")]

Finally, a list is created with each address. Iterating over the list of addresses with the zip function parses the start of the first zlip magic hit and the next zlib magic hit. This is required to identify the size of the zlib data to parse out of world.sav. With the zlib section parsed, the data can be decompressed and printed to screen.

# start/stop blocks of address
for x,y in zip(addrs,addrs[1:]):
    fin.seek(int(x,16))
    num_bytes = int(y,16)-int(x,16)
    compressed_data = fin.read(num_bytes)
    decompressed_data = zlib.decompress(compressed_data[2:]) # skip the first 2 00 bytes
    print(decompressed_data)
    fout.write(decompressed_data)
    break # stop after the first compressed block

The first zlib compressed block of data for world.sav is shown below.

b'..... truncated ......
2\x00[OBJECT:INORGANIC]\x14\x00[INORGANIC:DIVINE_1]\x0b\x00[GENERATED]\x08\x00[DIVINE]\x15\x00[DISPLAY_COLOR:0:0:
1]\x13\x00[BUILD_COLOR:0:0:1]\x1d\x00[STATE_COLOR:ALL_SOLID:BLACK]&\x00[USE_MATERIAL_TEMPLATE:METAL_TEMPLATE]*\x0
0[STATE_NAME_ADJ:ALL_SOLID:blistered metal]\x14\x00[MATERIAL_VALUE:200]\x10\x00[SPEC_HEAT:7500]\x14\x00[MELTING_P
OINT:NONE]\x14\x00[BOILING_POINT:NONE]\x0e\x00[ITEMS_WEAPON]\x15\x00[ITEMS_WEAPON_RANGED]\x0c\x00[ITEMS_AMMO]\x0e
\x00[ITEMS_DIGGER]\r\x00[ITEMS_ARMOR]\r\x00[ITEMS_ANVIL]\x0c\x00[ITEMS_HARD]\r\x00[ITEMS_METAL]\x0e\x00[ITEMS_BAR
RED]\x0e\x00[ITEMS_SCALED]\x14\x00[SOLID_DENSITY:1000]\x15\x00[LIQUID_DENSITY:1000]\x12\x00[MOLAR_MASS:20000]\x16
\x00[IMPACT_YIELD:1000000]\x19\x00[IMPACT_FRACTURE:2000000]\x1a\x00[IMPACT_STRAIN_AT_YIELD:0]\x1b\x00[COMPRESSIVE
_YIELD:1000000]\x1e\x00[COMPRESSIVE_FRACTURE:2000000]\x1f\x00[COMPRESSIVE_STRAIN_AT_YIELD:0]\x17\x00[TENSILE_YIEL
D:1000000]\x1a\x00[TENSILE_FRACTURE:2000000]\x1b\x00[TENSILE_STRAIN_AT_YIELD:0]\x17\x00[TORSION_YIELD:1000000]\x1
a\x00[TORSION_FRACTURE:2000000]\x1b\x00[TORSION_STRAIN_AT_YIELD:0]\x15\x00[SHEAR_YIELD:1000000]\x18\x00[SHEAR_FRA
CTURE:2000000]\x19\x00[SHEAR_STRAIN_AT_YIELD:0]\x17\x00[BENDING_YIELD:1000000]\x1a\x00[BENDING_FRACTURE:2000000]\
x1b\x00[BENDING_STRAIN_AT_YIELD:0]\x10\x00[MAX_EDGE:12000]\x10\x00[SPHERE:DISEASE]\x10\x00\x00\x00\x13\x00inorgan
ic_generated\x12\x00[OBJECT:INORGANIC]\x14\x00[INORGANIC:DIVINE_2]\x0b\x00[GENERATED]\x08\x00[DIVINE]\x15\x00[DIS
..... truncated ......

Success!

The plain text data above can now be modified, recompressed, and placed over the pre-exisitng world.sav file. The values here pertain to specific items or world configuration data that can be looked up on the Dwarf Fortress wiki. With the zlib data obtained, there’s still the mystery of the first 10 or so bytes. To further understand their role in the world.sav file, I opened Ghidra and loaded Dwarf Fortress. I searched for what I believe to be world.sav magic bytes 2208 0000 0100. Unfortunately, the results did not lead to the discovery of any new information. After creating numerous save files, these values remained static furthering my suspicions that these bytes are simply a magic header followed by two additional magic bytes that I believe to be unique to the map generated.

Beyond The Blog

Video game save files present an excellent opportunity to start parsing unknown file formats. Browse that old steam library of yours and I bet you’ll find a game that has some unique format to poke at and practice reverseing custom formats. While I used r2pipe to parse out the embedded zlib files, there are numerous ways to do this in native Python. I think Radare2 is a very powerful tool and I like to show useful use cases where I can, thus I chose it for this blog post. Thank you for taking time to read this, if you found it useful please on your social media platform of choice.

If you’re really interested in this kind of work, checkout Game Hacking: Developing Autonomous Bots ofr Online Games. This is not an affiliate link, this is just a great book I think you should read.