Skip to main content

Janissary: AOE2 Game Analyzer


Janissary is a python parser for multiplayer game recordings from Age of Empires 2. It generates an HTML report with embedded React application for interactively viewing stats about the game.

My brothers and I used to play AoE2 when I was a teenager. Recently, when my brother asked if I wanted to play, my reaction was something along the lines of, "Do you actually still have a copy of this game?!" Turns out, it has been re-released as the "HD edition", and is available on Steam, along with an active multi-player server. Once I got over the fact that Steam was still selling a game originally released in 1999, I paid my $20 and we started playing again semi-regularly.

At the end of a game, there's a summary of "Achievements" -- basically scores for players based on what they accomplished -- and a timeline: an area plot showing the peasant (economic) and military population of all players over time. I've always disliked this plot. Because it's an area plot, you can't tell very well how the overall population changes, and if there's a shift in area, you can't tell to what extent one player increased population or another decreased. If you want to see, for example, who had the most peasants at a given point in time, you can compare the areas, but this is pretty tough to do visually. It could be a lot better. The game allows recording, and has a playback tool built into it, so after a match you can re-watch it to see what you did wrong, or more importantly, what your opponent did that you couldn't see during the game. This is a cool feature, but more importantly, I knew it meant that there was a lot more information in the recording file than the game made available. Wouldn't it be cool to be able to collect all that info and plot whatever I like, however I like?

Knowing there was no way I was the first person to want to do this, I went looking for any documentation about the format of this file, and I found some. The exact format of the file has varied -- and continues to vary -- with new versions, but it appears several people have been hard at work reverse engineering these as they change.

Example output graph of unit production over time
Example output graph of unit production over time

The file format

The basic structure of the file is simple: a header and a body.

The header is written at the start of the game, and contains all of the initial conditions for the game, including:

  • Player information
  • The map
  • Game settings

After the header, the "body" of the file contains a list of short objects called operations. An operation can be one of these types: GameStart, Chat, Sync, or Command. Those last two are the important ones for our purposes.

A command is, for the most part, a user action. When you click to attack, this generates a command. When you select your peasants, and click to go chop trees, this generates a command. Selecting a peasant however, is not a command. This is handled completely by the local game client. A command gets sent to other players (and ultimately logged in the recording) only when you do something that changes the global state of the game world.

Sync provides time information at synchronization points. I believe each sync command indicates a "communication turn" (see the Bettner/Terrano paper below), and likely all commands preceding it since the last sync are to be executed at that comm. turn. But I don't know precisely how timing is handled in the game.

Looking at the file format, it quickly becomes clear that there is no game state information stored into the file. A MOVE command is stored, indicating that a player commanded certain units to move to a certain location. But there is no later update about what path those units took, or where they are at any point along the way (or even where they started!). When units fight, there's no update about units that take damage or die. In fact, gameplay relies on every client implementing identical logic, and interpretting the commands to re-create identical state at every point in the game. This approach is discussed in this 2001 paper by Paul Bettner and Mark Terrano:

At first take it might seem that getting two pieces of identical code to run the same should be fairly easy and straightforward - not so. The Microsoft product manager - Tim Znamenacek told Mark early on “In every project, there is one stubborn bug that goes all the way to the wire - I think out-of-sync is going to be it” - he was right. The difficulty with finding out-of-sync errors is that very subtle differences would multiply over time. A deer slightly out of alignment when the random map was created would forage slightly differently - and minutes later a villager would path a tiny bit off, or miss with his spear and take home no meat. So what showed up as a checksum difference as different food amounts had a cause that was sometimes puzzling to trace back to the original cause. As much as we check-summed the world, the objects, the Pathfinding, targeting and every other system - it seemed that there was always one more thing that slipped just under the radar. Giant (50Mb) message traces and world object dumps to sift through made the problem even more difficult. Part of the difficulty was conceptual - programmers were not used to having to write code that used the same number of calls to random within the simulation (yes, the random numbers were seeded and synchronized as well).


The lack of any game state stored in the recording dashed my hopes of re-creating the achievements plot from the file, and briefly doused my enthusiasm to continue the project. There's just no way I'm going to re-create the game mechanics sufficiently to "play back" the recording. Still though, there's some interesting things that you can pull out. I can't tell how many units a player had alive at a given time, but I can figure out how many cumulative units a player has trained up to that time. And I can do something the achievements page doesn't do at all: I can break it down by unit type. I can also look at actions that a user takes. How many times does one use click MOVE or ATTACK in a battle vs another? Do winning players do more micromanaging -- and therefor generate more actions per minute -- than losing players?

Software Structure

The parser is written in python. I knew I wanted to be able to generate a report for a log file, and I may want to set this up as a web service so that anyone can upload a log file and get a report back easily. So it makes sense to generate an HTML output. So the python package installs the janissary script, and the main command on this script is janissary report, which parses a log, and writes out the HTML report. The report is generated from a simple jinja2 template:

<script type="application/json" id="data">
<div id="root"></div>
Sorry, this report requires javascript and it appears you do not have it enabled.
<script type="application/javascript">
{{ include_file("main.js") }}
import jinja2
import json
import os

from .report import report

def include_file(ctx, name):
"""Create a jinja2 function for including a file without parsing it as a template

The standard jinja2 `include` function will attempt to parse the included
file as a template, and javascript (not enclosed in a <script> tag) is not a valid
jinja2 template.
env = ctx.environment
return jinja2.Markup(env.loader.get_source(env, name)[0])

def render_html(header_dict, timestamped_commands):
"""Returns HTML output for report file

header_dict - A dict containing the information parsed from the header of the log file
timestamped_commands - List of TimestampedCommand objects parsed from the body of the log file
report_data = report(header_dict, timestamped_commands)

fileDir = os.path.dirname(os.path.realpath(__file__))
searchpath = [os.path.join(fileDir, "templates/"), os.path.join(fileDir, "js/dist")]
templateLoader = jinja2.FileSystemLoader(searchpath=searchpath)
templateEnv = jinja2.Environment(loader=templateLoader)
templateEnv.globals['include_file'] = include_file
template = templateEnv.get_template("report.html")
return template.render(jsdata=json.dumps(report_data))

All this does is create a simple HTML document, and inject two things into it: A JSON blob containing the report data generated by the python code, and main.js, which contains the webpack bundle created from a React application.

Now with React, I can build an interactive UI to display the data however I want. There's not much to it at the moment: Just a few tables and graphs created with chart.js, but its nice to have the flexibility to adjust graph options interactively. We could have stayed entirely python and produced plots with matplotlib, but its a lot harder to make them interactive. By embedding this all into an HTML file, we get a very portable output artifact. The HTML file is self-contained, and can be opened in pretty much any browser.

Here's an example output file.

Needs work

There are still some shortcomings and room for improvements if I end up spending more time on it.

Support more format versions

It's only tested with -- and probably only works with -- the current version (HD v5.8) of the game. If there's any interest out there in more general use, it could be extended for wider support. But for my purposes so far I haven't wanted to spend the time collecting example logs for different versions and making sure it works with them.

Full static data import

There are a lot of static definitions, e.g. the meaning of various object IDs, that I don't have defined yet. I've put in just the ones I figured out myself thus far, which is a minority, so many units will show up as, e.g. "Unknown Unit 432".

Resource Spending reports

We can't tell from the recording how many resources a player has collecte, but we can tell what they spend. And, we can break it down into what was spent on research, buildings, units, and what type of each. I would love to see a total spending broken down over time, and a time plot of cumulative spending. Luckily, someone has already done an amazing job of collecting all of the resource information for everything in the game into a JSON format:


Admittedly, it is pretty basic in presentation. It could do with a little prettying up, for sure.