StarCraft Brood War Data Mining

Author: Alberto Uriarte. Last update: 03-31-2017

If you want to do some data mining or machine learning on StraCraft Brood War, don't reinvent the wheel. This is a compendium of all the work done. Just remember to notify me (albertouri[at]drexel[dot]edu) if you do something to keep this web updated.

Replay websites

First of all you need some replays (game log files) to do your data mining. There are some websites where you can find replays from professional gamers (a.k.a. gosu). It's important to notice that if you want to analyze the replays using BWAPI, you would need replays played in the last version of StarCraft Brood War (1.16.1).

Replay crawler

Downloading all the replays by hand it's a really time consuming task, you should use a crawler to automatize this process. Keep in mind that the same replay can be stored in different websites, so it's a good practice to check the hash of the replay to look for duplicates. Some replays could be corrupted.

Replay packages

Once you gather a nice amount of replays, it's nice to offer your replay package to the research community to be able to reproduce your results and compare it with new approaches. The map of the replay is included inside each replay file.

ID Link Size # Replays Sources Author Notes
[R1] Download 1.2 GB 5493 [W1][W2][W3] Ben Webber Contain replays from previous versions of StarCraft not compatible with BWAPI
[R2] Download 359 MB 6000 ? Fobbah Only Zerg replays (versus Zerg, Protoss and Terran)
[R3] Download 644 MB 7649 [W1][W2][W3] Gabriel Synnaeve No duplicates
[R4] Download 54 MB 1029 [W2] Gabriel Synnaeve Users replays (not professional players)
[R5] Download 63 MB 509 [W2] Tom Dietterich Only Protoss vs Terran

Replay analyzers

Now is time to parser the replays. For this we have two options, parser the replay file or parser the BWAPI events/states. The first one we don't need to play the replay on StarCraft, but we only have the click commands of the players and we need to decode the binary replay files. Using BWAPI we can record all game events and all states of the units, even simulate the fog of war, but we need to play each replay in StarCraft.

ID Name Language Type Based on Notes
[A1] LordMartin Replay Browser [DEAD] - File Parser Source code not available
[A2] BWChart [DEAD] C++ File Parser
[A3] RepASsiMilator [DEAD] C++ (PHP) File Parser [A2] PHP extension
[A4] bwhf Java File Parser
[A5] bwrepanalysis Java, C++ File Parser + BWAPI events [A4] BWAPI events recorded: economy 25 frames, tech 1 frame, vision 12 frames, orders 1 frame, unit location 100 frames or new order
[A6] bwrepdump C++ BWAPI events [A5] BWAPI events recorded: economy 25 frames, tech 1 frame, vision 12 frames, orders 1 frame, unit location 100 frames or new order, 1 frame during attacks
[A7] ScExtractor Java File Parser + BWAPI events [A4] BWAPI events recorded: 24 frames OR user action frames OR 24 frames, 7 frames during attacks
[A8] bwrepdump2 C++ BWAPI events [A6] Updated version of [A6] and extended to extract combat records

Datasets

After parsing the replays you will have a clean dataset ready to apply machine learning techniques. Each dataset has been created with some data mining in mind, so maybe they don't capture the information that you want or with the granularity you need. So feel free to create your own dataset using a replay analyzer.

ID Link Size Sources Notes
[D1] Download 1.7 MB [R1] ARFF files to use with Weka. Script used to label each player actions with a build order (early game strategy), i.e. supervised learning.
[D2] Download 870 MB [R2][A5] For each replay it provides 3 plain text files: RGD (Replay Game Data), RLD (Replay Location Data) and ROD (Replay Order Data).
[D3] Download 2.1 GB [R3][A6] For each replay it provides 3 plain text files: RGD (Replay Game Data), RLD (Replay Location Data) and ROD (Replay Order Data). Warning: not up-to-date with last version of A6.
[D4] Download 19.6 GB [R3][A7] Provides SQL files to populate a Data Base with the following structure. Recorded state changes (all unit attributes) each 24 frames. Uses a subset of [R3] after detecting some replays with errors.
[D5] Download 63.3 MB [R5] Contains the opening build choices of the Protoss player in the first 7 minutes of the game, as well as information about what the Terran player could see during this time. The objective is to predict the Protoss player's choices using the information available to the Terran player.

Papers

Some research publications that used a dataset (or a replay analyzer) from the previous section.