Match Data Formats
Cricsheet provides match data in multiple different formats. The current formats are JSON, YAML, XML, and 2 versions in CSV. The data for the JSON, YAML, and CSV formats can be found on the downloads page, and is available gathered together in zip files, whereas the XML data files can be downloaded individually via the cricsheet-xml Sourcehut repository.
If you’re planning to work with the data then you should be aware of the benefits/drawbacks of each. This page gives a summary on each, and you can find the full details on each format linked from each summary.
The “main” data provided by Cricsheet is in JSON format. It is the most complete of all of the formats provided, and should be easier to work with than the previous default format YAML. This format is the official Cricsheet format, and is the most likely to receive updates as extra data becomes available. The other formats are either deprecated (YAML), or experimental (XML, and CSV).
The original data provided by Cricsheet is in YAML format. A version of the YAML data has been provided ever since the project first started, however has now been superseded by the JSON format. It may not be the easiest format to work with if you’re not familiar with YAML.
It’s worth noting that the YAML format will eventually disappear now that the JSON format is in place, however I will give at least 6 months notice of that happening, and will continue to add new match data in the format until that time. No additions are likely to be made to this format though, so the fields that are available in it now won’t be added to.
The 3rd format provided is in XML format. This type of format was requested by one of the users of the data, as they found it easier to use than the YAML version, however it is still regarded as an experimental version and may be discontinued in the future (although a long warning period will be provided should that ever happen).
If you’re the type of person who is really into XML you might find use for the schema file which can be used to validate the XML, and will also give you a much more detailed breakdown of the format.
If you’re someone who would prefer to have a format that is easier to work with, and that will open directly in Excel (or your equivalent), then you probably want one of the CSV formats. The more recent (“Ashwin”) is a format suggested by a user, and the other (“Original”) is loosely based on the format that Retrosheet uses for baseball, with some suitable hacks applied. I would definitely recommend the “Ashwin” format if you’re coming to this new; it doesn’t have all of the data provided in the other versions, however it is probably the most straightforward data format to use if you’re looking to do something with Cricsheet data.
This CSV format takes it’s nickname from Ashwin Raman who kindly suggested the initial format that I tweaked (in a few minor ways). This format consists of two files, the first a ball-by-ball file featuring a single header row, with names for each column, and then each subsequent row detailing a single delivery, with information on the match itself duplicated on each delivery row. The second file contains information on the match itself, such as the players involved, the venue etc.
The “original” CSV format was the first version of CSV data we provided, hence the “original” nickname. Each file has a version row, multiple info rows, and multiple ball rows, with the info rows roughly corresponding to the same data in the YAML format, and the ball rows matching the deliveries. This format can be opened directly in Excel (or the like), but still requires some work to be properly useful.