Here’s a snapshot of some of our recent work into D&D data from the show ‘Critical Role’

Interactive web application - the critical role statistical visualiser

Orbisant’s ongoing analysis into data from Critical Role has now been formalised into an interactive web application! Please check it out here. New analysis and visualisations are being added on a consistent basis.

Bayesian modelling both accounts for uncertainty and the addition of new data each episode

Critical Role releases new episodes each Thursday. On top of usual assumptions and data challenges for statistical modelling, this additive process creates another major one: How can we incorporate new data into our model(s) as it becomes available?

As avid D&D and Critical Role fans and using a Bayesian statistics approach, Orbisant set out to build a model that determined whether characters “specialise” as a campaign goes on. Put another way, do characters focus more on either damage or healing as they level up and acquire increasingly niche abilities? Data from the fantastic CritRoleStats was used.

Traditional “frequentist” statistics (i.e. null hypothesis significance testing - regression, analysis of variance etc.) are typically too inflexible for this additive data process. This is because in frequentist statistics, we do not impose our prior knowledge on models - we model the data that is available and assume that if we re-did the same experiment/process many times over with data from the same population, 95% of those times our predictions would lie in a given range (this is known as a confidence interval - doesn’t have to be 95%). It also assumes the parameter we are estimating is a fixed, but unknown quantity. This means confidence intervals are not probability distributions - rather, their interpretation is simply one of whether the parameter value is either inside or outside the 95% range.

Conversely, Bayesian statistics provides solutions to both of these issues:

Adding new data - Bayesian statistics, by its mathematical definition, uses an expression of prior information (in the form of a distribution) on the data that is available to produce a posterior distribution. This means the parameter we are estimating is a random variable with a distribution. In the case of Critical Role, since data is added weekly, we can update the model built on all the previous weeks’ data by using the historical data as a prior.
Probability distributions - Bayesian statistics relies on distributions, from the prior to the likelihood, to the marginal likelihood, to the posterior. Contrary to frequentist confidence intervals, Bayesian analysis produces credible intervals, which are probability distributions of the believed population parameter. This means we can make statements such as “given the data, the true population parameter value lies between X and Y with 95% probability.”

The graph at the top shows an animation of a sample of draws from the Bayesian posterior distribution. Looking at the general trends of these draws for both low levels (proficiency bonus of up to 3) and higher levels (proficiency bonus of 4-6), the research question of character specialisation can be addressed. This is referred to as an interaction effect (specifically, the interaction of damage and proficiency bonus on healing given).

Evidently, the slope of the relationship is more negative for higher levels, such that characters who deal more damage in an episode tend to give less healing. Contrarily, at lower levels, the relationship is quite flat, suggesting that there isn’t as much specialisation into damage or party healing as higher levels. This finding resonates with D&D and role-playing game intuition because characters acquire new abilities when they level up. This process by its very nature begins to narrow a character’s focus to enhance the overall probability of the party surviving/succeeding. This model will be continuously updated as more episodes are released.

Code for this is available here.

statistical modelling and advanced visualisations reveal interesting character and episode-level findings for the mighty nein

Critical Role is a fantastic show where a group of professional voice actors (and friends) get together and play Dungeons & Dragons (D&D). Not only are the cast and overall show incredible, but so is the community. So much so, that a group of fans started Critical Role Stats - a website dedicated to tracking almost anything that happens in the show that can be quantified. Things that are tracked include:

Every roll (value, "Natural" value)
Roll type (e.g. Constitution, Dexterity, Strength Check etc.)
Number of times the Dungeon Master (DM) facepalms in each episode
Number of character kisses
Character damage dealt
Character healing given

This is by no means an exhaustive list, but highlights the richness of data that a group of fans have single-handedly enabled. Orbisant set out to build sophisticated and interesting data visualisations from some of the spreadsheets made available by Critical Role Stats, as well as determine if any informative statistical models could be built on the data.

The collection of graphs at the top of this post shows the first cut of analysis that was made. The first of these shows a heatmap of roll values by character, and what proportion that value accounted for of its total rolls. As expected, the colour gradient of the tiles across all characters goes from (left to right) green to pink to green, which resembles a Gaussian curve, or normal distribution. The graph at the bottom left shows an aggregated form of this, with a probability density function showing a significant concentration of rolls occurring in the normal values, and with marginally more Natural20s than Natural 1s. The graph at the bottom right shows a density time series of Natural20 rolls by character. Here, the unfortunate exit of the character Molly from the show at Episode 26 is visually evident, as is Beauregard's seemingly strong Natural20 rolling performance.

After producing these initial visualisations, Orbisant set about applying more complex data visualisations and statistical models. The collection of graphs below show this. The first of these graphs visualises the outputs of a multinomial logistic regression model - a statistical model which produces probabilities of an input value belonging to a categorical outcome value (where the outcome has more than 2 categories). In this case, Orbisant wanted to determine if a given character was significantly more likely to roll a Natural1 or a Natural20 compared to a baseline of all other "normal" rolls and 1 character (Beauregard; this was a random choice - the model intercept needs only one character to compute). As shown by the asterisks in the plot, Caduceus is significantly less likely to roll a Natural1 than a normal value roll compared to Beau, while Veth and Jester are significantly less likely to roll a Natural20. Since dice rolling is a probabilistic distributive outcome of chance, this is an interesting finding.

The bottom left graph in this collection of plots shows the distribution of normal roll values (i.e. not a Natural roll) for each character. As expected with many dice rolls across one-hundred episodes, these distributions approximate normal distributions. Interestingly, Veth appears to have a longer right-tail than other characters, signifying the occurrence of a few very high-value rolls.

The bottom right graph plots damage dealt and healing given by character per episode. Importantly, for visual clarity, cases where a character deal 0 damage and gave 0 healing were removed before summation occurred at the episode level. As expected, there is a large concentration of very low value incidents, which is indicative of characters starting at a low level in D&D and also other factors, such as some episodes not featuring as much combat as others. However, there are a few outliers where the total damage dealt or healing given by a character in an episode was very large. Also notably, the cleric character Caduceus is responsible for most of the healing given. A generalised additive model (GAM; a statistical regression model that uses smoothing functions) was added to show trend in the data.

This analysis has highlighted that quite sophisticated techniques in visualisation and statistics can be applied to data from games and shows. However, without the mountainous efforts of the community to collect and index the data in such a clean format, this analysis would have been exceedingly time consuming. Hopefully more shows with readily-quantifiable aspects can have such a similar supportive community group produce such useful products. This analysis will be updated as Critical Role continues, and there appears to be scope to transition it into an automated web-based analytics tool.

Code for this analysis is available on GitHub.