Here’s a snapshot of some of our recent work into health

Analysis update (as of 07/01/2021): COVID-19 restrictions have resulted in a significant drop in Australia’s influenza notification rate from April-December 2020

Orbisant’s last analysis (Part II) of the impact of COVID-19 on the influenza rate in Australia was released in early November 2020. With 2020 having recently come to a close and the latest data for November 2020 and December 2020 being released by the Department of Health, Orbisant decided to conduct and release what might be the final instalment in this analysis series.

Rather than using the previous CausalImpact based model, this time Orbisant set out to understand the difference in the distribution of influenza rates (despite sample size discrepancies) and visualise the remainder of 2020 against what could have been otherwise.

This distribution of values and their densities for the data pre-COVID-19 impacts and during COVID-19 impacts is presented below.

Keeping in mind the sample size and temporal differences (e.g. factors unique to a given year) that likely exist, it does remain highly likely that the pre-COVID-19 impacts influenza rate is described by a statistically different distribution than data from April 2020 onwards.

Taking the analysis further, Orbisant wanted to graph the real historical and real 2020 data against what could have likely occurred in the absence of COVID-19. The CausalImpact package is not immediately flexible enough to accommodate the desired plot, so a more straightforward generalised additive model (GAM) was used. Two terms were used as predictors in the model:

Between-year variance (i.e. trend)
Within-year variance (i.e. monthly seasonality)

The trend term was fitted using a standard smooth spline with a knot per year. The seasonality term was fitted using a cyclic cubic spline with a knot per month. A cyclic cubic spline is especially useful for seasonal terms as they ensure the beginning value (i.e. January) and the ending value (i.e. December) connect across the years, forming a continuous pattern. Other splines, especially the default spline in the mgcv R package used to build the model, would typically produce discontinuous patterns where months across years do not connect - which is obviously inappropriate for seasonality. These two predictor terms are by no means exhaustive and any model would likely benefit from the inclusion of others.

Other common forecasting models such as ARIMA or AR-enabled GAMs may have been more appropriate for this task, however, given that the data was non-stationary, substantial differencing would have had to be conducted first before using ARIMA. Since the parsimonious GAM produced results that were aligned with expectations and the exact forecast values themselves were not of paramount importance, no model comparisons were performed. Future analysis may wish to adopt a comparative modelling approach.

The output of the preliminary forecast comparison model is presented below.

Strikingly, the sharp decline in the influenza notification rate is visible from April 2020 onward. The rate in 2020 up to and including March would be regarded as within the bounds of expectation, but April onward is almost certainly anomalous to the historical data. Interestingly, a spike in the flu rate occurs at November 2020-December 2020. While the drivers of this are somewhat unknown and could be numerous, it is likely that some of the spike is explained by the loosening of lockdown rules and interstate travel restrictions after COVID-19 cases began to decrease. The spike itself is still far below any expected rate for November and December 2020.

This post summarises what has been an informative analytical journey in understanding a lesser-studied impact of COVID-19. Future analysis might seek to observe the longevity of the decrease, particularly as vaccine distribution may be more widespread by the time the next “flu season” comes around in Winter 2021. Future analysis may also seek to quantify the medical and economic impact of the decrease in the influenza rate.

Code for this analysis can be found on GitHub.

Analysis update (as of 03/11/2020): COVID-19 restrictions have most likely caused the influenza rate to significantly drop between 81%-131%

Orbisant recently conducted the first of multiple addendum updates to the initial causal impact modelling of influenza rates. In this addendum, Orbisant dropped multiple variables that upon further exploration, did not contribute meaningfully to the analysis and actually did not correlate with influenza seasonal patterns.

The updated model instead simplified the covariates to just include Google search interest trends for the flu vaccine and the mean Australian temperature. In the period between this analysis and the initial modelling, new data up to August had been released. This means the post-intervention period (i.e. post-April 2020 when the COVID-19 restrictions came into effect) has a larger sample size for which to compare the synthetic counterfactual to - resulting in a more accurate model and a better visual representation (see the graph above).

A significant drop in the influenza rate of between 81%-131% (90% confidence interval) has occurred in the period since April 2020. The mean intervention drop is 106%. In the period post-COVID-19 restrictions, the mean log influenza rate was -0.21, when in the absence of COVID-19 impacts, this mean would be expected to be 3.37 (as calculated by the synthetic counterfactual).

This analysis will continue to be updated as each month’s data is released. Code for the analysis can be found on GitHub.

covid-19 restrictions have caused a drop in influenza rates of 59%-138% (with 90% probability)

It goes without saying that COVID-19 has significantly impacted the world. One interesting impact, is its potential effect on rates of other communicable diseases. Indeed, given the timing of COVID-19 restrictions and lockdown procedures at the end of March (21 March specifically), one might expect there to be some impact on the current 'flu season' in the Australian Winter months of June-August. After the initial suggestion by a day-job colleague of Trent's (Adele Tyson), Orbisant set out to understand the impact of COVID-19 restrictions and societal/behavioural changes on Australia's influenza rates.

The graph at the top will be explored near the end of this post. To build up to that model, it's useful to understand the underlying data. As with almost all time-series analysis, the first step is to visually inspect the data. The graph below shows two ways of doing this:

Visualising the influenza rate as a univariate time series (top left) and visualising a log-transformed version of the rate as a univariate time series (top right) - Evidently, the raw data appears somewhat erratic with some particular concerning outlier years. The transformed data makes the data much more interpretable, and there are clear (expected) seasonal patterns and with a slight upward trend until 2020
Visualising the influenza rate by month and year (bottom) - This secondary visualisation confirms the findings from the other two graphs, and reinforces the notion that 2020 to-date is quite anomalous compared directly to the same months in previous years. Specifically, a rapid decline in influenza rates occurs immediately after the March 21 restrictions are imposed

These initial visualisations served to both confirm initial expectations about the data as well as identify potential time-series characteristics that will have to be modelled statistically if forecasts are to be somewhat feasible. These characteristics (or ‘components’) are seasonality and trend.

Orbisant then considered a range of modelling options. More 'traditional' statistical models were built first, such as autoregressive integrated moving average (ARIMA) models. These yielded interesting forecasts for the remainder of 2020, but did not really answer the core question underpinning this piece of work - "Have changes to socialising, travel and hygiene changed the influenza rate?" This question clearly is asking a question of causality - something that these traditional models are not designed to address.

Upon searching for interesting approaches to this type of question, Orbisant came across the 'CausalImpact' R package developed by Google. This package uses Bayesian state space models to simulate a synthetic counterfactual (i.e. "What would the influenza rate have looked like if some intervention didn't happen?"). The Bayesian approach has two key benefits for this analysis:

Bayesian statistics lets us specify and quantify our uncertainty
Bayesian statistics produces probabilistic estimates - meaning we can credibly say things like "the true population parameter lies in a range between X and Y with 90% probability"

Another major strength of this approach is that that state space models can flexibly handle multiple covariate time series, meaning more accurate counterfactuals can be produced as more time series are added. This is especially the case when you add time series whose pattern you would expect to correlate with the data of interest - such as weather (as it drops in Winter and increases outside of it; the is the direct inverse of the flu trend), but that might not be directly impacted by the COVID-19 restrictions. Further, in the Bayesian framework, you can specify your uncertainty about which of these time series are most useful for the counterfactual. This means a probabilistic approach to the counterfactual is taken, as opposed to directly specifying something based on a lack of rigour.

Orbisant compiled a collection of potentially useful time series for use in this model. The time series included:

Temperature for Brisbane, Sydney, Melbourne, and Perth
Sales turnover for six different industries (food, restaurants, department stores, clothing, household items, other retail)
Google trends data for searches related to 'flu vaccine'
Domestic flight hours
Number of employed persons
Hepatitis B rate
Hepatitis C rate

The graph at the top of this post shows the output of this model. The top plot shows that the synthetic counterfactual follows the actual data quite well in the pre-COVID-restriction time periods. Further, over this same timeframe, the second plot shows an approximately mean causal impact estimate of 0 (or no effect). This adds validity to our counterfactual’s ability to model the underlying trends. In the post-COVID-restriction period, the actual data drops far below the synthetic counterfactual. This strongly suggests that there has been a direct impact of some kind of influenza rates in Australia from April onwards. Indeed, the Bayesian posterior probability of there being a causal impact is 99%. The magnitude of this impact is such that influenza rates have likely dropped between 59% and 138% with 90% probability. This model will be updated each month as new data becomes available.

Last, we can explore which of the time series had the highest probability of being included (see graph below). Evidently the number of employed persons is by far the most commonly used data, with food turnover and Melbourne temperatures being second and third, and marginally higher than the other time series. The difference in inclusion probabilities between the temperature data of the included capital cities intuitively makes sense, as Melbourne's temperature swings (i.e. colder in Winter and back to hot in Summer) would be more severe than the other cities, meaning it more directly follows the inverse of influenza rate, thus making it potentially more informative.

Australian fertility rates show a distinct decline after the introduction of the pill

Births and deaths are important numbers for any society to keep track of. Operationalising the births side of this as fertility rates (i.e. rate of births per 1,000 women) can provide informative comparative analyses - especially when the data is available as a time series. Trent from Orbisant was speaking to a day-job work colleague at Nous Group with a background in bioinformatics, Adele Tyson, who came up with the idea to explore births data. With this direction and ongoing discussion with Adele, Trent set out to understand if any interesting patterns emerged from fertility rate data from 1921-2015 made available by the ABS.

As with any time-series analysis, the first step is to visualise the data. The graph above represents the fertility rate data over time by age group. On first glance, some interesting patterns stand out, and confirm some cohort and historical trends:

The Great Depression - substantial age-group independent downward trend in fertility rates from 1920s to mid-1930s
Baby Boomers - substantial upward trend in fertility rates for ages 20-30 from early 1940s to 1960
Older adults (age 45-49+) - almost no fluctuation in fertility rates across the time series compared to every other age group

Two further interesting patterns stand out in this graph:

Very rapid and substantial declines in fertility rates for ages 20-44 from 1961, especially for 20-35 year-olds
Substantial upward trend in fertility rates from 1980 for 30-40 year-olds (complete recovery for 30-34 year-olds from the 1960 decline)

Trent showed this preliminary graph to another non-work colleague, Talei Daly-Olm, who surmised that the steep decline in fertility rates from approximately 1960 might be due to the introduction of the contraceptive pill to Australia in February, 1961. Looking at the graph, Adele proposed that the reversal of this trend for 30-34 year-olds (and even 35-39 year-olds) may be due to an increase in availability and uptake in In vitro fertilisation (IVF). This should be explored further with the appropriate data. The pill introduction hypothesis was then explored in more detail.

The graph below shows the average fertility rate by year (aggregated over age group), with the pill introduction date added as a visual cue. Statistical curves were fit to the data for pre-pill introduction (1921-1960) and post-pill introduction (1961-2015) using a local regressive smoothing function to graph non-linear estimates and their confidence intervals.

This graph very definitively shows the effects of The Great Depression and the introduction of the pill. Average fertility rates were clearly rising in the two decades preceding the introduction of the pill, but inverted to a steep decline immediately from 1961 until approximately 1980. After 1980, average fertility rates asymptote toward a rate of 50. While support for the hypothesis of the pill catalysing sharp reductions in fertility rates appears strong, it’s also important to consider the broader societal changes that were occurring in these years, and continue to this day. These changes include shifts in traditional gender roles and equality, with the number of women participating in the workforce increasingly greatly over the latter half of the 1900s and into the 2000s. While not quantifiable in this analysis, it is likely the magnitude of the decline seen in the graph below is partially explained by these additional societal factors.

This analysis has highlighted many interesting trends in Australian fertility rates. It was acknowledged early in the analysis that some age groups began a sharp recovery in rates from 1980-2015. These patterns should be explored further, especially starting with the hypothesis that IVF treatment uptake increased. Follow-up analysis will aim to statistically test these differences in rates over time in the period after the pill was introduced.

Code for the analysis is available here.