Here’s a snapshot of some of our recent work into mental health


Analysis of Mental health-related Google search interest data reveals interesting temporal statistical properties

time-series.png

Orbisant has previously explored Australian mental health data, but did not make use of the large volume of information afforded by Google Trends. This analysis focuses on time-series analysis of searches related to mental health topics. Orbisant set out to understand the temporal patterns in mental health search data in Australia and investigate whether seasonal changes and other important characteristics could be statistically uncovered.

Specifically, Orbisant took a deep dive into the following shortlist of search terms:

  • anxiety

  • depression

  • mental health

  • psychologist

  • therapist

Google Trends provides time-series search data in the form of "interest scores" which Google defines as "search interest relative to the highest point on the chart for the given region and time. A value of 100 is the peak popularity for the term. A value of 50 means that the term is half as popular. A score of 0 means that there was not enough data for this term". As with all time-series analysis, the most useful first step is to visualise the data (see graph at the top of the post). In this graph, we can see the following interesting properties:

  • Somewhat consistent oscillations for each search term (suggesting some form of seasonality or periodicity)

  • Decreasing trend for depression

  • Increasing trends for mental health, anxiety and psychologist

  • Slightly increasing trend for therapist (but much smaller than the other increasing search terms)

These trends can be visualised better as a dotplot with a moving average line (see graph below).

ma-plot.png

While both of these graphs are a useful starting point, it is difficult to ascertain a solid understanding of the nature of the seasonality from them alone. The graph below presents another way to visualise the data which allows us to ascertain that there are seasonal fluctuations unique to each search term. These patterns appear to be relatively consistent across years. There are two key standout findings from this plot:

  1. Some search terms (mental health, anxiety, psychologist, therapist) have seen high interest scores in 2020 across most months. This is likely driven by impacts of COVID-19 on population mental health

  2. All search terms have seen sharp declines in interest scores for the end-of-year holiday period starting in November. While it is tempting to surmise that mental health disorders are less prominent in the Summer months, it is potentially likely that people are too busy being caught up in holiday activities to be focusing on seeking mental health support for themselves (through Google, at least)

seasonality.png

Another useful approach is to reduce each time series (i.e. each search term) down to a set of important statistics that can tell us informative things about its temporal structure. Put another way, we can reduce a nearly 16-year time series to a single number for each different temporal statistic. We can then visualise how similar each search term is on these various statistics and see if a useful structure emerges from this process.

This broad approach is termed "highly comparative time-series analysis" (hctsa) and current software approaches developed by Ben Fulcher and colleagues automate much of the process. The hctsa tool itself can produce over 7,000 time series statistics from across the academic literature (e.g. physics, econometrics, finance, statistics) from a single dataset. This enables researchers to leverage approaches developed in many different fields to see which are most appropriate and informative for their immediate context. However, 7,000+ different statistics is typically far too many features for most applied projects. To address this, the authors developed catch22, a 22-feature subset of the 7,000+. Through classification accuracy and mutual redundancy analysis, the authors found these 22 features to be a distinct and useful subset that can provide much of the informative power of the greater hctsa tool.

Orbisant applied catch22 to the Google Trends data and built a clustermap to visualise the results (see plot below). A clustermap aims to discover structure in data by plotting a heatmap and adding hierarchical dendrograms to each axis. We can see from the extracted statistical features that therapist and psychologist appear to cluster together (as expected), and mental health and depression cluster together, while anxiety is separate. This is an interesting finding. One hypothesis is that the symptoms of depression may be more salient to people, and it may also be the first thing that comes to mind when people think of mental health more generally. Anxiety, however, may be more difficult for a person to pick up on, which is reflected in its different temporal structure. Future analysis should seek to include more diagnosis-related terms to explore this.

On the feature side, we can see features from fundamentally different methods clustering together, such the first sub-branch of the first major parent branch which is comprised of 4 features - 2 from correlation methods (CO), 1 from power spectrum analysis (SP), and 1 from symbolic transformations (SB).

Features were scaled prior to plotting.

Features were scaled prior to plotting.

This first analysis of Australian Google Trends data related to mental health has revealed some interesting temporal properties for a range of search terms. Future analysis will seek to disaggregate the data by region (either State/Territory or city) and visualise it geospatially as a map to see if any other informative patterns emerge. This disaggregated approach may also benefit from multilevel modelling - a form of statistical analysis that can handles hierarchical (nested) relationships.

Code for this analysis can be found on GitHub.