Skip to content

Commit e01d794

Browse files
committed
update
1 parent da08473 commit e01d794

9 files changed

+30
-12
lines changed

content/en/blog/back-to-the-future-retrospectively-harmonizing-questionnaire-data.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,21 @@ date: 2022-11-10
55
categories: ["psychology"]
66
image: /images/blog/to-do-g9c7aee9ed_1920-1536x974.jpg
77
aliases: "/blog/back-to-the-future-retrospectively-harmonising-questionnaire-data"
8+
9+
10+
11+
aliases:
12+
- "/blog/back-to-the-future-retrospectively-harmonising-questionnaire-data"
13+
- "/back-to-the-future-retrospectively-harmonising-questionnaire-data/"
14+
url: "/data-harmonisation/back-to-the-future-retrospectively-harmonising-questionnaire-data/"
15+
816
---
917

1018
Now more than ever, the international research community are keen to determine whether their findings replicate across different contexts. For instance, if a researcher discovers a potentially important association between two variables, they may wish to see whether this association is present in other populations (e.g. different countries, or different generations). In an ideal world, this would be achieved by conducting follow-up studies that are harmonised by design. In other words, the exact same methodologies and measures would be used in a new sample, in order to determine whether the findings can be replicated. Such direct replication is often challenging however, with research funders often preferring novel lines of inquiry.
1119

1220
As an alternative to direct replication, researchers may choose to reach out to others in the field who either have access to, or are in the process of collecting, comparable data. Indeed many researchers, particularly those in the life and social sciences, routinely make use of large, ongoing studies that collect a variety of data for multiple purposes (e.g. [longitudinal](/item-harmonisation/harmony-a-free-ai-tool-to-merge-longitudinal-studies) population studies). In practice however, much of our research is designed and carried out in silos – with different research groups tackling similar research questions using widely different designs and measures. Even if a researcher is successful in identifying data that are similar to their original work, minor differences in the design or measures may limit the comparability. What are researchers to do in such situations?
1321

14-
One increasingly popular option is retrospective harmonisation. This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation-in-education) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonisation can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
22+
One increasingly popular option is retrospective [harmonisation](data-harmonisation). This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation-in-education) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonisation can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
1523

1624
However, not all constructs can be measured with such simple, categorical questions. Take the above outcome variable (depression) for instance. Depression is a complex, heterogeneous experience, characterized by a multitude of symptoms that can be experienced to various degrees and in different combinations. In large-scale surveys, depression is typically measured with standardized questionnaires – participants are asked to report on a range of symptoms, their responses are assigned numerical values, and these are summed to form a “total depression score” for each individual. Although this remains the most viable and plausible strategy for measuring something as complex as depression, there is no “gold standard” questionnaire that is universally adopted by researchers. Instead, there are well over 200 established depression scales. In a [recent review](https://www.closer.ac.uk/wp-content/uploads/210715-Harmonisation-measurement-properties-mental-health-measures-british-cohorts.pdf) (McElroy et al., 2020), we noted that the content of these questionnaires can differ markedly, e.g. different symptoms are assessed, or different response options are used.
1725

content/en/blog/data-harmonisation-examples-business.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "10 Data Harmonisation Examples That Move Businesses and Organisations Fo
33
description: "Unlock Data Harmonisation with Harmony: Transform Your Research & Analysis. Explore Harmony for seamless data harmonisation. Dive into our guide on using this tool to enhance research, attract collaborations, and drive insights."
44
date: 2024-02-27
55
categories: ["data"]
6-
image: "/images/01- X Data harmonisation examples that move businessess and organizations forward.svg"
6+
image: "/images/01-data-harmonisation-examples-businesses.svg"
77
aliases:
88
- "/10-data-harmonisation-examples-that-move-businesses-and-organisations-forward/"
99
url: "/data-harmonisation/10-data-harmonisation-examples-that-move-businesses-and-organisations-forward/"

content/en/blog/data-harmonisation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Data harmonisation is not a theoretical concept but a practical necessity across
130130
{{< youtube cEZppTBj1NI >}}
131131

132132

133-
Tools like Harmony, designed specifically for the retrospective harmonisation of questionnaire items, are invaluable in research and data analysis. They allow for the comparison and combination of data from different studies or time periods, which is crucial in fields like social sciences, healthcare, and market research.
133+
Tools like Harmony, designed specifically for the [retrospective harmonisation of questionnaire items](/data-harmonisation/back-to-the-future-retrospectively-harmonising-questionnaire-data/), are invaluable in research and data analysis. They allow for the comparison and combination of data from different studies or time periods, which is crucial in fields like social sciences, healthcare, and market research.
134134

135135
**Perspectives from EPAM and TIBCO**
136136
Companies like EPAM and TIBCO highlight the strategic importance of data harmonisation. They emphasize how it can provide a competitive edge by ensuring data consistency across an organization, improving decision-making, and streamlining operations.

content/en/blog/how-does-harmony-work.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,11 @@ description: When you input two questionnaires into Harmony, such as the GAD-7 a
44
date: 2022-11-03
55
categories: ["nlp"]
66
image: /images/blog/harmony-1.png
7-
aliases: "/blog/how-does-harmony-work"
7+
8+
aliases:
9+
- "/blog/how-does-harmony-work"
10+
- "/how-does-harmony-work"
11+
url: "/nlp-semantic-text-matching/how-does-harmony-work/"
812
---
913

1014
When you input two questionnaires into Harmony, such as the [GAD-7](https://en.wikipedia.org/wiki/Generalized_Anxiety_Disorder_7) and [Beck’s Anxiety Inventory](https://res.cloudinary.com/dpmykpsih/image/upload/great-plains-health-site-358/media/1087/anxiety.pdf), Harmony is able to match similar questions and assign a number to the match. (I have written another blog post on [how we measured Harmony’s performance in terms of AUC](https://harmonydata.ac.uk/measuring-the-performance-of-nlp-algorithms/)).
@@ -120,7 +124,7 @@ With an aim to make our research as accessible to the public as possible, we hav
120124

121125
## What’s next for Harmony?
122126

123-
### Likert scale [matching](https://harmonydata.ac.uk/semantic-text-matching-with-deep-learning-transformer-models)
127+
### Likert scale [matching](/nlp-semantic-text-matching/)
124128

125129
The questions often come with a set of options such as *definitely not, somewhat anxious*, and the like. These are often a form of [Likert scale](https://en.wikipedia.org/wiki/Likert_scale). We would like to apply the same logic to match the candidate answers in a question, and identify when questions have opposite polarity (*I often feel anxious* vs *I rarely feel anxious*).
126130

content/en/blog/how-far-can-we-go-with-harmony-testing-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ image: /images/blog/ccd.png
99
aliases:
1010
- "/blog/how-far-can-we-go-with-harmony-testing-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe"
1111
- "/how-far-can-we-go-with-harmony-testing-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe/"
12-
url: "/nlp-semantic-text-matching-with-deep-learning-transformer-models/harmony-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe/"
12+
url: "/nlp-semantic-text-matching/harmony-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe/"
1313
---
1414

1515
Many psychologists believe that mental illnesses can vary across cultures. In 1904, [Emil Kraepelin](https://en.wikipedia.org/wiki/Emil_Kraepelin) initiated the field of comparative psychiatry after studying mental health disorders in Java, writing that _“Die Eigenart eines Volkes wird auch in der Häufigkeit und klinischen Gestaltung seiner Geistesstörungen zum Ausdruck kommen,”_ meaning “The peculiarity of a people[ethnic group] will also be expressed in the frequency and clinical form of its mental disorders.”[1]
@@ -59,7 +59,7 @@ Although English is the best-resource language for [natural language processing]
5959

6060
Above: the text of the Shona symptom questionnaire for the detection of depression and anxiety.
6161

62-
A problem I encountered was that the [transformer model](/nlp-semantic-text-matching-with-deep-learning-transformer-models/) didn’t work for both Shona and English (it’s not multilingual, like Harmony’s default transformer model). I Google translated [GHQ-12](/ghq-12-vs-beck-anxiety-inventory) into Shona as a temporary measure.
62+
A problem I encountered was that the [transformer model](/nlp-semantic-text-matching/) didn’t work for both Shona and English (it’s not multilingual, like Harmony’s default transformer model). I Google translated [GHQ-12](/ghq-12-vs-beck-anxiety-inventory) into Shona as a temporary measure.
6363

6464
Also, the transformer model did not operate as a sentence transformer, but rather as a token-level transformer, so my sentence vectors were made by averaging token vectors across an input.
6565

@@ -74,7 +74,7 @@ Also, when we are using English and Portuguese texts, which has until now been o
7474

7575
## Further reading
7676

77-
You may want to read about my [experiments with semantic text matching with deep learning transformer models](/nlp-semantic-text-matching-with-deep-learning-transformer-models/).
77+
You may want to read about my [experiments with semantic text matching with deep learning transformer models](/nlp-semantic-text-matching/).
7878

7979
## References
8080

content/en/blog/how-to-harmonise-questionnaires.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,12 @@ description: "Discover expert strategies for questionnaire harmonisation with Ha
44
date: 2024-04-16
55
categories: ["data"]
66
image: "/images/10- How to harmonise questionnaires - X practical steps.svg"
7+
8+
9+
10+
aliases:
11+
- "/harmonising-questionnaire-data-10-1-practical-steps-for-enhanced-consistency/"
12+
url: "/data-harmonisation/harmonising-questionnaire-data-consistency/"
713
---
814

915
# **How to Harmonise Questionnaires - 10 Practical Steps (+ 1 Bonus Tip)**
@@ -14,7 +20,7 @@ This quote really highlights the importance of effective questionnaires. They're
1420

1521
This is where the challenge of non-harmonised data comes in, and it truly can be a problem when you have differently formatted surveys with different questions and scales. The questionnaires might not even be in the same language. Analysing the data straight-up is like trying to complete a jigsaw puzzle where the pieces are from different sets (it’s quite literally impossible). So, we need to get our questionnaires to work in harmony with each other.
1622

17-
In this guide, you'll discover 10 practical steps – and we've thrown in an extra one for good measure – to assist you in the harmonisation of questionnaire data. These data harmonisation steps are designed to make the process smoother, so that your collected data is not just abundant but also rich in insights and meaning.
23+
In this guide, you'll discover 10 practical steps – and we've thrown in an extra one for good measure – to assist you in the harmonisation of questionnaire data. These [data harmonisation](/data-harmonisation/) steps are designed to make the process smoother, so that your collected data is not just abundant but also rich in insights and meaning.
1824

1925
## **1\. Define Clear Objectives**
2026

content/en/blog/measuring-the-performance-of-nlp-algorithms.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ image: /images/blog/roc.png
88
aliases:
99
- "/blog/measuring-the-performance-of-nlp-algorithms/"
1010
- "/measuring-the-performance-of-nlp-algorithms/"
11-
url: "/semantic-text-matching-with-deep-learning-transformer-models/measuring-the-performance-of-nlp-algorithms/"
11+
url: "/nlp-semantic-text-matching/measuring-the-performance-of-nlp-algorithms/"
1212
---
1313

1414
_Harmony was able to reconstruct the matches of the questionnaire harmonisation tool developed by McElroy et al in 2020 with the following AUC scores: childhood **84%**, adulthood **80%**. Harmony was able to match the questions of the English and Portuguese [GAD-7](https://adaa.org/sites/default/files/GAD-7_Anxiety-updated_0.pdf) instruments with AUC **100%** and the Portuguese [CBCL](https://www.apa.org/depression-guideline/child-behavior-checklist.pdf) and [SDQ](/ces-d-vs-sdq) with AUC **89%**. Harmony was also evaluated using a variety of transformer models including MentalBERT, a publicly available pretrained language model for the mental [healthcare](https://fastdatascience.com/the-use-of-ai-in-healthcare) domain._

content/en/blog/semantic-text-matching-with-deep-learning-transformer-models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ categories: ["nlp"]
66
image: /images/blog/gad7-becks.jpg
77
aliases: ['/semantic-text-matching-with-deep-learning-transformer-models/']
88

9-
url: "/nlp-semantic-text-matching-with-deep-learning-transformer-models/"
9+
url: "/nlp-semantic-text-matching/"
1010
---
1111

1212
Semantic text matching is a task in [natural language processing](https://naturallanguageprocessing.com/) involving estimating the semantic [similarity](https://fastdatascience.com/finding-similar-documents-nlp/) between two texts. For example, if we had to quantify the similarity between “I feel nervous” and “I feel anxious”, most people would agree that these are closer together than either sentence is to “I feel happy”. A semantic text matching algorithm would be able to place a number on the similarity, such as 79%.
@@ -35,4 +35,4 @@ Transformer models have proven to be very effective in semantic text matching. T
3535

3636
## See also
3737

38-
* [Harmony on "kufungisisa": a cultural concept of distress from Zimbabwe](/nlp-semantic-text-matching-with-deep-learning-transformer-models/harmony-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe/)
38+
* [Harmony on "kufungisisa": a cultural concept of distress from Zimbabwe](/nlp-semantic-text-matching/harmony-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe/)

0 commit comments

Comments
 (0)