Skip to content

Commit 22f68f1

Browse files
committed
update
1 parent 0e61e82 commit 22f68f1

10 files changed

+16
-14
lines changed

content/en/ada.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The [Australian Data Archive (ADA)](https://ada.edu.au/) is a national service f
1313

1414
The ADA provides data access through the [ADA Dataverse](https://dataverse.ada.edu.au/). The collection includes polls on housing conditions in Australian states, political views over time across the country, questions about employment or health, and other datasets that the ADA has collected over the years (such as the Australian election study).
1515

16-
In 2023, the ADA embarked on a project to harmonise a vast collection of survey questions, seeking a solution that could effectively identify and group similar items across different studies. Researchers at the ADA found Harmony, a [data harmonisation](/data-harmonisation-unifying-data-for-deeper-insights/) tool powered by [natural language processing](https://naturallanguageprocessing.com/) (NLP), and the ADA recognised its potential to streamline this process.
16+
In 2023, the ADA embarked on a project to harmonise a vast collection of survey questions, seeking a solution that could effectively identify and group similar items across different studies. Researchers at the ADA found Harmony, a [data harmonisation](/data-harmonisation/) tool powered by [natural language processing](https://naturallanguageprocessing.com/) (NLP), and the ADA recognised its potential to streamline this process.
1717

1818
## Challenges
1919

@@ -25,7 +25,7 @@ The ADA faces several challenges in managing its extensive questionnaire data:
2525
## Integrating Harmony into the ADA’s workflow
2626

2727
The ADA may integrate Harmony into its processes, using its powerful NLP capabilities to address the challenges and expedite questionnaire harmonisation:
28-
1. Automated item comparison: Harmony's NLP algorithms [automatically compared and grouped questionnaire items based on their semantic similarity](/how-does-harmony-work/), eliminating the need for manual effort.
28+
1. Automated item comparison: Harmony's NLP algorithms [automatically compared and grouped questionnaire items based on their semantic similarity](/nlp-semantic-text-matching/how-does-harmony-work/), eliminating the need for manual effort.
2929
2. Enhanced consistency: Harmony's intelligent approach ensured consistent categorisation of questionnaire items, reducing inconsistencies and improving data integrity.
3030

3131
## Impact of Harmony on ADA's Operations

content/en/blog/aidl-meetup.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,9 @@ Our session will explore the transformative potential of Generative AI, focusing
7979

8080
## See also our past events
8181

82-
* 11 and 12 September 2024: [Harmony at MethodsCon Futures](/harmony-at-methodscon-futures-in-manchester/) in Manchester
83-
* 2 July 2024: [Harmony: NLP and generative models for psychology research](/pydata) at Pydata London
82+
* 11 and 12 September 2024: [Harmony at MethodsCon Futures](/ai-in-mental-health/harmony-at-methodscon-futures/
83+
) in Manchester
84+
* 2 July 2024: [Harmony: NLP and generative models for psychology research](/psychology-ai-tool/pydata-meetup/) at Pydata London
8485
* 3 June 2024: [Harmony Hackathon](/hackathon/) at UCL
8586
* 5 May 2024: [Harmony: A global platform for harmonisation, translation and cooperation in mental health](/harmony-at-lifecourse-seminar/) at Melbourne Children’s LifeCourse Initiative seminar series.
8687
* 27 March 2024: [Harmony at AI Camp](/upcoming-tech-talk-at-aicamp-meetup/)

content/en/blog/back-to-the-future-retrospectively-harmonizing-questionnaire-data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Now more than ever, the international research community are keen to determine w
1919

2020
As an alternative to direct replication, researchers may choose to reach out to others in the field who either have access to, or are in the process of collecting, comparable data. Indeed many researchers, particularly those in the life and social sciences, routinely make use of large, ongoing studies that collect a variety of data for multiple purposes (e.g. [longitudinal](/item-harmonisation/harmony-a-free-ai-tool-to-merge-longitudinal-studies) population studies). In practice however, much of our research is designed and carried out in silos – with different research groups tackling similar research questions using widely different designs and measures. Even if a researcher is successful in identifying data that are similar to their original work, minor differences in the design or measures may limit the comparability. What are researchers to do in such situations?
2121

22-
One increasingly popular option is retrospective [harmonisation](/data-harmonisation). This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation-in-education) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonisation can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
22+
One increasingly popular option is retrospective [harmonisation](/data-harmonisation). This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation/data-harmonisation-in-education/) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonisation can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
2323

2424
However, not all constructs can be measured with such simple, categorical questions. Take the above outcome variable (depression) for instance. Depression is a complex, heterogeneous experience, characterized by a multitude of symptoms that can be experienced to various degrees and in different combinations. In large-scale surveys, depression is typically measured with standardized questionnaires – participants are asked to report on a range of symptoms, their responses are assigned numerical values, and these are summed to form a “total depression score” for each individual. Although this remains the most viable and plausible strategy for measuring something as complex as depression, there is no “gold standard” questionnaire that is universally adopted by researchers. Instead, there are well over 200 established depression scales. In a [recent review](https://www.closer.ac.uk/wp-content/uploads/210715-Harmonisation-measurement-properties-mental-health-measures-british-cohorts.pdf) (McElroy et al., 2020), we noted that the content of these questionnaires can differ markedly, e.g. different symptoms are assessed, or different response options are used.
2525

@@ -31,7 +31,7 @@ An alternative approach is to apply retrospective harmonisation at the item-leve
3131

3232
By identifying, recoding, and testing the equivalence of subsets of [items](/item-harmonisation/harmony-a-free-ai-tool-for-longitudinal-study-in-psychology) from different questionnaires (for guidelines see our previous report), researchers can derive harmonised sub-scales that are directly comparable across studies. Our group has previously used this approach to study trends in mental health across different generations (Gondek et al., 2021), and examine how socio-economic deprivation impacted adolescent mental health across different [cohorts](/item-harmonisation/harmony-a-free-ai-tool-for-cross-cohort-research) (McElroy et al., 2022).
3333

34-
One of the main challenges to retrospectively harmonising questionnaire data is identifying the specific items that are comparable across the measures. In the above example, we used expert opinion to match candidate items based on their content, and used psychometric tests to determine how plausible it was to assume that matched items were directly comparable. Although our results were promising, this process was time-consuming, and the reliance on expert opinion introduces an element of human [bias](https://fastdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto) – i.e. different experts may disagree on which items match. As such, we are currently working on a [project](https://fastdatascience.com/starting-a-data-science-project) supported by [Wellcome](/radio-podcast-about-wellcome-data-prize), in which we aim to develop an online tool, ‘Hamony’, that uses machine learning to help researchers match items from different questionnaires based on their underlying meaning. Our overall aim is to streamline and add consistency and replicability to the harmonisation process. We plan to test the utility of this tool by using it to harmonise measures of mental health and social connectedness across two cohort of young people from the UK and and Brazil.
34+
One of the main challenges to retrospectively harmonising questionnaire data is identifying the specific items that are comparable across the measures. In the above example, we used expert opinion to match candidate items based on their content, and used psychometric tests to determine how plausible it was to assume that matched items were directly comparable. Although our results were promising, this process was time-consuming, and the reliance on expert opinion introduces an element of human [bias](https://fastdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto) – i.e. different experts may disagree on which items match. As such, we are currently working on a [project](https://fastdatascience.com/starting-a-data-science-project) supported by [Wellcome](/ai-in-mental-health/radio-podcast-about-wellcome-data-prize/), in which we aim to develop an online tool, ‘Hamony’, that uses machine learning to help researchers match items from different questionnaires based on their underlying meaning. Our overall aim is to streamline and add consistency and replicability to the harmonisation process. We plan to test the utility of this tool by using it to harmonise measures of mental health and social connectedness across two cohort of young people from the UK and and Brazil.
3535

3636
Follow this blog for updates on our Harmony project!
3737

content/en/blog/contribute-open-source-project.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ You might find this guide helpful: https://opensource.guide/how-to-contribute as
5757

5858
Read our [guide to contributing to Harmony](/contributing-to-harmony/).
5959

60-
Harmony is a powerful [data harmonisation tool](/data-harmonisation-unifying-data-for-deeper-insights/) which uses [natural language processing](https://naturallanguageprocessing.com/) (NLP) to [bridge the gap between diverse research studies](/ppie-for-secondary-data-analysis/), automatically comparing and grouping similar items across datasets. Here are a few ways you can get involved in the project:
60+
Harmony is a powerful [data harmonisation tool](/data-harmonisation/) which uses [natural language processing](https://naturallanguageprocessing.com/) (NLP) to [bridge the gap between diverse research studies](/ai-in-mental-health/ppie-for-secondary-data-analysis/), automatically comparing and grouping similar items across datasets. Here are a few ways you can get involved in the project:
6161

6262
### 1. Get coding
6363

content/en/blog/data-harmonisation-healthcare.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ Data harmonisation is a critical endeavor in healthcare, underpinning efforts to
3434
---
3535
### Methodological Considerations
3636

37-
Harmonisation methods in data science and healthcare research aim to standardize disparate data sources to ensure consistency, comparability, and reliability across datasets. These methods are critical in the context of big data and the increasing reliance on electronic health records (EHRs), where data is often collected from various sources with different standards and formats. Harmonisation can be approached retrospectively, after data collection, or prospectively, before data collection begins. The choice between these approaches depends on the constraints of the existing datasets and the theoretical [frameworks](/data-harmonisation-tools-frameworks) guiding the research or clinical objectives.
37+
Harmonisation methods in data science and healthcare research aim to standardize disparate data sources to ensure consistency, comparability, and reliability across datasets. These methods are critical in the context of big data and the increasing reliance on electronic health records (EHRs), where data is often collected from various sources with different standards and formats. Harmonisation can be approached retrospectively, after data collection, or prospectively, before data collection begins. The choice between these approaches depends on the constraints of the existing datasets and the theoretical [frameworks](/data-harmonisation/data-harmonisation-tools-frameworks/
38+
) guiding the research or clinical objectives.
3839

3940
---
4041
## Key Strategies for Harmonisation
@@ -105,7 +106,7 @@ Despite these challenges, the benefits of harmonising health data are substantia
105106
---
106107
## Implementing Data Harmonisation
107108

108-
Data harmonisation can be implemented retrospectively, after data collection, or prospectively, before data is collected. [Retrospective](/back-to-the-future-retrospectively-harmonising-questionnaire-data) harmonisation, also known as ex-post or output harmonisation, aligns existing datasets. Prospective harmonisation, or ex-ante/input harmonisation, involves planning data collection methods and standards in advance to ensure compatibility. Each approach has its merits, and the choice between them often depends on the goals of the harmonisation effort and the nature of the data involved.
109+
Data harmonisation can be implemented retrospectively, after data collection, or prospectively, before data is collected. [Retrospective harmonisation](/data-harmonisation/back-to-the-future-retrospectively-harmonising-questionnaire-data/), also known as ex-post or output harmonisation, aligns existing datasets. Prospective harmonisation, or ex-ante/input harmonisation, involves planning data collection methods and standards in advance to ensure compatibility. Each approach has its merits, and the choice between them often depends on the goals of the harmonisation effort and the nature of the data involved.
109110

110111
The process involves defining the scope of harmonisation, identifying relevant data sources, standardizing data formats and terminologies, and employing technologies such as natural language processing (NLP) to ensure data quality and consistency. Numerous initiatives support data harmonisation efforts, such as the Common Data Model Harmonisation project, which aims to enhance data utility and interoperability across healthcare networks. Tools and technologies like CDASH and the NIH's Common Data Elements facilitate registry interoperability.
111112

content/en/blog/hackathon.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ The Hackathon event will be held at Chandler House (UCL), providing a vibrant an
1414

1515
**This is an in-person hackathon happening in London on 3 June 2024.**
1616

17-
Make sure to also join our [community](/community) on Discord, check out the [ideas list](/ideas) and try our [Kaggle competition](/kaggle)!
17+
Make sure to also join our [community](/community) on Discord, check out the [ideas list](/ideas) and try our [Kaggle competition](/psychology-ai-tool/kaggle/)!
1818

1919
{{< card heading="Register for the Harmony hackathon" copy="Sign up on Eventbrite" url="https://www.eventbrite.com/e/harmony-hackathon-tickets-887795278577" >}}
2020

content/en/blog/measuring-the-performance-of-nlp-algorithms.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ _Harmony was able to reconstruct the matches of the questionnaire harmonisation
1515

1616
The content of this blog post has been written up as a [preprint for publication on OSF](https://osf.io/9x5ej).
1717

18-
Harmony is a tool for comparing questions in natural language from different surveys or instruments. In order to develop the tool, we had to be able to quantify how good it is at recognising equivalent or similar questions. You can read about how Harmony works [in my earlier blog post](/how-does-harmony-work/).
18+
Harmony is a tool for comparing questions in natural language from different surveys or instruments. In order to develop the tool, we had to be able to quantify how good it is at recognising equivalent or similar questions. You can read about how Harmony works [in my earlier blog post](/nlp-semantic-text-matching/how-does-harmony-work/).
1919

2020
For example, we might consider _Tries to Stop Quarrels_ is equivalent to _Is helpful if someone is hurt, upset or feeling ill_, even though there are no words in common between the two texts. But this is subjective, and if we are using AI to make this kind of matches, how can we put a number on our AI’s performance?
2121

content/en/blog/pydata.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ image: "/images/thomas-wood-pydata.jpg"
66

77
aliases:
88
- "/pydata/"
9-
url: "/psychology-ai-tool/aidl-meetup/"
9+
url: "/psychology-ai-tool/pydata-meetup/"
1010
---
1111

1212
## Harmony at PyData London - 86th Meetup

content/en/blog/semantic-text-matching-with-deep-learning-transformer-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ In the case of Harmony, we want to measure the similarity of every item in a que
2727

2828
Recent advancements in deep learning have enabled a new type of semantic text matching technique through [Transformer models](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)), such as [BERT](https://en.wikipedia.org/wiki/BERT_%28language_model%29), [GPT-3](https://openai.com/api/), and the recently announced [Google BARD](https://blog.google/technology/ai/bard-google-ai-search-updates/).
2929

30-
[Transformer](/how-does-harmony-work) models operate on sequences of words, and transform entire sentences in many languages into a vector representation in high-dimensional space. Then we can quantify the similarity between sentences with a simple metric such as Euclidean or cosine distance. This enables us to measure the similarity between words.
30+
[Transformer](/nlp-semantic-text-matching/how-does-harmony-work/) models operate on sequences of words, and transform entire sentences in many languages into a vector representation in high-dimensional space. Then we can quantify the similarity between sentences with a simple metric such as Euclidean or cosine distance. This enables us to measure the similarity between words.
3131

3232
In developing Harmony, the [most performant algorithm](/nlp-semantic-text-matching/measuring-the-performance-of-nlp-algorithms/) tested so far was GPT-3, however, as the field is evolving so rapidly, this is likely to be out of date very soon. So please watch our blog, and in the meantime you can [test out Harmony](https://harmonydata.ac.uk/app/) on your data.
3333

content/en/frequently-asked-questions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Harmony is a tool that helps researchers automate the process of harmonisation u
1818

1919
## How do I cite Harmony?
2020

21-
If you would like to cite our [validation study](/bmc-psychiatry-paper/), published in BMC Psychiatry, you can cite:
21+
If you would like to cite our [validation study](/ai-in-mental-health/bmc-psychiatry-paper/), published in BMC Psychiatry, you can cite:
2222

2323
* McElroy, E., Wood, T.A., Bond, R., Mulvenna M., Shevlin M., Ploubidis G., Scopel Hoffmann M., Moltrecht B., [Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data](https://bmcpsychiatry.biomedcentral.com/articles/10.1186/s12888-024-05954-2#citeas). BMC Psychiatry 24, 530 (2024). https://doi.org/10.1186/s12888-024-05954-2
2424

0 commit comments

Comments
 (0)