You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/en/ada.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ The [Australian Data Archive (ADA)](https://ada.edu.au/) is a national service f
13
13
14
14
The ADA provides data access through the [ADA Dataverse](https://dataverse.ada.edu.au/). The collection includes polls on housing conditions in Australian states, political views over time across the country, questions about employment or health, and other datasets that the ADA has collected over the years (such as the Australian election study).
15
15
16
-
In 2023, the ADA embarked on a project to harmonise a vast collection of survey questions, seeking a solution that could effectively identify and group similar items across different studies. Researchers at the ADA found Harmony, a [data harmonisation](/data-harmonisation-unifying-data-for-deeper-insights/) tool powered by [natural language processing](https://naturallanguageprocessing.com/) (NLP), and the ADA recognised its potential to streamline this process.
16
+
In 2023, the ADA embarked on a project to harmonise a vast collection of survey questions, seeking a solution that could effectively identify and group similar items across different studies. Researchers at the ADA found Harmony, a [data harmonisation](/data-harmonisation/) tool powered by [natural language processing](https://naturallanguageprocessing.com/) (NLP), and the ADA recognised its potential to streamline this process.
17
17
18
18
## Challenges
19
19
@@ -25,7 +25,7 @@ The ADA faces several challenges in managing its extensive questionnaire data:
25
25
## Integrating Harmony into the ADA’s workflow
26
26
27
27
The ADA may integrate Harmony into its processes, using its powerful NLP capabilities to address the challenges and expedite questionnaire harmonisation:
28
-
1. Automated item comparison: Harmony's NLP algorithms [automatically compared and grouped questionnaire items based on their semantic similarity](/how-does-harmony-work/), eliminating the need for manual effort.
28
+
1. Automated item comparison: Harmony's NLP algorithms [automatically compared and grouped questionnaire items based on their semantic similarity](/nlp-semantic-text-matching/how-does-harmony-work/), eliminating the need for manual effort.
29
29
2. Enhanced consistency: Harmony's intelligent approach ensured consistent categorisation of questionnaire items, reducing inconsistencies and improving data integrity.
Copy file name to clipboardExpand all lines: content/en/blog/aidl-meetup.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,8 +79,9 @@ Our session will explore the transformative potential of Generative AI, focusing
79
79
80
80
## See also our past events
81
81
82
-
* 11 and 12 September 2024: [Harmony at MethodsCon Futures](/harmony-at-methodscon-futures-in-manchester/) in Manchester
83
-
* 2 July 2024: [Harmony: NLP and generative models for psychology research](/pydata) at Pydata London
82
+
* 11 and 12 September 2024: [Harmony at MethodsCon Futures](/ai-in-mental-health/harmony-at-methodscon-futures/
83
+
) in Manchester
84
+
* 2 July 2024: [Harmony: NLP and generative models for psychology research](/psychology-ai-tool/pydata-meetup/) at Pydata London
84
85
* 3 June 2024: [Harmony Hackathon](/hackathon/) at UCL
85
86
* 5 May 2024: [Harmony: A global platform for harmonisation, translation and cooperation in mental health](/harmony-at-lifecourse-seminar/) at Melbourne Children’s LifeCourse Initiative seminar series.
86
87
* 27 March 2024: [Harmony at AI Camp](/upcoming-tech-talk-at-aicamp-meetup/)
Copy file name to clipboardExpand all lines: content/en/blog/back-to-the-future-retrospectively-harmonizing-questionnaire-data.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Now more than ever, the international research community are keen to determine w
19
19
20
20
As an alternative to direct replication, researchers may choose to reach out to others in the field who either have access to, or are in the process of collecting, comparable data. Indeed many researchers, particularly those in the life and social sciences, routinely make use of large, ongoing studies that collect a variety of data for multiple purposes (e.g. [longitudinal](/item-harmonisation/harmony-a-free-ai-tool-to-merge-longitudinal-studies) population studies). In practice however, much of our research is designed and carried out in silos – with different research groups tackling similar research questions using widely different designs and measures. Even if a researcher is successful in identifying data that are similar to their original work, minor differences in the design or measures may limit the comparability. What are researchers to do in such situations?
21
21
22
-
One increasingly popular option is retrospective [harmonisation](/data-harmonisation). This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation-in-education) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonisation can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
22
+
One increasingly popular option is retrospective [harmonisation](/data-harmonisation). This involves taking existing data from two or more disparate sources, and transforming the data in some way in order to make it directly comparable across sources. Let’s look at a simple, hypothetical example. Say a researcher wants to examine the relationship between level of [education](/data-harmonisation/data-harmonisation-in-education/) and [depression](/harmonisation-validation/promis-depression-subscale), and whether this varies across two datasets, each from a different country. In dataset A, participants were asked to report their highest qualification out of a list of 10 options ranging from “no formal education” to “doctoral education”, whereas in dataset B there was a simple question that asked participants whether they completed a Bachelor’s degree (yes/no). The 10-option question in dataset A could be recoded to match the variable in dataset B, by collapsing all of the categories above and below Bachelor’s level. In many cases, retrospective harmonisation can be applied on an ad-hoc basis, using simple, logical recoding strategies such as this.
23
23
24
24
However, not all constructs can be measured with such simple, categorical questions. Take the above outcome variable (depression) for instance. Depression is a complex, heterogeneous experience, characterized by a multitude of symptoms that can be experienced to various degrees and in different combinations. In large-scale surveys, depression is typically measured with standardized questionnaires – participants are asked to report on a range of symptoms, their responses are assigned numerical values, and these are summed to form a “total depression score” for each individual. Although this remains the most viable and plausible strategy for measuring something as complex as depression, there is no “gold standard” questionnaire that is universally adopted by researchers. Instead, there are well over 200 established depression scales. In a [recent review](https://www.closer.ac.uk/wp-content/uploads/210715-Harmonisation-measurement-properties-mental-health-measures-british-cohorts.pdf) (McElroy et al., 2020), we noted that the content of these questionnaires can differ markedly, e.g. different symptoms are assessed, or different response options are used.
25
25
@@ -31,7 +31,7 @@ An alternative approach is to apply retrospective harmonisation at the item-leve
31
31
32
32
By identifying, recoding, and testing the equivalence of subsets of [items](/item-harmonisation/harmony-a-free-ai-tool-for-longitudinal-study-in-psychology) from different questionnaires (for guidelines see our previous report), researchers can derive harmonised sub-scales that are directly comparable across studies. Our group has previously used this approach to study trends in mental health across different generations (Gondek et al., 2021), and examine how socio-economic deprivation impacted adolescent mental health across different [cohorts](/item-harmonisation/harmony-a-free-ai-tool-for-cross-cohort-research) (McElroy et al., 2022).
33
33
34
-
One of the main challenges to retrospectively harmonising questionnaire data is identifying the specific items that are comparable across the measures. In the above example, we used expert opinion to match candidate items based on their content, and used psychometric tests to determine how plausible it was to assume that matched items were directly comparable. Although our results were promising, this process was time-consuming, and the reliance on expert opinion introduces an element of human [bias](https://fastdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto) – i.e. different experts may disagree on which items match. As such, we are currently working on a [project](https://fastdatascience.com/starting-a-data-science-project) supported by [Wellcome](/radio-podcast-about-wellcome-data-prize), in which we aim to develop an online tool, ‘Hamony’, that uses machine learning to help researchers match items from different questionnaires based on their underlying meaning. Our overall aim is to streamline and add consistency and replicability to the harmonisation process. We plan to test the utility of this tool by using it to harmonise measures of mental health and social connectedness across two cohort of young people from the UK and and Brazil.
34
+
One of the main challenges to retrospectively harmonising questionnaire data is identifying the specific items that are comparable across the measures. In the above example, we used expert opinion to match candidate items based on their content, and used psychometric tests to determine how plausible it was to assume that matched items were directly comparable. Although our results were promising, this process was time-consuming, and the reliance on expert opinion introduces an element of human [bias](https://fastdatascience.com/how-can-we-eliminate-bias-from-ai-algorithms-the-pen-testing-manifesto) – i.e. different experts may disagree on which items match. As such, we are currently working on a [project](https://fastdatascience.com/starting-a-data-science-project) supported by [Wellcome](/ai-in-mental-health/radio-podcast-about-wellcome-data-prize/), in which we aim to develop an online tool, ‘Hamony’, that uses machine learning to help researchers match items from different questionnaires based on their underlying meaning. Our overall aim is to streamline and add consistency and replicability to the harmonisation process. We plan to test the utility of this tool by using it to harmonise measures of mental health and social connectedness across two cohort of young people from the UK and and Brazil.
35
35
36
36
Follow this blog for updates on our Harmony project!
Copy file name to clipboardExpand all lines: content/en/blog/contribute-open-source-project.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ You might find this guide helpful: https://opensource.guide/how-to-contribute as
57
57
58
58
Read our [guide to contributing to Harmony](/contributing-to-harmony/).
59
59
60
-
Harmony is a powerful [data harmonisation tool](/data-harmonisation-unifying-data-for-deeper-insights/) which uses [natural language processing](https://naturallanguageprocessing.com/) (NLP) to [bridge the gap between diverse research studies](/ppie-for-secondary-data-analysis/), automatically comparing and grouping similar items across datasets. Here are a few ways you can get involved in the project:
60
+
Harmony is a powerful [data harmonisation tool](/data-harmonisation/) which uses [natural language processing](https://naturallanguageprocessing.com/) (NLP) to [bridge the gap between diverse research studies](/ai-in-mental-health/ppie-for-secondary-data-analysis/), automatically comparing and grouping similar items across datasets. Here are a few ways you can get involved in the project:
Copy file name to clipboardExpand all lines: content/en/blog/data-harmonisation-healthcare.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,8 @@ Data harmonisation is a critical endeavor in healthcare, underpinning efforts to
34
34
---
35
35
### Methodological Considerations
36
36
37
-
Harmonisation methods in data science and healthcare research aim to standardize disparate data sources to ensure consistency, comparability, and reliability across datasets. These methods are critical in the context of big data and the increasing reliance on electronic health records (EHRs), where data is often collected from various sources with different standards and formats. Harmonisation can be approached retrospectively, after data collection, or prospectively, before data collection begins. The choice between these approaches depends on the constraints of the existing datasets and the theoretical [frameworks](/data-harmonisation-tools-frameworks) guiding the research or clinical objectives.
37
+
Harmonisation methods in data science and healthcare research aim to standardize disparate data sources to ensure consistency, comparability, and reliability across datasets. These methods are critical in the context of big data and the increasing reliance on electronic health records (EHRs), where data is often collected from various sources with different standards and formats. Harmonisation can be approached retrospectively, after data collection, or prospectively, before data collection begins. The choice between these approaches depends on the constraints of the existing datasets and the theoretical [frameworks](/data-harmonisation/data-harmonisation-tools-frameworks/
38
+
) guiding the research or clinical objectives.
38
39
39
40
---
40
41
## Key Strategies for Harmonisation
@@ -105,7 +106,7 @@ Despite these challenges, the benefits of harmonising health data are substantia
105
106
---
106
107
## Implementing Data Harmonisation
107
108
108
-
Data harmonisation can be implemented retrospectively, after data collection, or prospectively, before data is collected. [Retrospective](/back-to-the-future-retrospectively-harmonising-questionnaire-data) harmonisation, also known as ex-post or output harmonisation, aligns existing datasets. Prospective harmonisation, or ex-ante/input harmonisation, involves planning data collection methods and standards in advance to ensure compatibility. Each approach has its merits, and the choice between them often depends on the goals of the harmonisation effort and the nature of the data involved.
109
+
Data harmonisation can be implemented retrospectively, after data collection, or prospectively, before data is collected. [Retrospective harmonisation](/data-harmonisation/back-to-the-future-retrospectively-harmonising-questionnaire-data/), also known as ex-post or output harmonisation, aligns existing datasets. Prospective harmonisation, or ex-ante/input harmonisation, involves planning data collection methods and standards in advance to ensure compatibility. Each approach has its merits, and the choice between them often depends on the goals of the harmonisation effort and the nature of the data involved.
109
110
110
111
The process involves defining the scope of harmonisation, identifying relevant data sources, standardizing data formats and terminologies, and employing technologies such as natural language processing (NLP) to ensure data quality and consistency. Numerous initiatives support data harmonisation efforts, such as the Common Data Model Harmonisation project, which aims to enhance data utility and interoperability across healthcare networks. Tools and technologies like CDASH and the NIH's Common Data Elements facilitate registry interoperability.
Copy file name to clipboardExpand all lines: content/en/blog/hackathon.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ The Hackathon event will be held at Chandler House (UCL), providing a vibrant an
14
14
15
15
**This is an in-person hackathon happening in London on 3 June 2024.**
16
16
17
-
Make sure to also join our [community](/community) on Discord, check out the [ideas list](/ideas) and try our [Kaggle competition](/kaggle)!
17
+
Make sure to also join our [community](/community) on Discord, check out the [ideas list](/ideas) and try our [Kaggle competition](/psychology-ai-tool/kaggle/)!
18
18
19
19
{{< card heading="Register for the Harmony hackathon" copy="Sign up on Eventbrite" url="https://www.eventbrite.com/e/harmony-hackathon-tickets-887795278577" >}}
Copy file name to clipboardExpand all lines: content/en/blog/measuring-the-performance-of-nlp-algorithms.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,7 @@ _Harmony was able to reconstruct the matches of the questionnaire harmonisation
15
15
16
16
The content of this blog post has been written up as a [preprint for publication on OSF](https://osf.io/9x5ej).
17
17
18
-
Harmony is a tool for comparing questions in natural language from different surveys or instruments. In order to develop the tool, we had to be able to quantify how good it is at recognising equivalent or similar questions. You can read about how Harmony works [in my earlier blog post](/how-does-harmony-work/).
18
+
Harmony is a tool for comparing questions in natural language from different surveys or instruments. In order to develop the tool, we had to be able to quantify how good it is at recognising equivalent or similar questions. You can read about how Harmony works [in my earlier blog post](/nlp-semantic-text-matching/how-does-harmony-work/).
19
19
20
20
For example, we might consider _Tries to Stop Quarrels_ is equivalent to _Is helpful if someone is hurt, upset or feeling ill_, even though there are no words in common between the two texts. But this is subjective, and if we are using AI to make this kind of matches, how can we put a number on our AI’s performance?
Copy file name to clipboardExpand all lines: content/en/blog/semantic-text-matching-with-deep-learning-transformer-models.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ In the case of Harmony, we want to measure the similarity of every item in a que
27
27
28
28
Recent advancements in deep learning have enabled a new type of semantic text matching technique through [Transformer models](https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)), such as [BERT](https://en.wikipedia.org/wiki/BERT_%28language_model%29), [GPT-3](https://openai.com/api/), and the recently announced [Google BARD](https://blog.google/technology/ai/bard-google-ai-search-updates/).
29
29
30
-
[Transformer](/how-does-harmony-work) models operate on sequences of words, and transform entire sentences in many languages into a vector representation in high-dimensional space. Then we can quantify the similarity between sentences with a simple metric such as Euclidean or cosine distance. This enables us to measure the similarity between words.
30
+
[Transformer](/nlp-semantic-text-matching/how-does-harmony-work/) models operate on sequences of words, and transform entire sentences in many languages into a vector representation in high-dimensional space. Then we can quantify the similarity between sentences with a simple metric such as Euclidean or cosine distance. This enables us to measure the similarity between words.
31
31
32
32
In developing Harmony, the [most performant algorithm](/nlp-semantic-text-matching/measuring-the-performance-of-nlp-algorithms/) tested so far was GPT-3, however, as the field is evolving so rapidly, this is likely to be out of date very soon. So please watch our blog, and in the meantime you can [test out Harmony](https://harmonydata.ac.uk/app/) on your data.
Copy file name to clipboardExpand all lines: content/en/frequently-asked-questions.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ Harmony is a tool that helps researchers automate the process of harmonisation u
18
18
19
19
## How do I cite Harmony?
20
20
21
-
If you would like to cite our [validation study](/bmc-psychiatry-paper/), published in BMC Psychiatry, you can cite:
21
+
If you would like to cite our [validation study](/ai-in-mental-health/bmc-psychiatry-paper/), published in BMC Psychiatry, you can cite:
22
22
23
23
* McElroy, E., Wood, T.A., Bond, R., Mulvenna M., Shevlin M., Ploubidis G., Scopel Hoffmann M., Moltrecht B., [Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data](https://bmcpsychiatry.biomedcentral.com/articles/10.1186/s12888-024-05954-2#citeas). BMC Psychiatry 24, 530 (2024). https://doi.org/10.1186/s12888-024-05954-2
0 commit comments