Skip to content

Commit 094885d

Browse files
committed
update
1 parent 3095f53 commit 094885d

File tree

2 files changed

+80
-0
lines changed

2 files changed

+80
-0
lines changed
127 KB
Loading
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
title: "Harmony updates in March 2025"
3+
description: Harmony updates in March 2025
4+
date: 2025-03-11
5+
categories: ["development"]
6+
image: "/images/harmony-change-model.png"
7+
8+
url: "/open-source-for-social-science/harmony-updates-r-and-fine-tuned-llm/"
9+
---
10+
11+
We have a number of exciting updates to Harmony including:
12+
13+
* some improvements to the R library which have been asked for by researchers around the world who have been using Harmony on studies in lots of different topics as well as
14+
* making our own fine tuned large language model available in the web UI, which is José's winning model from the [DOXA challenge which ended on 10 January 2025](/matching-challenge-winner-announcement/).
15+
16+
## Harmony has its own Large Language Model!
17+
18+
Over a period of a few months ending in January 2025 we made a dataset public and available for people all over the world to download and fine tune their own sentence transformer model. We ran this as a challenge on [DOXA AI](https://doxaai.com/competition/harmony-matching).
19+
20+
The motivation for this challenge was that users were reporting that Harmony often lacked psychology specific domain knowledge and tends to group items together if they appear superficially similar, e.g. items to do with the topic of "sleep", and a mental health specific model with needed which would better differentiate topics within the mental health domain. We found that Harmony and other open weights and proprietary LLMs had [varying performances on different datasets](/nlp-semantic-text-matching/measuring-the-performance-of-nlp-algorithms/) and we thought it would be nice to try making our own LLM - or rather, asking our community to do it!
21+
22+
We received a fantastic number of submissions to the matching challenge, including a flurry of changes to our model leaderboard in the last few hours before the competition closed the [winner was announced](/matching-challenge-winner-announcement/) as José Inés Martínez Berard and the second place was awarded to Rafi Ahmed Riyaz Ahmed Patel. Both received prizes. We also appreciate the many runners up who submitted models to the challenge.
23+
24+
## Using the model on the web tool
25+
26+
You can now try José's winning model on Harmony on the web. Just select `harmonydata/mental_health_harmonisation_1` from the model dropdown.
27+
28+
{{< youtube mD5ZyDV8RDU >}}
29+
30+
31+
## Using José's model from your code
32+
33+
If you are coding in Python, you can run this command to set an environment variable before you import Harmony
34+
```
35+
export HARMONY_SENTENCE_TRANSFORMER_PATH=harmonydata/mental_health_harmonisation_1
36+
```
37+
38+
then you can use Harmony as usual in the Python terminal
39+
40+
```
41+
import harmony
42+
instruments = [harmony.example_instruments["CES_D English"],
43+
harmony.example_instruments["GAD-7 Portuguese"]]
44+
match_response = harmony.match_instruments(instruments)
45+
similarity = match_response.similarity_with_polarity
46+
for cluster in match_response.clusters:
47+
print (f"Cluster #{cluster.cluster_id}: {cluster.text_description}")
48+
for question in cluster.items:
49+
print ("\t", question.question_text)
50+
```
51+
52+
53+
The model is also available on HuggingFace Hub under the model ID [harmonydata/mental_health_harmonisation_1](https://huggingface.co/harmonydata/mental_health_harmonisation_1). The model converts English texts to 768 dimensions and has been trained by José to better differentiate mental health specific items.
54+
55+
56+
57+
## Updates to the R library
58+
59+
Here are the updates to the R library from Omar Hassoun. The R library is on CRAN at: https://cran.r-project.org/web/packages/harmonydata/index.html
60+
61+
The new library:
62+
63+
1. lets you see clusters found by Harmony
64+
2. shows instrument to instrument similarities
65+
3. has an easier way to create an instrument from a list or something similar
66+
4. outputs crosswalk tables
67+
5. allows users to turn the negation on/off
68+
6. allows users to turn within-instrument matches on/off
69+
70+
We would be grateful if you could give it a try and let us know if anything's unclear
71+
72+
There are a few other issues that we are working on, such as allowing users to harmonise response options and choose their own topics. We also welcome your contributions in these areas - check out the repo and issue board for the [Harmony R library](https://github.com/harmonydata/harmony_r)!
73+
74+
## Would you like to improve Harmony?
75+
76+
If you missed the DOXA competition, don't worry... there's another one! We're running a competition with £1000 in vouchers as first prize, where the challenge is to improve Harmony's questionnaire parsing. [Find out more here](/doxa-parsing/).
77+
78+
## Find us at AI UK
79+
80+
If you'd like to meet us in person, Harmony is an official sponsor of AI UK 2025, run by the Alan Turing Institute, and we will have a [stand at the event](https://harmonydata.ac.uk/ai-uk/) from 17 to 18 March at the QEII Conference Centre in London.

0 commit comments

Comments
 (0)