|
| 1 | +--- |
| 2 | +title: "Harmony updates in March 2025" |
| 3 | +description: Harmony updates in March 2025 |
| 4 | +date: 2025-03-11 |
| 5 | +categories: ["development"] |
| 6 | +image: "/images/harmony-change-model.png" |
| 7 | + |
| 8 | +url: "/open-source-for-social-science/harmony-updates-r-and-fine-tuned-llm/" |
| 9 | +--- |
| 10 | + |
| 11 | +We have a number of exciting updates to Harmony including: |
| 12 | + |
| 13 | +* some improvements to the R library which have been asked for by researchers around the world who have been using Harmony on studies in lots of different topics as well as |
| 14 | +* making our own fine tuned large language model available in the web UI, which is José's winning model from the [DOXA challenge which ended on 10 January 2025](/matching-challenge-winner-announcement/). |
| 15 | + |
| 16 | +## Harmony has its own Large Language Model! |
| 17 | + |
| 18 | +Over a period of a few months ending in January 2025 we made a dataset public and available for people all over the world to download and fine tune their own sentence transformer model. We ran this as a challenge on [DOXA AI](https://doxaai.com/competition/harmony-matching). |
| 19 | + |
| 20 | +The motivation for this challenge was that users were reporting that Harmony often lacked psychology specific domain knowledge and tends to group items together if they appear superficially similar, e.g. items to do with the topic of "sleep", and a mental health specific model with needed which would better differentiate topics within the mental health domain. We found that Harmony and other open weights and proprietary LLMs had [varying performances on different datasets](/nlp-semantic-text-matching/measuring-the-performance-of-nlp-algorithms/) and we thought it would be nice to try making our own LLM - or rather, asking our community to do it! |
| 21 | + |
| 22 | +We received a fantastic number of submissions to the matching challenge, including a flurry of changes to our model leaderboard in the last few hours before the competition closed the [winner was announced](/matching-challenge-winner-announcement/) as José Inés Martínez Berard and the second place was awarded to Rafi Ahmed Riyaz Ahmed Patel. Both received prizes. We also appreciate the many runners up who submitted models to the challenge. |
| 23 | + |
| 24 | +## Using the model on the web tool |
| 25 | + |
| 26 | +You can now try José's winning model on Harmony on the web. Just select `harmonydata/mental_health_harmonisation_1` from the model dropdown. |
| 27 | + |
| 28 | +{{< youtube mD5ZyDV8RDU >}} |
| 29 | + |
| 30 | + |
| 31 | +## Using José's model from your code |
| 32 | + |
| 33 | +If you are coding in Python, you can run this command to set an environment variable before you import Harmony |
| 34 | +``` |
| 35 | +export HARMONY_SENTENCE_TRANSFORMER_PATH=harmonydata/mental_health_harmonisation_1 |
| 36 | +``` |
| 37 | + |
| 38 | +then you can use Harmony as usual in the Python terminal |
| 39 | + |
| 40 | +``` |
| 41 | +import harmony |
| 42 | +instruments = [harmony.example_instruments["CES_D English"], |
| 43 | + harmony.example_instruments["GAD-7 Portuguese"]] |
| 44 | +match_response = harmony.match_instruments(instruments) |
| 45 | +similarity = match_response.similarity_with_polarity |
| 46 | +for cluster in match_response.clusters: |
| 47 | + print (f"Cluster #{cluster.cluster_id}: {cluster.text_description}") |
| 48 | + for question in cluster.items: |
| 49 | + print ("\t", question.question_text) |
| 50 | +``` |
| 51 | + |
| 52 | + |
| 53 | +The model is also available on HuggingFace Hub under the model ID [harmonydata/mental_health_harmonisation_1](https://huggingface.co/harmonydata/mental_health_harmonisation_1). The model converts English texts to 768 dimensions and has been trained by José to better differentiate mental health specific items. |
| 54 | + |
| 55 | + |
| 56 | + |
| 57 | +## Updates to the R library |
| 58 | + |
| 59 | +Here are the updates to the R library from Omar Hassoun. The R library is on CRAN at: https://cran.r-project.org/web/packages/harmonydata/index.html |
| 60 | + |
| 61 | +The new library: |
| 62 | + |
| 63 | +1. lets you see clusters found by Harmony |
| 64 | +2. shows instrument to instrument similarities |
| 65 | +3. has an easier way to create an instrument from a list or something similar |
| 66 | +4. outputs crosswalk tables |
| 67 | +5. allows users to turn the negation on/off |
| 68 | +6. allows users to turn within-instrument matches on/off |
| 69 | + |
| 70 | +We would be grateful if you could give it a try and let us know if anything's unclear |
| 71 | + |
| 72 | +There are a few other issues that we are working on, such as allowing users to harmonise response options and choose their own topics. We also welcome your contributions in these areas - check out the repo and issue board for the [Harmony R library](https://github.com/harmonydata/harmony_r)! |
| 73 | + |
| 74 | +## Would you like to improve Harmony? |
| 75 | + |
| 76 | +If you missed the DOXA competition, don't worry... there's another one! We're running a competition with £1000 in vouchers as first prize, where the challenge is to improve Harmony's questionnaire parsing. [Find out more here](/doxa-parsing/). |
| 77 | + |
| 78 | +## Find us at AI UK |
| 79 | + |
| 80 | +If you'd like to meet us in person, Harmony is an official sponsor of AI UK 2025, run by the Alan Turing Institute, and we will have a [stand at the event](https://harmonydata.ac.uk/ai-uk/) from 17 to 18 March at the QEII Conference Centre in London. |
0 commit comments