You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are excited to announce that Harmony, an [open source](/open-source-for-social-science/) Natural Language Processing tool for data harmonisation, is now available on the Comprehensive R Archive Network [CRAN](https://cran.r-project.org/)!
16
+
We are excited to announce that Harmony, an [open source](/open-source-for-social-science/) Natural Language Processing tool for data harmonisation, is now available on the Comprehensive R Archive Network [CRAN](https://cran.r-project.org/web/packages/harmonydata/index.html)!
17
17
18
18
Previously, [Harmony R](/open-source-for-social-science/harmony-r-package/) could be installed using [devtools](https://devtools.r-lib.org/).
Copy file name to clipboardExpand all lines: content/en/community.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -44,3 +44,20 @@ If you'd like to join Harmony you can fill out the form below or message us on D
44
44
45
45
46
46
{{</ htmlcode >}}
47
+
48
+
## How can I improve Harmony?
49
+
50
+
There are a number of ways you can help to make this tool better for future users:
51
+
52
+
* If you're doing research and found Harmony useful, please [cite us](/ai-in-mental-health/bmc-psychiatry-paper/)!
53
+
* If you're a researcher trying to use the tool, and you encounter a problem, a bug, or a feature which you would like us to implement, please [raise an issue on Github](https://github.com/harmonydata/harmony) or [message us on Discord](https://discord.gg/harmonydata).
54
+
* If you are able to fix the issue, please feel free to do that. You can submit your code back to the project by [making a pull request](https://github.blog/developer-skills/github-education/beginners-guide-to-github-merging-a-pull-request) but if you don't know how to do that, don't worry! You can always send us your work on Discord or by email.
55
+
* If you're a coder, feel free to contribute code to our repositories:
56
+
*[Python](https://github.com/harmonydata/harmony) - the main core library and the Python package which is on [Pypi](https://pypi.org/project/harmonydata/)
57
+
*[R](https://github.com/harmonydata/harmony_r) - the R port is on [CRAN](https://cran.r-project.org/web/packages/harmonydata/index.html) and it is slightly less mature than Python so we really appreciate if you can give the R package some TLC.
58
+
*[API](https://github.com/harmonydata/harmonyapi) - the Python API runs with Pydantic and Fast API and is running on an on-prem server enabling the web app to work
59
+
*[Web front end](https://github.com/harmonydata/app) - we welcome feedback and contributions on front end and UX issues
60
+
* You can always [message in Discord](https://discord.gg/harmonydata) and let us know that you're interested in joining the project.
61
+
* Please contribute to our [hackathons](/open-source-for-social-science/hackathon/) and [coding challenges](/doxa/) to help improve the tool
62
+
* We appreciate coming to give talks at events such as [Women In Data™️](/open-source-for-social-science/women-in-data/), [AI|DL](/psychology-ai-tool/aidl-meetup/), [MethodsCon Futures](/ai-in-mental-health/harmony-at-methodscon-futures/), [Pydata](/open-source-for-social-science/pydata-meetup/), [Lifecourse](/ai-in-mental-health/harmony-at-lifecourse-seminar/), and [AI Camp](/psychology-ai-tool/aicamp-meetup/). If you run a similar meetup or community group we are willing to come and talk.
63
+
* Please give feedback, no matter how informal. When we know how people are using the tool, this can help us improve it.
Copy file name to clipboardExpand all lines: content/en/developer-guide.md
+16-4Lines changed: 16 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,20 +2,31 @@
2
2
title: Developer guide
3
3
---
4
4
5
+
If you're a researcher and you found Harmony useful for your research, please [cite us](/ai-in-mental-health/bmc-psychiatry-paper/). If you encounter a problem, a bug, or a feature which you would like us to implement, please [raise an issue on Github](https://github.com/harmonydata/harmony) or [message us on Discord](https://discord.gg/harmonydata).
6
+
5
7
# Git and GitHub workflow
6
8
7
9
The preferred workflow for contributing to Harmony’s repository is to fork the [main repository](https://github.com/harmonydata/harmony/) on GitHub, clone, and develop on a new branch.
8
10
9
11
Please read our general guide about [contributing to Harmony](/open-source-for-social-science/contributing-to-harmony-nlp-project/).
10
12
11
-
We have three main repositories on Github under the `harmonydata` organisation:
13
+
We have four main repositories on Github under the `harmonydata` organisation:
12
14
13
-
* Harmony Python library: https://github.com/harmonydata/harmony - this is everything to do with the NLP logic of Harmony
* Harmony front end: https://github.com/harmonydata - this is everything to do with the front end and graphical interface of Harmony
15
+
* Harmony Python library: https://github.com/harmonydata/harmony - this is everything to do with the NLP logic of Harmony. This is the main core library and the Python package which is on [Pypi](https://pypi.org/project/harmonydata/).
16
+
* Harmony API: https://github.com/harmonydata/harmonyapi - the Python API runs with Pydantic and Fast API and is running on an on-prem server enabling the web app to work
17
+
* Harmony front end: https://github.com/harmonydata - this is everything to do with the front end and graphical interface of Harmony. We welcome feedback and contributions on front end and UX issues.
18
+
* R: https://github.com/harmonydata/harmony_r - the R port is on [CRAN](https://cran.r-project.org/web/packages/harmonydata/index.html) and it is slightly less mature than Python so we really appreciate if you can give the R package some TLC.
16
19
17
20
This contributor guide focuses on the Python library, but you could follow the same steps for the other repositories.
18
21
22
+
## Hackathons, coding challenges and events
23
+
24
+
Please contribute to our [hackathons](/open-source-for-social-science/hackathon/) and [coding challenges](/doxa/) to help improve the tool. We appreciate coming to give talks at events such as [Women In Data™️](/open-source-for-social-science/women-in-data/), [AI|DL](/psychology-ai-tool/aidl-meetup/), [MethodsCon Futures](/ai-in-mental-health/harmony-at-methodscon-futures/), [Pydata](/open-source-for-social-science/pydata-meetup/), [Lifecourse](/ai-in-mental-health/harmony-at-lifecourse-seminar/), and [AI Camp](/psychology-ai-tool/aicamp-meetup/). If you run a similar meetup or community group we are willing to come and talk.
25
+
26
+
## Process of forking and making a pull request
27
+
28
+
If you are able to fix an issue, please feel free to submit your code back to the project by [making a pull request](https://github.blog/developer-skills/github-education/beginners-guide-to-github-merging-a-pull-request) (PR) but if you don't know how to do that, don't worry! You can always send us your work on Discord or by email. Here's a brief overview of the steps for making a pull request.
29
+
19
30
1. Fork the [main project repository](https://github.com/harmonydata/harmony) by clicking on the ‘Fork’ button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see [this guide](https://help.github.com/articles/fork-a-repo/).
20
31
2.[Clone](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/cloning-a-repository) your fork of the Harmony repo from your GitHub account to your local disk:
21
32
@@ -92,3 +103,4 @@ We recommend to open a pull request early, so that other contributors become awa
92
103
Note
93
104
94
105
If any of the above seems like magic to you, look up the [Git documentation](https://gitscm.com/documentation). If you get stuck, chat with us on [Discord](https://discord.gg/harmonydata), or contact us at [harmonydata.ac.uk](https://harmonydata.ac.uk/contact).
Copy file name to clipboardExpand all lines: content/en/frequently-asked-questions.md
+42Lines changed: 42 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -137,6 +137,24 @@ McElroy, E., Moltrecht, B., Scopel Hoffmann, M., Wood, T. A., & Ploubidis, G. (2
137
137
}
138
138
```
139
139
140
+
141
+
## How can I improve Harmony?
142
+
143
+
There are a number of ways you can help to make this tool better for future users:
144
+
145
+
* If you're doing research and found Harmony useful, please [cite us](/ai-in-mental-health/bmc-psychiatry-paper/)!
146
+
* If you're a researcher trying to use the tool, and you encounter a problem, a bug, or a feature which you would like us to implement, please [raise an issue on Github](https://github.com/harmonydata/harmony) or [message us on Discord](https://discord.gg/harmonydata).
147
+
* If you are able to fix the issue, please feel free to do that. You can submit your code back to the project by [making a pull request](https://github.blog/developer-skills/github-education/beginners-guide-to-github-merging-a-pull-request) but if you don't know how to do that, don't worry! You can always send us your work on Discord or by email.
148
+
* If you're a coder, feel free to contribute code to our repositories:
149
+
*[Python](https://github.com/harmonydata/harmony) - the main core library and the Python package which is on [Pypi](https://pypi.org/project/harmonydata/)
150
+
*[R](https://github.com/harmonydata/harmony_r) - the R port is on [CRAN](https://cran.r-project.org/web/packages/harmonydata/index.html) and it is slightly less mature than Python so we really appreciate if you can give the R package some TLC.
151
+
*[API](https://github.com/harmonydata/harmonyapi) - the Python API runs with Pydantic and Fast API and is running on an on-prem server enabling the web app to work
152
+
*[Web front end](https://github.com/harmonydata/app) - we welcome feedback and contributions on front end and UX issues
153
+
* You can always [message in Discord](https://discord.gg/harmonydata) and let us know that you're interested in joining the project.
154
+
* Please contribute to our [hackathons](/open-source-for-social-science/hackathon/) and [coding challenges](/doxa/) to help improve the tool
155
+
* We appreciate coming to give talks at events such as [Women In Data™️](/open-source-for-social-science/women-in-data/), [AI|DL](/psychology-ai-tool/aidl-meetup/), [MethodsCon Futures](/ai-in-mental-health/harmony-at-methodscon-futures/), [Pydata](/open-source-for-social-science/pydata-meetup/), [Lifecourse](/ai-in-mental-health/harmony-at-lifecourse-seminar/), and [AI Camp](/psychology-ai-tool/aicamp-meetup/). If you run a similar meetup or community group we are willing to come and talk.
156
+
* Please give feedback, no matter how informal. When we know how people are using the tool, this can help us improve it.
157
+
140
158
## Does Harmony store my data?
141
159
142
160
If you upload a questionnaire or instrument, Harmony does not store or save it. You can read more on our [Privacy Policy page](/privacy-policy/).
@@ -153,6 +171,16 @@ Harmony was able to reconstruct the matches of the questionnaire harmonisation t
153
171
154
172
The numbers are the cosine similarity of document vectors. The cosine similarity of two vectors can range from -1 to 1 based on the angle between the two vectors being compared. We have converted these to percentages. We have also used a preprocessing stage to convert positive sentences to negative and vice-versa (e.g. _I feel anxious_ → _I do not feel anxious_). If the match between two sentences improves once this preprocessing has been applied, then the items are assigned a negative similarity.
155
173
174
+
## Which Large Language Model (LLM) does Harmony use?
175
+
176
+
By default Harmony uses the HuggingFace model [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2). In the [web tool](/app) you have the option of switching LLMs to a few other providers including OpenAI.
177
+
178
+
{{< image src="/images/harmony-switch-llm.png" alt="How to switch LLMs in Harmony's web UI" >}}
179
+
180
+
*Above: How to switch LLMs in Harmony's web UI*
181
+
182
+
However from the [Python library](https://github.com/harmonydata/harmony), you have the option of choosing any LLM you prefer, including options from Vertex, OpenAI, IBM, HuggingFace, or any of your preferred providers. For example, we have taken the Shona model from the Masakhane project and tested Harmony using a [Shona LLM](/nlp-semantic-text-matching/harmony-on-kufungisisa-a-cultural-concept-of-distress-from-zimbabwe/). The [README in Github](https://github.com/harmonydata/harmony/blob/main/README.md) gives some examples of how you can switch the LLM inside Harmony.
183
+
156
184
## Does Harmony give p-values?
157
185
158
186
At this time Harmony does not give p-values. Harmony matches vectors using a cosine score and p-values are not applicable in this context.
@@ -161,6 +189,10 @@ At this time Harmony does not give p-values. Harmony matches vectors using a cos
161
189
162
190
Items were matched on content using the online tool [Harmony](https://harmonydata.ac.uk/), which matches items by converting text to vectors using a transformer neural network ([Reimers & Gurevych, 2019](https://arxiv.org/abs/1908.10084)). Harmony produces a cosine score ranging from +/- 1, with values closer to 1 indicating a closer match.
163
191
192
+
## What does it mean when Harmony outputs a negative similarity?
193
+
194
+
Harmony is based mainly on a large language model but for detecting antonyms, we use a modification that is unique to Harmony. We detect negation words such as "not" and give the cosine match a negative polarity if it looks like sentences are antonyms. This is because LLMs tend to give cosine similarity values roughly between 0 and 1 but tend to give antonyms quite high similarity values, so we wanted to correct for that. There are negation rules for English, French and some other languages [in our Github](https://github.com/harmonydata/harmony/blob/main/src/harmony/matching/negator.py). The negation scripts were the result of the winning entries [from our last hackathon](https://harmonydata.ac.uk/open-source-for-social-science/hackathon/).
195
+
164
196
## How does Harmony compare to human harmonisation?
165
197
166
198
If you imagine as a human, trying to match items in a questionnaire, you might decide that “I feel depressed” and “I feel sad” are similar. If you had to place them on the surface of a sphere, you might place them close to each other. Whereas different concepts might be far from each other.
@@ -171,10 +203,20 @@ We can represent [any concept](/nlp-semantic-text-matching/harmony-on-kufungisis
171
203
172
204
*You can try playing with a large language model in your browser [in this blog post](https://fastdatascience.com/natural-language-processing/semantic-similarity-with-sentence-embeddings/). Input two sentences and you can see the vector values and the cosine similarity.*
173
205
206
+
If you want to understand our efforts to calibrate Harmony to human-generated harmonisation scores, please check our [validation study](/ai-in-mental-health/bmc-psychiatry-paper/).
207
+
174
208
## Who made Harmony?
175
209
176
210
The [Python](https://www.python.org/) code of Harmony was written by [Thomas Wood](https://freelancedatascientist.net/) (Fast Data Science) in collaboration with Eoin McElroy, Bettina Moltrecht, George Ploubidis, and Mauricio Scopel Hoffman.
177
211
212
+
## Where do the topics come from?
213
+
214
+
The Harmony Python library by default uses topics from the [Mental Health Catalogue](https://www.cataloguementalhealth.ac.uk/) in the `topics_auto` field. We are hoping to allow you to configure this to use your own topic modelling.
215
+
216
+
## Does Harmony handle response categories? (e.g. Likert scales, "very much so", etc)
217
+
218
+
Harmony does not currently support response categories [but this is on our roadmap](https://github.com/harmonydata/harmony/issues/58).
219
+
178
220
## Does Harmony comply with FAIR data principles?
179
221
180
222
We have developed Harmony as an open-source and open science initiative, paying attention to the [FAIR Guiding Principles for scientific data management and stewardship](https://www.go-fair.org/fair-principles/) (**F**indability, **A**ccessibility, **I**nteroperability, and **R**euse of digital assets). You can read more on our [FAIR data page](/fair-data/).
0 commit comments