This repository serves two purposes:
- Code storage: Contains code for converting Swedish-English dictionaries from The People's Dictionary.
- Dictionary releases: Provides the converted dictionaries in an optimized format.
The People's Dictionary is available in two formats: XML and XDXF. Each format has distinct advantages:
- XML format: Contains comprehensive information but lacks audio download URLs
- XDXF format: Includes better phonetic symbols and audio URLs but has less detailed content
However, both formats are challenging for software to parse directly.
This repository addresses these limitations by:
- Merging data: Combines audio download URLs and phonetic symbols from XDXF with comprehensive data from XML
- Format conversion: Converts the merged dictionary to JSON format for easier processing
- Character normalization: Converts HTML entities (
"and') to their corresponding characters in the output
Here's a sample dictionary entry to illustrate the converted format:
"jord": {
"t": [
"soil",
"land"
],
"c": "nn",
"i": [
"jorden",
"jordar"
],
"e": [
{
"v": "odlad jord",
"t": "cultivated soil"
},
{
"v": "äga jord",
"t": "own land"
},
{
"v": "gräva i jorden",
"t": "work the land"
}
],
"id": [
{
"v": "falla i god jord (\"tas emot med uppskattning\")",
"t": "fall on fertile ground (\"be received with appreciation\")"
}
],
"s": [
{
"v": "mull",
"l": "3.3"
}
],
"d": [
"mull, mylla; odlat markområde"
],
"p": "jo:r_d",
"a": "http://lexin.nada.kth.se/sound/jord.mp3"
}| Field | Description |
|---|---|
t |
Translation |
c |
Class (part of speech) |
i |
Inflection |
e |
Example |
id |
Idiom |
s |
Synonym |
d |
Definition |
p |
Phonetic transcription |
a |
Audio URL |
v |
Value |
l |
Level |
By default, the dictionary is saved as compact JSON (without indentation) to minimize file size. To generate human-readable output, set indent=4 in the json.dump() method.
We extend our gratitude to The People's Dictionary for their excellent work. All dictionaries released by this repository are derivative works based on The People's Dictionary.
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 Generic License.
