-
Notifications
You must be signed in to change notification settings - Fork 45
Towards a new Parallels JSON structure #368
Description
At the request of Bhante @sujato I want to make a proposal for a new, easily readable JSON structure for th Parallels.json file.
Methodology
There are four types of parallels, which are specified here: https://suttacentral.net/methodology?lang=en
Next to these, there are parallels of part of a text. So, for instance, if the first part of a sutta (A) is a full parallel to the first part of another sutta (B), but the rest of these suttas do not match. Then these two suttas can be represented as a "resembling parallel" or these first parts on their own can be represented as a "full parallel" by using their line numbers or paragraph ids to represent a range.
A is a resembling parallel to B and B is a resembling parallel to A
or
A#sc1-#sc10 is a full parallels to B#sc1-#sc10
Currently, the choice between these is sometimes clear, but sometimes a bit arbitrary.
Current JSON structure
Currently, the JSON structure is made to be concise, but it makes it not very readable. There is a wiki: https://github.com/suttacentral/suttacentral/wiki/Parallels-information
As an example, I take MN10: https://suttacentral.net/mn10?view=normal&lang=en
The first part of the parallels in this list is currently represented by:
{
"parallels": [
"dn22",
"ea12.1",
"ma98",
"mn10",
"sht-sutta11",
"~ma31",
"~mn119",
"~t32",
"~ma81"
]
},
The ~ represents a "resembling parallel".
Each of the resembling parallels is then also mentioned in a new object because it might be that for instance MA31 is resembling parallels with MN10, but not with MN119, etc. This has to be determined at in every case.
{
"parallels": [
"mn141",
"ea27.1",
"ma31",
"t32",
"~dn22#18.1",
"~ea12.1",
"~ma98",
"~mn10",
"~sht-sutta11"
]
},
{
"parallels": [
"mn119",
"ma81",
"~dn22",
"~ea12.1",
"~ma98",
"~mn10",
"~sht-sutta11"
]
},
.... Etc. for each of the resembling parallels.
Then there are the parallels of parts of the text. For instance for MN 10, paragraph 10.1:
{
"parallels": [
"dn22#5.1",
"mn10#10.1"
]
},
{
"mentions": [
"dn22#5.1",
"ne17#22.1",
"vb7#2.1"
]
},
{
"mentions": [
"mn10#10.1",
"ne17#22.1",
"vb7#2.1"
]
},
In this case, we have a full parallel with dn22#5.1 and two mentions of each of those in ne17#22.1 and vb7#2.1.
And the same for the other paragraphs that are mentioned.
New JSON structure proposal
Now I propose a structure for the JSON that is radically different, namely a structure based on each sutta separately in the form of:
[
"suttanr": {
"full": [],
"resembling": [],
"mentions": [],
"retelling": [],
"sections": [
"suttanr#id-#id: {
"full": [],
"resembling": [],
"mentions": [],
"retelling": []
}
]
}
]
Where "partial" represents the part-sutta parallels.
So for MN10 (in full), this would become:
[
"mn10": {
"full": [
"dn22",
"ea12.1",
"ma98",
"sht-sutta11"
],
"resembling": [
"ma31",
"mn119",
"t32",
"ma81"
],
"mentions": [],
"retelling": [],
"sections": [ "mn10#10.1": {
"full": ["dn22#5.1"],
"resembling": [],
"mentions": [
"ne17#22.1",
"vb7#2.1"
],
"retelling": []
},
"mn10#44.1": {
"full": [
"dn22#17.1",
"mn9#14-18.1"
],
"resembling": [],
"mentions": [],
"retelling": []
},
"mn10#47.1": {
"full": [
"dn22#22.24",
"sn47.1#2.1"
],
"resembling": [],
"mentions": ["kv1.9#10.1"],
"retelling": []
}
]
}
]
We could of course remove the empty fields so it becomes:
[
"mn10": {
"full": [
"dn22",
"ea12.1",
"ma98",
"sht-sutta11"
],
"resembling": [
"ma31",
"mn119",
"t32",
"ma81"
],
"sections": [ "mn10#10.1": {
"full": ["dn22#5.1"],
"mentions": [
"ne17#22.1",
"vb7#2.1"
]
},
"mn10#44.1": {
"full": [
"dn22#17.1",
"mn9#14-18.1"
]
},
"mn10#47.1": {
"full": [
"dn22#22.24",
"sn47.1#2.1"
],
"mentions": ["kv1.9#10.1"]
}
]
}
]
And this for each and every sutta. So in this case, "dn22" would get it's own mention in the same way as well.
Pros and Cons
The proposed structure is more readable and intuitive, as it is basically the same as it actually shows on the website. The old structure is much more concise.
But it is also easier for someone adding a parallel to make the mistake to add it only in one place. For instance, if a full parallel X is found for mn10, you would need to add it with "dn22", "ea12.1", "ma98", "mn10", "sht-sutta11" and check if it has to be added to "ma31", "mn119", "t32" and "ma81". And X needs to get it's own entry as well if it wasn't there before.
Now this is something that had to be taken into account in the old structure also, but if one adds X to the full parallels list "dn22", "ea12.1", "ma98", "mn10", "sht-sutta11", it is automatically added with all of those already, without the need of doing this 5 times and making a new extra entry for X.
Integration into SuttaCentral
A new JSON structure would require the loading code in Python to be updated in the SuttaCentral backend.
So please let me know your ideas and thoughts about this proposal.