Skip to content
petermr edited this page Aug 18, 2020 · 2 revisions

Validating AMI dictionaries

syntax and output

The dictionary is referred to by its name (--dictionary) and its location (--directory)

validation uses amidict display (this is not obvious and we may change it later).

$ amidict --dictionary country 
   --directory /Users/pm286/projects/openVirus/dictionaries/country
   display --validate

output

Generic values (DictionaryDisplayTool)
================================
-v to see generic values

Specific values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@4c1f22f3
--fields            : d        []
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d [country]
--directory         : d /Users/pm286/projects/openVirus/dictionaries/country


Dictionary: null   // PMR, I think this is a software bug

entries: 263       // dictionary has 263 entries
dictionary: null
    Afghanistan
**** Afghanistan: unknown attributes _p297_country wikidataURL wikipediaURL.  // these messages are software bugs, ignore
    Albania
**** Albania: unknown attributes _p297_country wikidataURL wikipediaURL
    Algeria
**** Algeria: unknown attributes _p297_country wikidataURL wikipediaURL
    ....
pm286macbook:ami3 pm286$ 

error messages

These are adequate but not great. The bugs.xml has several deliberate bugs

	@Test
	public void testDictionaryValidateShowBugs() {
		String dictionary = "bugs";
		File directory = TEST_DICTIONARY;
		String args = ""
				+ " --dictionary " + dictionary
				+ " --directory " + directory
				+ " -vv"
				+ " display"
				+ " --fields id"
				+ " --validate"
				;
		AMIDict.execute(args);
		
	}

The fields will list all occurrences of the given fields. This should be fixed to allow "none". Here id is not present so it produces minimal output.

output

Version: 

Generic values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@6e535154
--fields            : m      [id]
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d    [bugs]
--directory         : d /Users/pm286/workspace/cmdev/ami3/src/test/resources/org/contentmine/ami/dictionary

Specific values (DictionaryDisplayTool)
================================
--testString        : d      null
--wikilinks         : d [Lorg.contentmine.ami.tools.AbstractAMIDictTool$WikiLink;@6e535154
--fields            : m      [id]
--files             : d        []
--maxEntries        : d         3
--remote            : d [https://github.com/petermr/dictionary]
--suffix            : d       xml
--validate          : m      true
--help              : d     false
--version           : d     false
--dictionary        : d    [bugs]
--directory         : d /Users/pm286/workspace/cmdev/ami3/src/test/resources/org/contentmine/ami/dictionary
>/Users/pm286/workspace/cmdev/ami3/src/test/resources/org/contentmine/ami/dictionary
***non-matching title for dictionary / filename base: country != bugs*** !!! NOTE ERROR!!!

Dictionary: country

entries: 10
******Unrecognised attribute(s) on entry******                           !!! NOTE ERROR !!!
[term/name/description/id/wikipediaPage/wikipediaURL/wikidataAltLabel]
or starts-with _p<number> or _q<number> or _
<entry foo="bar" _p297_country="AF" description="sovereign state situated at the confluence of Western, Central, and South Asia" name="Afghanistan" term="Afghanistan" wikidataURL="http://www.wikidata.org/entity/Q889" wikipediaURL="https://en.wikipedia.org/wiki/Afghanistan" wikidataID="Q889" wikipediaPage="Afghanistan">
  <synonym>AFG</synonym>
  <synonym>af</synonym>
  <synonym>Islamic Republic of Afghanistan</synonym>
 </entry>
******Unrecognised attribute(s) on entry****** 
[term/name/description/id/wikipediaPage/wikipediaURL/wikidataAltLabel]
or starts-with _p<number> or _q<number> or _
<entry _p297_country="AL" description="sovereign state in Southeast Europe" name="Albania" missingterm="Albania" wikidataURL="http://www.wikidata.org/entity/Q222" wikipediaURL="https://en.wikipedia.org/wiki/Albania" wikidataID="Q222" wikipediaPage="Albania">
  <synonym>ALB</synonym>
  <synonym>al</synonym>
  <synonym>Republic of Albania</synonym>
  <synonym>Republika e Shqipërisë</synonym>
  <synonym>Shqipërisë</synonym>
 </entry>
******Unrecognised attribute(s) on entry****** 
[term/name/description/id/wikipediaPage/wikipediaURL/wikidataAltLabel]
or starts-with _p<number> or _q<number> or _
<entry _p297_country="DZ" description="sovereign state in North Africa" name="Algeria" term="Algeria" wikidata="http://www.wikidata.org/entity/Q262" wikipediaURL="https://en.wikipedia.org/wiki/Algeria" wikidataID="Q262" wikipediaPage="Algeria">
  <synonym>ALG</synonym>
  <synonym>dz</synonym>
  <synonym>People’s Democratic Republic of Algeria</synonym>
 </entry>
******Unrecognised child on entry****** <alternative>ANG</alternative>
******Unrecognised child on entry****** <foo>AQ</foo>
> fieldSummary: 
id: []                                   !!! Kludge to avoid printing fields
Attributes: @id: 0 
dictionary@title: country
======= start validation =======
dictionary@title: country                
======= end validation =======
Desc: Created from SPARQL query
    Afghanistan
    null
    Algeria
    ....

Clone this wiki locally