-
Notifications
You must be signed in to change notification settings - Fork 17
[DevDoc] Notes on the API implementation
Some rough, outdated, to-be-reviewed notes of mine (MB) regarding the way the KnetMiner API is implemented. You can get a decent dev-level intro of our code from here, especially if you open the mentioned components.
- Request hub, gets DS, "mode" (ie, name of API call) and general params
- And then dispatches to handleRaw()
- TODO
- Invokes
DS.$method, having got the method from mode
- Searches synonyms, using UIService.renderSynonymTable()
- Uses searchService.searchTopConceptsByName() to get relevant concepts
- Uses
luceneMgr.searchTopConceptsByIdxField() - Prepares a table where, for each keyword, there is an entry conceptName, conceptType, conceptId
- Uses
- Uses searchService.searchTopConceptsByName() to get relevant concepts
@param keywordDS.countHits()-
new SemanticMotifSearchMgr( keyword ), assumingkeyword && ! geneList-
luceneConcepts:Map<Concept -> Score>: SearchService.searchGeneRelatedConcepts ()* Split keyword into list, get 'not query' *notList = this.searchTopConceptsByName()if necessary * Populateshit2score (Concept->Score)with a series of Lucene searches, involvingkeyword(search string) andnotList-
countLinkGenes()- Uses
luceneConceptsandSM.concepts2Genesto count SM-linked concepts (luceneDocumentsLinked) and matched unique genes (numConnectedGenes)
- Uses
-
-
- Puts SMSearchMgr counts into the response
-
DataService.getLociGeneCount()to count the loci in the request's QTL - Used in the genome regions input
@param keyword, list, listMode, qtl-
DS.genome(), preparesGenomeResponse, callsDS._keyword()- Extracts the
userGenes, usingKGUtils.filterGenesByAccessionKeywords()-
This tunrs the list into genes, using 1) searches over accessions and names and 2) filter on taxId
- Probably not to be filtered with user taxId (check it's valid and configured)
-
Adds
qtltouserGenes, using genome regions, viaKGUtils.fetchQTLs ( ONDEXGraph graph, List<String> taxIds, List<String> qtlsStr )-
QTL.fromStringList ( qtlsStr )to build QTL region strucutures- Then double loop over all regions and all genes in the graph
-
-
smSearchMgr = new SemanticMotifSearchMgr ( searchString, genes )- Like said above, searches concepts based on keywords and scores them
-
candidateGenesMap = smSearchMgr.getSortedGeneCandidates() # Map<Concept->Score>This is based onSemanticMotifsSearchResult.getScoredGenes ( Lucene-scored concepts ), which works like:- From lucene-hit concepts, compute gene2HitConcepts, ie, a subfilter over gene->concepts map (coming from sem motifs)
- use gene2HitConcepts to compute knet scores for each gene =>
scoredGeneCandidates: Map<Gene -> KnetScore> - return gene -> score result, ranked by score and with a filter over (unlikely) duplicated genes
-
Then, this is (possibly) filtered using user genes + QTL genes
-
Finally, we have
genesMapandgenes -
Next is the chromosome view
- what to do with multi-specie case?
-
Next is
exportService.exportGeneTable() -
Next is
exportService.exportEvidenceTable()
-
- Extracts the
- Does the same gene filtering as _keyword()
-
ondexServiceProvider.getSemanticMotifService ().findSemanticMotifs( keyword, seed (genes) )Map<ONDEXConcept, Float> luceneResults = searchService.searchGeneRelatedConcepts ( keyword, seed, false )- Then, semanticMotifDataService.getGraphTraverser () with the seed genes
=>
Map<ONDEXConcept, List<EvidencePathNode>> results - Splits the search string into actual keyowrds (
SearchUtils.getSearchWords())- get a colour map for them (
UIUtils.createHilightColorMap()) - Uses the found paths to create the network view graph
- highlights paths and node labels based on the search keywords
- get a colour map for them (
- General info on the current dataset
- Served by
DatasetInfo DatasetInfoService.datasetInfo() - Mostly based on the dataset section in the config YAML
- Gets per-type topological information. Used by the 'Release notes' button
- Served by
DatasetInfoService.networkStats() - Based on the JSON file produced by
KnetMinerInitializer.exportGraphStats() - which mostly get data from the Semantic Motif summary data
- Served by
DatasetInfoService.knetSpaceURL() - Using a dedicated config variable
- @param keyword, used to extract an
evidenceOndexId - list: usual gene list (except QTL)
- Similar to /network, see #631
- No longer used, removed
- Replaced by
/dataset-info/network-stats, see #657 - Fetches stats on the whole dataset,
- which were computed by
ExportService.exportGraphStats() - which was invoked by
OSP.initData()
- which were computed by
- Searches genes bases on user input (uses
KGUtils.filterGenesByAccessionKeywords()as above) - Adds genes in QTL regions, as above+
- Finds sem motifs and builds the subgraph
- exports the subgraph to JSON
- puts counts into the response
- WTH?!?!?!?
- No longer used, removed
- Prepares data to perform a network view request
- Then forwards to genepage.jsp (via MVC)
- which will know how to invoke /network
- We moved it to the client, where it belongs
- Works similarly to genepage above
- Replaced by
/dataset-info/knetspace-url. - returns the KnetSpace host, set in the config.
-
Replaced by
/dataset-info. -
Some general info. Very rubbish format, it puts JSON into a string, instead of the usual fields in the response class. The taxIds overwrite each other:
summaryJSON.put("dbVersion", dataService.getDatasetVersion () ); summaryJSON.put("sourceOrganization", dataService.getDatasetOrganization ()); dataService.getTaxIds ().forEach( taxID -> { summaryJSON.put("speciesTaxid", taxID); }); summaryJSON.put("speciesName", dataService.getSpecies()); // TODO: in future, this might come from OXL metadata (the graph descriptor) SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm"); var timestampStr = formatter.format ( oxlFile.lastModified () ); summaryJSON.put("dbDateCreated", timestampStr); summaryJSON.put("provider", dataService.getDatasetProvider () ); String jsonString = summaryJSON.toString(); // Removing the pesky double quotes jsonString = jsonString.substring(1, jsonString.length() - 1); log.info("response.dataSource= " + jsonString); response.dataSource = jsonString;
-
It's used by
save-knet.js, forexportAsJson(). This is very messy -
It's also used in
showNetworkStats.js::fetchStats(), butdbVersiononly is fetched from the API out
-
Map<ONDEXConcept, Float> scoredConcepts: the keyword-related concepts, got from Lucene
- Based on
SearchService.searchGeneRelatedConcepts()(see below)
- Based on
-
SemanticMotifsSearchResult searchResult-
Uses SearchService.getScoredGenes ( scoredConcepts, this.taxId )(see below)
-
- Counts concepts in
scoredConcepts, just using its size - Counts the genes linked to
scoredConcepts- For each concept:
- Get genes in concept2Genes.get ( concept )
- Filter by taxId
- Eventually, count
- For each concept:
Case there is only a gene list:
(gene list is normalised)
for each gene in gene list: add genes2Concepts ( gene ) to the result, with score = 1
Case with keyword
- get the notQuery expression from keywords
- Search concepts via Lucene, using keywords
Map<Integer, Set<Integer>> gene2HitConcepts
- For each concept in scoredConcepts:
- add concept2Genes.get ( concept ) to result
- possibly, filter by taxId
- add concept2Genes.get ( concept ) to result
- Then, group by gene
Map<ONDEXConcept, Double> scoredGeneCandidates
-
for each gene in
gene2HitConcepts:- for concept in
gene2HitConcepts.get ( gene )-
luceneScore = scoredEvidenceConcepts.get ( concept )igf = log ( genesCount / concepts2Gene.get ( concept ).size () )
-
invGraphDist = 1 / genes2PathLens.get ( gene, concept )-
knetScore= the three above combined
-
- Sum of
knetScorefor each concept isknetScore ( gene )
-
- for concept in
-
scoredGeneCandidatesare sorted -
The final
SemanticMotifsSearchResultresult contains:-
geneId2RelatedConceptIds = gene2HitConceptsgene2Score = sorted scoredGeneCandidates
-
-
genesCountis the total no of genes in the traverser seed, which belong to one of the configured specie In Neo4j: needs to be stored? -
concepts2Gene.get ( concept ).size (), needs to be stored in Neo4j? -
genes2PathLens.get ( gene, concept )in Neo4j, is in the gene/concept link
Params:
* List<ONDEXConcept> candidateGenes
* Set<ONDEXConcept> userGenes
* List<String> userQtlsStr
* String listMode
* SemanticMotifsSearchResult searchResult
- Best name function in ondex
- The gene's evidences are got from
searchResult.getGeneId2RelatedConceptIds() - The gene score is got from searchResult.getGene2Score ()
- The graph distances are got from
genes2PathLengths(SemMotif summaries)- In Neo4j, gene/concept links
Params:
* String keywords // To be removed, not used
* Map<ONDEXConcept, Float> foundConcepts
* Set<ONDEXConcept> userGenes
* List<String> userQtlsStr
* boolean doSortResult
- score is summed up for each evidence concept using
foundConcepts - For each concept,
conceptGenesare fetched fromconcepts2Genes-
startGenesSize=conceptGenes.size()
-
- For each gene in
conceptGenes:-
matchedInGeneList++ if the gene is inuserGenes
-
- At the end:
notMatchedInGeneList = userGenes.size - matchedInGeneListmatchedNotInGeneList = startGenesSize - matchedInGeneListnotMatchedNotInGeneList = genes2Concepts.size - matchedNotInGeneList - matchedInGeneList - notMatchedInGeneList- These are used for Fisher test, from which pvalue is computed
- At the end:
- returns the found concept
- returns the concept score as Lucene score
- returns pvalue as computed above (ie, Fisher test)
- returns
startGenesSize(the no of SM genes associated to the concept) - returns the matching user genes
- Sorts by pvalue, score and others