feedbackfruits-knowledge-engine/thoughts.md at master · feedbackfruits/feedbackfruits-knowledge-engine

Annotate video with captions --> yt-annotator
Merge captions to text --> text-annotator (do we have the captions? get them from api, maybe in engine)
Tag and annotate text --> text-annotator
Cache all the things --> Expose from engine base?
Expose HTTP endpoint --> Again, something with the engine
Call endpoint from frontend --> Should this be aggregated in api? Probably...
Query with all of this info --> api, search-broker

Is the bus just a notification mechanism? It used to be (when it was just quads), but now it's entire docs, with their 'relationships' embedded by id at the least. That allows us to reason not just about the doc but what it has related to it. The bus gives us deduplication of things by key, in this case docs by id. That only works for single docs though, so no graphs. This makes it difficult to combine the labeled quads in Cayley with the bus, because it effectively forces a doc to be present entirely within one label or have data duplication.

And then there is provenance, which tells us how things came to be. https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-spaces/index.html#

Are documents ever in more than one space?

Flatten before sending
Compact before sending
Expand after receiving
Unflatten (?) after receiving? Basically, take the flattened document, get all of the ids for relationships and get all of those documents. Maybe let the engines handle this themselves.
Offer global (derived) measures from engine? (count, centrality) Aggregate in api? Or just build an annotator for it so it can be done on bus-level?

How to store in ES? Index can have multiple types. Filter by type, search by document, relationships with id, basically store flattened compacted docs.

The point of compacting the docs before sending is to use less characters to encode the same information. The point of flattening the docs before sending is to avoid redundancy on the bus. The unflattening is needed because some engines require more context than just a document, i.e. the annotations and tags and their entities.

Some engines might be better off not dealing with JSON-LD. Since the docs are compacted (and expanded before that), it can be verified that everything is the doc is compatible with the context. That would allow docs to call send with plain JSON that can still be contextualized (possible with mapping of "@id" and "@type").

Engines:

Can send and/or receive
Can react to the data they receive
Can send data whenever
Be started and stopped
Have a speed
Have a progress/status
Have side effects

Data:

Is streamed through Kafka
Is stored in Cayley
Is often incomplete (both with quads and docs)
Is sometimes measured globally (count, centrality)
Can be re-contextualized/framed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

thoughts.md

Latest commit

History

thoughts.md

File metadata and controls