Add data set for text analysis

Taken from the MLJText.jl requirements for transformers:

 Generate a vector whose elements are either tokenized documents or bags of words/ngrams. Specifically, each element would be one of the following:
 
- A vector of abstract strings (tokens), e.g., ["I", "like", "Sam",
       ".", "Sam", "is", "nice", "."] (scitype AbstractVector{Textual})

- A dictionary of counts, indexed on abstract strings, e.g.,
       Dict("I"=>1, "Sam"=>2, "Sam is"=>1) (scitype Multiset{Textual}})
 
- A dictionary of counts, indexed on plain ngrams, e.g.,
       Dict(("I",)=>1, ("Sam",)=>2, ("I", "Sam")=>1) (scitype
       Multiset{<:NTuple{N,Textual} where N}); here a plain ngram is a
       tuple of abstract strings.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add data set for text analysis #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add data set for text analysis #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions