File tree Expand file tree Collapse file tree 1 file changed +2
-0
lines changed Expand file tree Collapse file tree 1 file changed +2
-0
lines changed Original file line number Diff line number Diff line change 1
1
"""
2
2
BagOfWordsTransformer()
3
+
3
4
Convert a collection of raw documents to matrix representing a bag-of-words structure.
4
5
Essentially, a bag-of-words approach to representing documents in a matrix is comprised of
5
6
a count of every word in the document corpus/collection for every document. This is a simple
6
7
but often quite powerful way of representing documents as vectors. The resulting representation is
7
8
a matrix with rows representing every document in the corpus and columns representing every word
8
9
in the corpus. The value for each cell is the raw count of a particular word in a particular
9
10
document.
11
+
10
12
Similarly to the `TfidfTransformer`, the vocabulary considered can be restricted
11
13
to words occuring in a maximum or minimum portion of documents.
12
14
The parameters `max_doc_freq` and `min_doc_freq` restrict the vocabulary
You can’t perform that action at this time.
0 commit comments