Skip to content

Commit 30b470a

Browse files
pazzo83storopoli
andauthored
Apply suggestions from code review - spacing
Co-authored-by: Jose Storopoli <[email protected]>
1 parent 5237a88 commit 30b470a

File tree

2 files changed

+11
-11
lines changed

2 files changed

+11
-11
lines changed

src/bagofwords_transformer.jl

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,21 @@
44
Convert a collection of raw documents to matrix representing a bag-of-words structure.
55
66
Essentially, a bag-of-words approach to representing documents in a matrix is comprised of
7-
a count of every word in the document corpus/collection for every document. This is a simple
8-
but often quite powerful way of representing documents as vectors. The end representation is
7+
a count of every word in the document corpus/collection for every document. This is a simple
8+
but often quite powerful way of representing documents as vectors. The resulting representation is
99
a matrix with rows representing every document in the corpus and columns representing every word
10-
in the corpus. The value for each cell is the raw count of a particular word in a particular
10+
in the corpus. The value for each cell is the raw count of a particular word in a particular
1111
document.
1212
1313
Similarly to the `TfidfTransformer`, the vocabulary considered can be restricted
1414
to words occuring in a maximum or minimum portion of documents.
1515
1616
The parameters `max_doc_freq` and `min_doc_freq` restrict the vocabulary
17-
that the transformer will consider. `max_doc_freq` indicates that terms in only
18-
up to the specified percentage of documents will be considered. For example, if
17+
that the transformer will consider. `max_doc_freq` indicates that terms in only
18+
up to the specified percentage of documents will be considered. For example, if
1919
`max_doc_freq` is set to 0.9, terms that are in more than 90% of documents
20-
will be removed. Similarly, the `min_doc_freq` parameter restricts terms in the
21-
other direction. A value of 0.01 means that only terms that are at least in 1% of
20+
will be removed. Similarly, the `min_doc_freq` parameter restricts terms in the
21+
other direction. A value of 0.01 means that only terms that are at least in 1% of
2222
documents will be included.
2323
"""
2424
mutable struct BagOfWordsTransformer <: AbstractTextTransformer

src/bm25_transformer.jl

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,11 @@ For more explanations, please see:
2626
- https://nlp.stanford.edu/IR-book/html/htmledition/okapi-bm25-a-non-binary-model-1.html
2727
2828
The parameters `max_doc_freq` and `min_doc_freq` restrict the vocabulary
29-
that the transformer will consider. `max_doc_freq` indicates that terms in only
30-
up to the specified percentage of documents will be considered. For example, if
29+
that the transformer will consider. `max_doc_freq` indicates that terms in only
30+
up to the specified percentage of documents will be considered. For example, if
3131
`max_doc_freq` is set to 0.9, terms that are in more than 90% of documents
32-
will be removed. Similarly, the `min_doc_freq` parameter restricts terms in the
33-
other direction. A value of 0.01 means that only terms that are at least in 1% of
32+
will be removed. Similarly, the `min_doc_freq` parameter restricts terms in the
33+
other direction. A value of 0.01 means that only terms that are at least in 1% of
3434
documents will be included.
3535
"""
3636
mutable struct BM25Transformer <: AbstractTextTransformer

0 commit comments

Comments
 (0)