Skip to content

Commit f44ad05

Browse files
authored
Merge pull request #27 from JuliaAI/a
Fix a docstring and the whitespace in the docstrings
2 parents 5ab8f37 + 07b6a34 commit f44ad05

File tree

3 files changed

+19
-19
lines changed

3 files changed

+19
-19
lines changed

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "MLJText"
22
uuid = "5e27fcf9-6bac-46ba-8580-b5712f3d6387"
33
authors = ["Chris Alexander <[email protected]>, Anthony D. Blaom <[email protected]>"]
4-
version = "0.2.1"
4+
version = "0.2.2"
55

66
[deps]
77
CorpusLoaders = "214a0ac2-f95b-54f7-a80b-442ed9c2c9e8"

src/bm25_transformer.jl

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -137,21 +137,21 @@ In MLJ or MLJBase, bind an instance `model` to data with
137137
138138
mach = machine(model, X)
139139
140-
$DOC_IDF
140+
$DOC_TRANSFORMER_INPUTS
141141
142142
Train the machine using `fit!(mach, rows=...)`.
143143
144144
# Hyper-parameters
145145
146-
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider.
147-
Terms that occur in `> max_doc_freq` documents will not be considered by the
148-
transformer. For example, if `max_doc_freq` is set to 0.9, terms that are in more than
149-
90% of the documents will be removed.
146+
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider. Terms
147+
that occur in `> max_doc_freq` documents will not be considered by the transformer. For
148+
example, if `max_doc_freq` is set to 0.9, terms that are in more than 90% of the
149+
documents will be removed.
150150
151-
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider.
152-
Terms that occur in `< max_doc_freq` documents will not be considered by the
153-
transformer. A value of 0.01 means that only terms that are at least in 1% of the
154-
documents will be included.
151+
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider. Terms
152+
that occur in `< max_doc_freq` documents will not be considered by the transformer. A
153+
value of 0.01 means that only terms that are at least in 1% of the documents will be
154+
included.
155155
156156
- `κ=2`: The term frequency saturation characteristic. Higher values represent slower
157157
saturation. What we mean by saturation is the degree to which a term occurring extra

src/count_transformer.jl

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -94,15 +94,15 @@ Train the machine using `fit!(mach, rows=...)`.
9494
9595
# Hyper-parameters
9696
97-
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider.
98-
Terms that occur in `> max_doc_freq` documents will not be considered by the
99-
transformer. For example, if `max_doc_freq` is set to 0.9, terms that are in more than
100-
90% of the documents will be removed.
101-
102-
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider.
103-
Terms that occur in `< max_doc_freq` documents will not be considered by the
104-
transformer. A value of 0.01 means that only terms that are at least in 1% of the
105-
documents will be included.
97+
- `max_doc_freq=1.0`: Restricts the vocabulary that the transformer will consider. Terms
98+
that occur in `> max_doc_freq` documents will not be considered by the transformer. For
99+
example, if `max_doc_freq` is set to 0.9, terms that are in more than 90% of the
100+
documents will be removed.
101+
102+
- `min_doc_freq=0.0`: Restricts the vocabulary that the transformer will consider. Terms
103+
that occur in `< max_doc_freq` documents will not be considered by the transformer. A
104+
value of 0.01 means that only terms that are at least in 1% of the documents will be
105+
included.
106106
107107
# Operations
108108

0 commit comments

Comments
 (0)