Skip to content

Commit b7abc37

Browse files
committed
minor update to readme
1 parent b8bbb6a commit b7abc37

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Currently, we have a TF-IDF Transformer which converts a collection of raw docum
1717

1818
The goal of using TF-IDF instead of the raw frequencies of occurrence of a token in a given document is to scale down the impact of tokens that occur very frequently in a given corpus and that are hence empirically less informative than features that occur in a small fraction of the training corpus.
1919

20-
### Uses
20+
### Usage
2121
The TF-IDF Transformer accepts a variety of inputs for the raw documents that one wishes to convert into a TF-IDF matrix.
2222

2323
Raw documents can simply be provided as tokenized documents.
@@ -57,7 +57,7 @@ tfidf_mat = transform(mach, ngram_docs)
5757
## BM25 Transformer
5858
BM25 is an approach similar to that of TF-IDF in terms of representing documents in a vector space. The BM25 scoring function uses both term frequency (TF) and inverse document frequency (IDF) so that, for each term in a document, its relative concentration in the document is scored (like TF-IDF). However, BM25 improves upon TF-IDF by incorporating probability - particularly, the probability that a user will consider a search result relevant based on the terms in the search query and those in each document.
5959

60-
### Uses
60+
### Usage
6161
This transformer is used in much the same way as the `TfidfTransformer`.
6262

6363
```julia
@@ -92,7 +92,7 @@ Please see [http://ethen8181.github.io/machine-learning/search/bm25_intro.html](
9292
## Bag-of-Words Transformer
9393
The `MLJText` package also offers a way to represent documents using the simpler bag-of-words representation. This returns a document-term matrix (as you would get in `TextAnalysis`) that consists of the count for every word in the corpus for each document in the corpus.
9494

95-
### Uses
95+
### Usage
9696
```julia
9797
using MLJ, MLJText, TextAnalysis
9898

0 commit comments

Comments
 (0)