Skip to content

Commit 28ce37c

Browse files
author
Robert Kruszewski
committed
Revert "[SPARK-26133][ML][FOLLOWUP] Fix doc for OneHotEncoder"
This reverts commit 169d9ad.
1 parent e595072 commit 28ce37c

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

python/pyspark/ml/feature.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1679,30 +1679,30 @@ class OneHotEncoder(JavaEstimator, HasInputCols, HasOutputCols, HasHandleInvalid
16791679
at most a single one-value per row that indicates the input category index.
16801680
For example with 5 categories, an input value of 2.0 would map to an output vector of
16811681
`[0.0, 0.0, 1.0, 0.0]`.
1682-
The last category is not included by default (configurable via :py:attr:`dropLast`),
1682+
The last category is not included by default (configurable via `dropLast`),
16831683
because it makes the vector entries sum up to one, and hence linearly dependent.
16841684
So an input value of 4.0 maps to `[0.0, 0.0, 0.0, 0.0]`.
16851685
1686-
.. note:: This is different from scikit-learn's OneHotEncoder, which keeps all categories.
1687-
The output vectors are sparse.
1686+
Note: This is different from scikit-learn's OneHotEncoder, which keeps all categories.
1687+
The output vectors are sparse.
16881688
1689-
When :py:attr:`handleInvalid` is configured to 'keep', an extra "category" indicating invalid
1690-
values is added as last category. So when :py:attr:`dropLast` is true, invalid values are
1691-
encoded as all-zeros vector.
1689+
When `handleInvalid` is configured to 'keep', an extra "category" indicating invalid values is
1690+
added as last category. So when `dropLast` is true, invalid values are encoded as all-zeros
1691+
vector.
16921692
1693-
.. note:: When encoding multi-column by using :py:attr:`inputCols` and
1694-
:py:attr:`outputCols` params, input/output cols come in pairs, specified by the order in
1695-
the arrays, and each pair is treated independently.
1693+
Note: When encoding multi-column by using `inputCols` and `outputCols` params, input/output
1694+
cols come in pairs, specified by the order in the arrays, and each pair is treated
1695+
independently.
16961696
1697-
.. seealso:: :py:class:`StringIndexer` for converting categorical values into category indices
1697+
See `StringIndexer` for converting categorical values into category indices
16981698
16991699
>>> from pyspark.ml.linalg import Vectors
17001700
>>> df = spark.createDataFrame([(0.0,), (1.0,), (2.0,)], ["input"])
17011701
>>> ohe = OneHotEncoder(inputCols=["input"], outputCols=["output"])
17021702
>>> model = ohe.fit(df)
17031703
>>> model.transform(df).head().output
17041704
SparseVector(2, {0: 1.0})
1705-
>>> ohePath = temp_path + "/ohe"
1705+
>>> ohePath = temp_path + "/oheEstimator"
17061706
>>> ohe.save(ohePath)
17071707
>>> loadedOHE = OneHotEncoder.load(ohePath)
17081708
>>> loadedOHE.getInputCols() == ohe.getInputCols()

0 commit comments

Comments
 (0)