@@ -1679,30 +1679,30 @@ class OneHotEncoder(JavaEstimator, HasInputCols, HasOutputCols, HasHandleInvalid
1679
1679
at most a single one-value per row that indicates the input category index.
1680
1680
For example with 5 categories, an input value of 2.0 would map to an output vector of
1681
1681
`[0.0, 0.0, 1.0, 0.0]`.
1682
- The last category is not included by default (configurable via :py:attr: `dropLast`),
1682
+ The last category is not included by default (configurable via `dropLast`),
1683
1683
because it makes the vector entries sum up to one, and hence linearly dependent.
1684
1684
So an input value of 4.0 maps to `[0.0, 0.0, 0.0, 0.0]`.
1685
1685
1686
- .. note: : This is different from scikit-learn's OneHotEncoder, which keeps all categories.
1687
- The output vectors are sparse.
1686
+ Note : This is different from scikit-learn's OneHotEncoder, which keeps all categories.
1687
+ The output vectors are sparse.
1688
1688
1689
- When :py:attr: `handleInvalid` is configured to 'keep', an extra "category" indicating invalid
1690
- values is added as last category. So when :py:attr: `dropLast` is true, invalid values are
1691
- encoded as all-zeros vector.
1689
+ When `handleInvalid` is configured to 'keep', an extra "category" indicating invalid values is
1690
+ added as last category. So when `dropLast` is true, invalid values are encoded as all-zeros
1691
+ vector.
1692
1692
1693
- .. note:: When encoding multi-column by using :py:attr: `inputCols` and
1694
- :py:attr:`outputCols` params, input/output cols come in pairs, specified by the order in
1695
- the arrays, and each pair is treated independently.
1693
+ Note: When encoding multi-column by using `inputCols` and `outputCols` params, input/output
1694
+ cols come in pairs, specified by the order in the arrays, and each pair is treated
1695
+ independently.
1696
1696
1697
- .. seealso:: :py:class: `StringIndexer` for converting categorical values into category indices
1697
+ See `StringIndexer` for converting categorical values into category indices
1698
1698
1699
1699
>>> from pyspark.ml.linalg import Vectors
1700
1700
>>> df = spark.createDataFrame([(0.0,), (1.0,), (2.0,)], ["input"])
1701
1701
>>> ohe = OneHotEncoder(inputCols=["input"], outputCols=["output"])
1702
1702
>>> model = ohe.fit(df)
1703
1703
>>> model.transform(df).head().output
1704
1704
SparseVector(2, {0: 1.0})
1705
- >>> ohePath = temp_path + "/ohe "
1705
+ >>> ohePath = temp_path + "/oheEstimator "
1706
1706
>>> ohe.save(ohePath)
1707
1707
>>> loadedOHE = OneHotEncoder.load(ohePath)
1708
1708
>>> loadedOHE.getInputCols() == ohe.getInputCols()
0 commit comments