|
205 | 205 | "id": "xNxJ41MafiB-"
|
206 | 206 | },
|
207 | 207 | "source": [
|
208 |
| - "If your data has a uniform datatype, or `dtype`, it's possible use a pandas DataFrame anywhere you could use a NumPy array. This works because the `pandas.DataFrame` class supports the `__array__` protocol, and TensorFlow's `tf.convert_to_tensor` function accepts objects that support the protocol.\n", |
| 208 | + "If your data has a uniform datatype, or `dtype`, it's possible to use a pandas DataFrame anywhere you could use a NumPy array. This works because the `pandas.DataFrame` class supports the `__array__` protocol, and TensorFlow's `tf.convert_to_tensor` function accepts objects that support the protocol.\n", |
209 | 209 | "\n",
|
210 | 210 | "Take the numeric features from the dataset (skip the categorical features for now):"
|
211 | 211 | ]
|
|
428 | 428 | "id": "NQcp7kiPF8TP"
|
429 | 429 | },
|
430 | 430 | "source": [
|
431 |
| - "When you start dealing with heterogenous data, it is no longer possible to treat the DataFrame as if it were a single array. TensorFlow tensors require that all elements have the same `dtype`.\n", |
| 431 | + "When you start dealing with heterogeneous data, it is no longer possible to treat the DataFrame as if it were a single array. TensorFlow tensors require that all elements have the same `dtype`.\n", |
432 | 432 | "\n",
|
433 |
| - "So, in this case, you need to start treating it as a dictionary of columns, where each column has a uniform dtype. A DataFrame is a lot like a dictionary of arrays, so typically all you need to do is cast the DataFrame to a Python dict. Many important TensorFlow APIs support (nested-)dictionaries of arrays as inputs." |
| 433 | + "So, in this case, you need to start treating it as a dictionary of columns, where each column has a uniform `dtype`. A DataFrame is a lot like a dictionary of arrays, so typically all you need to do is cast the DataFrame to a Python dict. Many important TensorFlow APIs support (nested-)dictionaries of arrays as inputs." |
434 | 434 | ]
|
435 | 435 | },
|
436 | 436 | {
|
|
491 | 491 | "source": [
|
492 | 492 | "Typically, Keras models and layers expect a single input tensor, but these classes can accept and return nested structures of dictionaries, tuples and tensors. These structures are known as \"nests\" (refer to the `tf.nest` module for details).\n",
|
493 | 493 | "\n",
|
494 |
| - "There are two equivalent ways you can write a keras model that accepts a dictionary as input." |
| 494 | + "There are two equivalent ways you can write a Keras model that accepts a dictionary as input." |
495 | 495 | ]
|
496 | 496 | },
|
497 | 497 | {
|
|
545 | 545 | " ])\n",
|
546 | 546 | "\n",
|
547 | 547 | " def adapt(self, inputs):\n",
|
548 |
| - " # Stach the inputs and `adapt` the normalization layer.\n", |
| 548 | + " # Stack the inputs and `adapt` the normalization layer.\n", |
549 | 549 | " inputs = stack_dict(inputs)\n",
|
550 | 550 | " self.normalizer.adapt(inputs)\n",
|
551 | 551 | "\n",
|
|
728 | 728 | "id": "zYQ5fDaRxRWQ"
|
729 | 729 | },
|
730 | 730 | "source": [
|
731 |
| - "It you're passing a heterogenous `DataFrame` to Keras, each column may need unique preprocessing. You could do this preprocessing directly in the DataFrame, but for a model to work correctly, inputs always need to be preprocessed the same way. So, the best approach is to build the preprocessing into the model. [Keras preprocessing layers](https://www.tensorflow.org/guide/keras/preprocessing_layers) cover many common tasks." |
| 731 | + "If you're passing a heterogeneous DataFrame to Keras, each column may need unique preprocessing. You could do this preprocessing directly in the DataFrame, but for a model to work correctly, inputs always need to be preprocessed the same way. So, the best approach is to build the preprocessing into the model. [Keras preprocessing layers](https://www.tensorflow.org/guide/keras/preprocessing_layers) cover many common tasks." |
732 | 732 | ]
|
733 | 733 | },
|
734 | 734 | {
|
|
746 | 746 | "id": "C6aVQN4Gw-Va"
|
747 | 747 | },
|
748 | 748 | "source": [
|
749 |
| - "In this dataset some of the \"integer\" features in the raw data are actually Categorical indices. These indices are not really ordered numeric values (refer to the <a href=\"https://archive.ics.uci.edu/ml/datasets/heart+Disease\" class=\"external\">the dataset description</a> for details). Because these are unordered they are inapropriate to feed directly to the model; the model would interpret them as being ordered. To use these inputs you'll need to encode them, either as one-hot vectors or embedding vectors. The same applies to string-categorical features.\n", |
| 749 | + "In this dataset some of the \"integer\" features in the raw data are actually Categorical indices. These indices are not really ordered numeric values (refer to the <a href=\"https://archive.ics.uci.edu/ml/datasets/heart+Disease\" class=\"external\">the dataset description</a> for details). Because these are unordered they are inappropriate to feed directly to the model; the model would interpret them as being ordered. To use these inputs you'll need to encode them, either as one-hot vectors or embedding vectors. The same applies to string-categorical features.\n", |
750 | 750 | "\n",
|
751 |
| - "Note: If you have many features that need identical preprocessing it's more efficient to concatenate them together befofre applying the preprocessing.\n", |
| 751 | + "Note: If you have many features that need identical preprocessing it's more efficient to concatenate them together before applying the preprocessing.\n", |
752 | 752 | "\n",
|
753 | 753 | "Binary features on the other hand do not generally need to be encoded or normalized.\n",
|
754 | 754 | "\n",
|
|
783 | 783 | "id": "HRcC8WkyamJb"
|
784 | 784 | },
|
785 | 785 | "source": [
|
786 |
| - "The next step is to build a preprocessing model that will apply apropriate preprocessing to each to each input and concatenate the results.\n", |
| 786 | + "The next step is to build a preprocessing model that will apply appropriate preprocessing to each input and concatenate the results.\n", |
787 | 787 | "\n",
|
788 | 788 | "This section uses the [Keras Functional API](https://www.tensorflow.org/guide/keras/functional) to implement the preprocessing. You start by creating one `tf.keras.Input` for each column of the dataframe:"
|
789 | 789 | ]
|
|
926 | 926 | "id": "Z3wcFs1oKVao"
|
927 | 927 | },
|
928 | 928 | "source": [
|
929 |
| - "To use categorical features you'll first need to encode them into either binary vectors or embeddings. Since these features only contain a small number of categories, convert the inputs directly to one-hot vectors using the `output_mode='one_hot'` option, supported byy both the `tf.keras.layers.StringLookup` and `tf.keras.layers.IntegerLookup` layers.\n", |
| 929 | + "To use categorical features you'll first need to encode them into either binary vectors or embeddings. Since these features only contain a small number of categories, convert the inputs directly to one-hot vectors using the `output_mode='one_hot'` option, supported by both the `tf.keras.layers.StringLookup` and `tf.keras.layers.IntegerLookup` layers.\n", |
930 | 930 | "\n",
|
931 | 931 | "Here is an example of how these layers work:"
|
932 | 932 | ]
|
|
0 commit comments