huggingface
diff --git a/‎fine-tune-segformer.md
Lines changed: 13 additions & 13 deletions b/‎fine-tune-segformer.md
Lines changed: 13 additions & 13 deletions
diff --git a/‎fine-tune-vit.md
Lines changed: 11 additions & 12 deletions b/‎fine-tune-vit.md
Lines changed: 11 additions & 12 deletions
diff --git a/‎image-similarity.md
Lines changed: 2 additions & 2 deletions b/‎image-similarity.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎notebooks/111_tf_serving_vision.ipynb
Lines changed: 6 additions & 6 deletions b/‎notebooks/111_tf_serving_vision.ipynb
Lines changed: 6 additions & 6 deletions
diff --git a/‎notebooks/112_vertex_ai_vision.ipynb
Lines changed: 6 additions & 6 deletions b/‎notebooks/112_vertex_ai_vision.ipynb
Lines changed: 6 additions & 6 deletions
diff --git a/‎notebooks/56_fine_tune_segformer.ipynb
Lines changed: 13 additions & 13 deletions b/‎notebooks/56_fine_tune_segformer.ipynb
Lines changed: 13 additions & 13 deletions
@@ -197,31 +197,31 @@ label2id = {v: k for k, v in id2label.items()}
 num_labels = len(id2label)
 ```
 
-## Feature extractor & data augmentation
+## Image processor & data augmentation
 
-A SegFormer model expects the input to be of a certain shape. To transform our training data to match the expected shape, we can use `SegFormerFeatureExtractor`. We could use the `ds.map` function to apply the feature extractor to the whole training dataset in advance, but this can take up a lot of disk space. Instead, we'll use a *transform*, which will only prepare a batch of data when that data is actually used (on-the-fly). This way, we can start training without waiting for further data preprocessing.
+A SegFormer model expects the input to be of a certain shape. To transform our training data to match the expected shape, we can use `SegFormerImageProcessor`. We could use the `ds.map` function to apply the image processor to the whole training dataset in advance, but this can take up a lot of disk space. Instead, we'll use a *transform*, which will only prepare a batch of data when that data is actually used (on-the-fly). This way, we can start training without waiting for further data preprocessing.
 
 In our transform, we'll also define some data augmentations to make our model more resilient to different lighting conditions. We'll use the [`ColorJitter`](https://pytorch.org/vision/main/generated/torchvision.transforms.ColorJitter.html) function from `torchvision` to randomly change the brightness, contrast, saturation, and hue of the images in the batch.
 
 
 ```python
 from torchvision.transforms import ColorJitter
-from transformers import SegformerFeatureExtractor
+from transformers import SegformerImageProcessor
 
-feature_extractor = SegformerFeatureExtractor()
+processor = SegformerImageProcessor()
 jitter = ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1) 
 
 def train_transforms(example_batch):
     images = [jitter(x) for x in example_batch['pixel_values']]
     labels = [x for x in example_batch['label']]
-    inputs = feature_extractor(images, labels)
+    inputs = processor(images, labels)
     return inputs
 
 
 def val_transforms(example_batch):
     images = [x for x in example_batch['pixel_values']]
     labels = [x for x in example_batch['label']]
-    inputs = feature_extractor(images, labels)
+    inputs = processor(images, labels)
     return inputs
 
 
@@ -324,7 +324,7 @@ def compute_metrics(eval_pred):
             references=labels,
             num_labels=len(id2label),
             ignore_index=0,
-            reduce_labels=feature_extractor.do_reduce_labels,
+            reduce_labels=processor.do_reduce_labels,
         )
 
     # add per category metrics as individual key-value pairs
@@ -359,7 +359,7 @@ Now that our trainer is set up, training is as simple as calling the `train` fun
 trainer.train()
 ```
 
-When we're done with training, we can push our fine-tuned model and the feature extractor to the Hub.
+When we're done with training, we can push our fine-tuned model and the image processor to the Hub.
 
 This will also automatically create a model card with our results. We'll supply some extra information in `kwargs` to make the model card more complete.
 
@@ -371,7 +371,7 @@ kwargs = {
     "dataset": hf_dataset_identifier,
 }
 
-feature_extractor.push_to_hub(hub_model_id)
+processor.push_to_hub(hub_model_id)
 trainer.push_to_hub(**kwargs)
 ```
 
@@ -396,9 +396,9 @@ However, you can also try out your model directly on the Hugging Face Hub, thank
 We'll first load the model from the Hub using `SegformerForSemanticSegmentation.from_pretrained()`.
 
 ```python
-from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
+from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
 
-feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
+processor = SegformerImageProcessor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")
 model = SegformerForSemanticSegmentation.from_pretrained(f"{hf_username}/{hub_model_id}")
 ```
 
@@ -411,15 +411,15 @@ gt_seg = test_ds[0]['label']
 image
 ```
 
-To segment this test image, we first need to prepare the image using the feature extractor. Then we forward it through the model.
+To segment this test image, we first need to prepare the image using the image processor. Then we forward it through the model.
 
 We also need to remember to upscale the output logits to the original image size. In order to get the actual category predictions, we just have to apply an `argmax` on the logits.
 
 
 ```python
 from torch import nn
 
-inputs = feature_extractor(images=image, return_tensors="pt")
+inputs = processor(images=image, return_tensors="pt")
 outputs = model(**inputs)
 logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)
 
 
@@ -171,29 +171,28 @@ From what I'm seeing,
 - Bean Rust:  Has circular brown spots surrounded with a white-ish yellow ring
 - Healthy: ...looks healthy. 🤷‍♂️
 
-## Loading ViT Feature Extractor
+## Loading ViT Image Processor
 
 Now we know what our images look like and better understand the problem we're trying to solve. Let's see how we can prepare these images for our model!
 
 When ViT models are trained, specific transformations are applied to images fed into them. Use the wrong transformations on your image, and the model won't understand what it's seeing! 🖼 ➡️ 🔢
 
-To make sure we apply the correct transformations, we will use a [`ViTFeatureExtractor`](https://huggingface.co/docs/transformers/model_doc/vit#transformers.ViTFeatureExtractor) initialized with a configuration that was saved along with the pretrained model we plan to use. In our case, we'll be using the [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) model, so let's load its feature extractor from the Hugging Face Hub.
+To make sure we apply the correct transformations, we will use a [`ViTImageProcessor`](https://huggingface.co/docs/transformers/model_doc/vit#transformers.ViTImageProcessor) initialized with a configuration that was saved along with the pretrained model we plan to use. In our case, we'll be using the [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) model, so let's load its image processor from the Hugging Face Hub.
 
 
 ```python
-from transformers import ViTFeatureExtractor
+from transformers import ViTImageProcessor
 
 model_name_or_path = 'google/vit-base-patch16-224-in21k'
-feature_extractor = ViTFeatureExtractor.from_pretrained(model_name_or_path)
+processor = ViTImageProcessor.from_pretrained(model_name_or_path)
 ```
 
-You can see the feature extractor configuration by printing it.
+You can see the image processor configuration by printing it.
 
 
-    ViTFeatureExtractor {
+    ViTImageProcessor {
       "do_normalize": true,
       "do_resize": true,
-      "feature_extractor_type": "ViTFeatureExtractor",
       "image_mean": [
         0.5,
         0.5,
@@ -210,14 +209,14 @@ You can see the feature extractor configuration by printing it.
 
 
 
-To process an image, simply pass it to the feature extractor's call function. This will return a dict containing `pixel values`, which is the numeric representation to be passed to the model.
+To process an image, simply pass it to the image processor's call function. This will return a dict containing `pixel values`, which is the numeric representation to be passed to the model.
 
 You get a NumPy array by default, but if you add the `return_tensors='pt'` argument, you'll get back `torch` tensors instead.
 
 
 
 ```python
-feature_extractor(image, return_tensors='pt')
+processor(image, return_tensors='pt')
 ```
 
 Should give you something like...
@@ -235,7 +234,7 @@ Now that you know how to read images and transform them into inputs, let's write
 
 ```python
 def process_example(example):
-    inputs = feature_extractor(example['image'], return_tensors='pt')
+    inputs = processor(example['image'], return_tensors='pt')
     inputs['labels'] = example['labels']
     return inputs
 ```
@@ -263,7 +262,7 @@ ds = load_dataset('beans')
 
 def transform(example_batch):
     # Take a list of PIL images and turn them to pixel values
-    inputs = feature_extractor([x for x in example_batch['image']], return_tensors='pt')
+    inputs = processor([x for x in example_batch['image']], return_tensors='pt')
 
     # Don't forget to include the labels!
     inputs['labels'] = example_batch['labels']
@@ -399,7 +398,7 @@ trainer = Trainer(
     compute_metrics=compute_metrics,
     train_dataset=prepared_ds["train"],
     eval_dataset=prepared_ds["validation"],
-    tokenizer=feature_extractor,
+    tokenizer=processor,
 )
 ```
 
 
@@ -45,11 +45,11 @@ To compute the embeddings from the images, we'll use a vision model that has som
 For loading the model, we leverage the [`AutoModel` class](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModel). It provides an interface for us to load any compatible model checkpoint from the Hugging Face Hub. Alongside the model, we also load the processor associated with the model for data preprocessing. 
 
 ```py
-from transformers import AutoFeatureExtractor, AutoModel
+from transformers import AutoImageProcessor, AutoModel
 
 
 model_ckpt = "nateraw/vit-base-beans"
-extractor = AutoFeatureExtractor.from_pretrained(model_ckpt)
+processor = AutoImageProcessor.from_pretrained(model_ckpt)
 model = AutoModel.from_pretrained(model_ckpt)
 ```
 
 
@@ -67,7 +67,7 @@
       },
       "outputs": [],
       "source": [
-        "from transformers import ViTFeatureExtractor, TFViTForImageClassification\n",
+        "from transformers import ViTImageProcessor, TFViTForImageClassification\n",
         "import tensorflow as tf\n",
         "import tempfile\n",
         "import requests\n",
@@ -288,8 +288,8 @@
         }
       ],
       "source": [
-        "feature_extractor = ViTFeatureExtractor()\n",
-        "feature_extractor"
+        "processor = ViTImageProcessor()\n",
+        "processor"
       ]
     },
     {
@@ -301,7 +301,7 @@
       "outputs": [],
       "source": [
         "CONCRETE_INPUT = \"pixel_values\"\n",
-        "SIZE = feature_extractor.size\n",
+        "SIZE = processor.size[\"height\"]\n",
         "INPUT_SHAPE = (SIZE, SIZE, 3)"
       ]
     },
@@ -314,7 +314,7 @@
       "outputs": [],
       "source": [
         "def normalize_img(\n",
-        "    img, mean=feature_extractor.image_mean, std=feature_extractor.image_std\n",
+        "    img, mean=processor.image_mean, std=processor.image_std\n",
         "):\n",
         "    # Scale to the value range of [0, 1] first and then normalize.\n",
         "    img = img / 255\n",
@@ -1609,4 +1609,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}
@@ -132,7 +132,7 @@
         }
       ],
       "source": [
-        "from transformers import ViTFeatureExtractor, TFViTForImageClassification\n",
+        "from transformers import ViTImageProcessor, TFViTForImageClassification\n",
         "import tensorflow as tf\n",
         "import tempfile\n",
         "import requests\n",
@@ -442,8 +442,8 @@
         }
       ],
       "source": [
-        "feature_extractor = ViTFeatureExtractor()\n",
-        "feature_extractor"
+        "processor = ViTImageProcessor()\n",
+        "processor"
       ]
     },
     {
@@ -456,7 +456,7 @@
       "outputs": [],
       "source": [
         "CONCRETE_INPUT = \"pixel_values\"\n",
-        "SIZE = feature_extractor.size\n",
+        "SIZE = processor.size[\"height\"]\n",
         "INPUT_SHAPE = (SIZE, SIZE, 3)"
       ]
     },
@@ -469,7 +469,7 @@
       },
       "outputs": [],
       "source": [
-        "def normalize_img(img, mean=feature_extractor.image_mean, std=feature_extractor.image_std):\n",
+        "def normalize_img(img, mean=processor.image_mean, std=processor.image_std):\n",
         "    # Scale to the value range of [0, 1] first and then normalize.\n",
         "    img = img / 255\n",
         "    mean = tf.constant(mean)\n",
@@ -1019,4 +1019,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 5
-}
+}
@@ -1201,7 +1201,7 @@
         "id": "EobXJvy2EAQy"
       },
       "source": [
-        "## Feature extractor & data augmentation"
+        "## Image processor & data augmentation"
       ]
     },
     {
@@ -1210,7 +1210,7 @@
         "id": "Za3n6MH1UuDb"
       },
       "source": [
-        "A SegFormer model expects the input to be of a certain shape. To transform our training data to match the expected shape, we can use `SegFormerFeatureExtractor`. We could use the `ds.map` function to apply the feature extractor to the whole training dataset in advance, but this can take up a lot of disk space. Instead, we'll use a *transform*, which will only prepare a batch of data when that data is actually used (on-the-fly). This way, we can start training without waiting for further data preprocessing.\n",
+        "A SegFormer model expects the input to be of a certain shape. To transform our training data to match the expected shape, we can use `SegFormerImageProcessor`. We could use the `ds.map` function to apply the image processor to the whole training dataset in advance, but this can take up a lot of disk space. Instead, we'll use a *transform*, which will only prepare a batch of data when that data is actually used (on-the-fly). This way, we can start training without waiting for further data preprocessing.\n",
         "\n",
         "In our transform, we'll also define some data augmentations to make our model more resilient to different lighting conditions. We'll use the [`ColorJitter`](https://pytorch.org/vision/main/generated/torchvision.transforms.ColorJitter.html) function from `torchvision` to randomly change the brightness, contrast, saturation, and hue of the images in the batch."
       ]
@@ -1243,23 +1243,23 @@
       "source": [
         "from torchvision.transforms import ColorJitter\n",
         "from transformers import (\n",
-        "    SegformerFeatureExtractor,\n",
+        "    SegformerImageProcessor,\n",
         ")\n",
         "\n",
-        "feature_extractor = SegformerFeatureExtractor()\n",
+        "processor = SegformerImageProcessor()\n",
         "jitter = ColorJitter(brightness=0.25, contrast=0.25, saturation=0.25, hue=0.1) \n",
         "\n",
         "def train_transforms(example_batch):\n",
         "    images = [jitter(x) for x in example_batch['pixel_values']]\n",
         "    labels = [x for x in example_batch['label']]\n",
-        "    inputs = feature_extractor(images, labels)\n",
+        "    inputs = processor(images, labels)\n",
         "    return inputs\n",
         "\n",
         "\n",
         "def val_transforms(example_batch):\n",
         "    images = [x for x in example_batch['pixel_values']]\n",
         "    labels = [x for x in example_batch['label']]\n",
-        "    inputs = feature_extractor(images, labels)\n",
+        "    inputs = processor(images, labels)\n",
         "    return inputs\n",
         "\n",
         "\n",
@@ -1488,7 +1488,7 @@
         "            references=labels,\n",
         "            num_labels=len(id2label),\n",
         "            ignore_index=0,\n",
-        "            reduce_labels=feature_extractor.do_reduce_labels,\n",
+        "            reduce_labels=processor.do_reduce_labels,\n",
         "        )\n",
         "    \n",
         "    # add per category metrics as individual key-value pairs\n",
@@ -1565,7 +1565,7 @@
         "id": "YlOal7giORmw"
       },
       "source": [
-        "When we're done with training, we can push our fine-tuned model and the feature extractor to the Hugging Face hub.\n",
+        "When we're done with training, we can push our fine-tuned model and the image processor to the Hugging Face hub.\n",
         "\n",
         "This will also automatically create a model card with our results. We'll supply some extra information in `kwargs` to make the model card more complete."
       ]
@@ -1584,7 +1584,7 @@
         "    \"dataset\": hf_dataset_identifier,\n",
         "}\n",
         "\n",
-        "feature_extractor.push_to_hub(hub_model_id)\n",
+        "processor.push_to_hub(hub_model_id)\n",
         "trainer.push_to_hub(**kwargs)"
       ]
     },
@@ -1645,9 +1645,9 @@
       },
       "outputs": [],
       "source": [
-        "from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation\n",
+        "from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation\n",
         "\n",
-        "feature_extractor = SegformerFeatureExtractor.from_pretrained(\"nvidia/segformer-b0-finetuned-ade-512-512\")\n",
+        "processor = SegformerImageProcessor.from_pretrained(\"nvidia/segformer-b0-finetuned-ade-512-512\")\n",
         "model = SegformerForSemanticSegmentation.from_pretrained(f\"{hf_username}/{hub_model_id}\")"
       ]
     },
@@ -1679,7 +1679,7 @@
         "id": "7m7IfMv6R3_5"
       },
       "source": [
-        "To segment this test image, we first need to prepare the image using the feature extractor. Then we forward it through the model.\n",
+        "To segment this test image, we first need to prepare the image using the image processor. Then we forward it through the model.\n",
         "\n",
         "We also need to remember to upscale the output logits to the original image size. In order to get the actual category predictions, we just have to apply an `argmax` on the logits."
       ]
@@ -1694,7 +1694,7 @@
       "source": [
         "from torch import nn\n",
         "\n",
-        "inputs = feature_extractor(images=image, return_tensors=\"pt\")\n",
+        "inputs = processor(images=image, return_tensors=\"pt\")\n",
         "outputs = model(**inputs)\n",
         "logits = outputs.logits  # shape (batch_size, num_labels, height/4, width/4)\n",
         "\n",