You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Google Colab](https://colab.research.google.com/) is the go-to place for many data scientists and machine learning engineers who are looking to perform quick analysis or training for free. Below are some snippets that can be useful in Colab.
505
+
506
+
- If you are getting `NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968` or similar error when trying to run `!pip install` or similar CLI commands in Google Colab, you can fix it by running the following command before running `!pip install`. But note, this might break some imports. So make sure to import all the packages before running this command.
- If you want to continuously monitor the GPU usage, you can use `watch -n 2 nvidia-smi --id=0` command. This will refresh the `nvidia-smi` output every 2 second.
282
+
279
283
## HuggingFace Tokenizer
280
284
281
285
- Tokenizer is a pre-processing step that converts the text into a sequence of tokens. [HuggingFace tokenizer](https://huggingface.co/docs/transformers/main_classes/tokenizer) is a wrapper around the [tokenizers library](https://github.com/huggingface/tokenizers), that contains multiple base algorithms for fast tokenization.
- You can use the `summary` method to check the model's architecture. This will show the layers, their output shape and the number of parameters in each layer.
319
+
320
+
=== "Keras"
321
+
``` python linenums="1"
322
+
# import
323
+
from keras.models import Sequential
324
+
from keras.layers import Dense, Conv2D, MaxPooling2D, Flatten
Copy file name to clipboardExpand all lines: docs/machine_learning/interview_questions.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -466,4 +466,12 @@
466
466
467
467
=== "Answer"
468
468
469
-
XGBoost (Extreme Gradient Boosting) is a specific implementation of the Gradient Boosting method that uses a more efficient tree-based model and a number of techniques to speed up the training process and reduce overfitting. XGBoost is commonly used in machine learning competitions and it's one of the most popular libraries used for gradient boosting. It's used for classification and regression problems.
469
+
XGBoost (Extreme Gradient Boosting) is a specific implementation of the Gradient Boosting method that uses a more efficient tree-based model and a number of techniques to speed up the training process and reduce overfitting. XGBoost is commonly used in machine learning competitions and it's one of the most popular libraries used for gradient boosting. It's used for classification and regression problems.
470
+
471
+
!!! Question ""
472
+
=== "Question"
473
+
#### What is `group_size` in context of Quantization?
474
+
475
+
=== "Answer"
476
+
477
+
Group size is a parameter used in the quantization process that determines the number of weights or activations *(imagine weights in a row of matrix)* that are quantized together. A smaller group size can lead to better quantization accuracy, but it can also increase the memory and computational requirements of the model. Group size is an important hyperparameter that needs to be tuned to achieve the best trade-off between accuracy and efficiency. Note, the default groupsize for a GPTQ is 1024. [Refer this interesting Reddit discussion](https://www.reddit.com/r/LocalLLaMA/comments/12rtg82/what_is_group_size_128_and_why_do_30b_models_give/?rdt=46348)
0 commit comments