Fix: question33 broken link

danieldanielecki · web-flow · commit 5c82c91481b2 · 2023-10-24T18:33:13.000+02:00
diff --git a/README.md b/README.md
@@ -110,7 +110,7 @@ We are so thankful for every contribution, which makes sure we can deliver top-n
 | 30  | [When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST be specified? (Choose three.)](#when-submitting-amazon-sagemaker-training-jobs-using-one-of-the-built-in-algorithms-which-common-parameters-must-be-specified-choose-three)
 | 31  | [A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance. How should the records be stored in Amazon S3 to improve query performance?](#a-monitoring-service-generates-1-tb-of-scale-metrics-record-data-every-minute-a-research-team-performs-queries-on-this-data-using-amazon-athena-the-queries-run-slowly-due-to-the-large-volume-of-data-and-the-team-requires-better-performance-how-should-the-records-be-stored-in-amazon-s3-to-improve-query-performance)
 | 32  | [Machine Learning Specialist is working with a media company to perform classification on popular articles from the company's website. The company is using random forests to classify how popular an article will be before it is published. A sample of the data being used is below. Given the dataset, the Specialist wants to convert the Day_Of_Week column to binary values. What technique should be used to convert this column to binary values?](#machine-learning-specialist-is-working-with-a-media-company-to-perform-classification-on-popular-articles-from-the-companys-website-the-company-is-using-random-forests-to-classify-how-popular-an-article-will-be-before-it-is-published-a-sample-of-the-data-being-used-is-below-given-the-dataset-the-specialist-wants-to-convert-the-day_of_week-column-to-binary-values-what-technique-should-be-used-to-convert-this-column-to-binary-values)
-| 33  | [A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users. The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns. Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory. Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)](#a-gaming-company-has-launched-an-online-game-where-people-can-start-playing-for-free-but-they-need-to-pay-if-they-choose-to-use-certain-features-the-company-needs-to-build-an-automated-system-to-predict-whether-or-not-a-new-user-will-become-a-paid-user-within-1-year-the-company-has-gathered-a-labeled-dataset-from-1-million-users-the-training-dataset-consists-of-1000-positive-samples-from-users-who-ended-up-paying-within-1-year-and-999000-negative-samples-from-users-who-did-not-use-any-paid-features-each-data-sample-consists-of-200-features-including-user-age-device-location-and-play-patterns-using-this-dataset-for-training-the-data-science-team-trained-a-random-forest-model-that-converged-with-over-99%25-accuracy-on-the-training-set-however-the-prediction-results-on-a-test-dataset-were-not-satisfactory-which-of-the-following-approaches-should-the-data-science-team-take-to-mitigate-this-issue-choose-two)
+| 33  | [A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users. The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns. Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory. Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)](#a-gaming-company-has-launched-an-online-game-where-people-can-start-playing-for-free-but-they-need-to-pay-if-they-choose-to-use-certain-features-the-company-needs-to-build-an-automated-system-to-predict-whether-or-not-a-new-user-will-become-a-paid-user-within-1-year-the-company-has-gathered-a-labeled-dataset-from-1-million-users-the-training-dataset-consists-of-1000-positive-samples-from-users-who-ended-up-paying-within-1-year-and-999000-negative-samples-from-users-who-did-not-use-any-paid-features-each-data-sample-consists-of-200-features-including-user-age-device-location-and-play-patterns-using-this-dataset-for-training-the-data-science-team-trained-a-random-forest-model-that-converged-with-over-99-accuracy-on-the-training-set-however-the-prediction-results-on-a-test-dataset-were-not-satisfactory-which-of-the-following-approaches-should-the-data-science-team-take-to-mitigate-this-issue-choose-two)
 | 34  | [A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age. Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population. How should the Data Scientist correct this issue?](#a-data-scientist-is-developing-a-machine-learning-model-to-predict-future-patient-outcomes-based-on-information-collected-about-each-patient-and-their-treatment-plans-the-model-should-output-a-continuous-value-as-its-prediction-the-data-available-includes-labeled-outcomes-for-a-set-of-4000-patients-the-study-was-conducted-on-a-group-of-individuals-over-the-age-of-65-who-have-a-particular-disease-that-is-known-to-worsen-with-age-initial-models-have-performed-poorly-while-reviewing-the-underlying-data-the-data-scientist-notices-that-out-of-4000-patient-observations-there-are-450-where-the-patient-age-has-been-input-as-0-the-other-features-for-these-observations-appear-normal-compared-to-the-rest-of-the-sample-population-how-should-the-data-scientist-correct-this-issue)
 | 35  | [A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL. Which storage scheme is MOST adapted to this scenario?](#a-data-science-team-is-designing-a-dataset-repository-where-it-will-store-a-large-amount-of-training-data-commonly-used-in-its-machine-learning-models-as-data-scientists-may-create-an-arbitrary-number-of-new-datasets-every-day-the-solution-has-to-scale-automatically-and-be-cost-effective-also-it-must-be-possible-to-explore-the-data-using-sql-which-storage-scheme-is-most-adapted-to-this-scenario)
 | 36  | [Tom has been tasked to install Check Point R80 in a distributed deployment. Before Tom installs the systems this way, how many machines will he need if he does NOT include a SmartConsole machine in his calculations?](#tom-has-been-tasked-to-install-check-point-r80-in-a-distributed-deployment-before-tom-installs-the-systems-this-way-how-many-machines-will-he-need-if-he-does-not-include-a-smartconsole-machine-in-his-calculations)