🖥️Operating system: Windows 11 Home
🛠️Setting up WSL:
-
Open the command prompt of the windows machine as an administrator to avoid the permission issues.
-
Use the command “apt install wsl” to install the windows subsystem for linux (ubuntu).
-
Close the command prompt once ubuntu is installed.
-
Ubuntu will be visible in the start menu. 🐳Setting up docker and MongoDB:
-
Launch the ubuntu application.
-
Shift to the root user using the command “sudo su”
-
Enable the docker using the commands “sudo systemctl start docker” and “sudo systemctl enable docker”
-
Use the command “docker pull mongo” to install the MongoDB container
-
To run the MongoDB container, we use the command “docker run -d --name my-mongo-container -p 27017:27017 -v /root/AIT614:/data mongo --bind_ip 0.0.0.0” Here, “my-mongo-container” is the name of the container, “27017” is the MongoDB port and “ /root/AIT614” is the location of the folder where the dataset is located. “--bind_ip 0.0.0.0” is used to allow traffic from and to the container
-
Import the data into MongoDB using the command “docker exec -i my-mongo-container mongoimport --db my_database --collection my_collection --type csv --file /data/telecom_churn.csv --headerline”. In this command, “mongoimport” is the function used to import the data, “my_database” is the database name, “my_collection” is the collection name, since the data in MongoDB is saved as documents in a collection. We have mentioned the the file path as “/data/telecom_churn.csv”.
🌍Setting up Ngrok:
-
To use Ngrok, we need to set up a free account and update credit card information to gain access(There is no charge up to 1GB transfers). We need to use the command provided in ngrok along with the token to enable Ngrok.
-
Use the command “ngrok tcp 27017” to get the url that will enable the connection between MongoDB and Databricks.
📜Running the ipynb file:
Upload the ipynb file to Databricks.
Create a cluster and connect it to the ipynb notebook.
Cell 3: This cell holds the url for MongoDB which is the url provided by Ngrok
The rest of the cells can be run. Note: Since the SVM cells take more time and there is a limit on the cluster, cells 1-11,17,21 need to be run multiple times with a new cluster each time to run each SVM with a different kernel.