CoderSherlock
diff --git a/‎_posts/2024-04-11-inotify-watcher-leaks-in-kubelet.md
Lines changed: 57 additions & 1 deletion b/‎_posts/2024-04-11-inotify-watcher-leaks-in-kubelet.md
Lines changed: 57 additions & 1 deletion
diff --git a/‎_site/feed.xml
Lines changed: 67 additions & 12 deletions b/‎_site/feed.xml
Lines changed: 67 additions & 12 deletions
diff --git a/‎_site/posts/charles-is-not-a-good-tool.html
Lines changed: 1 addition & 1 deletion b/‎_site/posts/charles-is-not-a-good-tool.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎_site/posts/eddl-how-do-we-train-on-limited-edge-devices-part2.html
Lines changed: 1 addition & 1 deletion b/‎_site/posts/eddl-how-do-we-train-on-limited-edge-devices-part2.html
Lines changed: 1 addition & 1 deletion
diff --git a/‎_site/posts/eddl-how-do-we-train-on-limited-edge-devices.html
Lines changed: 1 addition & 1 deletion b/‎_site/posts/eddl-how-do-we-train-on-limited-edge-devices.html
Lines changed: 1 addition & 1 deletion
@@ -77,7 +77,7 @@ Inode	Pathname
 258855	/etc/srv/kubernetes/pki/ca-certificates.crt
 ```
 
-Put all processes above into one single script, I can retrieve all target files, that would help to understand if there's a real leakage. Also, I count the unique inode amount, this could also help to know which inode are monitored multiple times.
+Put all processes above into one single script(please see the [updated version in the appendix](#updated-script-to-get-inotify-watchers-initiated-by-kubelet)), I can retrieve all target files, that would help to understand if there's a real leakage. Also, I count the unique inode amount, this could also help to know which inode are monitored multiple times.
 
 ```bash
 cat << EOF | sudo tee -a test.sh
@@ -149,6 +149,62 @@ This turns things easy, because I can just use pod ID to compare between running
 - [How Kubelet leaked inotify watchers?]()
 - [debugfs]()
 
+## Appendix
+
+### Updated script to get inotify watchers initiated by kubelet
+
+Thanks for [yujuhong@](https://github.com/yujuhong)'s momentum and helps in finishing updated script.
+
+```bash
+PID=$(echo $(ps -aux | grep "/home/kubernetes/bin/kubelet" | head -1) |  cut -d " " -f 2)
+echo "Kubelet Pid:" ${PID}
+
+inums_raw=$(find /proc/${PID}/fdinfo -type f 2>/dev/null | xargs grep ^inotify)
+# echo ${inums_raw}
+echo "Count: $(find /proc/${PID}/fdinfo -type f 2>/dev/null | xargs grep ^inotify | wc -l)"
+
+while read -r line;
+do
+        reg="ino:([0-9a-f]*) sdev:([0-9a-f]*)"
+        if [[ ${line} =~ $reg ]]; then
+                ino="${BASH_REMATCH[1]}"
+                sdev="${BASH_REMATCH[2]}"
+                # echo $ino $sdev
+        else
+                echo "wrong line"
+        fi
+
+        sdev_in_dec=$((16#$sdev))
+        minor=$((sdev_in_dec % 256))
+        major=$((sdev_in_dec / 256))
+        # echo "${major}:${minor}"
+
+        in_fds_sdev+=("${ino}-${major}:${minor}")
+done <<< "${inums_raw}"
+
+uniq_pairs=($(echo "${in_fds_sdev[@]}" | sort | uniq))
+echo "Unique target" ${#uniq_pairs[@]}
+
+printf "%-10s %-10s %-6s %s\n" "INUM" "DEV" "COUNT" "TARGET"
+for pair in "${uniq_pairs[@]}"
+do
+        count=$(echo "${in_fds_sdev[@]}" | grep -o "${pair}" | wc -l)
+        fd_hex=$(echo ${pair} | cut -d "-" -f 1)
+        dev=$(echo ${pair} | cut -d "-" -f 2)
+        fd_dec="$((16#${fd_hex}))"
+
+        mount_info=$(grep ${dev} /proc/$PID/mountinfo)
+        if [[ -z $mount_info ]]; then
+                echo "Can't find mount info for" $dev
+        else
+                tmpfs_path=$(echo $mount_info | cut -d " " -f 5)
+                # echo $tmpfs_path
+                loc=$(find ${tmpfs_path} -inum ${fd_dec})
+                printf "%-10s %-10s %-6s %s\n" "${fd_dec}" "${dev}" "${count}" "${loc}"
+        fi
+done
+```
+
 ## References
 
 [^flbit_ino]: [Fluentbit error "cannot adjust chunk size" on GKE](https://stackoverflow.com/a/76712244)
 
@@ -34,7 +34,7 @@
 <meta name="twitter:card" content="summary" />
 <meta property="twitter:title" content="Using charles proxy to monitor mobile SSL traffics" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Pengzhan Hao"},"dateModified":"2024-04-19T00:41:23-04:00","datePublished":"2016-10-27T22:50:33-04:00","description":"In this blog, I will generally talk about how to use proper tools to monitor SSL traffics of a mobile devices. Currently, I only can dealing with those SSL traffics which use an obviously certification. Some applications may not using system root cert or they doesn’t provide us a method to modify their own certs. For these situation, I still didn’t find a good solutions for it. But I’ll keep updating this if I get one. My current solution is using AP to forward all SSL traffic to a proxy, charles proxy is my first choice (Prof asked). It’s a non-free software which still update new versions now. So mainly, I’ll talk about how to charles SSL proxy.","headline":"Using charles proxy to monitor mobile SSL traffics","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.pengzhan.dev/posts/charles-is-not-a-good-tool"},"url":"https://blog.pengzhan.dev/posts/charles-is-not-a-good-tool"}</script>
+{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Pengzhan Hao"},"dateModified":"2024-04-24T00:32:03-04:00","datePublished":"2016-10-27T22:50:33-04:00","description":"In this blog, I will generally talk about how to use proper tools to monitor SSL traffics of a mobile devices. Currently, I only can dealing with those SSL traffics which use an obviously certification. Some applications may not using system root cert or they doesn’t provide us a method to modify their own certs. For these situation, I still didn’t find a good solutions for it. But I’ll keep updating this if I get one. My current solution is using AP to forward all SSL traffic to a proxy, charles proxy is my first choice (Prof asked). It’s a non-free software which still update new versions now. So mainly, I’ll talk about how to charles SSL proxy.","headline":"Using charles proxy to monitor mobile SSL traffics","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.pengzhan.dev/posts/charles-is-not-a-good-tool"},"url":"https://blog.pengzhan.dev/posts/charles-is-not-a-good-tool"}</script>
 <!-- End Jekyll SEO tag -->
 
 <!-- end custom head snippets -->
 
@@ -34,7 +34,7 @@
 <meta name="twitter:card" content="summary" />
 <meta property="twitter:title" content="EDDL: How do we train neural networks on limited edge devices - PART 2" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Pengzhan Hao"},"dateModified":"2024-04-19T00:41:23-04:00","datePublished":"2021-10-31T13:01:14-04:00","description":"In the last post, part1, our idea of distributed learning on edge environment was generally addressed. I introduced the reason why edge distributed learning is needed and what improvements it can achieve. In this post, I will talk about our motivation study and how our framework works. How does data support us training on edge? Before designing and implementing our framework, we first need confirmation that training on edge resource-limited devices is worthwhile. We were using a malware detection neural network to show why a small, customized neural network is better. We collected 32000+ mobile apps feature as global data. With these data records, we trained a multilayer perceptron called “PerNet” to determine whether a given feature belongs to a benign or malware app. We called this detection. As well, PerNet can also classify malware apps into different types of attacks. We called this classification. The global model can achieve 93% above recall rate and 96.93% above accuracy. With all these data, we selected two community app usage sub-dataset for local model generations. Large categories (Scenario 1) We chose the 5 largest categories of apps, including entertainment, tools, brain&amp;Puzzle, Lifestyle, and Education, as well as the 5 largest malware categories. All together, 12000+ apps were included in this sub-dataset, almost 50 to 50 between benign and malware. Campus-community categories (Scenario 2) We chose the 5 most downloaded categories from college students as benign groups, as well as a similar amount of 5 malware categories. To ensure that malware apps are included in 5 benign categories, we also considered synthesizing some other malware apps within categories of 5 most downloaded(benign) categories. With these two types of sub-dataset, we used the same PerNet to generate multiple local models. Under each scenarios experiment, we compared global and local models on the preserved test dataset. In all classification performances, local beat global in every scenario. In detection performances, local also share the same accuracy as global does. In summary, local models were trained on special occasions. Under the same circumstance, a global model can achieve no better accuracy than local models. The reason why local is better might be because of overfitting. I believe this issue also be considered in the machine learning communities that they brought transfer learning, a technique to optimize global models to special scenarios but performing more training to a global model once it’s shipped to local. Design and Implementation Overall design The basic EDDL distributed training setup consists of 3 parts. EDDL training cluster, a device cluster that consists of edge or mobile devices that are participating in training. EDDL manager, the initial driver program that works as collect training data, relay data to training devices and initial training clusters. Training data entry (TDE), a data storage for all training data. Dynamic training data distribution Existing distributed DNN training solutions usually statically partition training data among workers. It can be a problem when the training node joins and exits. We designed our framework that can dynamically distribute training data during learning. Before every training batch started, a batch of TDE will be sent to devices. In our experiments, we found that by applying this design, overall training time was shortened by doing. Especially in large amount devices cases, this optimization can be 50% less than statically divided. Scaling up cluster size Our framework was designed to have both sync and async parameter aggregation. Asynchronous aggregation can allow a high outcome of training batch but with a sacrifice or converge time. Synchronous aggregation allows a quick converge time in epochs, however can’t ensure performance when there’s a struggler worker. As showed in experiments, we chose sync as default because the converging time is dominant in overall training time. But, we also considered the possibilities of that async with more workers can achieve similar overall training time. We introduced a formula to determine whether adding more training nodes can help or not. Here we used bandwidth usage coefficient (BUC) as [BUC = \\dfrac{n}{T_{sync}}] In this formula, \\(n\\) is the number of devices, and \\(T_{sync}\\) is the transmission time of parameters. With an increasing number of workers, n increase linearly but transmission time does not. When \\(BUC\\) increases, the cluster can speed up training time by adding workers. Otherwise, adding more workers won’t help with overall training time. Adaptive leader role splitting The idea of role splitting is simple that a device can work as a worker as well leader. The advantage of doing this is straightforward that we can transfer 1 less parameter and training time will be shortened. However, in our current settings, it can’t perform much better help since only 1 leader role is in a cluster. We can benefit from this in our future works. Overall architecture Details were given in the image. Prototype hardware and software EDDL was designed to be run on two single-board computer embedded platforms. One such platform is ODROID-XU4, which is equipped with a 2.1/1.4 GHz 32-bit ARM processor and 2GB memory. The other platform is the Raspberry Pi 3 Model B board, which comes with an ARM 1.2 GHz 64-bit quad-core processor and 1GB memory. The operating system running on the above platforms is Ubuntu 18.04 with Linux kernel 4.14. We used Dlib, a C++ library that provides implementations for a wide range of machine learning algorithms. We chose the Dlib library because it is written in C/C++, and can be easily and natively used in embedded devices.","headline":"EDDL: How do we train neural networks on limited edge devices - PART 2","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"},"url":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"}</script>
+{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Pengzhan Hao"},"dateModified":"2024-04-24T00:32:03-04:00","datePublished":"2021-10-31T13:01:14-04:00","description":"In the last post, part1, our idea of distributed learning on edge environment was generally addressed. I introduced the reason why edge distributed learning is needed and what improvements it can achieve. In this post, I will talk about our motivation study and how our framework works. How does data support us training on edge? Before designing and implementing our framework, we first need confirmation that training on edge resource-limited devices is worthwhile. We were using a malware detection neural network to show why a small, customized neural network is better. We collected 32000+ mobile apps feature as global data. With these data records, we trained a multilayer perceptron called “PerNet” to determine whether a given feature belongs to a benign or malware app. We called this detection. As well, PerNet can also classify malware apps into different types of attacks. We called this classification. The global model can achieve 93% above recall rate and 96.93% above accuracy. With all these data, we selected two community app usage sub-dataset for local model generations. Large categories (Scenario 1) We chose the 5 largest categories of apps, including entertainment, tools, brain&amp;Puzzle, Lifestyle, and Education, as well as the 5 largest malware categories. All together, 12000+ apps were included in this sub-dataset, almost 50 to 50 between benign and malware. Campus-community categories (Scenario 2) We chose the 5 most downloaded categories from college students as benign groups, as well as a similar amount of 5 malware categories. To ensure that malware apps are included in 5 benign categories, we also considered synthesizing some other malware apps within categories of 5 most downloaded(benign) categories. With these two types of sub-dataset, we used the same PerNet to generate multiple local models. Under each scenarios experiment, we compared global and local models on the preserved test dataset. In all classification performances, local beat global in every scenario. In detection performances, local also share the same accuracy as global does. In summary, local models were trained on special occasions. Under the same circumstance, a global model can achieve no better accuracy than local models. The reason why local is better might be because of overfitting. I believe this issue also be considered in the machine learning communities that they brought transfer learning, a technique to optimize global models to special scenarios but performing more training to a global model once it’s shipped to local. Design and Implementation Overall design The basic EDDL distributed training setup consists of 3 parts. EDDL training cluster, a device cluster that consists of edge or mobile devices that are participating in training. EDDL manager, the initial driver program that works as collect training data, relay data to training devices and initial training clusters. Training data entry (TDE), a data storage for all training data. Dynamic training data distribution Existing distributed DNN training solutions usually statically partition training data among workers. It can be a problem when the training node joins and exits. We designed our framework that can dynamically distribute training data during learning. Before every training batch started, a batch of TDE will be sent to devices. In our experiments, we found that by applying this design, overall training time was shortened by doing. Especially in large amount devices cases, this optimization can be 50% less than statically divided. Scaling up cluster size Our framework was designed to have both sync and async parameter aggregation. Asynchronous aggregation can allow a high outcome of training batch but with a sacrifice or converge time. Synchronous aggregation allows a quick converge time in epochs, however can’t ensure performance when there’s a struggler worker. As showed in experiments, we chose sync as default because the converging time is dominant in overall training time. But, we also considered the possibilities of that async with more workers can achieve similar overall training time. We introduced a formula to determine whether adding more training nodes can help or not. Here we used bandwidth usage coefficient (BUC) as [BUC = \\dfrac{n}{T_{sync}}] In this formula, \\(n\\) is the number of devices, and \\(T_{sync}\\) is the transmission time of parameters. With an increasing number of workers, n increase linearly but transmission time does not. When \\(BUC\\) increases, the cluster can speed up training time by adding workers. Otherwise, adding more workers won’t help with overall training time. Adaptive leader role splitting The idea of role splitting is simple that a device can work as a worker as well leader. The advantage of doing this is straightforward that we can transfer 1 less parameter and training time will be shortened. However, in our current settings, it can’t perform much better help since only 1 leader role is in a cluster. We can benefit from this in our future works. Overall architecture Details were given in the image. Prototype hardware and software EDDL was designed to be run on two single-board computer embedded platforms. One such platform is ODROID-XU4, which is equipped with a 2.1/1.4 GHz 32-bit ARM processor and 2GB memory. The other platform is the Raspberry Pi 3 Model B board, which comes with an ARM 1.2 GHz 64-bit quad-core processor and 1GB memory. The operating system running on the above platforms is Ubuntu 18.04 with Linux kernel 4.14. We used Dlib, a C++ library that provides implementations for a wide range of machine learning algorithms. We chose the Dlib library because it is written in C/C++, and can be easily and natively used in embedded devices.","headline":"EDDL: How do we train neural networks on limited edge devices - PART 2","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"},"url":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"}</script>
 <!-- End Jekyll SEO tag -->
 
 <!-- end custom head snippets -->
 
@@ -34,7 +34,7 @@
 <meta name="twitter:card" content="summary" />
 <meta property="twitter:title" content="EDDL: How do we train neural networks on limited edge devices - PART 1" />
 <script type="application/ld+json">
-{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Pengzhan Hao"},"dateModified":"2024-04-19T00:41:23-04:00","datePublished":"2021-10-13T16:53:20-04:00","description":"This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published. As the first part of the introductions, I focus only on the motivation and summary of our works. More details in design and implementation can be found in late posts.","headline":"EDDL: How do we train neural networks on limited edge devices - PART 1","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices"},"url":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices"}</script>
+{"@context":"https://schema.org","@type":"BlogPosting","author":{"@type":"Person","name":"Pengzhan Hao"},"dateModified":"2024-04-24T00:32:03-04:00","datePublished":"2021-10-13T16:53:20-04:00","description":"This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published. As the first part of the introductions, I focus only on the motivation and summary of our works. More details in design and implementation can be found in late posts.","headline":"EDDL: How do we train neural networks on limited edge devices - PART 1","mainEntityOfPage":{"@type":"WebPage","@id":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices"},"url":"https://blog.pengzhan.dev/posts/eddl-how-do-we-train-on-limited-edge-devices"}</script>
 <!-- End Jekyll SEO tag -->
 
 <!-- end custom head snippets -->