You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Learn.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,12 +75,13 @@ As a first step, we will clean the data by removing null values and outliers in
75
75
The questions that we answered as part of the analysis were given in the `Data analysis and visualization section`. Please refer to the Jupyter notebook file for all the codes. This `readme.md` file explains the key steps and results that we got as part of our project.
76
76
77
77
78
-
# <aname="2 Data Source">Data source:</a>
78
+
<h1id="2 Data Source">Data Source</h1>
79
79
80
80
The dataset is very diverse and came from a Stack overflow developer survey from 180 countries. Stack overflow has data collected through surveys from 2011 to 2020. We choose 2018,2019 and 2020 to analyze for the projects. The participants mostly from the US, India, and EMEA regions. The majority of the survey respondents had a background of developer/ coding experience. We performed various analysis and our key results are given in the `Data Analysis` section.
81
81
82
82
Dataset can be downloaded from the mentioned below link:
**Available in GitHub community Exchange** ->https://education.github.com/globalcampus/exchange?utf8=%E2%9C%93&q=sanjay
@@ -89,7 +90,7 @@ The data are available in the CSV format ranging from 40 to 150 MB with data of
89
90
90
91
The reason why we chose this dataset is because of its diverse nature and it was completely uncleaned. We, as a developer, use Stack overflow to find answers for most of the questions we get. That encouraged us to explore and derive key insights from the survey results. Also, the Insights can be used for a better understanding of the information technology and hiring employees and job seekers for preparing the career resume building.
91
92
92
-
# <aname="3 Key Insights">Key Insights</a>
93
+
<h1id="3 Key Insights">Key Insights</h1>
93
94
94
95
1. JavaScript has maintained its stronghold as the most commonly used programming language. Almost 70% of the respondents are using Javascript. HTML/CSS stands as the second most popular language with about 63%
95
96
2. About `55%` of respondents identify themselves as **full-stack developers**, and about `20%` consider themselves as **mobile developers**.
@@ -101,10 +102,8 @@ The data are available in the CSV format ranging from 40 to 150 MB with data of
101
102
8. Most of the Data scientist respondents came from United States(1550). And the country which has the second highest number of data scientist is India(540)
102
103
9. The country which pays the highest salary for Data scientist is Ireland($275,851). The second highest was Luxembourg($272,796). Australia pays about ($146,803)
As our first step, we started gathering information on all three datasets and looked into the columns that answer the questions we have as part of our research. The mentioned below columns were choosen as key factors for our analysis
@@ -126,7 +125,8 @@ Some of the column names were not easily understandable, for example, the column
126
125
| JobSat | CurrentJobSatis |
127
126
| JobSeek | JobStatus |
128
127
129
-
## <aname="4.1 Data Refactoring">4.1) Data Refactoring</a>
128
+
129
+
<h2id="4.1 Data Refactoring">4.1) Data Refactoring</h2>
130
130
131
131
Most of the column values were more detailed and were difficult for analze. For instance, the values in the `EdLevel` column were as below.
132
132
@@ -184,7 +184,7 @@ Professional 1037
184
184
185
185
Similary, we followed the same for other columns such as `Gender``Profession``UndergradMajor``JobStatus``Employment`
186
186
187
-
## <aname="4.2 Categorising the data">4.2) Categorising the data</a>
187
+
<h2id="4.2 Categorising the data">4.2) Categorising the data</h2>
188
188
189
189
One of our column `Ethnicity` had 173 values and had various subcategories. Some of the values are given below for reference.
@@ -516,7 +516,7 @@ Top 2 features negatively effecting Job Satisfaction are age, country. So, in th
516
516
- UndergradMajor and other Science,are mostly satisfied.
517
517
- Most satisfied countries Malta, Ghana, Cyprus.
518
518
519
-
# <aname="7 Conclusion">Conclusion</a>
519
+
<h1id="7 Conclusion">Conclusion:</h1>
520
520
521
521
Overall, we performed various analyses on the Stack overflow developer survey and derived insights from it.
522
522
We found which country has the highest no of respondents, which is the most popular language, education level of respondents, different roles of developers, and so on.
0 commit comments