Skip to content

Commit 25d1c28

Browse files
links pt1
1 parent 25398d2 commit 25d1c28

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

Learn.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -75,12 +75,13 @@ As a first step, we will clean the data by removing null values and outliers in
7575
The questions that we answered as part of the analysis were given in the `Data analysis and visualization section`. Please refer to the Jupyter notebook file for all the codes. This `readme.md` file explains the key steps and results that we got as part of our project.
7676

7777

78-
# <a name="2 Data Source">Data source:</a>
78+
<h1 id="2 Data Source">Data Source</h1>
7979

8080
The dataset is very diverse and came from a Stack overflow developer survey from 180 countries. Stack overflow has data collected through surveys from 2011 to 2020. We choose 2018,2019 and 2020 to analyze for the projects. The participants mostly from the US, India, and EMEA regions. The majority of the survey respondents had a background of developer/ coding experience. We performed various analysis and our key results are given in the `Data Analysis` section.
8181

8282
Dataset can be downloaded from the mentioned below link:
8383

84+
8485
**Download Link** -> https://insights.stackoverflow.com/survey
8586

8687
**Available in GitHub community Exchange** ->https://education.github.com/globalcampus/exchange?utf8=%E2%9C%93&q=sanjay
@@ -89,7 +90,7 @@ The data are available in the CSV format ranging from 40 to 150 MB with data of
8990

9091
The reason why we chose this dataset is because of its diverse nature and it was completely uncleaned. We, as a developer, use Stack overflow to find answers for most of the questions we get. That encouraged us to explore and derive key insights from the survey results. Also, the Insights can be used for a better understanding of the information technology and hiring employees and job seekers for preparing the career resume building.
9192

92-
# <a name="3 Key Insights">Key Insights</a>
93+
<h1 id="3 Key Insights">Key Insights</h1>
9394

9495
1. JavaScript has maintained its stronghold as the most commonly used programming language. Almost 70% of the respondents are using Javascript. HTML/CSS stands as the second most popular language with about 63%
9596
2. About `55%` of respondents identify themselves as **full-stack developers**, and about `20%` consider themselves as **mobile developers**.
@@ -101,10 +102,8 @@ The data are available in the CSV format ranging from 40 to 150 MB with data of
101102
8. Most of the Data scientist respondents came from United States(1550). And the country which has the second highest number of data scientist is India(540)
102103
9. The country which pays the highest salary for Data scientist is Ireland($275,851). The second highest was Luxembourg($272,796). Australia pays about ($146,803)
103104

104-
105-
106-
# <a name="4 Data Cleaning">Data Cleaning</a>
107-
105+
<h1 id="4 Data Cleaning">Data Cleaning</h1>
106+
108107
<img src="https://recodehive.com/wp-content/uploads/2021/05/Data-Cleaning-1024x361.png">
109108

110109
As our first step, we started gathering information on all three datasets and looked into the columns that answer the questions we have as part of our research. The mentioned below columns were choosen as key factors for our analysis
@@ -126,7 +125,8 @@ Some of the column names were not easily understandable, for example, the column
126125
| JobSat | CurrentJobSatis |
127126
| JobSeek | JobStatus |
128127

129-
## <a name="4.1 Data Refactoring">4.1) Data Refactoring</a>
128+
129+
<h2 id="4.1 Data Refactoring">4.1) Data Refactoring</h2>
130130

131131
Most of the column values were more detailed and were difficult for analze. For instance, the values in the `EdLevel` column were as below.
132132

@@ -184,7 +184,7 @@ Professional 1037
184184

185185
Similary, we followed the same for other columns such as `Gender` `Profession` `UndergradMajor` `JobStatus` `Employment`
186186

187-
## <a name="4.2 Categorising the data">4.2) Categorising the data</a>
187+
<h2 id="4.2 Categorising the data">4.2) Categorising the data</h2>
188188

189189
One of our column `Ethnicity` had 173 values and had various subcategories. Some of the values are given below for reference.
190190

@@ -238,7 +238,7 @@ df2020.loc[df['Ethnicity'].str.match('Multiracial') == True, 'Ethnicity'] = 'Mul
238238

239239
The above process has been carried out for all three data frames `2018` `2019` `2020`
240240

241-
## <a name="4.3 Handling the null values">4.3) Handling the null values</a>
241+
<h2 id="4.3 Handling the null values">4.3) Handling the null values</h2>
242242

243243
<img src="https://recodehive.com/wp-content/uploads/2021/05/Message-from-Founder-1024x576.png">
244244

@@ -516,7 +516,7 @@ Top 2 features negatively effecting Job Satisfaction are age, country. So, in th
516516
- UndergradMajor and other Science,are mostly satisfied.
517517
- Most satisfied countries Malta, Ghana, Cyprus.
518518

519-
# <a name="7 Conclusion">Conclusion</a>
519+
<h1 id="7 Conclusion">Conclusion:</h1>
520520

521521
Overall, we performed various analyses on the Stack overflow developer survey and derived insights from it.
522522
We found which country has the highest no of respondents, which is the most popular language, education level of respondents, different roles of developers, and so on.

0 commit comments

Comments
 (0)