dfg-13.github.io/data.html at master · dfg-13/dfg-13.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
<!DOCTYPE html>
<html xml:lang="en" lang ="en">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >

    <title>How to organize and analyze research data</title>
    <link rel="stylesheet" type="text/css" media="screen" href="style.css">

</head>
<body>
    <nav class="horizonNav">
            <ul class="horizonBar">
                <li><a href="index.html">Home</a></li>
                <li><a href="template.html">Topic</a></li>
                <li><a href="research.html">Research</a></li>
                <li><a href="data.html">Data</a></li>
                <li><a href="template.html">References</a></li>
                <li><a href="template.html">Style</a></li>
                <li><a href="template.html">Layout</a></li>
                <li><a href="template.html">Modality</a></li>
                <li><a href="template.html">Professionalization</a></li>
            </ul>
    </nav>

    <h1>How to organize and analyze research data</h1>

    <aside>
        <p><b>For this lesson, you'll need the following:</b></p>
        <ul>
            <li><b>Microsoft Excel</b></li>
            <li><b>PASW (formerly SPSS) statistics software, or free trial download</b></li>
        </ul>
        <p><b>You should already be familiar with:</b></p>
        <ul>
            <li><b>The basic functions and tools of Excel</b></li>
            <li><b>Descriptive statistics and inferential statistics</b></li>
            <li><b>Research design and how to identify independent variables and dependent variables</b></li>
        </ul>
    </aside>


    <p>This short lesson is intended for graduate and advanced undergraduate students undertaking original research projects. After completing the 30-minute lesson, you will be able to:</p>

    <ul>
        <li>Format raw research data in an Excel spreadsheet for efficient analysis</li>
        <li>Run basic descriptive statistics in Excel</li>
        <li>Prepare the document for statistical analysis in the software package PASW</li>
    </ul>

    <h2>Format data for analysis</h2>

    <p>
        Make sure the computer you are working on has Microsoft Excel installed.
        Open the document ToolBox_DataLesson.xls [[[NOTE: NEED LINK TO DOWNLOAD TUTORIAL EXCEL FILE]]] in Excel and follow along with the lesson, taking action as instructed.
        In the Raw_Dataset spreadsheet, the name of each variable has been entered in the first row of each column.
    </p>

    <ol type="1">
        <li>Each variable name must be different from other variable names.</li>
        <li>The first variable (in column A) is a <b> unique identifier</b>.</li>
        <li>Variable names must start with a letter (not numbers or special characters), so change <i> 4tutorial_types</i> to <i> tutorial_types</i>.</li>
        <li>Name variables so that they are intuitive to you. Therefore, change <i>use</i> and <i>useful</i> to <i> perc_ease_of_use</i> and <i> perc_usefulness</i>, respectively.</li>
        <li>Your spreadsheet should look like Figure 1, and it should now be readily apparent that Column F refers to <i>Perceived Usefulness</i> and Column G refers to <i>Perceived Ease of Use</i>.</li>
    </ol>

    <figure>
    <img src="./illustrations/datalesson_fig3.png" alt="figure3datalesson" class="center">
    <figcaption>Tutorial type, gender, experience, and task categorical values have been formatted as dichotomous variables.</figcaption>
    </figure>

    <p>Data values, or rows, have been input for each subject in the experiment.</p>

    <ol>
        <li>Decide on input conventions and stick to them. In gender, change the "female" value to "F".</li>
        <li>Separate data into component values whenever possible by adding new columns. For example, tasks_completed values (Y/Y/N) can be broken up into three components. So, add columns task1, task2, and task3, and reformat values appropriately.</li>
        <li>Double-check to ensure no data entry errors have been made, then delete the old tasks_completed column.</li>
        <li>Your Raw_Dataset spreadsheet should now look like Figure 2.</li>
    </ol>

    <figure>
    <img src="./illustrations/datalesson_fig2.png" alt="figure2datalesson" class="center">
    <figcaption>Values for each variable are entered in a consistent format.</figcaption>
    </figure>

    <p>Replace categorical data values with "0" or "1" (0=no, 1=yes) to indicate whether or not the value is represented for the given subject/item. This makes the categorical values <b>dichotomous</b>, which gives the researcher maximum flexibility in testing for relationships or correlations.</p>

    <aside>
        <p class="sb2"><b> Dichotomous variables allow for the easy isolation of characteristics for analysis. For example, separating experience levels allows the researcher to test in PASW whether subjects with low experience have perceived ease of use scores significantly different from those with medium and high experience levels.</b></p>
    </aside>

    <ol>
        <li>Copy the Raw_Dataset values into the Sheet2 tab. Rename this sheet Formatted_Data.</li>
        <li>In the Formatted_Data sheet, add new columns with <b>Insert > Columns</b> for each distinct value in categorical data columns (tutorial_types, gender, exp_level, task1, task2, and task3). You do not have to add a new column for task1, task2, and task3 because their values are already dichotomous.</li>
        <li>Rename columns according to each possible categorical value. Copy and paste the values from the original column.</li>
        <li>Replace dichotomous data values with "0" or "1" (0=no, 1=yes) using the replace function: <b>Edit > Replace</b>.</li>
        <li>Your Formatted_Data spreadsheet should now look like Figure 3.</li>
    </ol>

    <figure>
    <img src="./illustrations/datalesson_fig3(1).png" alt="figure3(1)datalesson" class="center">
    <figcaption>Tutorial type, gender, experience, and task categorical values have been formatted as dichotomous variables.</figcaption>
    </figure>

    <h2>Run descriptive statistics</h2>

    <aside>
        <p class="sb3"><b> Analyzing descriptive statistics may lead you to ask new questions. For example, <i> perc_enjoy</i> is much higher on average than <i> perc_usefulness</i>, which also has a large variance. This suggests that a segment of users may not have considered the system useful even though they enjoyed using it.</b></p>
    </aside>

    <p>Run descriptive (summary) statistics on the dataset and output results in a new spreadsheet. If necessary, rearrange columns so that data requiring summary, those with continuous values, are adjacent. (In the spreadsheet, the <b>continuous</b> values of <i> perc_enjoy</i>, <i> perc_usefulness</i>, and <i> perc_ease_of_use</i> are already adjacent.)
        Write down the "input range" (i.e. H2:J26) of the data requiring summary, making sure not to include variable labels.
    </p>

    <ol>
        <li>Go to <b> Tools > Add-Ins... </b>and make sure <b> Analysis ToolPak </b>is checked. Click OK.</li>
        <li>Go to <b> Tools > Data Analysis </b> and select <b> Descriptive Statistics</b>. Click OK.</li>
        <li>Enter the input range in the blank and check <b> Summary Statistics</b>. Click OK.</li>
    </ol>

    <p>The descriptive statistics summary will output in a new sheet. Rename this spreadsheet Descriptive_Stats. Label the output according to the variables. For example, Column 1 should be perc_enjoy. Your Descriptive_Stats spreadsheet should now look like Figure 4.
        Analyzing descriptive statistics is a great way to start appraising a dataset before running inferential statistics. What insights about the dataset can you glean from the summary statistics for perc_usefulness and perc_ease_of_use?
    </p>

    <figure>
    <img src="./illustrations/datalesson_fig4.png" alt="figure4datalesson" class="center">
    <figcaption>Excel will not run descriptive statistics with non-numeric characters, so the variable labels must be re-entered.</figcaption>
    </figure>

    <h2>Run inferential statistics</h2>

    <p>Open PASW and upload the spreadsheet. (IIT labs in Stuart Building, room 112, have PASW. You can also download a 30-day free trial.)</p>

    <ol>
        <li>When PASW starts, it will prompt you for a data source. Select <b> Open an existing data source > More Files...</b></li>
        <li>Set the file type to Excel, then find and open the file.</li>
        <li>From the list, select the Formatted_Data worksheet. Click OK. You should now see the dataset.</li>
    </ol>

    <p>The most commonly used statistical methods and tests are found under the <b> Analyze </b>menu in the standard toolbar.</p>

    <p>Figure 5 shows where to find some common statistical methods and tests. Under PASW's <b>Analyze</b> menu, see if you can also find the following:</p>

    <ol>
        <li>Wilcoxon-Mann-Whitney test</li>
        <li>Simple linear regression</li>
        <li>Non-parametric correlation</li>
    </ol>

    <figure>
    <img src="./illustrations/datalesson_fig5.png" alt="figure5datalesson" class="center">
    <figcaption>Don't assume PASW doesn't run a given test. Many are located under broader statistical classifications.</figcaption>
    </figure>

    <p>Use your prior knowledge or the <i><a href="http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm"> Choosing the Correct Statistic</a></i> resource to determine which statistic to run to find whether there is a significant difference in perc_usefulness between free_play and other tutorial types.</p>


    <aside>
        <p class="sb4"><b>Consult <a href="http://www.ats.ucla.edu/stat/mult_pkg/whatstat/default.htm"> UCLA's <i>Choosing the Correct Statistic </i></a> to ensure the statistical anlaysis is appropriate for your research question(s), variables of interest, and associated data types.</b></p>
    </aside>


    <p>You should have determined that the Kruskal-Wallis one-way analysis of variance is the appropriate test. Go to <b>Analyze > Nonparametric Tests > K Independent Samples...</b> A new window will open prompting you to select the variables (by column name) to be tested.</p>

    <ol>
        <li>Select the <i> perc_usefulness </i> variable from the scrollable list on the left, and click the top arrow to move it to the Test Variable List (i.e. dependent variable).</li>
        <li>Select the <i> free_play </i> variable from the scrollable list, and click the bottom arrow to move it to the Grouping Variable section (i.e. independent variable).</li>
        <li>Click <b>Define Range...</b>, and enter "0" in the Minimum field and "1" in the Maximum field. This specifies the two categorical variables to be tested. Your screen should now look like Figure 6. Click Continue.</li>
    </ol>

    <figure>
    <img src="./illustrations/datalesson_fig6.png" alt="figure6datalesson" class="center">
    <figcaption> For Kruskal-Wallis, an independent (free_play) and a dependent variable (perc_usefulness) are selected.</figcaption>
    </figure>

    <p>Click OK to display the statistical analysis results to new viewing window. Your results screen should look like Figure 7. Because statistical analysis output contains many numerical components (some of which must be cited in formal reports), it is a good idea to save it to a format independent of the PASW software. To save the results, go to <b>File > Export</b>. Change Document Type to the desired output format (Word, Excel, PDF, etc.), then click OK.</p>

    <h2>Test your knowledge</h2>

    <p>Now that you've finished the lesson, please complete a short quiz to test your understanding of formatting research datasets for comprehensive statistical analysis.</p>

</body>
</html>