SRM-Textbook/02-RQs.Rmd at main · PeterKDunn/SRM-Textbook · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# (PART) Asking research questions {-}


# Research questions {#RQs}


<!-- Introductions; easier to separate by format -->
```{r, child = if (knitr::is_html_output()) {'./introductions/02-RQs-HTML.Rmd'} else { './introductions/02-RQs-LaTeX.Rmd'}}
```


<!-- Define colours as appropriate -->
```{r, child = if (knitr::is_html_output()) {'./children/coloursHTML.Rmd'} else {'./children/coloursLaTeX.Rmd'}}
```


## Introduction {#Chap2-Intro}

The research question (RQ) directs all other components of the research.
Since quantitative research summarises and analyses data using numerical methods (like averages or percentages), the RQ must be *written* carefully so it can be *answered* effectively.
Four different types of RQs are studied:

* descriptive RQs (Sect.\ \@ref(RQsDescriptive)).
* relational RQs (Sect.\ \@ref(RQsRelational)).
* repeated-measures RQs (Sect.\ \@ref(RQsRepeatedMeasures)).
* correlational RQs (Sect.\ \@ref(RQsCorrelational)).


Since the RQ directs all other components of the research, writing RQs should be the first step of any research study.
Specifically, RQs should be asked before data are collected.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
RQs should be written *before data are collected*.
:::


## Descriptive RQs {#RQsDescriptive}
\index{Research question!descriptive|(}

All RQs identify a large group of interest to be studied (called a *population*),\index{Population} and study something *about* that population (called the *outcome*).\index{Outcome}

The population is any broad group of interest; for example:

* all German males between\ $18$ and\ $35$ years of age.
* all bamboo flooring materials manufactured in China.
* all elderly females with glaucoma in Canada.
* all *Pinguicula grandiflora* growing in Europe.


::: {.definition #Population name="Population"}
\index{Population}\index{Individuals}
A *population* is a group of *individuals* from which the total set of observations of interest *could* be made, and to which the results will generalise.
:::


Populations comprise many *individuals* (or *cases*).\index{Individuals}\index{Cases}
If the individuals are people, individuals may also be called *subjects*.\index{Subjects}


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The words *population*, *individuals* and *cases* do *not* just refer to people, though they may be commonly used that way in general conversation.
:::


Data are rarely taken from all the individuals in the population: *all* individuals are rarely accessible in practice.
For example, testing a new drug cannot possibly study *all* people who might use the drug (some may not even be born yet).
In contrast, a *sample* is a *subset* of the population from which data are obtained (Chap.\ \@ref(Sampling)).
Countless samples are possible from any given population, but only one is studied.


<div style="float:right; width: 75px; padding:10px">
<img src="Pics/iconmonstr-sitemap-20-240.png" width="50px"/>
</div>


::: {.definition #Sample name="Sample"}
\index{Sample}
A *sample* is a subset of individuals from the population.
The data are collected from the sample.
:::


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The *population* in an RQ is *not* just those studied; it is the whole group to which results could generalise.
:::


::: {.example #Samples name="Samples"}
A study of American college women [@data:woolf:ironstatus] compared iron status in highly-active and sedentary women.

The study compared $28$ active and\ $28$ sedentary American college women, from which data were collected.
The *population* was *all* active and sedentary American college women.
The group of $56$\ subjects was the *sample*.
:::


Descriptive RQs study something *about* the identified population, called the *outcome*.
Because the RQ concerns a large group (the population), the outcome numerically describes a *group* of individuals (not single individuals).
The outcome is, for example, an *average*\index{Averages} or *proportion*\index{Proportions} summarising a group of individuals.


<div style="float:right; width: 75px; padding:10px">
<img src="Pics/iconmonstr-process-1-240.png" width="50px"/>
</div>


::: {.definition #Outcome name="Outcome"}
\index{Outcome}
The *outcome* in an RQ is the result, output, consequence or effect of interest in a study, numerically summarised for a group of individuals.
:::


The outcome of interest in a population may be (for example) the

* *average* amount of wear after\ $1\,000\hs$ of use.
* *proportion* of people whose pupils dilate.
* *average* weight loss after three weeks on a diet.
* *percentage* of seedlings that die.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
\index{Outcome}\index{Population}\index{Individuals}
The *outcome* in an RQ summarises a *population*; it does not describe the *individuals* in the population.
:::


Descriptive RQs can now be introduced.


::: {.definition #DescriptiveRQ  name="Descriptive RQ"}
*Descriptive RQs* have a population and an outcome.
:::


Some RQs ask about the *value* of some population quantity (such as: what is the average internal body temperature?); these are called *estimation* RQs.
Some RQs require *making a decision* about the population (such as: is the average internal body temperature the same for females and males?); these are called *decision-making* RQs.
Descriptive RQs have one of these forms, depending on what information is sought (Sect.\ \@ref(TwoPurposesOfRQs)):

* *estimation* RQs: Among {*the population*}, what is {*the outcome*}?\index{Research question!estimation}
* *decision-making* RQs: Among {*the population*}, is {*the outcome*} equal to {*a given value*}?\index{Research question!decision-making}


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
These templates are *not* 'recipes', but guidelines.
:::


Answering *estimation* descriptive RQs is studied in Chaps.\ \@ref(CIOneProportion) and\ \@ref(OneMeanConfInterval).
Answering *decision-making* descriptive RQs is studied in Chaps.\ \@ref(TestOneProportion) and\ \@ref(TestOneMean).


::: {.example #DescriptiveRQBodyTemp name="Descriptive RQ"}
@data:mackowiak:bodytemp studied men and women aged\ $18$ to\ $40$; this is the *population*.
The *outcome* of interest in this population is the *average body temperature*.
The sample comprised\ $148$ 'healthy men and women' aged\ $18$ to\ $40$.
One descriptive RQ was:

> What is the average body temperature?


This is an *estimation* RQ.\spacex
They also studied a *decision-making* descriptive RQ (where\ $98.6$^o^F (or\ $37.0$^o^C) is a commonly-accepted value for the internal body temperature):

> Is the average body temperature really\ $98.6$^o^F ($37.0$^o^C)?
:::

\index{Research question!descriptive|)}


## Relational RQs {#RQsRelational}
\index{Research question!relational|(}

Studying relationships usually is more interesting than simply describing a population.
*Relational RQs* compare the outcome for groups of different individuals in the population, or compare two different sub-populations.
These comparisons are called *between-individuals* comparisons,\index{Comparison!between individuals} as they compare the outcome *between* (or among) groups of *different* individuals.
Examples include:

* comparing the average amount of wear in floorboards *between* two different groups: standard wooden floorboards, and bamboo floorboards.
* comparing the average heart rate *across* three groups of people: those not receiving the drug, those receiving a weekly dose, and those receiving a daily dose of the drug.


::: {.definition #ComparisonBetween name="Comparison (between individuals)"}
The *between-individuals comparison* in an RQ identifies the small number of groups of different individuals for which the outcome is compared.
:::


:::{.example #BetweenPossums name="Between-individuals comparison"}
@data:Williams2022:Possums compared the average weight of female and male Leadbeater's possums.
'Sex of the possum' is the *between-individuals* comparison; average weight is the outcome.
:::


Relational RQs can now be introduced.

::: {.definition #RelationalRQ name="Relational RQ"}
*Relational RQs* have a population, outcome, and a *between*-individuals comparison.
:::


Relational RQs have one of these forms, depending on what information is sought:

* *estimation* RQ: Among {*the population*}, what is the difference in {*the outcome*} for {*the groups being compared*}?
* *decision-making* RQ: Among {*the population*}, is {*the outcome*} the same for {*the groups being compared*}?


::: {.example #RelationalRQ name="Relational RQ"}
Consider this RQ (based on @estevez2019influence):

> Among Cubans between\ $13$ and\ $20$ years of age, is the average heart rate the same for females and males?

The *population* is 'Cubans\ $13$ and\ $20$ years of age', the *outcome* is '*average* heart rate', and the *between-individuals comparison* is between two separate groups: 'between females and males'.
This is a *relational RQ*.

This is a *decision-making RQ*,\index{Research question!decision-making} since it asks if the average heart rate is the same for females and males.
An *estimation*-type relational RQ would ask about the *size* of difference in the average heart rate between females and males.
:::
\index{Research question!relational|)}


## Repeated-measures RQs {#RQsRepeatedMeasures}
\index{Research question!repeated-measures|(}

Rather than comparing the outcome for groups of different individuals, *repeated-measures RQs* compare the outcome multiple times within the *same* individuals.

These comparisons are called *within-individuals* comparisons,\index{Comparison!within individuals} as they compare the outcome *within the same individuals*, not across groups of *different* individuals.
The multiple measurements may be different points in time (e.g., the height of the same trees at one, two and five years after planting), but do not have to be time points.

<!-- The comparisons may be made when researchers manipulate the individuals between measurements or observations (e.g., recording pulse rate *before* and *after* being given caffeine to drink), or with researchers not manipulating the individuals (e.g., recording the activity of lizards at 4a.m. and 4p.m.).  -->

\clearpage
Examples include:

* comparing the average strength of hind legs of horses to the forelegs of the same horses.
* comparing the average thickness of the cornea in left eyes and right eyes of the same individuals.
* comparing the average amount of wear in many individual floorboards after one, five and ten years of use.


::: {.definition #ComparisonWithin name="Within-individuals comparison"}
The *within-individuals comparison* in the RQ identifies the small number of different, distinct situations for which the outcome is compared for each individual.
:::


:::{.example #WithinBetweenComparison name="Between- and within-individual comparisons"}
Consider comparing the strength of the dominant and non-dominant legs of professional football players.

A *between*-individuals comparison would compare the average strengths of the dominant and non-dominant legs *between different* groups of footballers: one group would have their dominant-leg strength measured, and the other would have their non-dominant-leg strength measured.
This is a *between*-individuals comparison.

In contrast, the strengths of the dominant and non-dominant legs could be recorded on the *same* individuals.
This study examines *within*-individuals changes: the average differences between the strengths of the dominant and non-dominant legs *within* the same individuals.
In this study, *no between-individuals comparison* exists: different groups are not being compared.
:::


Studies may use *both* within- and between-individuals comparisons (see Sect.\ \@ref(ChamomileTea-TwoMeans)).
For instance, a study may examine the *change* in individuals' heart rate (the *within*-individuals comparison), for two drugs given to different groups (the *between*-groups comparison).

Repeated-measures RQs can now be introduced.


::: {.definition #RepeatedMeasuresRQ name="Repeated-measures RQ"}
*Repeated-measures RQs* have a population, outcome and a *within*-individuals comparison.
:::


Repeated-measures RQs have one of these forms, depending on what information is sought:

* *estimation* RQ: Among {*the population*}, what is the change in {*the outcome*} for {*the alternatives being compared within individuals*}?
* *decision-making* RQ: Among {*the population*}, is {*there a change in the outcome*} for {*the alternatives being compared within individuals*}?


::: {.example #WithinRelationalRQ name="Repeated-measure RQ"}
@rowland2017comparing compared the temperature in the *same* tree hollows in summer and winter:

> For tree hollows in the Strathbogie Ranges, Australia, what is the average temperature difference between summer and winter?

The comparison is *within individuals*, as the temperature is measured for the *same* tree hollows at the two times.
This is a repeated-measures, estimation-type RQ.
:::


Repeated-measures RQs with only two within-individual comparisons are often called *paired*.\index{Data!paired}\index{Study types!paired}


:::{.example #RepeatedMeasuresPaired name="Paired repeated-measures study"}
@levitsky2004freshman compared the weights of the same university students at the beginning of university, and then after $12$\ weeks.
The comparison is *within* individuals, and the study is a *repeated-measures* study.
Since each student has a *pair* of weight measurements, this is a *paired* study.
:::
\index{Research question!repeated-measures|)}


## Variables {#Variables}
\index{Variables}

RQs are about *populations*.
However, the data to answer an RQ come from *individuals* in that population.
The aspects or characteristics that can *vary* called *variables*.


::: {.definition #Variable name="Variable"}
A *variable* is a single aspect or characteristic, associated with the individuals, whose values can vary.
:::

::: {.example #Variables2 name="Variables"}
Examples of variables include: the duration of cold symptoms; sex; tree girth; response to a survey question (Yes, Maybe, No); city of birth; hair colour.
:::


Some variables change from one individual to another individual, such as sex and height.
These are called *between*-individuals variables.
In repeated-measures studies, some variables of interest change over repeated measurements from the same individuals; these are called *within*-individuals variables.


::: {.definition #BetweenWithinVariable name="Between- and within-individuals variables"}
*Between*-individuals variables vary from one individual to another individual.\index{Variables!between-individuals}
*Within*-individuals variables vary from one recording or measurement to another *within* the same individuals.\index{Variables!within-individuals}
:::


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
A between-individuals variable is a single aspect that can vary from *individual to individual*.
While *your* city of birth does not change, 'city of birth' is a variable because it varies from *individual* to *individual*.
:::


::: {.example #WithinIndividualsVariables name="Within-individuals variables"}
@rowland2017comparing compared the temperature in the *same* tree hollows in summer and winter (Example\ \@ref(exm:WithinRelationalRQ)).
The comparison is *within individuals*: the temperature is measured for the *same* tree hollows (the *individuals*) at two different times.

'Season' is a within-individuals variable, as each tree hollow is studied for two different seasons.
'Temperature' is also a within-individuals variable, as it is measured twice for each tree hollow.
:::


:::{.example #BetweenPossums2 name="Between-individuals comparison"}
@data:Williams2022:Possums compared the average weight of female and male Leadbeater's possums (Example\ \@ref(exm:BetweenPossums)).

'Sex of the possum' is a *between-individuals* variable; it can vary from possum to possum.
'Weight' is also a *between-individuals* variable; it can vary from possum to possum.
:::


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-andrea-piacquadio-3807629.jpg" width="200px"/>
</div>


::: {.example #Variables  name="Variables"}
'Duration of cold symptoms' is a between-individuals *variable*: its value can vary from individual to individual.
The '*average* duration of cold symptoms' is the *outcome*, a numerical summary of many individuals' cold durations.
:::


While many variables can be recorded, two essential variables are (Table\ \@ref(tab:RQsPopulationIndividuals)):

* the *response variable*, which records information to determine the outcome.\index{Response variable}
* the *explanatory variable*, which records information to determine the comparison.\index{Explanatory variable}

Usually, one variable can be considered as perhaps influencing the value of the other variable.
This variable is called the *explanatory variable* (which may *explain* changes in the other variable).
The other is the *response variable* (whose values *respond* to changes in the explanatory variable).
To be able to influence the response variable, the explanatory variable must occur before (or at the same time) as the response variable.


```{r RQsPopulationIndividuals}
if( knitr::is_latex_output() ) {

  PopInd <- array( dim = c(2, 3) )
  colnames(PopInd) <- c("Population",
                        "",
                        "Individuals")
  PopInd[1, ] <- c("Outcome",
                   "$\\rightarrow$",
                   "Response variable")
  PopInd[2, ] <- c("Comparison",
                   "$\\rightarrow$",
                   "Explanatory variable")

  kable(PopInd,
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        escape = FALSE, # For latex to work in \rightarrow
        linesep  =  c("", "", "", "\\addlinespace", "", "", ""), # Otherwise adds a space after five lines...
        caption = "The relationship between the population and the individuals.",
        align = c("r", "c", "l"))   %>%
    kable_styling(full_width = FALSE) %>%
    kable_styling(font_size = 8) %>%
    row_spec(0, bold = TRUE) # Columns headings in bold
}

if( knitr::is_html_output() ) {

  PopInd <- array( dim = c(4, 3) )

  PopInd[1, ] <- c("![](./Pics/iconmonstr-friend-5-240.png){#id .class height=100px}",
                   "",
                   "![](./Pics/iconmonstr-generation-16-240.png){#id .class height=100px}")
  PopInd[2, ] <- c("Population",
                   "",
                   "Individuals")

  PopInd[3, ] <- c("Outcome:",
                   "$\\rightarrow$",
                   "Response variable")
  PopInd[4, ] <- c("Comparison:",
                   "$\\rightarrow$",
                   "Explanatory variable")
  PopInd[, 2] <- "$\\rightarrow$"

  out <- kable(PopInd,
               format = "html",
               align = c("r", "c", "l"),
               longtable = FALSE,
               caption = "The relationship between the population and the individuals.",
               booktabs = TRUE)

    row_spec(out,
             2,
             bold = TRUE) # Columns headings in bold
}
```


The value of the *response* variable may change in *response* to the value of the explanatory variable.
The value of the *explanatory* variable may *explain* changes in the value of the response variable.


::: {.definition #ExplanatoryVariable name="Explanatory variable"}
An *explanatory variable* may (partially) explain or be associated with changes in another variable of interest (the response variable).
:::


::: {.definition #ResponseVariable name="Response variable"}
A *response variable* records the result, output, consequence or effect of interest from changes in another variable (the explanatory variable).
:::


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
The *response variable* is sometimes called the *dependent variable*,\index{Dependent variable} and the *explanatory variable* is sometimes called the *independent variable*.\index{Independent variable}
We avoid these terms, since the words 'dependent' and 'independent' have many meanings in research.
:::


The RQ cannot be answered without data for the response and explanatory variables.
The *outcome* is a numerical summary of the values of the response variable (Table\ \@ref(tab:RQsPopulationIndividualsExamplesOutcome)) recorded from many individuals.
The values of the explanatory variable distinguish between the values of the *comparison* for the individuals (Tables\ \@ref(tab:RQsPopulationIndividualsExamplesComparison) and\ \@ref(tab:RQsPopulationIndividualsExamplesComparisonWithin)) being made.\index{POCI}


```{r RQsPopulationIndividualsExamplesOutcome}
PopInd2 <- array( dim = c(7, 3) )

if( knitr::is_latex_output() ) {
  PopInd2[1, ] <- c("![](./Pics/iconmonstr-friend-5-240.png){#id .class height=100px}",
                    "",
                    "![](./Pics/iconmonstr-generation-16-240.png){#id .class height=100px}")
  PopInd2[2, ] <- c("Outcome describing the population",
                      "",
                      "Response variable in individuals")

  PopInd2[3, ] <- c("\\emph{Average} diastolic blood pressure",
                    "",
                    "Diastolic blood pressure of \\emph{individuals}")
  PopInd2[4, ] <- c("\\emph{Percentage} of seedlings that sprout",
                    "",
                    "Whether an \\emph{individual} seedling sprouts")
  PopInd2[5, ] <- c("\\emph{Proportion} owning iPad",
                    "",
                    "Whether an \\emph{individual} owns an iPad")
  PopInd2[6, ] <- c("\\emph{Average} cold duration",
                    "",
                    "Cold duration for \\emph{individuals}")
  PopInd2[7, ] <- c("\\emph{Percentage} of concrete cylinders having fissures",
                    "",
                    "Whether an \\emph{individual} cylinder has fissures")

  PopInd2[, 2] <- "$\\rightarrow$"

  kable(PopInd2[3:6, ],
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        col.names = c("Outcome describing the population",
                      "",
                      "Response variable in individuals"),
        escape = FALSE, # For latex to work in \rightarrow
        #linesep = c( "\\addlinespace"), # Add a bit of space between all rows
        caption = "Outcomes and corresponding response variable.",
        align = c("r", "c", "l")
  )   %>%
    kable_styling(full_width = FALSE,
                  font_size = 8) %>%
    row_spec(0, bold = TRUE) #%>% # Columns headings in bold
#    column_spec(column = 1, width = "46mm") %>%
#    column_spec(column = 3, width = "60mm")
}

if( knitr::is_html_output() ) {
  PopInd2[1, ] <- c("![](./Pics/iconmonstr-friend-5-240.png){#id .class height=100px}",
                    "",
                    "![](./Pics/iconmonstr-generation-16-240.png){#id .class height=100px}")
  PopInd2[2, ] <- c("Outcome describing the population",
                    "",
                    "Response variable in individuals")

  PopInd2[3, ] <- c("*Average* increase in diastolic blood pressure, from before to after exercise",
                    "",
                    "Increase in diastolic blood pressure of *individuals*, from before to after exercise")
  PopInd2[4, ] <- c("*Percentage* of seedlings that sprout",
                    "",
                    "Whether an *individual* seedling sprouts")
  PopInd2[5, ] <- c("*Proportion* owning iPad",
                    "",
                    "Whether an *individual* owns an iPad")
  PopInd2[6, ] <- c("*Average* cold duration",
                    "",
                    "Cold duration for *individuals*")
  PopInd2[7, ] <- c("*Percentage* of concrete cylinders having fissures",
                    "",
                    "Whether an *individual* cylinder has fissures")

  PopInd2[, 2] <- "$\\rightarrow$"

  out <- kable(PopInd2,
               format = "html",
               align = c("r", "c", "l"),
               longtable = FALSE,
               caption = "Examples of the outcome and the corresponding response variable.",
               booktabs = TRUE)

    row_spec(out, 2, bold = TRUE) # Columns headings in bold
}
```


(ref:CompareBetween) *Between-individuals* comparisons and the corresponding *between-individuals* explanatory variable.

```{r RQsPopulationIndividualsExamplesComparison}
PopInd3 <- array( dim = c(3, 3) )
colnames(PopInd3) <- c("Comparison being made",
                       "",
                       "Explanatory variable in individuals")
PopInd3[1, ] <- c("Between jarrah, beech, bamboo boards",
                  "",
                  "Type of floorboard in \\emph{different} individual homes")
PopInd3[2, ] <- c("Between $3\\kgs$/ha, $4\\kgs$/ha fertiliser rates",
                  "",
                  "Application rate in \\emph{different} individual paddocks")
PopInd3[3, ] <- c("Between people in $20$s, $30$s and\ $40$s",
                  "",
                  "Age group for each \\emph{different} individual person")

PopInd3[, 2] <- "$\\rightarrow$"

if( knitr::is_latex_output() ) {
  kable(PopInd3,
format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        escape = FALSE, # For latex to work in \rightarrow
        #linesep  =  c( "\\addlinespace"), # Add a bit of space between all rows
        caption = "(ref:CompareBetween)",
        align = c("r", "c", "l"))   %>%
    kable_styling(full_width = FALSE,
                  font_size = 8) %>%
    row_spec(0, bold = TRUE) # Columns headings in bold
}

if( knitr::is_html_output() ) {
  kable(PopInd3,
        format = "html",
        align = c("r", "c", "l"),
        longtable = FALSE,
        caption = "(ref:CompareBetween)",
        booktabs = TRUE)
}
```


(ref:CompareWithin) *Within-individuals* comparison and corresponding *within-individuals* explanatory variable.


```{r RQsPopulationIndividualsExamplesComparisonWithin}
PopInd3 <- array( dim = c(3, 3) )
colnames(PopInd3) <- c("Comparison being made",
                       "",
                       "Explanatory variable in individuals")
PopInd3[1, ] <- c("Before and after receiving a drug",
                  "",
                  "When measured on \\emph{each} individual person")
PopInd3[2, ] <- c("Between left and right arms",
                  "",
                  "Which arm in \\emph{each} individual person is used")
PopInd3[3, ] <- c("Between forelegs and hind legs",
                  "",
                  "Which legs are measured on \\emph{each} individual horse")

PopInd3[, 2] <- "$\\rightarrow$"

if( knitr::is_latex_output() ) {
  kable(PopInd3,
        format = "latex",
        longtable = FALSE,
        booktabs = TRUE,
        escape = FALSE, # For latex to work in \rightarrow
        #linesep  =  c( "\\addlinespace"), # Add a bit of space between all rows
        caption = "(ref:CompareWithin)",
        align = c("r", "c", "l"))   %>%
    kable_styling(full_width = FALSE,
                  font_size = 8) %>%
    row_spec(0, bold = TRUE) # Columns headings in bold
}

if( knitr::is_html_output() ) {
  kable(PopInd3,
        format = "html",
        align = c("r", "c", "l"),
        longtable = FALSE,
        caption = "(ref:CompareWithin)",
        booktabs = TRUE)
}
```


<!-- ::: {.example #Variables2 name="Variables"} -->
<!-- For the final RQ for the echinacea study (Sect.\ \@ref(Writing-RQs)), 'the duration of cold symptoms' is the *response variable*, and 'whether echinacea is taken or not' is the *explanatory variable*. -->
<!-- The type of medication is taken *before* the cold symptoms disappear, and may even partially explain the duration of the cold symptoms. -->
<!-- ::: -->


`r if (knitr::is_latex_output()) '<!--'`
<iframe src='https://www.ferendum.com/en/embeded.php?pregunta_ID=1249763&sec_digit=365747699&embeded_digit=874385320' style='width:100%; height:550px; overflow: auto; background: #badaff33;' frameBorder='0'></iframe><BR>
<A href='https://www.ferendum.com' target='_blank'>Free Online Poll Maker</A>


`r webexercises::hide()`
The *Population* is 'carrots grown in Buderim' 8 weeks after planting.
From these carrots, we *need* to collect *whether Thrive fertiliser was applied* and the *weight of the carrots $8$\ weeks after planting*.

The *response* variable is 'the weight of each individual carrot\ $8$\ weeks after planting', and the *explanatory* variable is 'whether Thrive was used on each carrot'.

('The *number* of carrots planted' is not even a variable: it is not information recorded about the individuals, but a summary of information.)
`r webexercises::unhide()`
`r if (knitr::is_latex_output()) '-->'`


::: {.example #POCIplaygrounds name="Variables"}
Consider a study of the ground surface temperature of public playgrounds in Boston in summer.

The *population* comprises all public playgrounds in Boston; each public playground is an *individual*.
The *outcome* is the *average* ground surface temperature in summer over many playgrounds; the *response variable* is the ground surface temperature for *individual* ground surfaces in summer.

The between-individuals *comparison* is between the four types of ground surfaces (rubber, soil, sand, mulch).
The *explanatory variable* is the type of surface for individual playgrounds.
:::


## Correlational RQs {#RQsCorrelational}
\index{Research question!correlational|(}

*Correlational RQs* are not concerned with summarising outcomes in comparison *groups*.
Instead, correlational RQs explore relationships between two variables measured or observed on or about the individuals.


::: {.definition #CorrelationalRQ name="Correlational RQ"}
*Correlational RQs* explore the relationship between two variables.
:::


Correlational RQs have one of these forms, depending on what information is sought:

* *estimation* RQ: Among {*the population*}, how strong is the relationship between {*the response variable*} and {*the explanatory variable*}?
* *decision-making* RQ: Among {*the population*}, is {*the response variable*} related to {*the explanatory variable*}?

Examples include studying the relationship between:

* the height of plants (response variable) and the number of hours of sunlight per day (explanatory variable).
* heart rate (response variable) and the number of grams of caffeine consumed that day (explanatory variable).

Usually, one variable can be considered as the explanatory variable, and the other as the response variable (Sect.\ \@ref(Variables)).
To be able to influence the response variable, the explanatory variable must occur before (or at the same time) as the response variable.
Explanatory and response variables may be either within- or between-individuals variables.


:::{.example #CorrelationalRunners name="Correlational RQ"}
Consider studying marathon runners.
An RQ exploring the relationship between the individuals' water intake on the day before the race and the individuals' race times would be a correlational RQ.\spacex
The water intake on the day before the race *may* influence the race time.

The water intake on the day before the race is the explanatory variable, and the race time is the response variable.
:::


:::{.example #CorrelationalPine name="Correlational RQ"}
The Wollemi pine was discovered by science in\ 1994.
@offord2023home studied the growth of these rare plants.

One correlational RQ concerned the relationship between the diameter of trees at breast height (DBH; response variable), and the pH of the soil (explanatory variable).
The two variables are the DBH and pH, both recorded for many trees.

Also studied was the relationship between the DBH for each tree at various times after the planting date (a repeated-measure RQ).
Each tree has the DBH measured over time, for many time points.
Time is the *within*-individuals comparison.
:::


In some situations, the variables are neither response nor explanatory variable; the interest is just in the association between the two variables.


::: {.example #ResearchDesignFishSize name="Correlation RQ"}
@gonzalez2024length recorded the length and weight of $14\,040$\ fish for\ $39$ demersal fish species.
The study has two variables (fish length; fish weight), but identifying a response variable and explanatory variable is meaningless.
The estimation-type correlational RQ is:

> Among demersal fish, how strong is the relationship between length and weight?
:::
\index{Research question!correlational|)}


## Interventions {#Intervention}
\index{Intervention}

Sometimes, the explanatory variable naturally occurs without manipulation by the researchers (e.g., the height of people; the sex of oxen; the pH of forest soil).
Sometimes, however, the explanatory variable is manipulated by researchers (e.g., the dose of fertiliser applied; the dose of drug given); this is called an *intervention*.


::: {.definition #Intervention name="Intervention"}
An *intervention* is present when *researchers* can manipulate (or impose) the values of the *explanatory variable* on the individuals to determine the impact on the response variable.
:::


When an intervention is present, the values of the explanatory variable are *manipulated* by the researchers, and are called *treatments*.
When an intervention is *not* present, the values of the explanatory variable are *not* manipulated by the researchers, and are called *conditions*.
The *analysis* is the same whether an intervention is used or not, but the *interpretation* of the results depend on whether an intervention is used (Sect.\ \@ref(CompareStudyTypes)).


::: {.definition #Treatments name="Treatments"}
\index{Treatments}
The *treatments* are the values of the explanatory variable that the researchers can manipulate and impose upon the individuals.
:::


::: {.definition #Conditions name="Condition"}
\index{Conditions}
The *conditions* are the values of the explanatory variable that those in the study have or experience, but are not manipulated or imposed by the researchers.
:::


An intervention is present when the researchers:

* explicitly give a dose of a new drug to patients.
* explicitly apply wear-testing loads to two different flooring materials.
* explicitly expose people to different stimuli.
* explicitly apply different doses of fertiliser.


:::{.example #InterventionHimalaya name="Intervention"}
@data:Bird2008:wholegrain *supplied* one group of participants with a diet using refined flour, and *supplied* another group of participants with a diet using a new flour variety.
'Type of diet' is the (between-individuals) explanatory variable.
Since the researchers manipulate which subjects ate which flour, this study has an intervention.
'Type of diet' is the treatment.
:::


::: {.example #Interventions name="No intervention"}
To compare the average blood pressure in female and male Scots, blood pressure was measured using a blood pressure machine (a sphygmomanometer).
The researchers interact with the participants to measure blood pressure, but there is *no* intervention.
Using the sphygmomanometer is just a way to measure blood pressure, to *obtain* the data.

The *comparison* is between females and males (the conditions), which cannot be manipulated or imposed on the individuals by the researchers; *there is no intervention*.
:::


Often, one of the comparison groups is the *control group*.
The *control group* is a comparison group *not* receiving the treatment being studied, or *not* having the condition being studied, but *as similar as possible* to the other individuals in all other ways.
The control group is like a benchmark for detecting changes in the outcome due to the treatment or condition of interest (Sect.\ \@ref(PlaceboEffect)).
Sometimes the control group is given a *placebo*: a non-effective treatment that appears to be the real treatment.


::: {.definition #Control name="Control"}
A *control* is an individual without the treatment or condition of interest, but as similar as possible in *every other way* to other individuals.
A *control group* is a group of controls.
:::


::: {.definition #Placebo name="Placebo"}
A *placebo* is a treatment with no intended effect or active ingredient, but appears to be the real treatment.
:::


::: {.example #ControlGroup name="Control group"}
To test the effectiveness of a new medication, patients report to a doctor to receive injections of the new drug.
Patients assigned to the *control group* do not receive the drug.
The controls should also report to a doctor and receive an injection (like those receiving the drug); the injection, however, would contain no active ingredients (a placebo).
:::


Together, the **P**opulation, **O**utcome, **C**omparison and **I**ntervention form the POCI acronym\index{POCI} (sometimes written as PICO) to aid remembering the elements of RQs.\spacex
The POCI acronym is not helpful for correlational RQs.


::: {.example #POCIWomen name="POCI"}
@data:woolf:ironstatus measured iron status in highly-active and sedentary American college women.

The *outcome* is the 'average iron status'.
The between-individuals *comparison* is between highly-active and sedentary women.
For this comparison to be an intervention, the *researchers* would need to tell each individual woman to be highly active or sedentary.
This seems unlikely, so the study does not have an intervention.
:::


## Estimation and decision-making RQs {#TwoPurposesOfRQs}
\index{Research question!estimation}\index{Research question!decision-making}

As noted earlier, RQs can be written with one of two purposes.
*Estimation RQs* ask how precisely an unknown *value* in the *population* is estimated by the *sample*.
Estimation RQs are answered using *confidence intervals*, which are discussed in Chaps.\ \@ref(CIOneProportion) to\ \@ref(OneMeanConfInterval), Chaps.\ \@ref(AnalysisPaired) to\ \@ref(AnalysisOddsRatio), plus Sects.\ \@ref(CorrelationTesting) and\ \@ref(RegressionHT).

*Decision-making RQs* require a decision to be made about the unknown values in the  population.
They are answered using *hypothesis tests*, and discussed in Chaps.\ \@ref(TestOneProportion) to\ \@ref(TestOneMean), Chaps.\ \@ref(AnalysisPaired) to\ \@ref(AnalysisOddsRatio), plus Sects.\ \@ref(CorrelationTesting) and\ \@ref(RegressionHT).


::: { .example #TypesOfRQS name="Decision-making RQs"}
@data:Thane2004:ZincVitA studied 'British young people aged\ $4$--$18$' and asked numerous RQs.
One *decision-making* relational RQ was:

> In British young people aged\ $4$--$18$, is the average daily zinc intake the same for boys and girls?
:::


Decision-making RQ have two possible answers.\index{Decision making}
For the example above, the average zinc intake either *is* the same for boys and girls, or *is not* the same for boys and girls, in the *population* (Fig.\ \@ref(fig:ZincRQ)).
These two options are *hypotheses*: potential answers to the RQ.\spacex\index{Hypotheses}
However, answers are rarely clear in practice, since only one of the countless possible samples from the population is studied.
Instead, researchers decide *how strongly* the sample evidence supports a particular hypothesis about the *population*.\index{Hypotheses}

Evidence may *support* or *contradict* a hypothesis; evidence rarely *proves* a hypothesis (at least, without any other support, such as theoretical support).
Ultimately, after collecting data from a *sample*, a decision must be made about which explanation about the *population* is more consistent with the data collected.


```{r ZincRQ, fig.align="center", fig.cap="Two possible answers to the RQ (two hypotheses) about zinc intake in children.", out.width='100%', fig.width=8.5, fig.height=4}
par( mar = c(0.05, 0.15, 0.75, 0.15))

openplotmat()

pos <- array(NA,
             dim = c(4, 2))
pos[1, ] <- c(0.15, 0.5) # RQ
pos[2, ] <- c(0.55, 0.70) # Yes
pos[3, ] <- c(0.55, 0.30)   # No
pos[4, ] <- c(0.90, 0.50)   # Data

straightarrow(from = pos[1,], # From RQ...
              to = pos[2,],   # ... to YES
              lty = 2,
              lcol = "grey")
straightarrow(from = pos[1,], # From RQ...
              to = pos[3,],   # ... to NO
              lty = 2,
              lcol = "grey")

#straightarrow(from = pos[4,], # From Data...
#              to = pos[2,],   # ... to YES
#              lty = 2,
#              lcol = "grey")
#straightarrow(from = pos[4,], # From DATA...
#              to = pos[3,],   # ... to NO
#              lty = 2,
#              lcol = "grey")
bentarrow(from = pos[4,], # From Data...
          to = c( 0.75, pos[2, 2]),   # ... to YES
          lty = 2,
          path = "V",
          arr.pos = 1,
          lcol = "grey")
bentarrow(from = pos[4,], # From DATA...
          to = c( 0.75, pos[3, 2]),   # ... to NO
          lty = 2,
          path = "V",
          arr.pos = 1,
          lcol = "grey")

textplain( c( 0.75,
              0.5 ),
           cex = 4.5,
           lab = "?",
           col = "grey")


textrect( pos[1,],
          lab = "In British young people aged\n4--18, is the average daily\nzinc intake the same\nfor boys and girls?",
          radx = 0.15,
          rady = 0.15,
          shadow.size = 0,
          box.col = ExplanatoryColour ,
          lcol = ExplanatoryColour )

textrect( pos[2,],
          lab = expression( atop(bold(YES)*":"~the~average~zinc~intake,
                                bold(is)~the~same~"for"~boys~and~girls)),
          radx = 0.16,
          rady = 0.1,
          shadow.size = 0,
          box.col = IndividualColour,
          lcol = IndividualColour)
textrect( pos[3,],
          lab = expression( atop(bold(NO)*":"~the~average~zinc~intake,
                                bold(is~not)~the~same~"for"~boys~and~girls)),
          radx = 0.16,
          rady = 0.1,
          shadow.size = 0,
          box.col = IndividualColour,
          lcol = IndividualColour)

textrect( pos[4,],
          lab = "Which does\ndata support?",
          radx = 0.10,
          rady = 0.1,
          shadow.size = 0,
          box.col = GroupColour,
          lcol = GroupColour)

text(x = pos[1, 1],
     y = 0.90,
     font = 2, # Bold
     labels = "Research question")
text(x = pos[2, 1],
     y = 0.90,
     font = 2, # Bold
     labels = "Hypotheses")
text(x = pos[4, 1],
     y = 0.90,
     font = 2, # Bold
     labels = "Data")
```


Decision-making RQs can be asked in different ways.\index{Research question!one- and two-tailed}
For the zinc-intake study above (Fig.\ \@ref(fig:ZincRQ)), the RQ could ask (about the population):

* is the average zinc intake *the same* for boys and girls?
* is the average zinc intake *different* for boys and girls?
* is the average zinc intake *lower* for boys, compared to girls?
* is the average zinc intake *higher* for boys, compared to girls?

The first two are *two-tailed RQs* (and are essentially asking the same question but in different ways): the average zinc intake could be higher for girls or higher for boys.
We are just interested in whether any difference is present; that is, two options are being considered.
The last two are *one-tailed RQ*, since they ask specifically about a difference in just one direction: boys lower than girls, or boys higher than girls.

Most RQs are two-tailed, unless a good reason exists to ask a one-tailed RQ *before* the data are collected (e.g., a drug has been developed specifically to *reduce* blood pressure).
RQs should be formed before the data are collected.


::: {.importantBox .important data-latex="{iconmonstr-warning-8-240.png}"}
In general, RQs should be two-tailed RQs, unless a justifiable reason exists for asking a one-tailed question *before data are collected*.
:::


## Units of observation and analysis {#UnitsObsAnalysis}
\index{Units of observation}\index{Units of analysis}

*Units of observation* and *units of analysis* are different yet similar concepts that must be distinguished to properly identify a population.


<div style="float:right; width: 222x; border: 1px; padding:10px">
<img src="Illustrations/pexels-moose-photos-1036627.jpg" width="200px"/>
</div>


Consider this descriptive RQ:

> In English $20$-something men, what is the average thickness of head-hair strands?

To answer this question, the thickness of individual hair strands needs to be measured.
The 'things' from or about which measurements are taken are called *units of observation*.


::: {.definition #UnitOfObservation name="Unit of observation"}
The *unit of observation* is the entity that is observed, from or about which measurements are taken and data collected.
:::

For this RQ, the unit of observation is the hair strand: the thickness measurements are taken from the hair strands.
Suppose the thickness of $100$ hair strands is recorded.
These $100$ hair strands could be obtained in many different ways.
Two options are to:

* take $100$ hair strands, all from the same man.
* take one hair strand from each of $100$ different men.