You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inline code renders with a styled pill background (gray-100 light / slate-800 dark) via `.prose :not(pre) > code` in `tailwind.css`. The Tailwind Typography quote pseudo-elements are suppressed.
156
+
157
+
---
158
+
144
159
## Self-Correction
145
160
146
161
-**Stale code map**: If you discover that a file path, export name, or directory described above no longer exists or has moved, update the relevant section of this file immediately before proceeding with the task.
147
162
-**User corrections**: If the user corrects how work should be done in this repo (workflow, tooling preferences, naming conventions, patterns to avoid), add the correction to the **Local norms** section above so future sessions inherit it.
163
+
-**After editing this file**: Run `npm run fix` to apply Prettier formatting before proceeding.
imageAlt: "Papers tell you WHAT was used. Showcase tells you what's AVAILABLE NOW."
@@ -30,12 +30,42 @@ seo:
30
30
<em>The showcase returns at least three different field IDs for BMI. It is difficult to find this information in any publication.</em>
31
31
</figcaption>
32
32
33
-
Consider this post a love letter to the Showcase.
33
+
Consider this post a love letter to the UK Biobank Showcase.
34
34
35
-
When I first started working with UK Biobank, I did what I always did in graduate school. I dug through methods sections and supplemental materials to track down the features used in the study. It worked, but it was slow. Part of the problem is that papers rarely cite [The UK Biobank Showcase](https://biobank.ndph.ox.ac.uk/showcase/) directly. It is so foundational to the field that experienced researchers treat it as assumed knowledge. Coming from a different domain, I had no idea it existed. Once my manager pointed me to the Showcase, I discovered measurements beyond what had been published and no longer had to spend hours on detective work.
35
+
When I first started working with UK Biobank, I fell back on what I knew from graduate school. I dug through methods sections and supplemental materials to track down the features used in the study. It worked, but it was slow. Part of the problem is that papers rarely cite [The UK Biobank Showcase](https://biobank.ndph.ox.ac.uk/showcase/) directly. It is so foundational to the field that experienced researchers treat it as assumed knowledge. Coming from a different domain, I had no idea it existed. Once my manager pointed me to the Showcase, I discovered measurements beyond what had been published and no longer had to spend hours on detective work.
36
36
37
37
To understand why the showcase is so useful, it helps to know the scale of what UK Biobank actually contains. The UK Biobank as a resource spans clinical measurements, survey data, genomics, imaging, proteomics, metabolomics, and more. For a comprehensive overview of all available data types, see [What types of data are available in UK Biobank?](https://community.ukbiobank.ac.uk/hc/en-gb/articles/23472796568861-What-types-of-data-are-available-in-UK-Biobank) on the UK Biobank Community site. The Showcase is your tool for navigating all of it.
38
38
39
+
## Reading Between the Data Fields: Arrays, Instances, and Codes
40
+
41
+
The Showcase doesn't just catalog measurements. It lovingly documents the shape of the data itself. On the main page for each field, the **Data** tab provides key details about coding, instances, and array indices that will save you real headaches downstream.
42
+
43
+
### Coding
44
+
45
+
Coding is how categorical responses are stored. Rather than storing "Yes" or "No", many fields store numeric codes: `1` for "Yes", `0` for "No", and `-3` for "Prefer not to answer". The Showcase provides a data coding table for each such field. More on working with complex codes in the next post.
46
+
47
+
### Instances
48
+
49
+
Instances are timepoints. For example, if a measurement reports using instancing type 2, it will report measurements collected at four visits:
50
+
51
+
- instance `0`: the initial assessment
52
+
53
+
- instance `1`: the first repeat visit
54
+
55
+
- instance `2`: the imaging visit
56
+
57
+
- instance `3`: the first repeat imaging visit
58
+
59
+
For most phenotypes, Instance `0` has the largest sample size. If you need longitudinal data, expect much smaller numbers at later instances.
60
+
61
+
### Arrays
62
+
63
+
Arrays are repeated measurements within a single visit. Diastolic and systolic blood pressure (`4079`, `4080`), for example, are taken twice in one sitting. Each repeat is stored as a separate array index (`0`, `1`). The Showcase tells you how many array values a field has so you can plan how to handle them.
64
+
65
+
The Data tab gives you the architecture of a field: how its values are structured, repeated, and encoded. What it does not tell you is whether those values are reliable, comparable, or the best available option for your phenotype. For that, you need the category-level context, and the Showcase delivers it.
66
+
67
+
## Example: Not All Field IDs Are Created Equal
68
+
39
69
For example, searching "Left Ventricular Ejection Fraction" returns [multiple relevant fields](https://biobank.ndph.ox.ac.uk/showcase/search.cgi?wot=0&srch=Left+Ventricular+Ejection+Fraction&yfirst=2000&ylast=2025) and three with the exact correct description (22420, 24103, and 31060). But which one should you use? This is where the Showcase becomes essential.
40
70
41
71

@@ -52,13 +82,9 @@ Notice the three highlighted fields measure the same thing but belong to differe
52
82
Comparison of three LVEF fields showing participant counts and category quality warnings
53
83
</figcaption>
54
84
55
-
Field 22420 ([category 133](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=133)) has 39,649 measurements but includes a warning: "Quality issues may exist in this data. Researchers may wish to consider using data available in Category 157 or Category 162 as an alternative." Field 24103 ([category 157](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=157)) contains 80,974 measurements and references a published methodology, but warns these fields "should not be considered together" with Category 162 without quality assessment. Field 31060 ([category 162](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=162)) has only 4,868 participants, fewer than the flagged field 22420.
56
-
57
-
For my cardiomyopathy work ([Klasfeld _et al_ 2025](<https://www.cell.com/hgg-advances/fulltext/S2666-2477(25)00063-6>)), I chose field 24103 for its sample size and data quality. However, other practical information provided by the showcase includes the date of which the data was reported (Debut) and the distribution of the data (shown in the data tab in the second section of the Field ID entry).
85
+
Field `22420` ([category 133](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=133)) has 39,649 measurements but includes a warning: "Quality issues may exist in this data. Researchers may wish to consider using data available in Category 157 or Category 162 as an alternative." Field `24103` ([category 157](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=157)) contains 80,974 measurements and references a published methodology, but warns these fields "should not be considered together" with Category 162 without quality assessment. Field `31060` ([category 162](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=162)) has only 4,868 participants, fewer than the flagged field `22420`.
58
86
59
-
**Another critical detail:** Many UK Biobank measurements were collected at multiple timepoints (instances). The Showcase shows you not just the field ID, but which instances have data. For most phenotypes, the initial assessment (Instance 0) has the largest sample size, with subsequent visits having progressively fewer participants. For covariates, I typically use the initial visit value. If you need longitudinal data, expect much smaller sample sizes.
60
-
61
-
Additionally, some fields contain multiple measurements per participant within a single visit (arrays). For example, blood pressure taken three times. The Showcase specifies these array structures so you can plan your handling strategy.
87
+
For my cardiomyopathy work ([Klasfeld _et al_ 2025](<https://www.cell.com/hgg-advances/fulltext/S2666-2477(25)00063-6>)), I chose field `24103` for its sample size and data quality. However, other practical information provided by the showcase includes the date of which the data was reported (Debut) and the distribution of the data (shown in the data tab in the second section of the Field ID entry).
62
88
63
89
## UK Biobank Showcase Tips
64
90
@@ -67,7 +93,7 @@ After working with the Showcase on multiple projects, I've developed a workflow
67
93
**Before selecting a field:**
68
94
69
95
- Always check the category warnings, not just the field description
70
-
- Look at the data distribution tab: Is it normally distributed? Heavy missingness? Homogenous values? Sampling bias?
96
+
- Look at the data distribution tab: Is it normally distributed? Heavy missingness? Homogeneous values? Sampling bias?
71
97
- Check the total participants to plan your sample size accordingly
72
98
73
99
**For reproducibility:**
@@ -76,11 +102,11 @@ After working with the Showcase on multiple projects, I've developed a workflow
76
102
- Record the version date (last import/update)
77
103
- Check stability rating (how data may change in future releases)
78
104
79
-
**To save time:**
80
-
81
-
- Use algorithmically-defined outcomes in Category [42](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42) instead of manually classifying from multiple sources
82
-
- Watch for "Not available" status because sometimes fields are listed before release
105
+
**Watch out:**
83
106
84
-
Publications tell you which field _was_ used, but they rarely tell you which field to use. The Showcase is how you figure out which one you should use. Spending a few minutes there before starting your analysis can save weeks of downstream headaches. For more details on features I didn't cover, see Part III of the [Showcase user guide](https://biobank.ndph.ox.ac.uk/showcase/ukb/exinfo/ShowcaseUserGuide.pdf) (page 4).
107
+
- Like any great love, the Showcase is not perfect. Sometimes a data field has the status set to "Not available", meaning it is listed before release. If the release date is listed and it is not set in the future, reach out to UK Biobank support for clarification.
108
+
- Sometimes data that appears in the Showcase is missing from your RAP workspace entirely. This can happen if your project is running an outdated version of the UK Biobank data release.
109
+
- If you are the project admin, go to the `Settings` page of your dispensed project and click `Check for Updates` in the UK Biobank section.
110
+
- If you are not the admin or the update does not resolve it, reach out to the UK Biobank support team directly. Tell them upfront if you have already searched the community forums. They are genuinely helpful and worth contacting.
85
111
86
-
Finding the right field is half the battle. In the next post, we'll dive into actually loading this data for analysis.
112
+
The more time I spend with the Showcase, the more I appreciate what it actually is: not just a catalog, but a guide to making good decisions about your data. For features not covered here, Part III of the [Showcase user guide](https://biobank.ndph.ox.ac.uk/showcase/ukb/exinfo/ShowcaseUserGuide.pdf) (page 4) is worth bookmarking. In the next post, we'll dive into actually loading this data for analysis.
0 commit comments