Skip to content

Commit 7530935

Browse files
authored
Merge branch 'main' into fundraiser-event
2 parents f58cbe1 + 7607c99 commit 7530935

File tree

3 files changed

+75
-16
lines changed

3 files changed

+75
-16
lines changed

AGENTS.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,23 @@ No test suite (no Jest/Vitest/Playwright config). Quality is enforced via `astro
141141

142142
---
143143

144+
## Markdown Formatting Conventions (Blog Posts)
145+
146+
Use the following conventions consistently in blog post `.md` files:
147+
148+
| Element | Format | Example |
149+
| ----------------------------------------- | ------------------------------------ | ------------------------------------------------------------ |
150+
| Variable/field names, IDs, numeric values | Inline code (backticks) | `` `22420` ``, `` `1` ``, `` `-3` `` |
151+
| UI tab names, button labels, page names | Bold | `**Data** tab`, `**Settings** page`, `**Check for Updates**` |
152+
| Key domain terms being defined | Bold (first use) or `###` subheading | `**Coding**` or `### Coding` |
153+
| File paths, code identifiers | Inline code (backticks) | `` `src/utils/blog.ts` `` |
154+
155+
Inline code renders with a styled pill background (gray-100 light / slate-800 dark) via `.prose :not(pre) > code` in `tailwind.css`. The Tailwind Typography quote pseudo-elements are suppressed.
156+
157+
---
158+
144159
## Self-Correction
145160

146161
- **Stale code map**: If you discover that a file path, export name, or directory described above no longer exists or has moved, update the relevant section of this file immediately before proceeding with the task.
147162
- **User corrections**: If the user corrects how work should be done in this repo (workflow, tooling preferences, naming conventions, patterns to avoid), add the correction to the **Local norms** section above so future sessions inherit it.
163+
- **After editing this file**: Run `npm run fix` to apply Prettier formatting before proceeding.

src/assets/styles/tailwind.css

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,23 @@
117117
@apply bg-slate-900 md:bg-[#030621e6] border-b border-gray-500/20;
118118
box-shadow: none;
119119
}
120+
/* Inline code styling in prose */
121+
.prose :not(pre) > code {
122+
background-color: #f3f4f6; /* gray-100 */
123+
color: #1f2937; /* gray-800 */
124+
border-radius: 0.25rem;
125+
padding: 0.1em 0.35em;
126+
font-size: 0.875em;
127+
}
128+
.prose :not(pre) > code::before,
129+
.prose :not(pre) > code::after {
130+
content: none;
131+
}
132+
.dark .prose :not(pre) > code {
133+
background-color: #1e293b; /* slate-800 */
134+
color: #e2e8f0; /* slate-200 */
135+
}
136+
120137
/* Make sure bullets are visible in dark mode */
121138
.dark .prose ul > li::marker,
122139
.dark .prose ol > li::marker {

src/content/post/biobankSeries/20260310_post_Sammy.md

Lines changed: 42 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
publishDate: 2026-03-10T00:00:00-05:00
33
title: 'Biobank Intro Series: UK Biobank Observational Data (Part I)'
4-
excerpt: 'Save a clock tick with the UK Biobank Showcase'
4+
excerpt: 'An ode to the UK Biobank Showcase'
55
slug: blog/biobank-intro-series/03-ukb-observational-data-partI
66
image: /blog_images/biobank1/ukbShowcaseGraphic.png
77
imageAlt: "Papers tell you WHAT was used. Showcase tells you what's AVAILABLE NOW."
@@ -30,12 +30,42 @@ seo:
3030
<em>The showcase returns at least three different field IDs for BMI. It is difficult to find this information in any publication.</em>
3131
</figcaption>
3232

33-
Consider this post a love letter to the Showcase.
33+
Consider this post a love letter to the UK Biobank Showcase.
3434

35-
When I first started working with UK Biobank, I did what I always did in graduate school. I dug through methods sections and supplemental materials to track down the features used in the study. It worked, but it was slow. Part of the problem is that papers rarely cite [The UK Biobank Showcase](https://biobank.ndph.ox.ac.uk/showcase/) directly. It is so foundational to the field that experienced researchers treat it as assumed knowledge. Coming from a different domain, I had no idea it existed. Once my manager pointed me to the Showcase, I discovered measurements beyond what had been published and no longer had to spend hours on detective work.
35+
When I first started working with UK Biobank, I fell back on what I knew from graduate school. I dug through methods sections and supplemental materials to track down the features used in the study. It worked, but it was slow. Part of the problem is that papers rarely cite [The UK Biobank Showcase](https://biobank.ndph.ox.ac.uk/showcase/) directly. It is so foundational to the field that experienced researchers treat it as assumed knowledge. Coming from a different domain, I had no idea it existed. Once my manager pointed me to the Showcase, I discovered measurements beyond what had been published and no longer had to spend hours on detective work.
3636

3737
To understand why the showcase is so useful, it helps to know the scale of what UK Biobank actually contains. The UK Biobank as a resource spans clinical measurements, survey data, genomics, imaging, proteomics, metabolomics, and more. For a comprehensive overview of all available data types, see [What types of data are available in UK Biobank?](https://community.ukbiobank.ac.uk/hc/en-gb/articles/23472796568861-What-types-of-data-are-available-in-UK-Biobank) on the UK Biobank Community site. The Showcase is your tool for navigating all of it.
3838

39+
## Reading Between the Data Fields: Arrays, Instances, and Codes
40+
41+
The Showcase doesn't just catalog measurements. It lovingly documents the shape of the data itself. On the main page for each field, the **Data** tab provides key details about coding, instances, and array indices that will save you real headaches downstream.
42+
43+
### Coding
44+
45+
Coding is how categorical responses are stored. Rather than storing "Yes" or "No", many fields store numeric codes: `1` for "Yes", `0` for "No", and `-3` for "Prefer not to answer". The Showcase provides a data coding table for each such field. More on working with complex codes in the next post.
46+
47+
### Instances
48+
49+
Instances are timepoints. For example, if a measurement reports using instancing type 2, it will report measurements collected at four visits:
50+
51+
- instance `0`: the initial assessment
52+
53+
- instance `1`: the first repeat visit
54+
55+
- instance `2`: the imaging visit
56+
57+
- instance `3`: the first repeat imaging visit
58+
59+
For most phenotypes, Instance `0` has the largest sample size. If you need longitudinal data, expect much smaller numbers at later instances.
60+
61+
### Arrays
62+
63+
Arrays are repeated measurements within a single visit. Diastolic and systolic blood pressure (`4079`, `4080`), for example, are taken twice in one sitting. Each repeat is stored as a separate array index (`0`, `1`). The Showcase tells you how many array values a field has so you can plan how to handle them.
64+
65+
The Data tab gives you the architecture of a field: how its values are structured, repeated, and encoded. What it does not tell you is whether those values are reliable, comparable, or the best available option for your phenotype. For that, you need the category-level context, and the Showcase delivers it.
66+
67+
## Example: Not All Field IDs Are Created Equal
68+
3969
For example, searching "Left Ventricular Ejection Fraction" returns [multiple relevant fields](https://biobank.ndph.ox.ac.uk/showcase/search.cgi?wot=0&srch=Left+Ventricular+Ejection+Fraction&yfirst=2000&ylast=2025) and three with the exact correct description (22420, 24103, and 31060). But which one should you use? This is where the Showcase becomes essential.
4070

4171
![UK Biobank Showcase search interface showing search results for left ventricular ejection fraction, with three data fields highlighted: 22420 (Left ventricular size and function category), 24103 (Cardiac and aortic function #1 category), and 31060 (Cardiac and aortic function #2 category)](/blog_images/biobank1/lvef_search.png)
@@ -52,13 +82,9 @@ Notice the three highlighted fields measure the same thing but belong to differe
5282
Comparison of three LVEF fields showing participant counts and category quality warnings
5383
</figcaption>
5484

55-
Field 22420 ([category 133](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=133)) has 39,649 measurements but includes a warning: "Quality issues may exist in this data. Researchers may wish to consider using data available in Category 157 or Category 162 as an alternative." Field 24103 ([category 157](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=157)) contains 80,974 measurements and references a published methodology, but warns these fields "should not be considered together" with Category 162 without quality assessment. Field 31060 ([category 162](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=162)) has only 4,868 participants, fewer than the flagged field 22420.
56-
57-
For my cardiomyopathy work ([Klasfeld _et al_ 2025](<https://www.cell.com/hgg-advances/fulltext/S2666-2477(25)00063-6>)), I chose field 24103 for its sample size and data quality. However, other practical information provided by the showcase includes the date of which the data was reported (Debut) and the distribution of the data (shown in the data tab in the second section of the Field ID entry).
85+
Field `22420` ([category 133](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=133)) has 39,649 measurements but includes a warning: "Quality issues may exist in this data. Researchers may wish to consider using data available in Category 157 or Category 162 as an alternative." Field `24103` ([category 157](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=157)) contains 80,974 measurements and references a published methodology, but warns these fields "should not be considered together" with Category 162 without quality assessment. Field `31060` ([category 162](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=162)) has only 4,868 participants, fewer than the flagged field `22420`.
5886

59-
**Another critical detail:** Many UK Biobank measurements were collected at multiple timepoints (instances). The Showcase shows you not just the field ID, but which instances have data. For most phenotypes, the initial assessment (Instance 0) has the largest sample size, with subsequent visits having progressively fewer participants. For covariates, I typically use the initial visit value. If you need longitudinal data, expect much smaller sample sizes.
60-
61-
Additionally, some fields contain multiple measurements per participant within a single visit (arrays). For example, blood pressure taken three times. The Showcase specifies these array structures so you can plan your handling strategy.
87+
For my cardiomyopathy work ([Klasfeld _et al_ 2025](<https://www.cell.com/hgg-advances/fulltext/S2666-2477(25)00063-6>)), I chose field `24103` for its sample size and data quality. However, other practical information provided by the showcase includes the date of which the data was reported (Debut) and the distribution of the data (shown in the data tab in the second section of the Field ID entry).
6288

6389
## UK Biobank Showcase Tips
6490

@@ -67,7 +93,7 @@ After working with the Showcase on multiple projects, I've developed a workflow
6793
**Before selecting a field:**
6894

6995
- Always check the category warnings, not just the field description
70-
- Look at the data distribution tab: Is it normally distributed? Heavy missingness? Homogenous values? Sampling bias?
96+
- Look at the data distribution tab: Is it normally distributed? Heavy missingness? Homogeneous values? Sampling bias?
7197
- Check the total participants to plan your sample size accordingly
7298

7399
**For reproducibility:**
@@ -76,11 +102,11 @@ After working with the Showcase on multiple projects, I've developed a workflow
76102
- Record the version date (last import/update)
77103
- Check stability rating (how data may change in future releases)
78104

79-
**To save time:**
80-
81-
- Use algorithmically-defined outcomes in Category [42](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42) instead of manually classifying from multiple sources
82-
- Watch for "Not available" status because sometimes fields are listed before release
105+
**Watch out:**
83106

84-
Publications tell you which field _was_ used, but they rarely tell you which field to use. The Showcase is how you figure out which one you should use. Spending a few minutes there before starting your analysis can save weeks of downstream headaches. For more details on features I didn't cover, see Part III of the [Showcase user guide](https://biobank.ndph.ox.ac.uk/showcase/ukb/exinfo/ShowcaseUserGuide.pdf) (page 4).
107+
- Like any great love, the Showcase is not perfect. Sometimes a data field has the status set to "Not available", meaning it is listed before release. If the release date is listed and it is not set in the future, reach out to UK Biobank support for clarification.
108+
- Sometimes data that appears in the Showcase is missing from your RAP workspace entirely. This can happen if your project is running an outdated version of the UK Biobank data release.
109+
- If you are the project admin, go to the `Settings` page of your dispensed project and click `Check for Updates` in the UK Biobank section.
110+
- If you are not the admin or the update does not resolve it, reach out to the UK Biobank support team directly. Tell them upfront if you have already searched the community forums. They are genuinely helpful and worth contacting.
85111

86-
Finding the right field is half the battle. In the next post, we'll dive into actually loading this data for analysis.
112+
The more time I spend with the Showcase, the more I appreciate what it actually is: not just a catalog, but a guide to making good decisions about your data. For features not covered here, Part III of the [Showcase user guide](https://biobank.ndph.ox.ac.uk/showcase/ukb/exinfo/ShowcaseUserGuide.pdf) (page 4) is worth bookmarking. In the next post, we'll dive into actually loading this data for analysis.

0 commit comments

Comments
 (0)