Merge branch 'main' into fundraiser-event

sklasfeld · web-flow · commit 7530935c7c10 · 2026-03-12T08:35:40.000-04:00
diff --git a/AGENTS.md b/AGENTS.md
@@ -141,7 +141,23 @@ No test suite (no Jest/Vitest/Playwright config). Quality is enforced via `astro
 
 ---
 
+## Markdown Formatting Conventions (Blog Posts)
+
+Use the following conventions consistently in blog post `.md` files:
+
+| Element                                   | Format                               | Example                                                      |
+| ----------------------------------------- | ------------------------------------ | ------------------------------------------------------------ |
+| Variable/field names, IDs, numeric values | Inline code (backticks)              | `` `22420` ``, `` `1` ``, `` `-3` ``                         |
+| UI tab names, button labels, page names   | Bold                                 | `**Data** tab`, `**Settings** page`, `**Check for Updates**` |
+| Key domain terms being defined            | Bold (first use) or `###` subheading | `**Coding**` or `### Coding`                                 |
+| File paths, code identifiers              | Inline code (backticks)              | `` `src/utils/blog.ts` ``                                    |
+
+Inline code renders with a styled pill background (gray-100 light / slate-800 dark) via `.prose :not(pre) > code` in `tailwind.css`. The Tailwind Typography quote pseudo-elements are suppressed.
+
+---
+
 ## Self-Correction
 
 - **Stale code map**: If you discover that a file path, export name, or directory described above no longer exists or has moved, update the relevant section of this file immediately before proceeding with the task.
 - **User corrections**: If the user corrects how work should be done in this repo (workflow, tooling preferences, naming conventions, patterns to avoid), add the correction to the **Local norms** section above so future sessions inherit it.
+- **After editing this file**: Run `npm run fix` to apply Prettier formatting before proceeding.
diff --git a/src/assets/styles/tailwind.css b/src/assets/styles/tailwind.css
@@ -117,6 +117,23 @@
   @apply bg-slate-900 md:bg-[#030621e6] border-b border-gray-500/20;
   box-shadow: none;
 }
+/* Inline code styling in prose */
+.prose :not(pre) > code {
+  background-color: #f3f4f6; /* gray-100 */
+  color: #1f2937; /* gray-800 */
+  border-radius: 0.25rem;
+  padding: 0.1em 0.35em;
+  font-size: 0.875em;
+}
+.prose :not(pre) > code::before,
+.prose :not(pre) > code::after {
+  content: none;
+}
+.dark .prose :not(pre) > code {
+  background-color: #1e293b; /* slate-800 */
+  color: #e2e8f0; /* slate-200 */
+}
+
 /* Make sure bullets are visible in dark mode */
 .dark .prose ul > li::marker,
 .dark .prose ol > li::marker {
diff --git a/src/content/post/biobankSeries/20260310_post_Sammy.md b/src/content/post/biobankSeries/20260310_post_Sammy.md
@@ -1,7 +1,7 @@
 ---
 publishDate: 2026-03-10T00:00:00-05:00
 title: 'Biobank Intro Series: UK Biobank Observational Data (Part I)'
-excerpt: 'Save a clock tick with the UK Biobank Showcase'
+excerpt: 'An ode to the UK Biobank Showcase'
 slug: blog/biobank-intro-series/03-ukb-observational-data-partI
 image: /blog_images/biobank1/ukbShowcaseGraphic.png
 imageAlt: "Papers tell you WHAT was used. Showcase tells you what's AVAILABLE NOW."
@@ -30,12 +30,42 @@ seo:
    <em>The showcase returns at least three different field IDs for BMI. It is difficult to find this information in any publication.</em>
 </figcaption>
 
-Consider this post a love letter to the Showcase.
+Consider this post a love letter to the UK Biobank Showcase.
 
-When I first started working with UK Biobank, I did what I always did in graduate school. I dug through methods sections and supplemental materials to track down the features used in the study. It worked, but it was slow. Part of the problem is that papers rarely cite [The UK Biobank Showcase](https://biobank.ndph.ox.ac.uk/showcase/) directly. It is so foundational to the field that experienced researchers treat it as assumed knowledge. Coming from a different domain, I had no idea it existed. Once my manager pointed me to the Showcase, I discovered measurements beyond what had been published and no longer had to spend hours on detective work.
+When I first started working with UK Biobank, I fell back on what I knew from graduate school. I dug through methods sections and supplemental materials to track down the features used in the study. It worked, but it was slow. Part of the problem is that papers rarely cite [The UK Biobank Showcase](https://biobank.ndph.ox.ac.uk/showcase/) directly. It is so foundational to the field that experienced researchers treat it as assumed knowledge. Coming from a different domain, I had no idea it existed. Once my manager pointed me to the Showcase, I discovered measurements beyond what had been published and no longer had to spend hours on detective work.
 
 To understand why the showcase is so useful, it helps to know the scale of what UK Biobank actually contains. The UK Biobank as a resource spans clinical measurements, survey data, genomics, imaging, proteomics, metabolomics, and more. For a comprehensive overview of all available data types, see [What types of data are available in UK Biobank?](https://community.ukbiobank.ac.uk/hc/en-gb/articles/23472796568861-What-types-of-data-are-available-in-UK-Biobank) on the UK Biobank Community site. The Showcase is your tool for navigating all of it.
 
+## Reading Between the Data Fields: Arrays, Instances, and Codes
+
+The Showcase doesn't just catalog measurements. It lovingly documents the shape of the data itself. On the main page for each field, the **Data** tab provides key details about coding, instances, and array indices that will save you real headaches downstream.
+
+### Coding
+
+Coding is how categorical responses are stored. Rather than storing "Yes" or "No", many fields store numeric codes: `1` for "Yes", `0` for "No", and `-3` for "Prefer not to answer". The Showcase provides a data coding table for each such field. More on working with complex codes in the next post.
+
+### Instances
+
+Instances are timepoints. For example, if a measurement reports using instancing type 2, it will report measurements collected at four visits:
+
+- instance `0`: the initial assessment
+
+- instance `1`: the first repeat visit
+
+- instance `2`: the imaging visit
+
+- instance `3`: the first repeat imaging visit
+
+For most phenotypes, Instance `0` has the largest sample size. If you need longitudinal data, expect much smaller numbers at later instances.
+
+### Arrays
+
+Arrays are repeated measurements within a single visit. Diastolic and systolic blood pressure (`4079`, `4080`), for example, are taken twice in one sitting. Each repeat is stored as a separate array index (`0`, `1`). The Showcase tells you how many array values a field has so you can plan how to handle them.
+
+The Data tab gives you the architecture of a field: how its values are structured, repeated, and encoded. What it does not tell you is whether those values are reliable, comparable, or the best available option for your phenotype. For that, you need the category-level context, and the Showcase delivers it.
+
+## Example: Not All Field IDs Are Created Equal
+
 For example, searching "Left Ventricular Ejection Fraction" returns [multiple relevant fields](https://biobank.ndph.ox.ac.uk/showcase/search.cgi?wot=0&srch=Left+Ventricular+Ejection+Fraction&yfirst=2000&ylast=2025) and three with the exact correct description (22420, 24103, and 31060). But which one should you use? This is where the Showcase becomes essential.
 
 ![UK Biobank Showcase search interface showing search results for left ventricular ejection fraction, with three data fields highlighted: 22420 (Left ventricular size and function category), 24103 (Cardiac and aortic function #1 category), and 31060 (Cardiac and aortic function #2 category)](/blog_images/biobank1/lvef_search.png)
@@ -52,13 +82,9 @@ Notice the three highlighted fields measure the same thing but belong to differe
    Comparison of three LVEF fields showing participant counts and category quality warnings
 </figcaption>
 
-Field 22420 ([category 133](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=133)) has 39,649 measurements but includes a warning: "Quality issues may exist in this data. Researchers may wish to consider using data available in Category 157 or Category 162 as an alternative." Field 24103 ([category 157](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=157)) contains 80,974 measurements and references a published methodology, but warns these fields "should not be considered together" with Category 162 without quality assessment. Field 31060 ([category 162](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=162)) has only 4,868 participants, fewer than the flagged field 22420.
-
-For my cardiomyopathy work ([Klasfeld _et al_ 2025](<https://www.cell.com/hgg-advances/fulltext/S2666-2477(25)00063-6>)), I chose field 24103 for its sample size and data quality. However, other practical information provided by the showcase includes the date of which the data was reported (Debut) and the distribution of the data (shown in the data tab in the second section of the Field ID entry).
+Field `22420` ([category 133](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=133)) has 39,649 measurements but includes a warning: "Quality issues may exist in this data. Researchers may wish to consider using data available in Category 157 or Category 162 as an alternative." Field `24103` ([category 157](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=157)) contains 80,974 measurements and references a published methodology, but warns these fields "should not be considered together" with Category 162 without quality assessment. Field `31060` ([category 162](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=162)) has only 4,868 participants, fewer than the flagged field `22420`.
 
-**Another critical detail:** Many UK Biobank measurements were collected at multiple timepoints (instances). The Showcase shows you not just the field ID, but which instances have data. For most phenotypes, the initial assessment (Instance 0) has the largest sample size, with subsequent visits having progressively fewer participants. For covariates, I typically use the initial visit value. If you need longitudinal data, expect much smaller sample sizes.
-
-Additionally, some fields contain multiple measurements per participant within a single visit (arrays). For example, blood pressure taken three times. The Showcase specifies these array structures so you can plan your handling strategy.
+For my cardiomyopathy work ([Klasfeld _et al_ 2025](<https://www.cell.com/hgg-advances/fulltext/S2666-2477(25)00063-6>)), I chose field `24103` for its sample size and data quality. However, other practical information provided by the showcase includes the date of which the data was reported (Debut) and the distribution of the data (shown in the data tab in the second section of the Field ID entry).
 
 ## UK Biobank Showcase Tips
 
@@ -67,7 +93,7 @@ After working with the Showcase on multiple projects, I've developed a workflow
 **Before selecting a field:**
 
 - Always check the category warnings, not just the field description
-- Look at the data distribution tab: Is it normally distributed? Heavy missingness? Homogenous values? Sampling bias?
+- Look at the data distribution tab: Is it normally distributed? Heavy missingness? Homogeneous values? Sampling bias?
 - Check the total participants to plan your sample size accordingly
 
 **For reproducibility:**
@@ -76,11 +102,11 @@ After working with the Showcase on multiple projects, I've developed a workflow
 - Record the version date (last import/update)
 - Check stability rating (how data may change in future releases)
 
-**To save time:**
-
-- Use algorithmically-defined outcomes in Category [42](https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=42) instead of manually classifying from multiple sources
-- Watch for "Not available" status because sometimes fields are listed before release
+**Watch out:**
 
-Publications tell you which field _was_ used, but they rarely tell you which field to use. The Showcase is how you figure out which one you should use. Spending a few minutes there before starting your analysis can save weeks of downstream headaches. For more details on features I didn't cover, see Part III of the [Showcase user guide](https://biobank.ndph.ox.ac.uk/showcase/ukb/exinfo/ShowcaseUserGuide.pdf) (page 4).
+- Like any great love, the Showcase is not perfect. Sometimes a data field has the status set to "Not available", meaning it is listed before release. If the release date is listed and it is not set in the future, reach out to UK Biobank support for clarification.
+- Sometimes data that appears in the Showcase is missing from your RAP workspace entirely. This can happen if your project is running an outdated version of the UK Biobank data release.
+  - If you are the project admin, go to the `Settings` page of your dispensed project and click `Check for Updates` in the UK Biobank section.
+  - If you are not the admin or the update does not resolve it, reach out to the UK Biobank support team directly. Tell them upfront if you have already searched the community forums. They are genuinely helpful and worth contacting.
 
-Finding the right field is half the battle. In the next post, we'll dive into actually loading this data for analysis.
+The more time I spend with the Showcase, the more I appreciate what it actually is: not just a catalog, but a guide to making good decisions about your data. For features not covered here, Part III of the [Showcase user guide](https://biobank.ndph.ox.ac.uk/showcase/ukb/exinfo/ShowcaseUserGuide.pdf) (page 4) is worth bookmarking. In the next post, we'll dive into actually loading this data for analysis.