|
| 1 | +# Current state of the art re data linkage/federation/AI&ML&LLM across infrastructures: federation, governance, safe output methods |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +### Summary |
| 6 | + |
| 7 | +Issues about federation of datasets were discussed, including identifying different datasets across multiple systems, how to collect identifiable information robustly, and how we can link up different approaches across the 4 nations effectively. |
| 8 | + |
| 9 | +There was further discussion on how to effectively check ML models within TREs. |
| 10 | + |
| 11 | +In the case of governance, it was suggested that a project working across multiple TREs should have one singular governance process. |
| 12 | + |
| 13 | +### Next steps |
| 14 | + |
| 15 | +- Create a 'panel' focused on specific type of data/research (e.g. health, crime, financial) who can oversee specific research projects within these fields |
| 16 | + |
| 17 | +## Raw notes |
| 18 | + |
| 19 | +### Data Linkage |
| 20 | + |
| 21 | +#### How do you go about the NHS Number? |
| 22 | + |
| 23 | +- Uses NHS Standard NF5, after 3 they went to manual to track through the system. |
| 24 | +- Issues with health and non-health data |
| 25 | + |
| 26 | +#### Names such as Dave / David can cause problems. |
| 27 | + |
| 28 | +- Linksmart is a solution for this. |
| 29 | +- Collecting Crime Data |
| 30 | + |
| 31 | +#### Scotland's Approach |
| 32 | + |
| 33 | +- a national ID number |
| 34 | + |
| 35 | +### Federation between datasets |
| 36 | + |
| 37 | +- Identifying with confidence across TREs is important |
| 38 | +- Problem: Linking health with something else is problematic to match up and link it with addresses and names |
| 39 | +- Separation functions |
| 40 | +- Person has all the identifying information, but they do not have the data |
| 41 | +- TREs communications between each other need specific criteria, Scotland has 5 TREs |
| 42 | +- Having more than two, and introducing a central one is a possibility |
| 43 | +- Issues with identifying A-B data sets across multiple systems |
| 44 | +- Seeding Death Data -- David and Debra Smith: D. Smith & D. Smith causes gender incompatibility issues |
| 45 | +- National Drug Treatment Data -- At source they only collected initials 'D.S.', Gender and MM/YYYY of DOB. Deidentifying can cause linking problems. Education to non-education where they don't have their common 'number' -- how confident can we be that Participant A is the same participant in another TRE? If you're not sharing names & addresses |
| 46 | +- Bringing in NHS data and also pseudo anonymise it -- how can you work with it without a key? |
| 47 | +- Once you got a data linkage -- bringing the different data types into a data set (TRE). E.g. Linking mental health data and shopping data, if you anonymise that and have their own key -- they can do it anonymously for external sources |
| 48 | +- Education data between England, Scotland and Wales might use different notations |
| 49 | +- Residential Data can be used as a key |
| 50 | +- 'E-child' trying to link the NHS with the Department of Education |
| 51 | + |
| 52 | +### AI & ML |
| 53 | + |
| 54 | +- People misunderstand the terms AI & ML with 'Statistical Modeling' |
| 55 | +- Based on risk factors you can determine 70% precision pre-diabetic chance |
| 56 | +- Accessing 'clinical like data' with similar terminology to mimic clinic systems |
| 57 | +- AI -- Offline AI: you can have an offline machine learning model -- yes |
| 58 | +- Would multiple AIs learn the same thing on same data sets? -- no |
| 59 | +- You can make it work with a shared API though (Stroke Predicition) |
| 60 | +- APRs -- 8-9 expensive centre |
| 61 | +- Different type of interpretation of ML, ML data on health 'takes your job', ML data on other scenarios might be socially acceptable |
| 62 | +- Pattern finding models are popular and precise, this is lacking in statistical modeling |
| 63 | +- At the end of the day, medical data ML is not understood why it gives that result |
| 64 | +- Checking models are problematic and difficult, unsure results and unsure contents of the model begs the question of the model's authenticity |
| 65 | + |
| 66 | +### Governance |
| 67 | + |
| 68 | +- Process is repeated a lot, no committee talks to each other and are a separate entity |
| 69 | +- Cannot start work unless approved |
| 70 | +- Doing a project between TREs, each TRE will have an approval process, ideally a multi TRE Project requires a single approval process, this decision should be approved across the other one |
| 71 | + |
| 72 | +#### What would a solution to this problem look like? |
| 73 | + |
| 74 | +- Current state of the art is the overarching question -- needs a TRE panel to decide what is state of the art |
| 75 | +- Single 'panel' on a specialty (e.g. health, crime) who deal with specific projects, additionally members of the national TRE supervision |
0 commit comments