Wikibase for UMIL? #213
Replies: 3 comments
-
|
Thanks for this thorough overview! A very interesting find... Reading this, I'm finding there are a number of product requirements that are nebulous to me that are separate from, and primary to, questions about any particular implementation. And I think we would do well to have our technical discussions relative to clarified requirements. Which is to say that I'm not necessarily convinced by the pain points you identify, or rather, that I am not convinced that right now these are technical challenges (relative to which we can identify advantages or disadvantages of particular technologies) as opposed to (not sure if this is the right word, but let's say) "functional" pain points. To take them in turn (leaving out the "Reconciliation complexity" one because I don't think I understand it):
Agreed. Based on what we know about Wikidata, what we imagine about the users of UMIL, and our goals, what does review mean for UMIL? What has to "pass" or "fail" and what does that mean? Is it about content? About process? Is it relative to a general code of conduct or some other set of criteria? Let's say we got a whole bunch of users in a room -- ie. it's a process now, not an application -- and that brain trust is called "UMIL" -- what does review look like then?
What's provenance metadata are we trying to track?
The experts have joined our UMIL brain trust: what is their role in the room? Who are our imagined/real collaborators? What difficulties are they running into in collaboration?
So I think this is saying that as internal LinkedMusic users, we want to be able to test UMIL's functionality in development without worrying about making edits to Wikidata itself. What are the criteria we want to implement, then, that will satisfy us internally that something "works" during the development process? What are the criteria/procedures that will ultimately convince us that something is working in production? None of this is particular to whether or not Wikibase (or a relational database, or a search engine) is some part of the UMIL architecture. But does, I think, raise the questions against which any technological decision (including all of the "default" decisions we've already made -- "oh, we need a web app running on it's own server because that's just what we do here at DDMAL") needs to be made. |
Beta Was this translation helpful? Give feedback.
-
|
Let’s discuss all of these at tomorrow’s meeting on this topic at 11 am in the conference room. We’ll setup the Owl and use the lab meeting Zoom link. |
Beta Was this translation helpful? Give feedback.
-
Not me being messy, but despite my raising of some questions/concerns, I am particularly interested in this bit. Let's keep to the current plan we outlined (simple verification process on both Wikibase and Django app), but keep in the back pocket these other possible applications... ;P |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Warning
Hi everyone! I wanted to share some findings from investigating Wikibase as a potential intermediary platform for our UMIL (Universal Musical Instrument Lexicon) workflow, and get thoughts on whether this approach might benefit the broader LinkedMusic initiative.
The Wikibase I've set up and been experimenting with, if you want to click around as you read: https://umilex.wikibase.cloud/wiki/Main_Page (It's called "UMILex" because they require a 5 character minimum, so I went for UMI Lex(icon)).
Background: Current UMIL Workflow Challenges
Our current pipeline looks (something) like:
While functional, we've identified several pain points:
Quality control bottleneck: No clear systematic review process before Wikidata upload
Data provenance gaps: Limited tracking of user contributions vs. expert curation
Reconciliation complexity: Using separate OpenRefine + Virtuoso workflow (mainly a LinkedMusic issue, than UMIL)
Collaboration friction: Difficult for experts to review and validate data, and for users to access Wikidata; Wikibase doesn't have the stringent requirements Wikidata does (we control user requirements directly)
Duplicate risk: No staging environment to test Wikidata integration; can be done with the WikidataAPI, but Wikibase has this too and allows systematic checking and is already in Wikidata's format
Why Investigate Wikibase?
Note
Given UMIL's goal of systematically improving musical instrument representation in Wikidata, I wanted to explore whether
Wikibase could serve as a "gated community" for data quality control before public contribution, as well as a potential workplace for tracking additions and edits.
What I've Discovered
Technical Feasibility ✅
Seamless import: PostgreSQL → OpenRefine → QuickStatements → Wikibase works smoothly
Native multilingual support: Perfect for instrument names across languages/cultures
Custom property system: Can model domain-specific needs (HBS classification, regional variants, etc.)
Built-in SPARQL endpoint: Replaces our separate Virtuoso instance for many use cases
Reconciliation integration: OpenRefine's Wikibase extension handles cross-system matching
Tracking: Using the in-built SPARQL query point, we can query for items "Needing Review" (Q3), as well as items "under review" (Q4), "reviewed" (Q5), or with "errors" (Q6).
Similarly, we can then look for items lacking any of these items—such as those that would be added by a user. Copy below into the UMILex Wikibase query endpoint, or look here: https://tinyurl.com/24bxjnz4 . 😁
Workflow Consolidation Benefits
Instead of: PostgreSQL + OpenRefine + (Maybe Virtuoso) + Manual Curation + Wikidata
We get: PostgreSQL + OpenRefine + Wikibase (with built-in curation tools) + Wikidata
Data Quality Control Features
Wikibase enables systematic quality workflows:
Review status tracking: Items progress through "Needs Review" → "Reviewed" → "Export Ready"
Provenance metadata: Track whether data came from users, experts, databases, or literature
Expert collaboration: Domain specialists can review instrument classifications in a structured environment
Pre-export validation: Test Wikidata integration before making public contributions
Concrete Example: HBS Classification Enhancement
Our prototype automatically:
Imports instruments with existing Wikidata connections
Flags items for expert review based on data source
Enables systematic enhancement (missing languages, corrected classifications, etc.)
Prepares clean export batches distinguishing "new items" vs "enhancements to existing items"
SPARQL Query Example (finding all idiophones needing review): (https://tinyurl.com/22nocu8c)
Strengths Discovered
For UMIL Specifically:
Quality gatekeeper: Provides more control for preventing unvetted data from reaching Wikidata
Expert review workflow: Musicologists can systematically validate classifications
Systematic Wikidata improvement: Both adds new instruments AND fixes existing ones
Full data lineage: Complete provenance from user submission to Wikidata contribution
Technical Advantages:
Consolidates toolchain: Reduces complexity vs. current multi-tool workflow
Native collaborative editing: Multiple experts can work simultaneously
Version control: Built-in change tracking and history; all in one place
Cross-system reconciliation: Maintains connections to original Wikidata items
Flexible data modeling: Can experiment with properties before proposing to Wikidata
Challenges & Limitations
Technical Considerations:
Additional infrastructure: Requires hosting/maintaining Wikibase instance
Learning curve: Team needs to understand Wikibase-specific workflows
Property management: Need to maintain mappings between Wikibase and Wikidata properties
Performance questions: Unclear how Wikibase scales vs. dedicated Virtuoso for very large datasets
Workflow Complexity:
Extra step: Adds another layer between user input and final publication
Synchronization: Need processes to keep Wikibase and Wikidata in sync
Export management: Requires systematic batching and reconciliation for Wikidata uploads
🎊 🏁 You've made it to the end! One concluding statement and then I promise I will finish —
I know this sounds a lot like a sell, and I've gotta say I'm pretty swayed by what it might offer us. I'm looking forward to picking through this with folks here and in ensuing meetings. I'll be throwing notes from these discussions into this thread as well, so please don't hesitate to shout if you have a question, idea, tomato to throw, etc.
cheers! ✨ 🏁
Beta Was this translation helpful? Give feedback.
All reactions