Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/alpha.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ We are rolling out an initial version of Underlay dubbed `build-target-1`. This

1. The definition and use of shared schemas
2. The workflow and interfaces for adding data
3. The interfaces for using data stored in different collections How do you want to use data?
3. The interfaces for using data stored in different collections (How do you want to use data?)


## 1. Shared schemas
Some questions we're interested in understanding:
- Are there common schemas you already use in practice?
- Are there specific things that you wish were consistentl represented in a common way across groups?
- Are there specific things that you wish were consistently represented in a common way across groups?

## 2. Data workflows
- What tools do you use for data work at the moment?
Expand Down
6 changes: 3 additions & 3 deletions docs/collections.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Collections are the primary element in Underlay and its associated protocols. Mo

A collection should be portable - it should contain everything about itself (e.g. [Discussions](/discussions) should be stored as data within the collection, not in another format outside of the collection itself).

Collections have a human-readbale name and a canonical `shortId`. These are concatenated to produce the URL of the collection: `underlay.org/jordan/${human-readable-name}-${shortId}`.
Collections have a human-readable name and a canonical `shortId`. These are concatenated to produce the URL of the collection: `underlay.org/jordan/${human-readable-name}-${shortId}`.

The canonical `shortId` provides a persistent means for routing to a collection across changes in namespace, transfers of collections to other owners, collection renames, and namespace or collection-name typos.

## Collections as building blocks
In the early days of the Underlay, we envisioned a singular, monolithic knowledge graph akin to Freebase or Google's knowledge graph. As the project matured, we realized that a misalignment of that approach with our mission is that a singular graph can only possibly represent a single curatorial perspective. We don't believe such a singular perspective can exist ethically or logistically (who is going to curate such a thing?!).

Our current vision for the Underlay is one where many such graphs are created by using collections as building blocks. Each collection represents a focused, curated set of data. Piecing many of them together, like using Legos to construct a larger structure, it is possible to build a large, expert-curated, deeply provenanced knowledge graph. We envision there will many large collections that are simply curated perspectives on which sub-collections are trustworthy, verifiable, and appropriate. Similarly to how a single open-source codebase can have a deeply nested tree of dependencies, we envision collections that have a deeply nested tree of dependency-collections.
Our current vision for the Underlay is one where many such graphs are created by using collections as building blocks. Each collection represents a focused, curated set of data. Piecing many of them together, like using Legos to construct a larger structure, it is possible to build a large, expert-curated, deeply provenanced knowledge graph. We envision there will many large collections that are simply curated perspectives on which sub-collections are trustworthy, verifiable, and appropriate. Similar to how a single open-source codebase can have a deeply nested tree of dependencies, we envision collections that have a deeply nested tree of dependency-collections.

Similarly to how an opensource code package defines an API that is used to integrate it into a larger codebase, Underlay collections define a schema that allows the data to be mapped into a larger database appropriately.
Similar to how an open-source code package defines an API that is used to integrate it into a larger codebase, Underlay collections define a schema that allows the data to be mapped into a larger database appropriately.

7 changes: 4 additions & 3 deletions docs/data.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# Data

Data in Underlay can be stored in any manner technically appropriate for a given architecture. The key conssistency is making data available to other collections and to [exports](using.md) with a consistent interface. Whether data is stored in flat files, a database, or some future architecture should be irrelevant. In this way, we say that Underlay data is 'storage agnostic'. Underlay does not prescribe how data is stored, rather, the interfaces that must be implemented in making that data avialable.
Data in Underlay can be stored in any manner technically appropriate for a given architecture. The key consistency is making data available to other collections and to [exports](using.md) with a consistent interface. Whether data is stored in flat files, a database, or some future architecture should be irrelevant. In this way, we say that Underlay data is 'storage agnostic'. Underlay does not prescribe how data is stored, rather, the interfaces that must be implemented in making that data available.

## Current implementation
At present, data is added to collections by uploaded CSV files. This basic approach is the most common one we've heard requested. A user with sufficient permissions to a collection can upload a CSV file which, along with its Mapping produce an [assertion](protocol.md).
<!-- What is a Mapping? -->

Our intent is to implement many modes of adding data that all generate a compliant assertion. Some example input approaches:

- **Web UX:** Building a table-like data editor directly into underlay.org. Values in the table can we edited, new rows can be inserted, or deletions can be made. Such an interface could be made available to different permission levels, some requiring approval by an administrator before being included as a viable assertion.
- **API:** Building an API to allow programmatic insertion, editing, and deletion of collection data. This would allow scripting to be written that automate the process of shaping and uploading new data into a collection.
- **Web UX:** Building a table-like data editor directly into underlay.org. Values in the table can be edited, new rows can be inserted, or deletions can be made. Such an interface could be made available to different permission levels, some requiring approval by an administrator before being included as a viable assertion.
- **API:** Building an API to allow programmatic insertion, editing, and deletion of collection data. This would allow scripts to be written that automate the process of shaping and uploading new data into a collection.
- **Web Forms:** Using the API to provide hosted web forms that can be used to generate schema-compliant data additions. Analogous to a web form populating a new row in a spreadsheet, we can have web forms populate a new set of entities in a collection.

## Collaborative data
Expand Down
6 changes: 3 additions & 3 deletions docs/discussions.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Discussions are implemented by automatically adding a `Discussion` type to the c

## Discussion UI

- Discussions can be created from the discussion tab
- [Show image]
- Discussions can be created from the discussion tab.
<!-- - [Show image] -->
- Discussions can be created from the data viewer to scope it to a specific data point.
- [Show image]
<!-- - [Show image] -->
2 changes: 1 addition & 1 deletion docs/federation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Federation
Over the course of the project, we've explored maybe different architectures for Underlay. At the core of this consideration is a tension between the simplicity and accessibility of a system, and the eventual power dynamics and restrictions that such a system imposes. Centralized systems are cheaper, more efficient, and quickly to develop, but they risk placing too much power in the hands of the central hosting entity.
Over the course of the project, we've explored many different architectures for Underlay. At the core of this consideration is a tension between the simplicity and accessibility of a system, and the eventual power dynamics and restrictions that such a system imposes. Centralized systems are cheaper, more efficient, and quick to develop, but they risk placing too much power in the hands of the central hosting entity.

Purely distributed systems offer technical assurances against such risks, but in practice are difficult to use, grow, and develop. In the most extreme case, they play into a fallacy of trustless systems based on technical proofs. We do not advocate for trustless systems, as we simply don't believe such things are possible once you enter the realm of social structures and culture.

Expand Down
4 changes: 2 additions & 2 deletions docs/history.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ There were several catalysts that led discussions about a new project called the
- Danny's experience with founding [Metaweb](https://en.wikipedia.org/wiki/Metaweb) and their development (and eventual sale) of [Freebase](https://en.wikipedia.org/wiki/Freebase_(database)). Freebase was [shut down by Google in 2016](https://groups.google.com/g/freebase-discuss/c/WEnyO8f7xOQ), and Danny knew an open version of a global knowledge graph still possible and critical.
- Travis and Thariq Shihipar had been developing early versions of [PubPub](https://www.pubpub.org) in 2015 and 2016. Initially, PubPub was both a frontend for fast iteration of scholarly articles, and a backend for long-term archival of such documents. Merging these two things created a very uncomfortable interface — one that was simultaneously trying to reduce friction to make quick, iterative changes while also notifying people that all changes would be permanently catalogued in a distributed database forever. Yikes. The archival layer of PubPub was broken off as it became clear there was value in having a separate system for long-term collaborative storage of persistent data.
- SJ's experience and involvement with WikiData led to enthusiasm about the opportunities in this space and insight into what was still lacking.
- As part of his PhD general exames, Travis built [DbDb](https://notes.knowledgefutures.org/pub/hevceylu). The idea behind DbDb was to allow users to publish not just datasets, but the lineage of how a dataset was processed and transformed over time, allowing alternative analyses to be 'forked' from any point in a datasets processing timeline.
- As part of his PhD general exams, Travis built [DbDb](https://notes.knowledgefutures.org/pub/hevceylu). The idea behind DbDb was to allow users to publish not just datasets, but the lineage of how a dataset was processed and transformed over time, allowing alternative analyses to be 'forked' from any point in a dataset's processing timeline.

## Support from Protocol Labs
Beginning in 2018, Joel Gustafson began working at [Protocol Labs](https://protocol.ai/), who generously allowed him to work full-time on research and development of the Underlay project. Until 2021, Joel was the only person working full-time on the technical components of the project.
Expand Down Expand Up @@ -64,4 +64,4 @@ With time, we realized that the idea of a singular knowledge graph was rather fr

Instead of a single, global, distributed knowledge graph, the notion of Collections emerged. A collection can be thought of as a contained, singularly curated, knowledge graph. This could be as broad as 'all human knowledge' or as narrow as 'taco shops in my neighborhood'. The critical feature of collections though is that they can be designed to be composable. That is, larger and broader knowledge graphs could be built by pulling together smaller knowledge graphs curated by experts on the given topic. And as an extension, many such large and broad knowledge graphs could exist simultaneously based on which collections they decide are relevant for their collection.

Collections thus can be viewed as a knowledge graph of specific size, authority, trustworthyness, and purpose that represent a singular curational perspective. So rather than a singular, global knowledge graph - we have a network of knowledge graphs that can be assembled, composed, and re-mixed to match a given purpose and perspective.
Collections thus can be viewed as a knowledge graph of specific size, authority, trustworthiness, and purpose that represent a singular curational perspective. So rather than a singular, global knowledge graph - we have a network of knowledge graphs that can be assembled, composed, and re-mixed to match a given purpose and perspective.
2 changes: 1 addition & 1 deletion docs/incentives.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Incentives

A key to Underlay's success is being able to provide incentives for creating, curatoring, and using collections. Currently, the incentives to create and maintain a public dataset are underwhelming when compared to the cost of actually doing so. There are a few key features we think can improve the situation:
A key to Underlay's success is being able to provide incentives for creating, curating, and using collections. Currently, the incentives to create and maintain a public dataset are underwhelming when compared to the cost of actually doing so. There are a few key features we think can improve the situation.

## Incentivizing useful datasets.
A common practice when publishing a public dataset is to simply upload a flat file to a data server. This server can then give you a download count to suggest how useful people find the dataset to be. We think we can do better with a few key features:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ Our first step in building towards that vision is a central underlay.org platfor
While there are significant technical elements to the project, we are heavily focused on the social and cultural requirements that must be addressed in order for public, collaborative data to flourish. As such, we strive to prioritize simple and accessible design patterns that can be understood and used by many, rather than complex technical patterns that may provide higher customization or efficiency but require deep expertise.

## Why is this work important?
Knowledge is often exchanged in formats optimized for computers, and then used to render webpages, maps, diagrams, tables and text for human consumption. It is also used directly by machines to navigate vehicles, trade stocks, control appliances, design structures, formulate scientific hypotheses, order search results, and much more. But today, most of these machine-readable data sources are privately held and controlled, and the ones that do exist publicly are fragmented and don’t work well with each other. Indeed, at the moment it seems the only way to leverage the power of a large dataset is to control it privately and implement business operations around staffing its maintenance and usage for the purpose of private benefit. The goal of the Underlay is to improve the way that public data is created, curated, and used such that it can be shared and used for public benefit.
Knowledge is often exchanged in formats optimized for computers, and then used to render webpages, maps, diagrams, tables, and text for human consumption. It is also used directly by machines to navigate vehicles, trade stocks, control appliances, design structures, formulate scientific hypotheses, order search results, and much more. But today, most of these machine-readable data sources are privately held and controlled, and the ones that do exist publicly are fragmented and don’t work well with each other. Indeed, at the moment it seems the only way to leverage the power of a large dataset is to control it privately and implement business operations around staffing its maintenance and usage for the purpose of private benefit. The goal of the Underlay is to improve the way that public data is created, curated, and used such that it can be shared and used for public benefit.
6 changes: 3 additions & 3 deletions docs/namespaces.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Namespaces

Namespaces in Underlay are human-readable strings that give context to the authority behind a given collection. A single namespace is given to each User and Community. The pool of available namespaces is shared amonst all users and communities.
Namespaces in Underlay are human-readable strings that give context to the authority behind a given collection. A single namespace is given to each User and Community. The pool of available namespaces is shared amongst all users and communities.

Navigating to a namespace URL (e.g. `underlay.org/jordan` or `underlay.org/nasa`) will lead to a user's or organization's profile. The profile will list all collections associated with that namespace and other profile details. Collections live at a URL path after a namepsace, e.g. `underlay.org/${namespace}/${collection-slug}${collection-shortId}`.

Namespaces are not persistent! Users or communities may change the namespace over time (though, many won't), so they can not do not guarantee a persistent, permanent address.
Namespaces are not persistent! Users or communities may change the namespace over time (though, many won't), so they cannot and do not guarantee a persistent, permanent address.

It will be common practice to refer to a schema or collection using namespaces and collection slugs, but permanent URIs will use collection shortIds or full ids of the collection or namespace. For example

Expand All @@ -25,7 +25,7 @@ collectionShortId: hsbga72

This allows us to resolve collections, specific schema or collection versions despite a namespace or collection-string changing, as long as the collection `shortId` is maintained.

Schemas will typically be addressed be referenced by their most recent human-readable URI. For example:
Schemas will typically be addressed or be referenced by their most recent human-readable URI. For example:

```
schema: jordan/map-data-hsbga@2.1
Expand Down
4 changes: 2 additions & 2 deletions docs/protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Our current answer is that the purpose of the Underlay is to make public data mo

With that in mind, it is often easier to identify what parts of that problem Underlay does _not_ address. For example, we are not trying to increase query-speed of data (i.e. we're not a database), and we're not trying to improve transfer sizes (i.e. we're not a compression algorithm).

In fact, nearly all of the technical components that one typically thinks of when considering public datasets have already been considered and addressed by past efforts around the Semantic Web, RDF, and modern efforts on IPFS, IPLD, Dat, and other open data projects. However, despite the technical expertise brought to these projects, the reality of public data still leaves us wanting. As such, we identify that a missing piece we can address is the social dynamics of using public data. Our approach is to identify simple, well-established technical components that can serve as the basis for facilitating more effective, equitable, and sustainble processes.
In fact, nearly all of the technical components that one typically thinks of when considering public datasets have already been considered and addressed by past efforts around the Semantic Web, RDF, and modern efforts on IPFS, IPLD, Dat, and other open data projects. However, despite the technical expertise brought to these projects, the reality of public data still leaves us wanting. As such, we identify that a missing piece we can address is the social dynamics of using public data. Our approach is to identify simple, well-established technical components that can serve as the basis for facilitating more effective, equitable, and sustainable processes.

The Underlay is premised on the idea that a knowledge graph can be constructed from a series of distributed transactions called assertions. Multiple assertions are combined through a process called reduction and can be curated into useful groupings using collections.

Expand All @@ -25,7 +25,7 @@ A toy example of an assertion is simply something like
}
```

In english: 'Jude says that Rosalind Franklin was born in 1920'.
In English: 'Jude says that Rosalind Franklin was born in 1920'.

## Reduction

Expand Down
Loading