diff --git a/docs/components/index.md b/docs/components/index.md index b0ab07e..ab114a8 100644 --- a/docs/components/index.md +++ b/docs/components/index.md @@ -1,31 +1,35 @@ Component library ================= -On reviewing a range of policy-related data standards, including those that we maintain and those maintained by others, we identified approximately 60 components that at least two standards have chosen to create or adopt. +On reviewing a range of policy-related data standards, including those that we maintain and those maintained by others, we identified approximately 70 components that at least two standards have chosen to create or adopt. Whether or not these are important to a particular standard depends on lots of factors, including but not limited to the maturity of the standard, whether the standard’s adoption is being driven co-operatively or adversarially, political factors and the character of the people and organisations in the domain where the standard operates. -We conducted an exercise with representatives of several standards, asking them to conduct a diamond ranking exercise of the components. Diamond ranking was chosen to allow a ‘fat middle’ while forcing a decision on the highest and lowest priority items. Participants were asked to consider a standard at different levels of maturity, using Charles Handy’s Second Curve model for describing maturity. +During our research in 2016/17, we conducted an exercise with representatives of several standards, asking them to conduct a diamond ranking exercise of the components. Diamond ranking was chosen to allow a ‘fat middle’ while forcing a decision on the highest and lowest priority items. Participants were asked to consider a standard at different levels of maturity, using Charles Handy’s Second Curve model for describing maturity. + +This list was revised in 2021 to take account of developments in open data standards in the intervening period. The list of components is below, with TODO: guidance as to what may make them more or less important for a particular standard. --- ```eval_rst -.. _component-advocacy-plan: +.. _component-advocacy-tools: ``` ## Components -### Advocacy Plan +### Advocacy Tools #### Summary -Setting out steps to encourage organizations to adopt the standard. +Tools to support advocacy for the standard, including plans, arguments and evidence. #### Description An advocacy plan provides the resources and sets out the steps that will be followed to encourage organisations to adopt a standard, as well as key arguments. It should be updated regularly as the standard matures and as the standard starts to have an impact. +Evidence should be gathered over time, to help with advocacy efforts. + #### Examples #### Prioritisation Factors @@ -43,6 +47,35 @@ An advocacy plan provides the resources and sets out the steps that will be foll #### Related Patterns +--- +```eval_rst +.. _component-advocacy-applications: +``` + +### Advocacy Applications + +#### Summary + +Applications that provide a tangible benefit to publishers or users of data that uses a standard + +#### Description + +Advocacy applications are ways to both demonstrate the value of open data and provide value from publication or use. + +Care should be taken to avoid creating applications that push out potential other applications; advocacy applications often will have limited features so as to ensure that there's space for innovation. + +#### Examples + +#### Prioritisation Factors + +#### Deprioritisation Factors + +* If there is existing innovation around the data that stimulates publication and use + +#### Related Components + +#### Related Patterns + --- ```eval_rst .. _component-blog: @@ -104,6 +137,43 @@ Standards that have developed a brand need to ensure that it is used in a way th #### Related Patterns +--- +```eval_rst +.. _component-brand-agreements: +``` + +### Brand Assets + +#### Summary + +Designs, patterns and files to reinforce the brand + +#### Description + +Alongside key assets such as the logo and icons, other brand assets such as document templates, event stand designs, photos and website templates help anyone creating media for the standard to do so in a consistent and high-quality way. + +If there are partners who are particularly high-profile in their support of the initiative, their logos may form part of the brand's assets. + +#### Examples + +#### Prioritisation Factors + +* Where a professional image is important in order to create legitimacy + +#### Deprioritisation Factors + +* Where there is distrust of strong brands +* If the community has a DIY ethos + +#### Related Components + +[Brand Agreements](component-brand-agreements) +[Brand Guidance](component-brand-guidance) +[Logo](component-logo) +[Icons](component-icons) + +#### Related Patterns + --- ```eval_rst .. _component-brand-guidance: @@ -164,7 +234,7 @@ Case studies give real-world examples of when use of a standard has enabled a pa #### Related Components -Advocacy Plan +(Advocacy Tools)[component-advocacy-tools] #### Related Patterns @@ -177,11 +247,15 @@ Advocacy Plan #### Summary -Setting out steps to get media coverage of the standard. +Setting out steps to raise the profile of the standard. #### Description -A communications plan sets out the steps that are planned to encourage media coverage of the standard. Having a plan ensures that media opportunities are sought, and that representatives of the standard are well-equipped when taking advantage of opportunities. It can ensure that the standard is properly represented, setting expectations among potential users and beneficaries. +A communications plan sets out the steps that are planned to raise the profile of the standard, across various media and platforms. + +It can encompass social media, broadcast media and direct communications to important parties. + +Having a plan ensures that opportunities are sought, and that representatives of the standard are well-equipped when taking advantage of opportunities. It can ensure that the standard is properly represented, setting expectations among potential users and beneficiaries. #### Examples @@ -196,6 +270,47 @@ A communications plan sets out the steps that are planned to encourage media cov #### Related Patterns +--- +```eval_rst +.. _component-data-store: +``` + +### Data Store + +#### Summary + +A store of all the known data published using a standard, optionally aggregated, enhanced or processed. + +#### Description + +A data store brings together all of the data that is known to be published using a standard. Often, it will process the data in some way to make it easier to use, such as through aggregation, normalisation or enrichment. + +Aggregations allow complex data to be presented in forms that meet specific analysis needs, while normalisation helps to ensure that all the data is structurally similar (if the standard allows structural divergence), and enrichment can help potential users avoid duplication of effort or common pitfalls when trying to work with the data in practical ways. + +Access to the data store might be public, available on application, or restricted to key users. Consideration should be given to the commitment that a datastore might result in for a standard organisation, if it starts to become a central part of products using the standard. + +Data stores reduce the effort required for a developer, researcher or other interested party to start using data that's published to a standard, allowing them to more quickly start to experiment and understand what the data might be suitable for. + +#### Examples + +The [IATI Datastore](https://iatistandard.org/en/iati-tools-and-resources/iati-datastore/) offers public access to IATI data + +Access to the [360Giving Data store](https://www.threesixtygiving.org/data/360giving-datastore/) is available on application, as this allows the team to discuss individual needs and understand the demand for data. The data store includes enrichment with names for charities and location information, which is uncontroversial in its application, and has extensive edge cases that have taken considerable investment to address. + +[Kingfisher Summarize](https://kingfisher-summarize.readthedocs.io/en/latest/) carries out aggregation of data in the [OCDS Kingfisher](https://ocdskingfisher.readthedocs.io/en/latest/) data store, to prepare forms of the data that meet specific analysis needs, along with data quality information. + + +#### Prioritisation Factors + +* If there are lots of organisations publishing to a standard +* If published data proves to be hard to use for structural reasons + +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + --- ```eval_rst .. _component-demonstration-applications: @@ -209,7 +324,7 @@ Showcasing what can be done with data when it is published to a standard #### Description -Demonstration applications are either real-world or contrived applications using standardised data to illustrate what the data could be used for. They can be used to demonstrate the advantages of using the standard at all, or using particular parts of the stardard. +Demonstration applications are either real-world or contrived applications using standardised data to illustrate what the data could be used for. They can be used to demonstrate the advantages of using the standard at all, or using particular parts of the standard. #### Examples @@ -226,12 +341,43 @@ Demonstration applications are either real-world or contrived applications using #### Related Patterns +--- +```eval_rst +.. _component-technical-tools: +``` + +### Technical Tools + +#### Summary + +Tools that address specific technical challenges associated with working with the standard + +#### Description + +The process of working with a standard can involve tasks which are complex and technical; sometimes these require obtaining a deep understanding of some mechanism which is rarely required. These are often ideal candidates for creating tools for, so that people working with the standard can do their work more effectively. + +Often, these are command-line tools, as that's often the environment where these tasks are carried out. Command-line tools are often easier to create and develop, but the command line can present a considerable barrier to users who aren't familiar with it, and so web-based versions are often important as a standard matures. + +#### Examples + +[ocdskit](https://github.com/open-contracting/ocdskit) carries out a range of helpful functions for OCDS on the command line + +[OCDS Toucan](https://toucan.open-contracting.org/) presents a web interface for the most commonly-used functions from ocdskit and [flatten-tool](https://github.com/OpenDataServices/flatten-tool/) + +#### Prioritisation Factors + +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + --- ```eval_rst .. _component-discourse-forum: ``` -### Discourse Forum +### Forum #### Summary @@ -243,6 +389,8 @@ An online space for community discussion of the standard, adoption and data use. #### Examples +[Discourse](https://www.discourse.org/) is a common choice + #### Prioritisation Factors * When a standard has multiple stakeholders who don't meet regularly in person @@ -600,14 +748,22 @@ Public microblogging services such as Twitter, Tumblr and Instagram allow a stan #### Summary -A shop-window on the standard, setting in context of wider goals +A shop-window on the initiative, setting out wider goals that the initiative is seeking to achieve and introducing the technical aspects in context. #### Description -A standard's website brings together the various component of a standard, giving a single place where the standard can be explained to different audiences, and will act as a place that adopters go to in order to discover resources. +A standard's website brings together the various component of a standard, giving a single place where the standard can be explained to different audiences, and will act as a place that adopters go to in order to discover resources. + +Often, standards are created as part of a wider initiative, and so the website can set out the initiative's wider goals, and put the technical work in context. + +As a community grows, it may be appropriate to separate out technical resources into their own space, so that developers are able to focus on what's immediately important to them during implementation. #### Examples +[360Giving](https://threesixtygiving.org) +[Open Contracting](https://open-contracting.org) +[OpenActive](https://openactive.io) + #### Prioritisation Factors * If there are multiple resources relating to a standard @@ -620,6 +776,8 @@ A standard's website brings together the various component of a standard, giving #### Related Components +[Developer Guidelines](component-developer-guidelines) + #### Related Patterns --- @@ -667,7 +825,15 @@ Classifications used in the standard #### Description -Codelists are lists of terms that are provided as part of a standard in order to ensure that values of fields where there are a limited range of options are properly limited in the data, and that concepts map correctly between datasets. For example, a codelist might specify currency codes, to avoid US Dollars being referred to as "$" in one data set and "USD" in another. Codelists can be open or closed - open codelists allow values to be added, while closed codelists do not permit additions +Codelists are lists of terms that are provided as part of a standard in order to ensure that values of fields where there are a limited range of options are properly limited in the data, and that concepts map correctly between datasets. For example, a codelist might specify currency codes, to avoid US Dollars being referred to as "$" in one data set and "USD" in another. + +Codelists are traditionally lists of codes which are given specific meaning in a particular context - such as the ISO 3166 country code lists, where the code "UK" is given the meaning "United Kingdom". + +In data standards, the term has often been expanded to include lists where the codes and the terms are the same (e.g. “English” stands alone, rather than having a code “EN”). + +Codelists can be *open* or *closed* - open codelists allow values to be added, while closed codelists do not permit additions. + +Codelists can be either *internal* or *external* to a data standard. Internal codelists are supplied by, and governed alongside, the data standard. External codelists are supplied by, and governed by, a body that’s separate from the data standard. #### Examples @@ -710,7 +876,7 @@ Contributor guidelines set out the expectations of external contributions to the #### Related Components -Developer Guidelines +[Developer Guidelines](component-developer-guidelines) #### Related Patterns @@ -727,7 +893,7 @@ Describing the coding practices and workflows for contributing to the standard o #### Description -Developer guidelines set out the expectactions of external contributions to the standard or the tools that are provided to support adopters. They typically cover licensing, procedure for contributions to be reviewed, expectations around process, and technical expectations such as comments, naming conventions, tests and coding style. +Developer guidelines set out the expectations of external contributions to the standard or the tools that are provided to support adopters. They typically cover licensing, procedure for contributions to be reviewed, expectations around process, and technical expectations such as comments, naming conventions, tests and coding style. #### Examples @@ -743,6 +909,34 @@ Contributor Guidelines #### Related Patterns +--- +```eval_rst +.. _component-developer-tools: +``` + +### Software Libraries + +#### Summary + +Tools to help developers implement the standard in their own software + +#### Description + +Developers using a standard will often require tooling to support their work, and a standard can help by providing common components. Common examples include libraries to: +* validate against schema and assess data quality +* convert between formats +* download data from publishers + +#### Examples + +#### Prioritisation Factors + +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + --- ```eval_rst .. _component-contributors-agreement: @@ -855,6 +1049,37 @@ Many standards consider one of the best arguments for standardisation is being a #### Related Components +#### Related Patterns +--- +```eval_rst +.. _component-stakeholder-analysis: +``` + +### Stakeholder Analysis + +#### Summary + +An understanding of the stakeholders in the initiative + +#### Description + +Decisions around a standard will always require consideration of the impact on, and requirements of, the stakeholders in a standard. Ensuring that this is well-articulated and easily available to everyone making decisions helps to improve decisions. + +In particular, this helps to ensure that voices that aren't present during development are heard. + +#### Examples + +#### Prioritisation Factors + +* If the standard is operating in a complex environment +* If the standard is being developed away from where it's likely to have impact + +#### Deprioritisation Factors + +* If the standard is being developed by people who are part of a community that contains most, if not all, of the stakeholders in the standard. + +#### Related Components + #### Related Patterns --- @@ -1091,7 +1316,7 @@ The Implementation Plan Template provides an overview of the planning required f #### Summary -For the team maintaining and updating the standard. +Documentation for the team maintaining and updating the standard. #### Description @@ -1128,7 +1353,63 @@ Part of a standard is often schema, and reporting on technical validity against #### Prioritisation Factors -* If there +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + +--- +```eval_rst +.. _component-password-store: +``` + +### Password Store + +#### Summary + +Ensure that all relevant people have access to online services + +#### Description + +Standards development will often involve multiple online services, each of which will have its own password. Although individual user accounts should be strongly preferred wherever possible, there are circumstances where this isn't possible. A password store serves both as a record of the services that the standard uses, and grants access to those services to people who require it. It should be encrypted, and passwords changed regularly - and always when someone's access to the store is revoked. + +Some online services support team permissions, giving individual users access to the passwords that they require. In the absence of this, segmenting large password stores by function limits risk. + +#### Examples + +#### Prioritisation Factors + +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + +--- +```eval_rst +.. _component-quality-framework: +``` + +### Quality Framework + +#### Summary + +A framework to describe what constitutes quality in the context of a particular standard. + +#### Description + +What constitutes "quality" in data is, to a degree, subjective. + +Certain factors (for example, whether or not specific fields are provided) are important for some applications, but not other; therefore, their absence isn't necessarily an indicator of poor overall quality, but rather poor suitability for a particular purpose. + +Other factors are more universally applicable - such as dates being realistic, descriptions and titles being of appropriate length. + +A data quality framework allows all participants in an open data ecosystem to understand the quality of data in a more nuanced and targeted way. + +#### Examples + +#### Prioritisation Factors #### Deprioritisation Factors @@ -1145,12 +1426,45 @@ Part of a standard is often schema, and reporting on technical validity against #### Summary -Providing feedback on the content of datasets, based on a set of data quality rules. +Providing feedback on the quality of datasets, based on a set of rules or a framework. + +#### Description + +In many data ecosystems, data is published automatically from a system. When the system is being developed, a quality tool allows developers and other stakeholders in that system to understand the data that is being created in the context of the data ecosystem, so that they can ensure that it is of appropriate quality for the uses that they are intending it to be put to. + +Open data standards advocates can use quality tools to demonstrate where existing data excels, and where improvements can be made. + +#### Examples + +#### Prioritisation Factors + +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + +--- +```eval_rst +.. _component-quality-monitoring-tool: +``` + +### Quality Monitoring Tool + +#### Summary + +Providing continuous feedback on the quality of datasets, based on a set of rules or a framework. #### Description +In many data ecosystems, data is published automatically from systems, and quality can change over time due to changes in those systems. For example, a data aggregator might start to include new sources, or existing sources might change in ways that aren't compatible with the publication. Or, simple tweaks made over time might improve the data as underlying systems improve. + +Continuous quality monitoring allows open data advocates to detect changes in quality and investigate them, as well as helping create a sense of community and transparency around the data. + #### Examples +[OCDS Pelican](https://www.open-contracting.org/2020/01/28/meet-pelican-our-new-tool-for-assessing-the-quality-of-open-contracting-data/) + #### Prioritisation Factors #### Deprioritisation Factors @@ -1172,6 +1486,8 @@ A list of fields that publishers are encouraged (but not required) to provide #### Description +When modeling a concept, there is always a balance to be found between requiring so many fields as to make publication unduly arduous, and requiring so few fields as to make the data difficult to use. Recommended Fields allow a standard to highlight fields which are helpful to publish, but that may not exist in all systems, or be relevant in all contexts. + #### Examples #### Prioritisation Factors @@ -1180,6 +1496,8 @@ A list of fields that publishers are encouraged (but not required) to provide #### Related Components +[Extensions Mechanism](component-extensions-mechanism) + #### Related Patterns --- @@ -1310,6 +1628,8 @@ Machine and human-readable rules used to check data quality. #### Description +Building on the data quality framework, Additional Checks can be implemented by both tools created by the initiative and publishers/users of the standard developing their own tooling. + #### Examples #### Prioritisation Factors @@ -1318,6 +1638,8 @@ Machine and human-readable rules used to check data quality. #### Related Components +[Data Quality Framework](component-quality-framework) + #### Related Patterns --- @@ -1394,7 +1716,7 @@ Offering a public archive of meeting minutes, reports, presentations and other r .. _component-slack: ``` -### Slack +### Instant Messaging #### Summary @@ -1404,6 +1726,8 @@ For chat-type conversations with the community about development, adoption and u #### Examples +Slack, Discord and IRC are all popular choices. + #### Prioritisation Factors #### Deprioritisation Factors @@ -1448,6 +1772,31 @@ An editorialised template that can be filled in to provide data that meets the s #### Description +#### Examples + +#### Prioritisation Factors + +#### Deprioritisation Factors + +#### Related Components + +#### Related Patterns + +--- +```eval_rst +.. _component-sustainability-plan: +``` + +### Sustainability Plan + +#### Summary + +A plan to ensure that the standard can continue to be maintained as circumstances change + +#### Description + + + #### Examples #### Prioritisation Factors @@ -1628,7 +1977,7 @@ A mechanism for having optional new codelists, schema and documentation added to #### Summary -A mechanism for having optional new codelists, schema and documentation added to the standard. +Regular or one-off events where the community can learn more about how to use the standard #### Description diff --git a/docs/primers/accuracy_vs_precision.md b/docs/primers/accuracy_vs_precision.md new file mode 100644 index 0000000..e94d953 --- /dev/null +++ b/docs/primers/accuracy_vs_precision.md @@ -0,0 +1,15 @@ +Accuracy vs Precision +===================== + +When producing data, key considerations include accuracy and precision. + +Accuracy is how close to the truth a given value is. + +Precision is the level of detail included in the value. + +For example, a boxer might step on the scales and record a value of 87kg. If the boxer’s true weight is 87.43kg, then the scales are accurate to +/- 1kg, and have a degree of precision of 1kg. If the scales were to record a weight of 98.7285kg, they would have a high degree of precision, but a low degree of accuracy. + +For data standards, this concept can be applied to other concepts as well. For example, if an event is described as “Usually happening every Monday at 9am” then its degree of accuracy is relatively high (because the potential for it to not happen is described by the word “usually”), but its precision is relatively low (because it doesn’t tell us under what conditions it might not be happening). Conversely, an event that is described as happening on Monday 3rd May 2021 at 9am is very precise, but may not be accurate if the event doesn’t actually happen on bank holidays. + +Conveying the level of precision in a data standard (as part of the design, or the metadata) can be important for ensuring that its accuracy is understood by data users. Typically, the more precise data needs to be, the higher the costs involved in creating it accurately. + diff --git a/docs/primers/aggregators_data_stores.md b/docs/primers/aggregators_data_stores.md new file mode 100644 index 0000000..c732eaf --- /dev/null +++ b/docs/primers/aggregators_data_stores.md @@ -0,0 +1,19 @@ +Aggregators & Data Stores +========================= + + +Aggregators are online tools which bring together multiple sources of data, and present them to users as one complete feed. This means that someone wanting to use the data only has to connect to one source, rather than multiple, reducing the complexity of their system. + +Aggregators will often keep a copy of the data that they’ve downloaded, so that if the original source encounters an outage, the data from that source is still available for users of the aggregator. + +Aggregators might collect all of the data available in a domain, or only some (such as that relating to a particular audience or region). + +Aggregators will sometimes carry out a degree of processing of the data. This might include: +De-duplication; identifying and removing data items that have been provided by multiple data sources. This can happen if an aggregator consumes data from other aggregators - such as one that covers a particular sport, and another that covers a particular region. +Normalization; converting data that is in multiple formats to the same format. This can happen if there isn’t a standard, or if a standard is quite loose. By normalizing in the aggregator, individual data users can receive more consistent data. +Filtering; removing data that isn’t relevant. For example, an aggregator might remove data that’s outdated or too far in the future to be relevant, or might only include data that meets certain criteria - such as a certain baseline of data quality, or use of particular fields + +Aggregators may be part of, or used to provide the first part of the pipeline for, data stores. + +Data stores download all of the data that’s available, and then store it in a way that’s useful for querying; this often involves considerable processing and the creation of policies to handle retention and deletion of data. Data stores can be used to understand the data at a point in time, to generate statistics about the data, and to observe how the data has changed over time (e.g. number of activities, which fields are used, how the data quality has changed). + diff --git a/docs/primers/customisations.md b/docs/primers/customisations.md new file mode 100644 index 0000000..262a2b2 --- /dev/null +++ b/docs/primers/customisations.md @@ -0,0 +1,38 @@ +Customisations +============== + +An open data standard comprises an agreed-upon "common ground" around a particular subject or domain. This is a nuanced balance to strike - too little common ground, and the standard doesn't actually shape the data sufficiently to be used; too much and the standard is overly burdensome to use, or inappropriate for some potential users. + +In practical terms, this will affect which fields are required and which are optional, what constraints are placed on the contents of fields (such as length, conformance to a particular format, or reference to an external data source), and how fields are used together. If too few fields are required, then publishers of data may not actually provide the information that users need. + +A standard with too little common ground defined may also model concepts that are too abstract for their intended use case. This results in implementers having to create ways to use the standard in their own contexts, without them necessarily doing so in the same way. For example, a data standard that models lectures might not enforce using the provided way to model a course of lectures (because lectures can be standalone) - so users of that data then find that each publisher describes a course of lectures in a different way + +The decisions that are made around modelling are a product of the immediate and future needs of the users of the standard - an elegant technical solution may be unworkable in practice, while a solution that's easy to publish is likely to be hard to use. + +In the communities around standards, it's common to find that there are members who are more aligned with each other than others. If they work in the same sub-sector of an industry or just conceive of the domain in the same way, then it's likely that they will be able to share more information with each other, and share that information in a more aligned way. Giving these sub-communities a way to do this in a structured way, that results in useful data for all users of the standard, is something that standards approach in different ways. + +## Extensions + +The most formal way of making a standard customisable is to allow the creation of extensions. These are a set of technical constraints (usually schema) which can: +* Add fields +* Add additional constraints to existing fields +* Make optional fields compulsary +* Combine new and existing fields and constraints into new models, such as a more specific instance of an abstract concept. + +How these extensions are governed varies, but it can include: +* "Official" extensions which are part of the standard, but only applicable in certain circumstances +* A way for a community to publish and maintain extensions, which might only be applicable to that community +* As a matter of good practice, individual publishers describing the modifications that they've made to the standard, or extra data that they've provided + +Typically, extensions aren't allowed to remove fields or constraints, as this would undermine the "common ground" that can usually be assumed around a standard. + +A standards initiative might create a list of known extensions and recommend their use, so that future publishers can align with existing ones when modelling the same concepts. + +## Profiles + +Less formal than extensions, profiles are a collection of artefacts (potentially including schema, documentation, case studies and guidance) that describe how a standard can be put to use in a particular way. + +Profiles allow a group of users of a standard to describe the ways that they've resolved ambiguity or used flexibility in a standard, with the aim that others like them will follow the same approach. + + + diff --git a/docs/primers/four_types_of_documentation.md b/docs/primers/four_types_of_documentation.md new file mode 100644 index 0000000..5b2fbf7 --- /dev/null +++ b/docs/primers/four_types_of_documentation.md @@ -0,0 +1,6 @@ +The Four Types of Documentation +=============================== + +![The Four Types of Documentation](four_types_of_documentation.png) + +The “four types” model helps to describe the different needs that people bring to documentation at different times. For most projects, all four types of documentation are required. \ No newline at end of file diff --git a/docs/primers/four_types_of_documentation.png b/docs/primers/four_types_of_documentation.png new file mode 100644 index 0000000..e28d2c4 Binary files /dev/null and b/docs/primers/four_types_of_documentation.png differ diff --git a/docs/primers/linked_data_semantic_markup.md b/docs/primers/linked_data_semantic_markup.md new file mode 100644 index 0000000..ba0b334 --- /dev/null +++ b/docs/primers/linked_data_semantic_markup.md @@ -0,0 +1,14 @@ +Linked Data & Semantic Markup +============================= + +Linked Data has evolved from the domain of knowledge management, which studies how things that are known can be organised, discovered and relationships described. + +Semantic markup is the practice of including machine-readable information in web pages alongside the human-readable parts, so that the information can be used in linked data applications as well as by people. + +Since the mid-2000s, this “semantic web” approach has been advocated for by many leaders in web technology, most notably Sir Tim Berners-Lee. + +Semantic markup allows for high levels of automation and machine reasoning - computer systems can act in smart ways with the data that they consume. If a website presents a table of opening hours, a computer doesn’t “know” what it means - it’s just a table with some text in. With semantic markup, a computer can “know” that the text is a series of times, that those times represent when the physical place referred to by the website is open, and therefore can decide to present a warning to someone using a mapping application that the place that they’re planning a route to might be closed. + +Linked Data approaches are most commonly found in contexts that are close to knowledge management, such as academia, museums, libraries, search engines and certain AI / Machine Learning businesses. Although the technologies are well-developed, linked data approaches are relatively rare outside of these contexts, and so developers approaching Linked Data projects for the first time often have a steep learning curve. + +Schema.org is a public project (W3C-held, search-engine funded) that provides the most widely-used classification framework for the semantic web. diff --git a/docs/primers/pace_layering.md b/docs/primers/pace_layering.md new file mode 100644 index 0000000..2cdcfc6 --- /dev/null +++ b/docs/primers/pace_layering.md @@ -0,0 +1,21 @@ +Pace Layering +============= + +![Pace Layering](pace_layering.png) + +Pace layering describes how different components of a system change at different rates, and how these layers interact. Layers closer to the centre move more slowly, and provide a stabilising effect. Layers on the outside change more quickly, responding to change in the environment almost immediately. + +Layers further out are also: +Easier to describe - you can demonstrate that “red clothes are popular in my city this season” much more readily than any statement that is universally true of nature. +More applicable to immediate circumstances - warm coats are in fashion in some countries for a few weeks or months when the weather is colder, and then out of fashion again as the weather warms up +Where innovation and experimentation are easier - a new technique, a new textile, a new machine can be tried out without changing government, or culture. +Small drivers for change of lower layers - with decreasing influence the lower down the stack you go +Stabilised by lower layers - the bounds of what can be in fashion are set by by the lower layers + +A data standard and its tooling can be positioned using pace layering, and this positioning allows us to understand the expected properties of the standard, as well as what else is required around it in order for it to be impactful. + +Typically, a data standard can be positioned in pace layers by the concept that it’s modelling - if the concept changes rapidly, then it should be further up. The further up the layers the standard is, the more it will benefit from the use of standards that model concepts further down in order to help to stabilise it. Conversely, standards that are lower down will often need to be adapted or put to use by models further up in order to be meaningful. + +For example, ISO 8601 is a standard that describes how to model dates using the Gregorian calendar. Calendars typically change over multi-century timescales, so it’s clearly low down on the layers. A date always needs context in order to mean anything - and so ISO 8601 is usually used by other standards to describe when a particular thing happens. + +(Image reproduced from https://blog.longnow.org/02015/01/27/stewart-brand-pace-layers-thinking-at-the-interval/ under license CC BY-SA 3.0) \ No newline at end of file diff --git a/docs/primers/pace_layering.png b/docs/primers/pace_layering.png new file mode 100644 index 0000000..69cb3c1 Binary files /dev/null and b/docs/primers/pace_layering.png differ diff --git a/docs/primers/software_lifecycle.md b/docs/primers/software_lifecycle.md new file mode 100644 index 0000000..80bfbad --- /dev/null +++ b/docs/primers/software_lifecycle.md @@ -0,0 +1,15 @@ +Software Lifecycle +================== + +There is no single, universally agreed set of terms for describing how well-developed a piece of software is. Some organisations such as GDS have defined a lifecycle, using common terms, while many are content to use terms quite loosely. + +There are two common sets of terminology, which are used interchangeably at times. + +Alpha / Beta / Release Candidate / Release or Live + +This set of terminology is normally used where there’s a well-defined product that’s being built - all experimentation and discovery is focussed on the details, rather than the fundamentals. The Alpha stage will normally be where the high-level architecture is defined and tested, and any particularly challenging technical problems will be identified and solutions tried. Beta will be when the software is largely ready to use, and it’s tried out by its intended users to identify if there are any parts that don’t work or are confusing. Release Candidate stage code should be ready to go, but in recognition that there are often last-minute problems that crop up, software often undergoes multiple RC rounds before being released. Any feature requests identified at beta stage or later are put aside for the next round of development, while bugs identified at RC stage may be dealt with straight away or left for later, depending on severity. + +Discovery / Prototype / MVP / Iteration + +This set of terminology is usually used when there’s a well-defined problem to be solved, but multiple solutions might be acceptable. Discovery is when initial understanding of the problem is developed, and Prototypes are developed to try out specific ideas, to see if potential solutions might work. Using the learning from prototypes, an MVP can be developed to validate the solution further, which can then be iterated on to improve it (usually, using further discovery and prototyping to understand potential improvements). + diff --git a/docs/primers/tooling_in_open_data_ecosystems.md b/docs/primers/tooling_in_open_data_ecosystems.md new file mode 100644 index 0000000..4b0ad18 --- /dev/null +++ b/docs/primers/tooling_in_open_data_ecosystems.md @@ -0,0 +1,20 @@ +Tooling in Open Data Ecosystems +=============================== + +Open data ecosystems usually develop a range of tools to help to advocate for more or better publication of data, to help promote use of the data, and to ease the technical processes of working with the data. + +Advocacy tools exist primarily to help convince someone of something. In open data, that’s often to help demonstrate to potential publishers why they should publish, to existing publishers why they should improve their publication, or to potential users of the data to demonstrate what the data might be able to do for them. + +Examples include 360Giving GrantNav, which as well as being a useful tool in its own right is a potent advocacy tool for publishers - they’re proud to see their data appear the next day in a well-known and well-respected tool, and the absence of an organisation from the list can be a source of mild embarrassment. + +Demonstration tools exist to stimulate innovation and to encourage people to think about what might be possible with the data. They can either offer incentives for interesting use of real existing data (which also then leads to valuable insight about the challenges of working with the real data), or by using fictitious data, as a way of demonstrating what would be possible if the data existed, or was of a certain quality. There’s some overlap with advocacy tools, although they’re usually less directly targeted to particular groups of (potential) users. + +Examples include “proof of concept” tools, data use challenges, hack-and-learn events. + +Technical tools exist to reduce the costs that anyone wanting to work with the data might incur; this is particularly valuable where a whole sector can work in the same way, or where new users can be helped to very quickly understand the possibilities of the data. + +Examples include validators, aggregators, converters and visualisation tools. + +Data Infrastructure is the tools and services that are required in order for an ecosystem to continue to operate. There’s often a lot of overlap with technical tools - an instance of a technical tool that’s run for the continued benefit of the community is often part of the data infrastructure. + +Examples include registries, online conversion/validation tools and datastores.