Re-evaluate implications of augmenting the context for JSON-LD extensibility (JSON-only processors get confused)

From https://github.com/swicg/activitypub-trust-and-safety/issues/98 a tangential issue was raised regarding context augmentation and whether contexts can/should be "trusted".

In AS2-Core we have the following sections:

- 2.1 "JSON-LD" https://www.w3.org/TR/activitystreams-core/#jsonld
- 5 "Extensibility" https://www.w3.org/TR/activitystreams-core/#extensibility

Support for "extension" properties using JSON-LD is somewhat poor in the current fedi ecosystem, and many publishers and consumers hardcode certain terms with the expectation that they expand to certain IRIs or concepts, but not actually verifying that they do.

When compact terms are not expanded to full IRIs, there is a potential for semantic confusion. AS2 hardcodes `actor` to mean `https://www.w3.org/ns/activitystreams#actor`, i.e. "who performed an activity", but in other formats it could expand to `http://schema.org/actor` which is defined as "who performed a role in a movie or creative work".

## Potential approaches

We have a few options for dealing with this:

### Do not augment the context

The most straightforward thing to do is to say that AS2 documents SHOULD NOT augment the context, and should use partially-compacted JSON-LD for any "extension" properties. This makes the representation of AS2 documents unambiguous.

```json
{
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "type": "http://schema.org/Movie",
    "http://schema.org/name": "Ghostbusters",
    "http://schema.org/actor": {
      "type": "http://schema.org/Person",
      "http://schema.org/name": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}
```

A consumer doesn't have to be aware of JSON-LD context or IRI expansion here; they can just look for the full IRI.

### Preload trusted contexts out-of-band

The newer JSON-LD adjacent work (VC, DID, CID) have language to the following effect:

> Implementations that perform JSON-LD processing MUST treat the following JSON-LD context URL as already resolved, where the resolved document matches the corresponding hash value below

...and then describe context injection for JSON-LD consumers, with the semantics coming from the IANA media type.

We have the `application/activity+json` IANA media type which provides the semantics nominally described by the normative activitystreams context currently hosted at https://www.w3.org/ns/activitystreams -- although, see #416 for another issue with that. We also have `application/ld+json; profile="https://www.w3.org/ns/activitystreams` as a slightly-different-but-ostensibly-equivalent media type. There are a few missing features here like #659 and #661, but those are also separate issues. For now, we just need to know that a document with the AS2 media type is going to have AS2 semantics.

What we need beyond that is for publishers and consumers to negotiate additional context(s) out of band. Right now, it is popular for publishers to inline term definitions and for consumers to either ignore it or perform naive checks for it. Consumers need more guidance on how to deal with semantics outside of AS2.

Our earlier example could be serialized like this...

```json
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "@context": "https://schema.org",
    "type": "Movie",
    "name": "Ghostbusters",
    "actor": {
      "type": "Person",
      "name": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}
```

...in which case the consumer needs to recognize that the inner properties of the `"object"` are NOT AS2 terms. This can be more complicated, as the consumer needs to track whatever the current loaded context is for the current node; for example, the `object` more appropriately has `"@context": ["https://www.w3.org/ns/activitystreams", "https://schema.org"]` in an equivalent JSON-LD processor, because context is additive throughout the document[^1]; the later context overrides any earlier term definitions[^2], so `"object"."actor"` here means the s.o Movie actor.

[^1]: It maybe should not be, but that is an issue for the JSON-LD WG to consider... right now, the "solution" is to declare `"@context": [null, "https://schema.org"]`.

[^2]: ...which would also be a problem in JSON-LD if the activitystreams context was defined to be `@protected`, but it currently is not; if it was, it would break the earlier solution unless the activitystreams context also used scoped contexts or context propagation which could be disruptive for publishers who currently only use a top-level `@context` applied to the whole document. 

Beyond recognizing which contexts are active at which level, consumers also need to know what context identifiers represent. We can generally expect AS2 consumers to understand the semantics described in `https://www.w3.org/ns/activitystreams` but not everyone will know what `https://schema.org` means, or that it is equivalent to `http://schema.org`, or fully understand what each and every term means (even if they can tell when those terms should be applied).

### Maintain a central registry of terms

If JSON-LD is not used, then terms still need to be defined. Without JSON-LD, these terms would be defined centrally instead of decentrally. An "activitystreams-extensions" context could be injected just like the normative activitystreams context, or the normative activitystreams context could be updated. I don't think this should necessarily be done, though -- at least, not for the W3C Activity Streams 2.0 recommendation. If peers in a network want to agree to use a specific augmented context, then they can, but this shouldn't automatically apply to everyone who uses AS2.

### Dynamically load remote contexts at runtime over HTTP

This generally SHOULD NOT be done. The ideal way to handle contexts is to assign an identifier to an immutable context, then distribute knowledge of what that identifier means (in-band or out-of-band). With in-band remote contexts, you have to be careful about remote contexts *changing*, because changing the context document also changes the meaning of any document referring to that remote context.

Imagine an HTTP resource whose text/plain content is simply "I like cats". If I link to that resource and say "I agree!", then the original resource is updated to say "I hate cats", then I will be in trouble with my local cat fan-club.

### Embed any additional context in the document

We can say for security reasons, remote contexts are discouraged unless they are frozen and preloaded out-of-band. Alternatively or additionally, we can recommend embedding a `@context` object using "extension" terms, if we expect consumers to understand them and we also expect consumers to not dynamically load or out-of-band preload additional contexts.

In our earlier example, we can minimally define something like this:

```json
{
  "@context": [
    {
      "schema": "http://schema.org/"
    },
    "https://www.w3.org/ns/activitystreams"
  ],
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "type": "schema:Movie",
    "schema:name": "Ghostbusters",
    "schema:actor": {
      "type": "schema:Person",
      "schema:name": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}
```

...and this works as long as you understand expanding terms by their prefix / "compact IRI". But you can also do something like this:

```json
{
  "@context": [
    {
      "schema": "http://schema.org/",
      "Movie": "schema:Movie",
      "movieTitle": "schema:name",
      "actorsInMovie": {
        "@id": "schema:actor",
        "@type": "@id",
        "@context": {
          "nameOfActor": "schema:name",
          "schemaPerson": "schema:Person"
        }
      }
    },
    "https://www.w3.org/ns/activitystreams"
  ],
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "type": "Movie",
    "movieTitle": "Ghostbusters",
    "actorsInMovie": {
      "type": "schemaPerson",
      "nameOfActor": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}
```

...and get arbitrarily complex with it. So it might be worth considering recommending keeping it simple:

- Supporting prefixes is relatively easy.
- Supporting terms in `@context` is slightly more complicated, because you have to look for the `@id` and then potentially expand the `@id` according to a prefix.
- Supporting nested or scoped contexts seems like it would get pretty complicated pretty fast, and at this point it's worth using an existing JSON-LD processor.

## Summary

Which brings us back to "don't augment the context" and "preload well-known contexts out of band" as the two options worth considering. If JSON-LD `@context` processing is sufficiently confusing to naive JSON-only processors, then we may want to place some limits on what those naive processors are expected to understand.

When you preload well-known contexts out-of-band, this typically involves either grabbing a context document and loading it into your JSON-LD processor, or if you don't have a JSON-LD processor, then reading the specification document and loading it into your human brain or your codebase's hardcoded processor.

When you don't augment the context at all, then people can use full IRIs, which can be verbose but is at least unambiguous.

### Recommendations

- Primer page, probably?
- Something to discuss for Next Version.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluate implications of augmenting the context for JSON-LD extensibility (JSON-only processors get confused) #663

Potential approaches

Do not augment the context

Preload trusted contexts out-of-band

Maintain a central registry of terms

Dynamically load remote contexts at runtime over HTTP

Embed any additional context in the document

Summary

Recommendations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Re-evaluate implications of augmenting the context for JSON-LD extensibility (JSON-only processors get confused) #663

Description

Potential approaches

Do not augment the context

Preload trusted contexts out-of-band

Maintain a central registry of terms

Dynamically load remote contexts at runtime over HTTP

Embed any additional context in the document

Summary

Recommendations

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions