Skip to content

Re-evaluate implications of augmenting the context for JSON-LD extensibility (JSON-only processors get confused)Β #663

@trwnh

Description

@trwnh

From swicg/activitypub-trust-and-safety#98 a tangential issue was raised regarding context augmentation and whether contexts can/should be "trusted".

In AS2-Core we have the following sections:

Support for "extension" properties using JSON-LD is somewhat poor in the current fedi ecosystem, and many publishers and consumers hardcode certain terms with the expectation that they expand to certain IRIs or concepts, but not actually verifying that they do.

When compact terms are not expanded to full IRIs, there is a potential for semantic confusion. AS2 hardcodes actor to mean https://www.w3.org/ns/activitystreams#actor, i.e. "who performed an activity", but in other formats it could expand to http://schema.org/actor which is defined as "who performed a role in a movie or creative work".

Potential approaches

We have a few options for dealing with this:

Do not augment the context

The most straightforward thing to do is to say that AS2 documents SHOULD NOT augment the context, and should use partially-compacted JSON-LD for any "extension" properties. This makes the representation of AS2 documents unambiguous.

{
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "type": "http://schema.org/Movie",
    "http://schema.org/name": "Ghostbusters",
    "http://schema.org/actor": {
      "type": "http://schema.org/Person",
      "http://schema.org/name": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}

A consumer doesn't have to be aware of JSON-LD context or IRI expansion here; they can just look for the full IRI.

Preload trusted contexts out-of-band

The newer JSON-LD adjacent work (VC, DID, CID) have language to the following effect:

Implementations that perform JSON-LD processing MUST treat the following JSON-LD context URL as already resolved, where the resolved document matches the corresponding hash value below

...and then describe context injection for JSON-LD consumers, with the semantics coming from the IANA media type.

We have the application/activity+json IANA media type which provides the semantics nominally described by the normative activitystreams context currently hosted at https://www.w3.org/ns/activitystreams -- although, see #416 for another issue with that. We also have application/ld+json; profile="https://www.w3.org/ns/activitystreams as a slightly-different-but-ostensibly-equivalent media type. There are a few missing features here like #659 and #661, but those are also separate issues. For now, we just need to know that a document with the AS2 media type is going to have AS2 semantics.

What we need beyond that is for publishers and consumers to negotiate additional context(s) out of band. Right now, it is popular for publishers to inline term definitions and for consumers to either ignore it or perform naive checks for it. Consumers need more guidance on how to deal with semantics outside of AS2.

Our earlier example could be serialized like this...

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "@context": "https://schema.org",
    "type": "Movie",
    "name": "Ghostbusters",
    "actor": {
      "type": "Person",
      "name": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}

...in which case the consumer needs to recognize that the inner properties of the "object" are NOT AS2 terms. This can be more complicated, as the consumer needs to track whatever the current loaded context is for the current node; for example, the object more appropriately has "@context": ["https://www.w3.org/ns/activitystreams", "https://schema.org"] in an equivalent JSON-LD processor, because context is additive throughout the document1; the later context overrides any earlier term definitions2, so "object"."actor" here means the s.o Movie actor.

Beyond recognizing which contexts are active at which level, consumers also need to know what context identifiers represent. We can generally expect AS2 consumers to understand the semantics described in https://www.w3.org/ns/activitystreams but not everyone will know what https://schema.org means, or that it is equivalent to http://schema.org, or fully understand what each and every term means (even if they can tell when those terms should be applied).

Maintain a central registry of terms

If JSON-LD is not used, then terms still need to be defined. Without JSON-LD, these terms would be defined centrally instead of decentrally. An "activitystreams-extensions" context could be injected just like the normative activitystreams context, or the normative activitystreams context could be updated. I don't think this should necessarily be done, though -- at least, not for the W3C Activity Streams 2.0 recommendation. If peers in a network want to agree to use a specific augmented context, then they can, but this shouldn't automatically apply to everyone who uses AS2.

Dynamically load remote contexts at runtime over HTTP

This generally SHOULD NOT be done. The ideal way to handle contexts is to assign an identifier to an immutable context, then distribute knowledge of what that identifier means (in-band or out-of-band). With in-band remote contexts, you have to be careful about remote contexts changing, because changing the context document also changes the meaning of any document referring to that remote context.

Imagine an HTTP resource whose text/plain content is simply "I like cats". If I link to that resource and say "I agree!", then the original resource is updated to say "I hate cats", then I will be in trouble with my local cat fan-club.

Embed any additional context in the document

We can say for security reasons, remote contexts are discouraged unless they are frozen and preloaded out-of-band. Alternatively or additionally, we can recommend embedding a @context object using "extension" terms, if we expect consumers to understand them and we also expect consumers to not dynamically load or out-of-band preload additional contexts.

In our earlier example, we can minimally define something like this:

{
  "@context": [
    {
      "schema": "http://schema.org/"
    },
    "https://www.w3.org/ns/activitystreams"
  ],
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "type": "schema:Movie",
    "schema:name": "Ghostbusters",
    "schema:actor": {
      "type": "schema:Person",
      "schema:name": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}

...and this works as long as you understand expanding terms by their prefix / "compact IRI". But you can also do something like this:

{
  "@context": [
    {
      "schema": "http://schema.org/",
      "Movie": "schema:Movie",
      "movieTitle": "schema:name",
      "actorsInMovie": {
        "@id": "schema:actor",
        "@type": "@id",
        "@context": {
          "nameOfActor": "schema:name",
          "schemaPerson": "schema:Person"
        }
      }
    },
    "https://www.w3.org/ns/activitystreams"
  ],
  "actor": "https://alice.example/",
  "type": "View",
  "object": {
    "type": "Movie",
    "movieTitle": "Ghostbusters",
    "actorsInMovie": {
      "type": "schemaPerson",
      "nameOfActor": "Bill Murray"
    }
  },
  "summary": "Alice watched Ghostbusters, starring Bill Murray."
}

...and get arbitrarily complex with it. So it might be worth considering recommending keeping it simple:

  • Supporting prefixes is relatively easy.
  • Supporting terms in @context is slightly more complicated, because you have to look for the @id and then potentially expand the @id according to a prefix.
  • Supporting nested or scoped contexts seems like it would get pretty complicated pretty fast, and at this point it's worth using an existing JSON-LD processor.

Summary

Which brings us back to "don't augment the context" and "preload well-known contexts out of band" as the two options worth considering. If JSON-LD @context processing is sufficiently confusing to naive JSON-only processors, then we may want to place some limits on what those naive processors are expected to understand.

When you preload well-known contexts out-of-band, this typically involves either grabbing a context document and loading it into your JSON-LD processor, or if you don't have a JSON-LD processor, then reading the specification document and loading it into your human brain or your codebase's hardcoded processor.

When you don't augment the context at all, then people can use full IRIs, which can be verbose but is at least unambiguous.

Recommendations

  • Primer page, probably?
  • Something to discuss for Next Version.

Footnotes

  1. It maybe should not be, but that is an issue for the JSON-LD WG to consider... right now, the "solution" is to declare "@context": [null, "https://schema.org"]. ↩

  2. ...which would also be a problem in JSON-LD if the activitystreams context was defined to be @protected, but it currently is not; if it was, it would break the earlier solution unless the activitystreams context also used scoped contexts or context propagation which could be disruptive for publishers who currently only use a top-level @context applied to the whole document. ↩

Metadata

Metadata

Assignees

No one assigned

    Labels

    Needs primer pageNeed to add a page at https://www.w3.org/wiki/Activity_Streams/Primer on this topicNext versionThings that should probably be resolved in a next version of AS2

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions