request to add typed text/csv to registry, in parity with major databases & languages

# Rationale

A variety of languages support CSV as a first-class citizen for objects - Swift and PowerShell come immediately to mind, but many (perhaps most) languages support typed object serializer/deserializer for CSV through 3rd party libraries or simple map operations as well - Go, JavaScript w/ JSDoc, Python, etc.

Since CSV is much simpler, smaller, and less server intensive that JSON - and even many binary formats - for representing many kinds of typed objects, it's often _the_ ideal choice for easy-to-cache, easy-to-update data.

Creating complex permutations in a database which explodes the size of data that needs to be loaded in RAM, then transmitted to the application, and then across the Internet just to satisfy a frontend rendering contract is a little crazy when you stop and think about it - it's trivial to iterate over and create in-memory graphs client-side (whether Web or iOS, etc).

If you use CSV, however, you can write very simple queries that satisfy business logic, often at 1/5 or 1/10 the size _before_ gzip - which reduces stress across the entire pipeline.

Not only that, but when API endpoints are exposed as CSV they can be integrated very easily with Excel and Google Sheets, which drastically increases user productivity and drives down development cost and time, depending on the data.

# Typical CSV Hydration

In general, there are 3 ways in which CSV hydration is performed, which parallel how Postgres, MariaDB, etc store and hydrate data:

## 1. Standard RFC

```csv
id,name,dob,ssn
1,John Smith,4/1/1969,12-34-5678
2,Jane Doe,5/5/1971,10-20-3456
```

- fields separated by commas - or an alternate character, such as TAB or US (ASCII Unit Separator)
- records separated by newlines - or an alternate character, such as RS (ASCII Record Separator)
- values containing quotes, field, or record separators are quoted \
   (where a double double quote `""` escapes to a single quote)
   (it's less common for the quote character to be replaced, but sometimes `` ` `` is used)
- empty lines are ignored, or treated as comments
- lines beginning with a prescribed comment character (usually `#`) are ignored, or treated as comments
- most parsers do NOT allow sequences of characters (i.e. `//`) as a separator or comment, except for `\r\n`
- **query parameters** may be used to choose **delimiters**

In this configuration fields map exactly to a typical JSON object - except that:
- strings, booleans, numbers, and nulls require extra information to distinguish (which OpenAPI already provides)
- (optional) the CSV header format may different than the object key format
  - use index rather than name
  - map name to a different form of the name

Just for the purpose of illustration, here's a schema with `csv_field_name` and `csv_field_index` (CSV is 1-indexed):

```yaml
components:
  schemas:
    Thingy:
      type: object
      properties:
        id: { type: integer, format: uint32, csv_field_name: "I.D." }
        slug: { type: string, csv_field_index: 2 }
        name: { type: string, csv_field_name: "Name" }
        is_unique: { type: boolean, csv_field_name: "Is Unique" }
        created_at: { type: string, format: date-time }
        updated_at: { type: string, format: date-time }
```

I realize that's probably not exactly where this belongs (probably with something related to the text/csv registration), but it's just to show the idea.

note: **query parameters** may be used to choose **delimiters**

## 2. RFC, but with Lists

```csv
id,name,friends,ssn
1,John Smith,"2,3",12-34-5678
2,Jane Doe,"1,3",10-20-3456
3,Bob McMaster,"1,2",44-33-5555
```

Another common format is simply to embed lists - so perhaps using `\t` as the record separator, but using a space ` ` as a subrecord separator. \
(this is also how OIDC `scopes` and many other simple list types uses spaces to create a list in query params)

This is similar to Array column types in a database.

For example, that might be represented something like this with a `list_delimiter`:

```yaml
components:
  schemas:
    Thingy:
      type: object
      properties:
        # ...
        codes:
          type: array
          list_delimiter: \t
          items:
            type: integer
            format: int32
```

And, not that I think that metadata belongs there, but just for illustration of the idea.

## 3. Nested JSON

```csv
id,name,options,ssn
1,John Smith,"{""foo"":""bar"",""n"":42}",12-34-5678
2,Jane Doe,"{""baz"":""qux"",""n"":37}",10-20-3456
```

Just as with a JSON column in a database, occasionally you need something free-form that can hold awkward or inconsistent information without adding a dozen fields that might typically be empty.

```yaml
components:
  schemas:
    Thingy:
      type: object
      properties:
        # ...
        options:
          nested_json: true
          $ref: '#/components/schemas/Options'
    Options:
      type: object
      properties:
        color:
          type: string
          enum: [red, green, blue]
        size:
          type: integer
          minimum: 1
      required: [color]
```

All of the same quoting rules apply - the implementor simply needs to know that a field should be parsed as JSON as opposed to just so happening to start and end with a [ and ] or { and }.

## Bonus: Base64 & Hex, like JSON

```csv
id,name,icon
1,John Smith,"data:image/gif;base64,R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw=="
2,Jane Doe,"data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' width='1' height='1'/>"
```

It's pretty common to use data URLs or raw Base64 or Hex to encode binary data. This is no different from how it would already be handled in JSON.

## Super Bonus: Multiplexed

This s a bit out there, but there's also this idea of allowing multiple data-types in a single CSV that's been floating around - but I don't know of any actual implementations yet:

```csv
type,person,id,name
type,book,id,title,author
person,1,John Smith
book,1,The Art of Computer Programming,Donald E. Knuth
```

The theory is that you can sort the data before sending it downstream such that all relevant data to an object graph can be frontloaded, and UI can begin rendering while streaming.

However, in my own testing of a similar concept, I don't think there's any significant savings here - in fact it may be _worse_.

10s of thousands of gzipped records can be loaded in parallel via CSV endpoints and reconstructed in a graph on the client in a matter of dozens of milliseconds.

The extra work to reform the data beyond what a database query ORDER BY already does might even slow it down.

Considering how easily the data is cached and how easily just updated records can be gathered - most of the time of which is waiting for the query to finish in the DB - it might be a mistake to go this far.

# Other considerations

These types would be useful for both `GET`ing data to render on or list in a client as well as `POST`ing bulk data (such as something a user copied from a spreadsheet, or where a list of items need some properties adjusted).

Having first-class CSV support in OpenAPI 3 will make it easier to document, build, and support simpler, data-efficient APIs (and maybe even encourage them).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

request to add typed text/csv to registry, in parity with major databases & languages #40

Rationale

Typical CSV Hydration

1. Standard RFC

2. RFC, but with Lists

3. Nested JSON

Bonus: Base64 & Hex, like JSON

Super Bonus: Multiplexed

Other considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

request to add typed text/csv to registry, in parity with major databases & languages #40

Description

Rationale

Typical CSV Hydration

1. Standard RFC

2. RFC, but with Lists

3. Nested JSON

Bonus: Base64 & Hex, like JSON

Super Bonus: Multiplexed

Other considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions