Skip to content

Staging

Chuck May edited this page Apr 30, 2015 · 29 revisions

This is a technical overview of the process of staging a case.

Each version of each staging algorithm consists of a set of StagingSchema and StagingTable entities. All logic and validation are contained within those entities. In other words, there is no hidden logic within the library. The schemas and tables are represented internally by JSON files (which are the same JSON files that SEER*API provides). For example, here are the entities for algorithm "cs" and version "02.05.50":

Schemas
Tables

The process of staging is actually the processing of a schema and all tables it references to produce output. It starts with calling stage(StagingData data). The StagingData passed to the API contains the user input for the staging call. At the end of the call, it will also contain the results.

At the start of staging, a "context" is created. The context consists of a map of key/value pairs. The context starts off with the input supplied to the stage call. Each step in the staging process can add entries to the context or modify existing entries.

For the purpose of this example, assume this input is supplied to the stage call:

{
   "site": "C161",
   "hist": "8000",
   "behavior": "3",
   "grade": "9",
   "year_dx": "2013",
   "cs_input_version_original": "020550",
   "size": "075",
   "extension": "100",
   "extension_eval": "9",
   "nodes": "100",
   "nodes_eval": "9",
   "nodes_pos": "99",
   "nodes_exam": "99",
   "mets": "10",
   "mets_eval": "9",
   "lvi": "9",
   "age_dx": "060",
   "ssf1": "100",
   "ssf25": "100"
}

At the end of staging process, the data in the context represents the results of the staging call.

In addition to staging output, the result in the StagingData is set to a value indicating whether the case was staged or if there were errors. The values include:

// list of all Staging Result types
public enum Result {
    // staging was performed
    STAGED,

    // both primary site and histology must be supplied
    FAILED_MISSING_SITE_OR_HISTOLOGY,

    // no matching schema was found
    FAILED_NO_MATCHING_SCHEMA,

    // multiple matching schemas were found; a discriminator is probably needed
    FAILED_MULITPLE_MATCHING_SCHEMAS,

    // year of DX out of valid range
    FAILED_INVALID_YEAR_DX,

    // a field that was flagged as "fail_on_invalid" has an invalid value
    FAILED_INVALID_INPUT
}

Staging can be broken down into these steps.

  1. Initial Validation
  2. Schema Selection
  3. Staging Errors
  4. Input Validation and Defaults
  5. Initial Context
  6. Process Mappings
  7. Results

Initial Validation

There are certain field requirements that must be valid to even attempt staging. The full list of required and valid inputs needed to stage are not understood until the schema is determined. Primary site and histology are the minimum requirements for schema selection so thay are validated as a first step. If they are not supplied, then staging stops with a result of Result.FAILED_MISSING_SITE_OR_HISTOLOGY.

In this example, "site" is "C161" and "hist" is "8000".

Schema Selection

The next step in staging is to determine which schema should be used. The schemas can be thought of as set of instructions to stage. Each can have their own inputs and rules. Each schema defines a table used for schema selection, for example the Stomach schema defines the following selection table:

"schema_selection_table": "schema_selection_stomach"

And here is what that schema selection table looks like. This particular example uses a discriminator (ssf25), but many only use "site" and "hist".

{
   "id": "schema_selection_stomach",
   "algorithm": "cs",
   "version": "02.05.50",
   "name": "Schema Selection Stomach",
   "title": "Schema selection for Stomach",
   "last_modified": "2015-04-16T13:43:34.098Z",
   "definition": [
      { "key": "site", "name": "Primary Site", "type": "INPUT" },
      { "key": "hist", "name": "Histology", "type": "INPUT" },
      { "key": "ssf25", "name": "Schema Discriminator: EsophagusGEJunction (EGJ)/Stomach", "type": "INPUT" },
      { "key": "result", "name": "Result", "type": "ENDPOINT" }
   ],
   "rows": [
      [ "C161-C162", "8000-8152,8154-8231,8243-8245,8247,8248,8250-8934,8940-9136,9141-9582,9700-9701", "000,030,100,981,999", "MATCH" ],
      [ "C163-C166,C168-C169", "8000-8152,8154-8231,8243-8245,8247,8248,8250-8934,8940-9136,9141-9582,9700-9701", "*", "MATCH" ]
   ]
}

The context is matched against all the schema selection tables to determine a list of matching schemas. For more information about processing tables, see [Table Processing] for a complete description. If no maching schemas are found, then staging stops with a result of Result.FAILED_NO_MATCHING_SCHEMA. If multiple matching schemas are found, then staging stops with a result of Result.FAILED_MULITPLE_MATCHING_SCHEMAS. If a single schema is found, processing continues.

In our example, "site", "hist" and "ssf25" match the second row of the stomach selection table and not other schemas. A single schema is found so processing continues on that schema.

Once the schema is found, year of dignosis is validated. The values for that are determined by the input definition in the schema. If the year of diagnosis is not valid, then staging stops with a result of Result.FAILED_INVALID_YEAR_DX.

In the Stomach schema, here is the definition of the year of diagnosis:

{
   "key": "year_dx",
   "name": "Year of Diagnosis",
   "naaccr_item": 390,
   "table": "cs_year_validation",
   "used_for_staging": true
}

The "cs_year_validation" table is used to check the incoming "year_dx" value of "2013".

{
   "id": "cs_year_validation",
   "algorithm": "cs",
   "version": "02.05.50",
   "name": "CS Year Validation",
   "title": "CS Year Validation",
   "notes": "",
   "last_modified": "2015-04-16T13:42:33.446Z",
   "definition": [
      { "key": "year_dx", "name": "Year of Diagnosis", "type": "INPUT" },
      { "key": "cs_input_version_original", "name": "CS Version Input Original", "type": "INPUT" },
      { "key": "result", "name": "Result", "type": "ENDPOINT" }
   ],
   "rows": [
      [ "2004-{{ctx_year_current}}", "*", "MATCH" ],
      [ "", "020500-999999,020440,020302,020200,020100,020001,010401,010400,010300,010200,010100,010005,010004,010003,010002,010000,000937", "MATCH" ]
   ]
}

This table contains a special variable called "ctx_current_year". That value will automatically be replaced with the actual current year when matching the table. Our context matchesthe first line since "2013" is between "2004" and "2015". The second input has a value of "*" which means to match any value. Year is considered valid and processing continues. For more information about matching tables, see [Table Processing].

Staging errors

As the schema is processed, there are various conditions that trigger an "error". By default, errors will NOT stop the staging process. The StagingData entity contains a list of errors that get returned when the staging is complete. Here is the complete list of errors:

// list of all Error types
public enum Type {
    // a required input value does not conform to the table or allowed values
    INVALID_REQUIRED_INPUT,

    // a non-required input value does not conform to the table or allowed values
    INVALID_NON_REQUIRED_INPUT,

    // an input mapping from value did not exist
    UNKNOWN_INPUT_MAPPING,

    // an ERROR endpoint was hit during staging processing
    STAGING_ERROR,

    // a table was processed during staging and no match was found
    MATCH_NOT_FOUND,

    // a specified table does not exist
    UNKNOWN_TABLE,

    // processing a table ended up in an infinite loop due to JUMPs
    INFINITE_LOOP
}

Input Validation and Defaults

The next step is that all supplied inputs in the context are trimmed or trailing space. There is no difference in processing between "", " ", or "     ". All will be evaluated as "".

As this stage, all the inputs in the selected Stomach schema are iterated over and the following steps are taken for each one:

  1. If the input was not supplied in the context, add that key to the context. For example:

    {
       "key": "ssf3",
       "name": "CS Site-Specific Factor 3",
       "naaccr_item": 2900,
       "default": "988",
       "table": "ssf3_lna",
       "used_for_staging": false,
       "metadata": [ "UNDEFINED_SSF" ]
    }

    If ssf3 is not supplied in the context, it will be added with a value of "988", which is specified as the default. Some inputs do not specify a default value:

    {
       "key": "cs_input_version_original",
       "name": "CS Version Input Original",
       "naaccr_item": 2935,
       "table": "cs_input_version_original",
       "used_for_staging": true
    }

    If cs_input_version_original is not supplied in the context, it will be added with a value of "" since there is no default.

    In the end, every input specified in the schema will have a key in our context.

  2. Validate all input. Fields are optionally validated using a table. For example:

    {
       "key": "behavior",
       "name": "Behavior ICD-O-3",
       "naaccr_item": 523,
       "table": "behavior",
       "used_for_staging": false
    }

    The "behavior" field in this case must match a row in the "behavior" table.

    {
        "id": "behavior",
        "algorithm": "cs",
        "version": "02.05.50",
        "name": "Behavior",
        "title": "Behavior ICD-O-3",
        "last_modified": "2015-04-16T13:42:33.009Z",
        "definition": [
            { "key": "behavior", "name": "Behavior", "type": "INPUT" },
            { "key": "desc", "name": "Description", "type": "DESCRIPTION" }
        ],
        "rows": [
            [ "0", "Benign" ],
            [ "1", "Uncertain Benign/Malig" ],
            [ "2", "In Situ" ],
            [ "3", "Malignant Primary" ]
        ]
    }

    The "behavior" table has a single INPUT column which matches the key of the input we are validating. If "behavior" is "0", "1", "2" or "3" then the field is considered valid. Otherwise, an error is added to the process. The type of error depends on the input. If the input definition has a value of true for used_for_staging, an error of Type.INVALID_REQUIRED_INPUT will be added. Otherwise an error of Type.INVALID_NON_REQUIRED_INPUT will be added. Non-required input errors are less important since they do not affect the staging outputs.

    For complete information about matching tables, see [Table Processing].

    Inputs may also indicate that staging should stop when they do not have a valid value. If an input definition specifies includes this:

    "fail_on_invalid": true

    then an invalid input for that key will stop processing and return a result of Result.FAILED_INVALID_INPUT. If fail_on_invalid is false or not specified, invalid inputs will add an error to the StagingData but not stop processing.

Initial Context

The next step is to add the initial context values. A schema may define initial_context as a set of key/value pairs to put into the context at the start of the staging process.

"initial_context": [
    { "key": "schema_number", "value": "44" },
    { "key": "csver_derived", "value": "020550" }
]

For the Stomach schema, two keys will be added to the context: schema_number and csver_derived.

Process Mappings

The next step is to process each "mapping". A mapping represents a list of tables to be processed with the purpose of adding output to the context.

Here is a mapping from the Stomach schema:

{
    "id": "mapping_ajcc7",
    "name": "AJCC 7",
    "inclusion_tables": [
        { "id": "ajcc7_inclusions_tqj" }
    ],
    "initial_context": [
        { "key": "stor_ajcc7_stage", "value": "" },
        { "key": "ajcc7_stage", "value": "" }
    ],
    "tables": [
        { "id": "ssf25_spv" },
        {
            "id": "ajcc7_stage_uam",
            "input_mapping": [
                { "from": "ajcc7_t", "to": "t" },
                { "from": "ajcc7_n", "to": "n" },
                { "from": "ajcc7_m", "to": "m" }
            ],
            "output_mapping": [
                { "from": "stage", "to": "ajcc7_stage" }
            ]
        },
        { "id": "ajcc7_stage_codes" }
    ]
}

Each mapping is processed in order. For each mapping, here are the steps:

  1. If there are any inclusion_tables specified, verify that the current context matches ALL inclusion tables. In the example above, there is a single inclusion table, "ajcc7_inclusions_tqj":

    {
        "id": "ajcc7_inclusions_tqj",
        "algorithm": "cs",
        "version": "02.05.50",
        "name": "AJCC7 Inclusions",
        "title": "Histology Inclusion Table AJCC 7th ed.",
        "last_modified": "2015-04-16T13:42:20.345Z",
        "definition": [
            { "key": "hist", "name": "Histology", "type": "INPUT" }
        ],
        "rows": [
            [ "8000-8152" ],
            [ "8154-8231" ],
            [ "8243-8245" ],
            [ "8247" ],
            [ "8248" ],
            [ "8250-8576" ],
            [ "8940-8950" ],
            [ "8980-8990" ]
        ]
    }

    If there is hist value in the context which matches one of the rows in this table, then the mapping will be processed. If there is no match, this mapping is skipped and the processing moves to the next mapping.

  2. If there are any exclusion_tables, verify that the current context does NOT match any of the exclusion tables. This is the opposite behavior of the inclusion_tables. Instead of verifying the context matches a table, it is verifying the context does not match the table. If none of the tables match, then the mapping will continue processing. Otherwise this mapping is skipped and the processing moves to the next mapping One way this can be used is to have one mapping use a table as an inclusion_table and another use the same table as an exclusion_table. The is equivalent to saying in some cases execute the first mapping, else execute the second mapping.

  3. Add initial_context if mapping has it defined. This is similar to the top level initial_context in that it defines outputs to put into the context. The difference is that it only happens if the mapping is processed based on inclusion_tables and exclusion_tables.

  4. Process each table, specified by id, in the mapping. For a detailed description of how tables are processed, see [Table Processing].

    Mapping tables may define input_mapping or output_mapping. This allowing a single table to use different keys for inputs and how their output remapped to new keys as well.

    For example, here is the second table from the mapping above:

    {
        "id": "ajcc7_stage_uam",
        "input_mapping": [
            { "from": "ajcc7_t", "to": "t" },
            { "from": "ajcc7_n", "to": "n" },
            { "from": "ajcc7_m", "to": "m" }
        ],
        "output_mapping": [
            { "from": "stage", "to": "ajcc7_stage" }
        ]
    }

    The input_mapping states that during the processing of the table, map the keys labeled as from to the keys specified as to. In the above example, when the "ajcc7_stage_uam" table is processed, when it looks for the key "t", it will get its values from "ajcc7_t".

    The output_mapping states that specific output that results from processing the table will be mapped to a different key. In the above example, the "ajcc7_stage_uam" table has an ENDPOINT that produces a key called "stage". The output_mapping above specifies that instead of creating "stage" in the context, put that value under the key "ajcc7_stage" instead.

    Here is the table "ajcc7_stage_uam" for reference:

    {
        "id": "ajcc7_stage_uam",
        "algorithm": "cs",
        "version": "02.05.50",
        "name": "AJCC7 Stage",
        "title": "AJCC TNM 7 Stage",
        "last_modified": "2015-04-16T13:42:21.938Z",
        "definition": [
            { "key": "t", "name": "T", "type": "INPUT" },
            { "key": "n", "name": "N", "type": "INPUT" },
            { "key": "m", "name": "M", "type": "INPUT" },
            { "key": "stage", "name": "Stage", "type": "ENDPOINT" }
        ],
        "rows": [
            [ "T0", "N0", "M0", "ERROR:" ],
            [ "T0", "N1", "M0", "VALUE:UNK" ],
            [ "T0", "N2", "M0", "VALUE:UNK" ],
            [ "T0", "N3a", "M0", "VALUE:UNK" ],
            [ "T0", "N3b", "M0", "VALUE:UNK" ],
            [ "T0", "N3NOS", "M0", "VALUE:UNK" ],
            [ "T0", "NX", "M0", "VALUE:UNK" ],
            [ "Tis", "N0", "M0", "VALUE:0" ],
        ]
    }

    The input_mapping and output_mapping allow a single table to be processed at different times with different inputs and outputs. To do the same without this concept would require multiple copies of the table.

Every table that is processed during the mapping is added to the path in StagingData so that a record of all tables in the order they were processed is recorded.

Results

After all mappings have been processed, staging is complete. The StagingData object now includes the following data:

  1. result - a code indicating whether staging was successful
  2. schema_id - the identifier of the schema used for staging
  3. input - the original input passed to the staging call
  4. output - the resulting output from the staging call
  5. errors - a list of errors that were encountered during staging
  6. path - a list of tables that were processed during staging in the order they were processed

Here are the final results from staging the stomach case:

{
    "result": "STAGED",
    "schema_id": "stomach",
    "input": {
        "site": "C161",
        "hist": "8000",
        "behavior": "3",
        "grade": "9",
        "year_dx": "2013",
        "cs_input_version_original": "020550",
        "size": "075",
        "extension": "100",
        "extension_eval": "9",
        "nodes": "100",
        "nodes_eval": "9",
        "nodes_pos": "99",
        "nodes_exam": "99",
        "mets": "10",
        "mets_eval": "9",
        "lvi": "9",
        "age_dx": "060",
        "ssf1": "100",
        "ssf25": "100"
    },
    "output": {
        "ajcc6_n": "N1",
        "ajcc6_m": "M1",
        "schema_number": "44",
        "stor_ss77": "7",
        "stor_ajcc6_t": "10",
        "n2000": "RN",
        "m77": "D",
        "stor_ajcc6_m": "10",
        "stor_ajcc6_n": "10",
        "stor_ajcc6_ndescriptor": "c",
        "stor_ajcc7_ndescriptor": "c",
        "ajcc6_stage": "IV",
        "stor_ajcc6_stage": "70",
        "stor_ajcc6_mdescriptor": "c",
        "ajcc6_ndescriptor": "c",
        "ajcc7_ndescriptor": "c",
        "ajcc7_stage": "IV",
        "csver_derived": "020550",
        "ss2000": "D",
        "stor_ss2000": "7",
        "stor_ajcc7_mdescriptor": "c",
        "stor_ajcc7_tdescriptor": "c",
        "ajcc7_t": "T1a",
        "m2000": "D",
        "stor_ajcc6_tdescriptor": "c",
        "schema": "stomach",
        "ajcc7_n": "N1",
        "ajcc7_m": "M1",
        "ajcc6_tdescriptor": "c",
        "stor_ajcc7_t": "120",
        "t2000": "L",
        "ajcc7_tdescriptor": "c",
        "stor_ajcc7_n": "100",
        "stor_ajcc7_stage": "700",
        "n77": "RN",
        "stor_ajcc7_m": "100",
        "ajcc7_mdescriptor": "c",
        "t77": "L",
        "ajcc6_mdescriptor": "c",
        "ss77": "D",
        "ajcc6_t": "T1"
    },
    "errors": [ ],
    "path": [
        "mapping_t.extension_bal",
        "mapping_t.extension_eval_cpa",
        "mapping_t.ajcc_descriptor_codes",
        "mapping_t.ajcc_tdescriptor_cleanup",
        "mapping_t.ajcc7_t_codes",
        "mapping_t.extension_eval_cpa",
        "mapping_t.ajcc_descriptor_codes",
        "mapping_t.ajcc_tdescriptor_cleanup",
        "mapping_t.ajcc6_t_codes",
        "mapping_n.nodes_dak",
        "mapping_n.determine_correct_table_for_ajcc7_n_ns27",
        "mapping_n.lymph_nodes_clinical_eval_v0205_ajcc7_xam",
        "mapping_n.determine_correct_table_for_ajcc6_n_ns26",
        "mapping_n.lymph_nodes_clinical_evaluation_ajcc6_xbe",
        "mapping_n.nodes_eval_epa",
        "mapping_n.ajcc_descriptor_codes",
        "mapping_n.ajcc_ndescriptor_cleanup",
        "mapping_n.ajcc7_n_codes",
        "mapping_n.nodes_eval_epa",
        "mapping_n.ajcc_descriptor_codes",
        "mapping_n.ajcc_ndescriptor_cleanup",
        "mapping_n.ajcc6_n_codes",
        "mapping_m.mets_hac",
        "mapping_m.mets_eval_ipa",
        "mapping_m.ajcc_descriptor_codes",
        "mapping_m.ajcc_mdescriptor_cleanup",
        "mapping_m.ajcc7_m_codes",
        "mapping_m.mets_eval_ipa",
        "mapping_m.ajcc_descriptor_codes",
        "mapping_m.ajcc_mdescriptor_cleanup",
        "mapping_m.ajcc6_m_codes",
        "mapping_ajcc7.ajcc7_inclusions_tqj",
        "mapping_ajcc7.ssf25_spv",
        "mapping_ajcc7.ajcc7_stage_uam",
        "mapping_ajcc7.ajcc7_stage_codes",
        "mapping_ajcc6.ajcc6_exclusions_ppd",
        "mapping_ajcc6.ssf25_spv",
        "mapping_ajcc6.ajcc6_stage_qpl",
        "mapping_ajcc6.ajcc6_stage_codes",
        "mapping_summary_stage.summary_stage_rpa",
        "mapping_summary_stage.ss_codes",
        "mapping_summary_stage.summary_stage_rpa",
        "mapping_summary_stage.ss_codes"
    ]
}

Clone this wiki locally