Generalize handling of multiple GTFS-RT feeds referencing the same GTFS schedule in fct_observed_trip


### Description

We’ve identified a recurring pattern where **multiple GTFS-Realtime feeds reference the same GTFS schedule feed**, which can result in trips from different RT feeds resolving to the same `trip_instance_key`. When this happens, trip-level metrics in downstream tables (e.g., `fct_observed_trip`) and dashboards may unintentionally **accumulate data from multiple RT feeds**.

We’ve seen this scenario occur in multiple agencies (e.g., cases where agencies are transitioning systems, running parallel feeds, or exposing both customer-facing and non-customer-facing/test feeds). In at least one historical case, this was addressed via a **hard-coded exclusion** of a specific RT feed in the ETL/model logic feeding `fct_observed_trip` (Torrance Transit).

<img width="992" height="546" alt="Image" src="https://github.com/user-attachments/assets/f7d6cc6d-92b2-4c67-9df2-cbdced9fe3d8" />

While that approach resolves the immediate issue, it does not scale well and requires maintaining one-off exclusions by agency or feed identifier.

### Proposed direction

Instead of hard-coded exclusions, we should **generalize the filtering logic** in the formation of `fct_observed_trip` by:

* Systematically excluding **non-customer-facing RT feeds** using existing metadata/flags (e.g., `customer_facing`, `private_dataset`, or equivalent), and
* Applying this logic uniformly so that trip-level metrics are derived only from the intended customer-facing feed(s).

This would make the pipeline more robust to similar scenarios in the future and reduce the need for agency-specific exceptions.

### Open questions

* Which flag(s) should be considered authoritative for determining whether an RT feed should contribute to `fct_observed_trip`?
* Should this filtering be enforced strictly at the `fct_observed_trip` layer, or earlier in the pipeline, (eg. `fct_trip_updates_trip_summaries`)?

### Considerations / Caveats
In some cases, we intentionally define a GTFS-RT feed as a **temporary or test feed** for research or exploratory analysis (e.g., feeds where the `private_dataset` flag is set to `true`). Automatically excluding these feeds from `fct_observed_trip` could limit analysts’ ability to perform certain analyses or validations that rely on test data.

Because of this, any generalized filtering approach should consider:
- whether test / private datasets need to be accessible for analysis in specific contexts, and  
- whether there should be a documented way to **opt in** to including these feeds (e.g., via alternative tables, query overrides, or explicit flags) when needed for research or debugging.

This suggests that the filtering logic should be systematic but flexible, rather than a hard exclusion with no escape hatch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize handling of multiple GTFS-RT feeds referencing the same GTFS schedule in fct_observed_trip #4732

Description

Proposed direction

Open questions

Considerations / Caveats

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generalize handling of multiple GTFS-RT feeds referencing the same GTFS schedule in fct_observed_trip #4732

Description

Description

Proposed direction

Open questions

Considerations / Caveats

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions