Skip to content

Opportunities to improve the FRED API #3

@leaderanalytics

Description

@leaderanalytics

From: Sam Wheat
Sent: Thursday, January 5, 2023 8:03 PM
To: [email protected]

Subject: Opportunities to improve the FRED API

As I expand my usage the API I continue to find inconsistencies that increasingly support my suspicion that the API is built on concepts that are themselves correct but are misapplied throughout the API. I also believe the API returns data that is incorrect - by my definition and as defined by the current published documentation.

I apologize for the verbosity of this email. It's been nearly four years since I raised my original questions and I have not received a clarifying response from St. Louis Fed. Given that my prior communications have failed to garner attention, I feel it is incumbent on me to provide additional evidence that improvements can and should be made. I also hope to participate in suggesting a path forward.

I am confident that the concepts I present below are well known and understood by all of you. My purpose in articulating them is to demonstrate detailed examples where I believe API diverges from known practice.

My background in economics is limited but my experience as a software developer is extensive (see samwheat.com). If you would like to further discuss the issues I've raised from the perspective of a developer, please feel free to contact me. I will be glad to discuss these issues via phone or Zoom at your convenience.

This email is a draft for several articles I will be writing for my personal blog. My goal is to share my experience using the API and to assist developers such as myself with learning and using the API effectively. Your feedback is requested, valued, and appreciated. It will be shared with the developer community via my blog as well as relevant social media sites.

Real-time periods versus Vintage dates

Definition of real-time periods

The definition of real-time periods as stated here:

"The real-time period marks when facts were true or when information was known until it changed".

This definition is accurate but far from complete. Specifically, it fails to identify several criteria that separate the concept of a real-time period from a vintage date.

Real-time periods are:

• A period of time when facts were true or when information was known until it changed.
• Delimited by any arbitrary historical range of dates/times. Real-time periods are unrelated to and unconstrained by any vintage date.
• User defined. Real-time periods are conceived by the user. They are not stored in any FRED database or computed by the API.
• Not identifiers. No unit of information that is available from the API can be identified by a real-time period by itself or as part of a compound key.
• Inputs to queries. Real-time start and end dates are compared to vintage dates to determine if a data element was valid during the user defined real-time period.

Definition of vintage dates

The definition of vintage date as stated here:

"Vintage dates are the release dates for a series excluding release dates when the data for the series did not change."

This definition is accurate but also incomplete.

Vintage dates are:

• The moment in time when information is released.
• Not arbitrary. With respect to a given series, not every day in history is a vintage date. Only the dates when new information is released are vintage dates.
• Mark a moment in history when an event occurred and as such are immutable historical facts.
• Not user defined - vintage dates are historical records. Their values are stored in the FRED database.
• Identifiers - a vintage date uniquely identifies a release of data for a given series.
• Outputs from queries - Vintage dates are compared to real-time dates to determine if a data element was valid during the user defined real-time period. If the vintage is valid within the real time period the observation is returned in the result set. It is identified by its vintage date.

Having defined real-time periods and vintage dates we can see that both concepts share two common characteristics: 1.) both identify moments in time and 2.) both can be expressed as calendar date/times (i.e. Jan 1 1980 2:30pm). The fact that these attributes are shared does not in any way give license to use these two concepts interchangeably. They are fundamentally different and must always be used and identified correctly.

It is possible that the term "Real-time" has some other ordained usage in the domain of economics. If that is the case than my argument still stands and different terminology should be used in this context. In fact, I will suggest that a term such "User defined period of interest" is much more meaningful to the common man than Real-time. While lacking the mystique enjoyed by Real-time, a term such as "User defined period of interest" is practical and clear in purpose. For example, it is tempting to say "Vintage dates delimit the real-time period when facts were known." This is a misapplication of concepts and is incorrect. The correct statement is "Vintage dates coincide with a real-time period of matching dates when facts were known". If we use the term "User defined period of interest" the missing component becomes more apparent: "Vintage dates delimit the user defined period of interest when facts were known." What user defined period of interest would that be?

Functional Requirement

If we accept the preceding concepts and definitions as correct, we can use them to construct a requirement that describes how the API should function. We can also create criteria for determining if the data that is returned by the API is correct:

Requirement 1. Query inputs (parameters) are user defined real-time periods. The API may assist the user in constructing real-time periods that coincide with Vintage dates but behind the scenes the logic to limit data to a real-time period is the same.

Requirement 2. Data elements returned by the API are identified by valid Vintage dates. Vintage dates are labeled as vintage dates. Real-time start and end dates are not used as identifiers.

Requirement 3. Vintage dates are never interchangeable with real-time periods.

Testing the current implementation against the requirement

If we accept the preceding requirements as correct we can use them to test the validity of the API and the correctness of the documentation as it exists now:

Documentation

"Economic data sources, releases, series, and observations are all assigned a real-time period."

This statement is incorrect. The correct statement is "...data sources, releases, series, and observations are all assigned a vintage date." (Requirement 2)

"Sources, releases, and series can change their names, and observation data values can be revised."

This statement is incomplete and ambiguous. The correct statement is "Historical sources, releases, and series data are immutable. When a new vintage is created, Sources, releases, and series can change their names, and observation data values can be revised." (Requirement 2)

Documentation

Despite the clearly identified output formats, all data returned by this API is in fact identified by vintage date (as all data provided by FRED should be). The documentation in the section for Observations by Real-Time Period confirms this assertion:

"The real-time period start date defines the first vintage date for which a data value is the latest revision available. The real-time period end date defines the last vintage date for which a data value is the latest revision available."

It is confusing to the user that despite choosing to receive data by Real-Time Period, there is no facility for the user to input a real time period. How the API determines a real-time period (that is different to a real-time periods that corresponds to vintage dates) is unknown.

Documentation

"Sometimes it may be useful to enter a vintage date that is not a date when the data values were revised."

This is an invalid instruction. A vintage date is an identifier. When a user requests data using an invalid identifier the API should return an error. (Requirement 2)

Consider the following sentence: "Sometimes it may be useful to enter a series identifier for a series that is not maintained by FRED." This instruction is functionally equivalent to the instruction provided for vintage dates. It is equally nonsensical. If a user requests observations for series 123XYZ the API will return an error because 123XYZ is an invalid series identifier. Why then, should the API not return an error when an invalid vintage identifier is requested? When the user requests vintages that are valid between a range of dates that they define they should use the real-time start and real-time end date parameters as previously defined.

Functional Inconsistency

The following request supplies no real-time start or end dates so the API (correctly) assumes the current date (2023-01-04) as the real-time period:

Request:

https://api.stlouisfed.org/fred/series/observations?series_id=GDP&api_key=123&observation_start=1975-01-01&observation_end=1975-01-01

Response:

 <observations realtime_start="2023-01-04" realtime_end="2023-01-04" observation_start="1975-01-01" observation_end="1975-01-01" units="lin" output_type="1" file_type="xml" order_by="observation_date" sort_order="asc" count="1" offset="0" limit="100000">
    <observation realtime_start="2023-01-04" realtime_end="2023-01-04" date="1975-01-01" value="1616.116"/>
</observations>

The data above is incorrect because the realtime_start date is used to identify the vintage instead of the vintage date, which is 2018-07-27 (Requirements 2/3). Note that the realtime_start and realtime_end dates are reported in the response header. The correct response is:

<observation vintage_date="2018-07-27" date="1975-01-01" value="1616.116"/>

Functional Inconsistency

The following request specifies realtime_start and realtime_end dates that span multiple vintages. The realtime_start and realtime_end dates do not exactly match any vintage dates:

Request:

https://api.stlouisfed.org/fred/series/observations?series_id=GDP&api_key=123&observation_start=1975-01-01&observation_end=1975-01-01&realtime_start=1991-12-15&realtime_end=1992-01-15

Response:

<observations realtime_start="1991-12-15" realtime_end="1992-01-15" observation_start="1975-01-01" observation_end="1975-01-01" units="lin" output_type="1" file_type="xml" order_by="observation_date" sort_order="asc" count="2" offset="0" limit="100000">
    <observation realtime_start="1991-12-15" realtime_end="1991-12-19" date="1975-01-01" value="1512.7"/>
    <observation realtime_start="1991-12-20" realtime_end="1992-01-15" date="1975-01-01" value="1513.6"/>
</observations>

The response shown above is arguably the clearest example of how the API confuses real-time dates with vintage dates. Note that realtime_start and realtime_end dates are correctly reported in the header of the response. This is where they belong as they are inputs to a query (Requirement 1). The data elements, however, are badly constructed:

1991-12-15 is a random, arbitrary historical date. It does not mark the start or end of any vintage. Nothing related to GDP happened on this day.
1991-12-20 is a meaningful historical date. It is a vintage date. On this day in history a revision to GDP was released.

These two dates are profoundly different - yet the API reports them both as realtime_start dates and the user is left to wonder if either, both, or neither of them are of any significance. (Requirement 2/3) The correct response is:

<observation vintage_date="1991-12-04" date="1975-01-01" value="1512.7"/>
<observation vintage_date="1991-12-20" date="1975-01-01" value="1513.6"/>

When the data is reported this way the user receives accurate, useful data. The definitions of real-time versus vintage dates are respected.

Example of Expected Functionality
QueryDates

Example request:

https://api.stlouisfed.org/fred/series/observations?series_id=GDP&api_key=123&realtime_start=1980-01-01&realtime_end=2010-01-01

Expected response (abbreviated):

<observations realtime_start="1980-01-01" realtime_end="2010-01-01" >
    <observation vintage_date="1975-01-01" date="1975-01-01" value="1"/>
    <observation vintage_date="1985-01-01" date="1975-01-01" value="2"/>
    <observation vintage_date="2000-01-01" date="1975-01-01" value="3"/>
</observations>

Data formatting issues

This page describes four output types:

1 = Observations by Real-Time Period
2 = Observations by Vintage Date, All Observations
3 = Observations by Vintage Date, New and Revised Observations Only
4 = Observations, Initial Release Only

It turns out that output type is somewhat of a misnomer as the setting actually controls both the format of output and the actual values that are returned.

Problem 1: Output type 1 (Observations by Real-Time Period) returns mixed data with respect to all observations vs. new and revised observations.
Steps to reproduce:
Submit this request:

https://api.stlouisfed.org/fred/series/observations?series_id=gdp&vintage_dates=2022-09-29,2022-10-27&output_type=1&file_type=json&api_key=123

The first vintage returned contains all observations while the second vintage returns only new and revised observations. This mixed result is functionally useless in all cases except two:
1.) The user requests every vintage for the series. This will return all observations for the first vintage and new/revisions only for every subsequent vintage. Unfortunately this option is impractical as many series have too many vintages to include in a single request.
2.) The user includes only one vintage date per request. This allows the user to at least obtain a consistent result - every response from the API will include all observations for the series. However, if the user wants only new/revised data this option will not work.

Problem 2: Output type 3 returns unparsable json/xml: { "date":"2017-01-01","GDP_20220929":"19148.194"}
Steps to reproduce:
Submit this request:

https://api.stlouisfed.org/fred/series/observations?series_id=gdp&vintage_dates=2022-09-29,2022-10-27&output_type=3&file_type=json&api_key=123

I am not going to quibble over whether the json/xml is invalid per the spec. I can tell you that most deserializers can not handle this format elegantly as the vintage date column can not be mapped to a property on a statically defined object.

Suggestion:
I suggest a new output type be introduced with the behavior of output type 3 and data format of output type 1. This can be a non-breaking change that gives the user both a consistent result and a usable format.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions