You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Porter is a data import abstraction library to import any data from anywhere. To achieve this she must be able to generalize about the structure of data. Porter believes all data sets are either a single record, repeating collection of records with consistent structure, or both, where *record* is either a list or tree of name and value pairs.
Porter [reliably](#durability) and efficiently imports data into applications. Since data may not arrive in the format we want, Porter can also apply a series of [transformations](#transformers) during import. She is commonly used to integrate third party APIs, but she can import any data from anywhere.
11
15
12
-
Porter's interfaces use arrays, called *records*, and array iterators, called *record collections*. Arrays allow us to store any data type and iterators allow us to iterate over an unlimited number of records, thus allowing Porter to stream any data format of any size.
16
+
Porter's interfaces use arrays, called *records*, and iterators, called [*record collections*](#record-collections). Records may store any data type and record collections may iterate over any number of records, allowing Porter to stream any data format of any size in a memory efficient manner.
13
17
14
-
The [Provider organization][Provider] hosts ready-to-use Porter providers to help quickly gain access to popular third-party APIs and data services. For example, the [Stripe provider][Stripe provider] allows an application to make online payments whilst the [European Central Bank provider][ECB provider] imports the latest currency exchange rates into an application. Anyone writing new providers is encouraged to contribute them to the organization to share with other Porter users.
18
+
The [Provider organization][Provider] hosts ready-to-use Porter providers to help quickly gain access to popular third-party APIs and data services. As examples, the [Stripe provider][Stripe provider] allows an application to make online payments, whereas the [European Central Bank provider][ECB provider] imports the latest currency exchange rates. Anyone writing new providers is encouraged to contribute them to the organization to share with other Porter users.
15
19
16
20
Contents
17
21
--------
@@ -45,12 +49,12 @@ Porter is useful for anyone wanting a [simple API](#porters-api) to import data
45
49
Benefits
46
50
--------
47
51
48
-
* Provides a [framework](#architecture) of inter-operable components for structuring data imports.
49
-
* Defines structured import concepts, such as [providers](#providers) that provide data via one or more [resources](#resources).
52
+
* Provides a [framework](#architecture) of structured data import concepts: [providers](#providers) provide data via [resources](#resources) fetched from [connectors](#connectors).
53
+
* Defines efficient in-memory data processing interfaces to handle large data sets.
50
54
* Offers post-import [transformations](#transformers), such as [filtering](#filtering) and [mapping][MappingTransformer], to transform third-party data into useful data.
51
55
* Protects against intermittent network failure with [durability](#durability) features.
52
56
* Supports raw data [caching](#caching), at the connector level, for each import.
53
-
* Joins many disparate data sets together using [sub-imports][Sub-imports].
57
+
* Joins two or more separate data sets together using [sub-imports][Sub-imports].
54
58
55
59
Quick start
56
60
-----------
@@ -121,13 +125,15 @@ Options may be configured by some of the methods listed below.
121
125
Record collections
122
126
------------------
123
127
124
-
Record collections are a type of `Iterator`, whose values are arrays of imported data, and are sometimes `Countable`. The result of a successful `Porter::import` call is an instance of `PorterRecords` or one of its specialisations, guaranteeing the collection is enumerable using `foreach`. That's all you need to know! The following details are just for nerds.
128
+
Record collections are a type of `Iterator`, guaranteeing imported data is enumerable using `foreach`. The result of a successful `Porter::import` call is either an instance of `PorterRecords` or `CountablePorterRecords`, depending on whether the number of records is known. That's all you need to know! The following details are just for debugging and nerds.
125
129
126
130
### Details
127
131
128
132
Record collections may be `Countable`, depending on whether the imported data was countable and whether any destructive operations were performed after import. Filtering is a destructive operation since it may remove records and therefore the count reported by a `ProviderResource` would no longer be accurate. It is the responsibility of the resource to supply the number of records in its collection by returning an iterator that implements `Countable`, such as `ArrayIterator` or `CountableProviderRecords`. When a countable iterator is detected, Porter returns `CountablePorterRecords` as long as no destructive operations were performed, which is possible because all non-destructive operation's collection types have a countable analogue.
129
133
130
-
Record collections are composed by Porter using the decorator pattern. If provider data is not modified, `PorterRecords` will decorate the `ProviderRecords` returned from a `ProviderResource`. That is, `PorterRecords` has a pointer back to the previous collection, which could be written as: `PorterRecords` → `ProviderRecords`. If a [filter](#filtering) was applied, the collection stack would be `PorterRecords` → `FilteredRecords` → `ProviderRecords`. In general this is an unimportant detail for most users but it can be useful for debugging. The stack of record collection types informs us of the transformations a collection has undergone and each type holds a pointer to relevant objects that participated in the transformation, for example, `PorterRecords` holds a reference to the `ImportSpecification` that was used to create it and can be accessed using `PorterRecords::getSpecification`.
134
+
Record collections are composed by Porter using the decorator pattern. If provider data is not modified, `PorterRecords` will decorate the `ProviderRecords` returned from a `ProviderResource`. That is, `PorterRecords` has a pointer back to the previous collection, which could be written as: `PorterRecords` → `ProviderRecords`. If a [filter](#filtering) was applied, the collection stack would be `PorterRecords` → `FilteredRecords` → `ProviderRecords`. Generally this is an unimportant detail for most users but can be useful for debugging.
135
+
136
+
The stack of record collection types informs us of the transformations a collection has undergone and each type holds a pointer to relevant objects that participated in the transformation. For example, `PorterRecords` holds a reference to the `ImportSpecification` that was used to create it and can be accessed using `PorterRecords::getSpecification`.
131
137
132
138
Transformers
133
139
------------
@@ -173,7 +179,7 @@ $records = $porter->import(
173
179
Durability
174
180
----------
175
181
176
-
Porter automatically retries connections when an exception occurs during `Connector::fetch`. This helps mitigate intermittent network conditions that cause data fetches to fail temporarily. The number of retry attempts can be configured by calling the `setMaxFetchAttempts` method of an `ImportSpecification`.
182
+
Porter automatically retries connections when an exception occurs during `Connector::fetch`. This helps mitigate intermittent network conditions that cause temporary data fetch failures. The number of retry attempts can be configured by calling the `setMaxFetchAttempts` method of an [`ImportSpecification`](#import-specifications).
177
183
178
184
The default exception handler, `ExponentialBackoffExceptionHandler`, causes a failed import to pause for an exponentially increasing series of delays. Given that the default number of retry attempts is *five*, the exception handler may be called up to *four* times, delaying each retry attempt for ~0.1, ~0.2, ~0.4, and finally, ~0.8 seconds.
0 commit comments