-
Notifications
You must be signed in to change notification settings - Fork 5
Provenance Generators
A provenance generator extracts or infers the entities, agents, activities, and relations from a data source. Below are a set of built-in generators Origins provides. It is implemented as a standalone service.
- Relational Databases
- Document Stores
- Delimited Files
- Data Dictionaries
- Excel
- File System
- Variant Call Format (VCF) Files
- REDCap (MySQL)
- REDCap (REST API)
- REDCap (CSV)
- Harvest (REST API)
sqlite-
postgresql- requires psycopg2 -
mysql- requires PyMySQL or MySQL-python -
oracle- requires cx_Oracle
Options
-
database- name of the database -
host- host of the server -
port- port of the server -
user- user for authentication -
password- password for authentication
Hierarchy
database-
schemas(PostgreSQL only)* tablescolumns
Note: In addition to the PostgreSQL backend supporting schemas, it also provides direct access to the tables under the public schema via the tables property.
Options
-
database- name of the database -
host- host of the server -
port- port of the server -
user- user for authentication -
password- password for authentication
Hierarchy
databasecollections-
fields*
Note: the fields of nested documents are not included.
Backends for fixed width delimited files.
delimited-
csv- Alias to thedelimitedbackend with a,(comma) delimiter -
tab- Alias to thedelimitedbackend with a\t(tab) delimiter
Options
-
path- Path to the file -
delimiter- The delimiter between fields, defaults to comma -
header- A list/tuple of column names. If not specified, the header will be detected if it exists, otherwise the column names will be the indices. -
sniff- The number of bytes to use of the file when detecting the header. -
dialect-csv.Dialectinstance. This will be detected if not specified.
Hierarchy
filecolumns
Backends for data dictionary-style delimited files.
Options
-
path- Path to the file -
delimiter- The delimiter between fields, defaults to comma -
header- A list/tuple of column names. If not specified, the header will be detected if it exists, otherwise the field names will be the indices. -
sniff- The number of bytes to use of the file when detecting the header. -
dialect-csv.Dialectinstance. This will be detected if not specified.
Hierarchy
filefields
Options
-
path- Path to the file -
headers- IfTrue, the first row on each sheet will be assume to be the header. IfFalsethe column indices will be used. If a list/tuple, the columns will apply to the first sheet. If a dict, keys are the sheet names and the values are a list/tuple of column names for the sheet.
Hierarchy
workbooksheetscolumns
Note: Sheets are assumed to be fixed width based on the first row.
Backends for interacting with files and directories.
Options
-
path- Path to directory -
recurse- If true, directories will be recursed, Default is true. -
depth- Ifrecurseis true, this defines the maximum depth of recursion. Default is None (no maximum depth). -
pattern- Glob-style pattern for matching files. Default '*' (everything) -
hidden- If true, hidden files will be matched as well. Default is false.
Hierarchy
directoryfiles
Options
-
path- Path to VCF file
Hierarchy
filefield
-
redcap-mysql- depends on MySQL backend
Options
-
project- name of the project to access -
database- name of the database (defaults to 'redcap') -
host- host of the server -
port- port of the server -
user- user for authentication -
password- password for authentication
Hierarchy
projectformsfields
-
redcap-api- depends on PyCap
Options
-
url- REDCap API URL -
token- REDCap API token for the project -
name- Name of the project being accessed (this is merely an identifier). Note, this is required since PyCap does not currently export the name of the project itself through it's APIs.
Hierarchy
projectformsfields
Options
-
path- Path to the REDCap data dictionary CSV file
Hierarchy
projectformsfields
Options
-
url- Harvest API URL -
token- Harvest API token if authentication is required
Hierarchy
applicationcategoriesconceptsfields