Skip to content

Provenance Generators

bruth edited this page Nov 18, 2014 · 1 revision

A provenance generator extracts or infers the entities, agents, activities, and relations from a data source. Below are a set of built-in generators Origins provides. It is implemented as a standalone service.

Relational Databases

Options

  • database - name of the database
  • host - host of the server
  • port - port of the server
  • user - user for authentication
  • password - password for authentication

Hierarchy

  • database
  • schemas (PostgreSQL only)*
  • tables
  • columns

Note: In addition to the PostgreSQL backend supporting schemas, it also provides direct access to the tables under the public schema via the tables property.

Document Stores

Options

  • database - name of the database
  • host - host of the server
  • port - port of the server
  • user - user for authentication
  • password - password for authentication

Hierarchy

  • database
  • collections
  • fields*

Note: the fields of nested documents are not included.

Delimited Files

Backends for fixed width delimited files.

  • delimited
  • csv - Alias to the delimited backend with a , (comma) delimiter
  • tab - Alias to the delimited backend with a \t (tab) delimiter

Options

  • path - Path to the file
  • delimiter - The delimiter between fields, defaults to comma
  • header - A list/tuple of column names. If not specified, the header will be detected if it exists, otherwise the column names will be the indices.
  • sniff - The number of bytes to use of the file when detecting the header.
  • dialect - csv.Dialect instance. This will be detected if not specified.

Hierarchy

  • file
  • columns

Data Dictionaries

Backends for data dictionary-style delimited files.

Options

  • path - Path to the file
  • delimiter - The delimiter between fields, defaults to comma
  • header - A list/tuple of column names. If not specified, the header will be detected if it exists, otherwise the field names will be the indices.
  • sniff - The number of bytes to use of the file when detecting the header.
  • dialect - csv.Dialect instance. This will be detected if not specified.

Hierarchy

  • file
  • fields

Excel

Options

  • path - Path to the file
  • headers - If True, the first row on each sheet will be assume to be the header. If False the column indices will be used. If a list/tuple, the columns will apply to the first sheet. If a dict, keys are the sheet names and the values are a list/tuple of column names for the sheet.

Hierarchy

  • workbook
  • sheets
  • columns

Note: Sheets are assumed to be fixed width based on the first row.

File System

Backends for interacting with files and directories.

Options

  • path - Path to directory
  • recurse - If true, directories will be recursed, Default is true.
  • depth - If recurse is true, this defines the maximum depth of recursion. Default is None (no maximum depth).
  • pattern - Glob-style pattern for matching files. Default '*' (everything)
  • hidden - If true, hidden files will be matched as well. Default is false.

Hierarchy

  • directory
  • files

Variant Call Format (VCF) Files

Options

  • path - Path to VCF file

Hierarchy

  • file
  • field

REDCap (MySQL)

Options

  • project - name of the project to access
  • database - name of the database (defaults to 'redcap')
  • host - host of the server
  • port - port of the server
  • user - user for authentication
  • password - password for authentication

Hierarchy

  • project
  • forms
  • fields

REDCap (REST API)

Options

  • url - REDCap API URL
  • token - REDCap API token for the project
  • name - Name of the project being accessed (this is merely an identifier). Note, this is required since PyCap does not currently export the name of the project itself through it's APIs.

Hierarchy

  • project
  • forms
  • fields

REDCap (CSV)

Options

  • path - Path to the REDCap data dictionary CSV file

Hierarchy

  • project
  • forms
  • fields

Harvest (REST API)

Options

  • url - Harvest API URL
  • token - Harvest API token if authentication is required

Hierarchy

  • application
  • categories
  • concepts
  • fields

Clone this wiki locally