Skip to content

Commit 34ce74f

Browse files
authored
papaparse implementation - read v2 (#100)
* papaparse implementation on read action
1 parent 73bae3b commit 34ce74f

File tree

11 files changed

+508
-903
lines changed

11 files changed

+508
-903
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
## 3.0.0 (June 25, 2021)
2+
* Deleted old action: `Read CSV file from URL`
3+
* Deleted old trigger: `Read CSV attachment`
4+
* New action renamed: from `Read CSV file from URL v2` to `Read CSV attachment`
5+
* Fixed memory leak on big CSV files
6+
7+
## 2.3.0-dev.1 (June 16, 2021)
8+
* Add action: `Read CSV file from URL v2`
9+
* Add trigger: `Read CSV attachment v2`
10+
111
## 2.2.0 (April 23, 2021)
212
* Add pipe to list of separators in `Write CSV attachment from JSON Array` and `Write CSV attachment from JSON Object` actions
313
* Bump dependencies

README.md

Lines changed: 9 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -27,36 +27,23 @@ Name|Mandatory|Description|Values|
2727

2828
The component does not require credentials to function.
2929

30-
31-
## Triggers
32-
33-
### Read CSV file from URL
34-
35-
This trigger will fetch the CSV file from a given URL. The address must be accessible
36-
to the component. The fetched CSV file will be placed in the attachment part of the
37-
outgoing message.
38-
39-
* `CSV URL` - the full URL to the file for retrieving data.
40-
* `Emit all messages` - this checkbox configures output behavior of the component. If the option is checked - the component emits an array of messages, otherwise - the component emits a message per row.
41-
* `CSV Header` - this is a required field. Input the names of headers separated with a comma.
42-
* `Separators` - Specify the separator type. Usually it is a comma (`,`) but values like Semicolon (`;`), Space (` `), Tab (`\t`) and Hash (`#`) are also supported.
43-
* `Skip rows` - if you know that the incoming CSV file has certain number of headers you can indicate to skip them. The supported values are `None`, `First row`, `First two`, `First three` and `First four`.
44-
* `Data columns` - here the values will be added dynamically based on the values in the `CSV Header` field. Here each data column will be listed with the name, Data Type and the Format to enable further configuration.
45-
4630
## Actions
4731

4832
### Read CSV attachment
4933

5034
This action will read the CSV attachment of the incoming message or from the specified URL and output a JSON object.
5135
To configure this action the following fields can be used:
5236

37+
#### Config Fields
38+
39+
* `Emit Behavior` - this selector configures output behavior of the component. If the option is `Fetch All` - the component emits an array of messages, otherwise (`Emit Individually`) - the component emits a message per row.
40+
41+
#### Input Metadata
5342

54-
* `CSV URL` - the full URL to the file for retrieving data. Leave the field blank and action will read CSV attachment of the incoming message (if any). Error will be thrown if URL of the CSV is missing and no CSV file in incoming message found
55-
* `Emit all messages` - this checkbox configures output behavior of the component. If the option is checked - the component emits an array of messages, otherwise - the component emits a message per row.
56-
* `CSV Header` - this is a required field. Input the names of headers separated with a comma.
57-
* `Separators` - Specify the separator type. Usually it is a comma (`,`) but values like Semicolon (`;`), Space (` `), Tab (`\t`) and Hash (`#`) are also supported.
58-
* `Skip rows` - if you know that the incoming CSV file has certain number of headers you can indicate to skip them. The supported values are `None`, `First row`, `First two`, `First three` and `First four`.
59-
* `Data columns` - here the values will be added dynamically based on the values in the `CSV Header` field. Here each data column will be listed with the name, Data Type and the Format to enable further configuration.
43+
* `URL` - We will fetch this URL and parse it as CSV file
44+
* `Contains headers` - if true, the first row of parsed data will be interpreted as field names, false by default.
45+
* `Delimiter` - The delimiting character. Leave blank to auto-detect from a list of most common delimiters.
46+
* `Convert Data types` - numeric, date and boolean data will be converted to their type instead of remaining strings, false by default.
6047

6148
### Write CSV attachment
6249

component.json

Lines changed: 12 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -3,59 +3,29 @@
33
"description": "A comma-separated values (CSV) file stores tabular data (numbers and text) in plain-text form",
44
"docsUrl": "https://github.com/elasticio/csv-component",
55
"buildType": "docker",
6-
"version": "2.2.0",
7-
"triggers": {
8-
"read": {
9-
"main": "./lib/triggers/read.js",
10-
"title": "Read CSV file from URL",
11-
"help": {
12-
"description": "Fetch a CSV file from a given URL and store it in the attachment storage.",
13-
"link": "/components/csv/triggers#read-csv-file-from-url"
14-
},
15-
"type": "polling",
16-
"fields": {
17-
"url": {
18-
"label": "CSV URL",
19-
"required": true,
20-
"placeholder": "http://my-url.com/foo.csv",
21-
"note": "We will fetch this URL and parse it as CSV file",
22-
"viewClass": "TextFieldWithNoteView"
23-
},
24-
"reader": {
25-
"viewClass": "CSVReadView"
26-
}
27-
},
28-
"metadata": {
29-
"out": {}
30-
}
31-
}
32-
},
6+
"version": "3.0.0",
337
"actions": {
348
"read_action": {
35-
"main": "./lib/triggers/read.js",
9+
"main": "./lib/actions/read.js",
3610
"title": "Read CSV attachment",
3711
"help": {
38-
"description": "Read a CSV attachment of an incoming message.",
12+
"description": "This action will read the CSV attachment of the incoming message or from the specified URL and output a JSON object.",
3913
"link": "/components/csv/actions#read-csv-attachment"
4014
},
4115
"fields": {
42-
"url": {
43-
"label": "CSV URL",
44-
"required": false,
45-
"placeholder": "http://my-url.com/foo.csv",
46-
"note": "We will fetch this URL and parse it as CSV file, leave the field blank if you expect CSV attachment from previous step",
47-
"viewClass": "TextFieldWithNoteView"
48-
},
4916
"emitAll": {
50-
"label": "Emit all messages",
51-
"viewClass": "CheckBoxView"
52-
},
53-
"reader": {
54-
"viewClass": "CSVReadView",
55-
"required": true
17+
"label": "Emit Behavior",
18+
"required": true,
19+
"viewClass": "SelectView",
20+
"model": {
21+
"true": "Fetch All",
22+
"false": "Emit Individually"
23+
},
24+
"prompt": "Select Emit Behavior"
5625
}
5726
},
5827
"metadata": {
28+
"in": "./lib/schemas/read.in.json",
5929
"out": {}
6030
}
6131
},

lib/actions/read.js

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
/* eslint-disable no-restricted-syntax,semi,comma-dangle,class-methods-use-this */
2+
3+
const { AttachmentProcessor } = require('@elastic.io/component-commons-library')
4+
const { Writable } = require('stream');
5+
const { messages } = require('elasticio-node')
6+
const stream = require('stream')
7+
const util = require('util')
8+
const papa = require('papaparse')
9+
10+
const pipeline = util.promisify(stream.pipeline);
11+
const attachmentProcessor = new AttachmentProcessor()
12+
13+
// transform array to obj, for example:
14+
// ['aa', 'bb', 'cc'] => {column0: 'aa', column1: 'bb', column2: 'cc'}
15+
function arrayToObj(arr) {
16+
let columns = {}
17+
arr.forEach((value, index) => {
18+
columns = { ...columns, ...{ [`column${index}`]: value } }
19+
})
20+
return columns
21+
}
22+
23+
async function errHelper(text) {
24+
await this.logger.error(text)
25+
await this.emit('error', text)
26+
await this.emit('end')
27+
}
28+
29+
async function readCSV(msg, cfg) {
30+
const that = this
31+
const emitAll = cfg.emitAll === true || cfg.emitAll === 'true'
32+
const { body } = msg
33+
34+
// check if url provided in msg
35+
if (body.url && body.url.length > 0) {
36+
this.logger.info('URL found')
37+
} else {
38+
await errHelper.call(this, 'URL of the CSV is missing')
39+
return
40+
}
41+
42+
if (body.header !== undefined
43+
&& body.header !== ''
44+
&& (typeof body.header) !== 'boolean') {
45+
await errHelper.call(this, 'Non-boolean values are not supported by "Contains headers" field')
46+
return
47+
}
48+
49+
if (body.dynamicTyping !== undefined
50+
&& body.dynamicTyping !== ''
51+
&& (typeof body.dynamicTyping) !== 'boolean') {
52+
await errHelper.call(this, 'Non-boolean values are not supported by "Convert Data types" field')
53+
return
54+
}
55+
56+
const parseOptions = {
57+
header: body.header,
58+
dynamicTyping: body.dynamicTyping,
59+
delimiter: body.delimiter
60+
}
61+
62+
// if set "Fetch All" create object with results
63+
const outputMsg = {
64+
result: [],
65+
}
66+
67+
let dataStream
68+
const parseStream = papa.parse(papa.NODE_STREAM_INPUT, parseOptions)
69+
70+
try {
71+
dataStream = await attachmentProcessor.getAttachment(body.url, 'stream')
72+
this.logger.info('File received, trying to parse CSV')
73+
} catch (err) {
74+
this.logger.error(`URL - "${body.url}" unreachable: ${err}`);
75+
this.emit('error', `URL - "${body.url}" unreachable: ${err}`)
76+
this.emit('end')
77+
return
78+
}
79+
// control of node data stream
80+
class CsvWriter extends Writable {
81+
async write(chunk) {
82+
let data = {}
83+
if (parseOptions.header) {
84+
data = chunk
85+
} else {
86+
data = arrayToObj(chunk)
87+
}
88+
if (emitAll) {
89+
outputMsg.result.push(data)
90+
} else {
91+
parseStream.pause()
92+
await that.emit('data', messages.newMessageWithBody(data))
93+
parseStream.resume()
94+
}
95+
}
96+
}
97+
const writerStream = new CsvWriter()
98+
writerStream.logger = this.logger
99+
100+
try {
101+
await pipeline(
102+
dataStream.data,
103+
parseStream,
104+
writerStream
105+
)
106+
this.logger.info('File parsed successfully')
107+
} catch (err) {
108+
this.logger.error(`error during file parse: ${err}`);
109+
this.emit('error', `error during file parse: ${err}`)
110+
this.emit('end')
111+
return
112+
}
113+
114+
if (emitAll) {
115+
await this.emit('data', messages.newMessageWithBody(outputMsg))
116+
}
117+
this.logger.info(`Complete, memory used: ${process.memoryUsage().heapUsed / 1024 / 1024} Mb`)
118+
}
119+
120+
module.exports.process = readCSV

lib/schemas/read.in.json

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{
2+
"type": "object",
3+
"properties": {
4+
"url": {
5+
"type": "string",
6+
"required": true,
7+
"title": "URL"
8+
},
9+
"header": {
10+
"type": "boolean",
11+
"required": false,
12+
"title": "Contains headers"
13+
},
14+
"delimiter": {
15+
"type": "string",
16+
"required": false,
17+
"title": "Delimiter"
18+
},
19+
"dynamicTyping": {
20+
"type": "boolean",
21+
"required": false,
22+
"title": "Convert Data types"
23+
}
24+
}
25+
}

0 commit comments

Comments
 (0)