How to initialize channel from CSV file but only after CSV file created? #2154

bhpersonal · 2021-06-05T11:00:52Z

bhpersonal
Jun 5, 2021

I'm trying to do something relatively simple - have one process produce a CSV file (external to the work directory), then, feed the CSV rows into another channel after the first process is completed.

Process A runs a python script which generates a CSV file in some absolute file path say "/mnt/x/blah.csv".

Process B needs to consume the CSV file line by line, after process A has finished.

The trouble is that the channel to input to B is initialized at the beginning before "/mnt/x/blah.csv" is created by A so it fails. This occurs even if I collect the result of process A as input to process B (because channel is initialized regardless)

nextflow.enable.dsl=2


def get_the_csv_records() {
    Channel
        .fromPath("/mnt/x/blah.csv")
        .splitCsv(header:true, quote: '\"')
        .map{ it.TheFieldIWant }
}

process process_a() {
    
    output:
    val("I'm done!")

    '''
    python.exe ./make_the_csv_file.py --path=/mnt/x/blah.csv
    '''
}

process process_b() {

    input:
    val(previous_step_is_done_dummy_variable)
    val(the_field)

    '''
    echo ${the_field}
    '''

}

workflow do_a_then_b{
    process_a()
    process_b(process_a.out.collect(), get_the_csv_records())
}

This fails immediately with "No such file: /mnt/x/blah.csv" because the channel is initialized before process_a has even started.

Same behavior occurs when using concat:

    process_b(process_a.out.collect().concat(get_the_csv_records()))

Question:

How can I make process_b() only read the input channel from the CSV file after it is created?
Alternatively, can I make process_b() trigger when process_a() is finished some other way?
Alternatively, can I add an intermediate step which reads process_a.out.collect(), then reads /mnt/x/blah.csv and outputs it into some existing channel which process_b() takes as input?
Another approach - what is the way to create a channel which outputs the results of splitCsv, when the file is not known at compile time?

Answered by bentsherman

Jul 15, 2022

Hi @bhpersonal , I know you asked this a while ago, but I'll answer for the edification of anyone who comes across this post.

Ideally, you should output the csv file as an output channel in process_a, then you can pass that channel to process_b. That way, process_b will start only after process_a is done, and it will receive the csv file directly to its folder. Meanwhile you can publish the csv file from process_a to an output folder if you want.

Alternatively, you can pass the "ready" channel you made for process_a to process_b as an input, and that will at least ensure that process_b waits for process_a. But really you should do the first thing I suggested.

View full answer

bentsherman · 2022-07-15T01:58:07Z

bentsherman
Jul 15, 2022
Maintainer

Hi @bhpersonal , I know you asked this a while ago, but I'll answer for the edification of anyone who comes across this post.

Ideally, you should output the csv file as an output channel in process_a, then you can pass that channel to process_b. That way, process_b will start only after process_a is done, and it will receive the csv file directly to its folder. Meanwhile you can publish the csv file from process_a to an output folder if you want.

Alternatively, you can pass the "ready" channel you made for process_a to process_b as an input, and that will at least ensure that process_b waits for process_a. But really you should do the first thing I suggested.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to initialize channel from CSV file but only after CSV file created? #2154

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to initialize channel from CSV file but only after CSV file created? #2154

Uh oh!

Uh oh!

bhpersonal Jun 5, 2021

Replies: 1 comment

Uh oh!

bentsherman Jul 15, 2022 Maintainer

bhpersonal
Jun 5, 2021

bentsherman
Jul 15, 2022
Maintainer