Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/reference/stdlib-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -600,10 +600,10 @@ The following methods are available for writing to files:
: Appends text to a file without replacing existing content.

`setText( text: String )`
: Writes text to a file. Equivalent to setting the `text` property.
: Writes text to a file, replacing any existing content. Equivalent to setting the `text` property.

`write( text: String )`
: Writes a string to a file, replacing any existing content.
: Writes text to a file, replacing any existing content. Equivalent to `setText()`.

<h3>Filesystem operations</h3>

Expand Down
162 changes: 57 additions & 105 deletions docs/working-with-files.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,51 +2,52 @@

# Working with files

## Opening files
## Retrieving files

To access and work with files, use the `file()` method, which returns a file system object given a file path string:
Use the `file()` function to obtain a reference to a file by name:

```nextflow
myFile = file('some/path/to/my_file.file')
```

The `file()` method can reference both files and directories, depending on what the string path refers to in the file system.
The `file()` function can reference both files and directories.

When using the wildcard characters `*`, `?`, `[]` and `{}`, the argument is interpreted as a [glob](http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) path matcher and the `file()` method returns a list object holding the paths of files whose names match the specified pattern, or an empty list if no match is found:
Use the `files()` function to obtain a list of files. When using the wildcard characters `*`, `?`, `[]` and `{}`, the file name is treated as a [glob](http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) pattern, returning all files that match the given pattern, or an empty list if no matching files are found:

```nextflow
listOfFiles = file('some/path/*.fa')
listOfFiles = files('some/path/*.fa')
```

:::{note}
The `file()` method does not return a list if only one file is matched. Use the `files()` method to always return a list.
The `file()` function can also be called with a glob pattern, as long as the pattern is intended to match exactly one file.
:::

:::{note}
A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories.
:::
A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories:

By default, wildcard characters do not match directories or hidden files. For example, if you want to include hidden files in the result list, enable the `hidden` option:
```nextflow
deeplyNestedFiles = files('some/path/**/*.fa')
```

By default, wildcard characters do not match directories or hidden files. Use the `hidden` option to include hidden files:

```nextflow
listWithHidden = file('some/path/*.fa', hidden: true)
```

:::{note}
To compose paths, instead of string interpolation, use the `resolve()` method or the `/` operator:
Given a file reference, you can use the `resolve()` method or the `/` operator to obtain files relative to that path:

```nextflow
def dir = file('s3://bucket/some/data/path')
def sample1 = dir.resolve('sample.bam') // correct
def sample2 = dir / 'sample.bam'
def sample3 = file("$dir/sample.bam") // correct (but verbose)
def sample4 = "$dir/sample.bam" // incorrect

dir.resolve('sample.bam') // correct
dir / 'sample.bam'
file("$dir/sample.bam") // correct (but verbose)
"$dir/sample.bam" // incorrect
```
:::

## Getting file attributes

The `file()` method returns a [Path](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Path.html), which has several methods for retrieving metadata about the file:
The `file()` function returns a {ref}`Path <stdlib-types-path>`, which has several methods for retrieving metadata about the file:

```nextflow
def path = file('/some/path/file.txt')
Expand All @@ -57,159 +58,110 @@ assert path.name == 'file.txt'
assert path.parent == '/some/path'
```

:::{tip}
When calling an object method, any method that looks like `get*()` can also be accessed as a field. For example, `path.getName()` is equivalent to `path.name`, `path.getBaseName()` is equivalent to `path.baseName`, and so on.
:::

See the {ref}`stdlib-types-path` reference for the list of available methods.

## Reading and writing

### Reading and writing an entire file

Given a file variable, created with the `file()` method as shown previously, reading a file is as easy as getting the file's `text` property, which returns the file content as a string:
Reading a file is as easy as using the file's `text` property, which returns the file contents as a string:

```nextflow
print myFile.text
```

Similarly, you can save a string to a file by assigning it to the file's `text` property:
Similarly, you can write text to a file by assigning it to the file's `text` property:

```nextflow
myFile.text = 'Hello world!'
```

Binary data can be managed in the same way, just using the file property `bytes` instead of `text`. Thus, the following example reads the file and returns its content as a byte array:

```nextflow
binaryContent = myFile.bytes
```
This approach overwrites any existing file contents, and implicitly creates the file if it doesn't exist.

Or you can save a byte array to a file:
:::{tip}
The `text` property is shorthand for the `getText()` and `setText()` methods:

```nextflow
myFile.bytes = binaryContent
println myFile.getText()
myFile.setText('Hello world!')
```

:::{note}
The above assignment overwrites any existing file contents, and implicitly creates the file if it doesn't exist.
:::

:::{warning}
The above methods read and write the **entire** file contents at once, in a single variable or buffer. For this reason, when dealing with large files it is recommended that you use a more memory efficient approach, such as reading/writing a file line by line or using a fixed size buffer.
The above methods read and write the *entire* file contents at once, requiring the entire file to be loaded into memory. Consider using a more memory-efficient approach for large files, such as reading/writing the file line by line.
:::

### Appending to a file

In order to append a string value to a file without erasing existing content, you can use the `append()` method:

```nextflow
myFile.append('Add this line\n')
```

Or use the left shift operator, a more idiomatic way to append text content to a file:

```nextflow
myFile << 'Add a line more\n'
```

### Reading a file line by line

In order to read a text file line by line you can use the method `readLines()` provided by the file object, which returns the file content as a list of strings:

```nextflow
myFile = file('some/my_file.txt')
allLines = myFile.readLines()
for( line : allLines ) {
println line
}
```

This can also be written in a more idiomatic syntax:
You can use the `readLines()` method to read a text file line by line:

```nextflow
file('some/my_file.txt')
.readLines()
.each { println it }
.each { line ->
println line
}
```

:::{warning}
The method `readLines()` reads the **entire** file at once and returns a list containing all the lines. For this reason, do not use it to read big files.
:::
The `readLines()` method loads the *entire* file into memory, so it is not ideal for large files.

To process a big file, use the method `eachLine()`, which reads only a single line at a time into memory:
You can use the `eachLine()` method to read line by line while only loading one line at a time into memory:

```nextflow
count = 0
myFile.eachLine { str ->
println "line ${count++}: $str"
myFile.eachLine { line ->
println "line ${count++}: $line"
}
```

### Advanced file reading

The classes `Reader` and `InputStream` provide fine-grained control for reading text and binary files, respectively.
The `withReader()` method creates a [Reader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/Reader.html) that you can use to read the file line by line, or even character by character. It is useful when you don't need to read the entire file.

The method `newReader()` creates a [Reader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/Reader.html) object for the given file that allows you to read the content as single characters, lines or arrays of characters:
For example, to read only the first line of a file:

```nextflow
myReader = myFile.newReader()
String line
while( line = myReader.readLine() ) {
println line
myFile.withReader { r ->
def firstLine = r.readLine()
println firstLine
}
myReader.close()
```

The method `withReader()` works similarly, but automatically calls the `close()` method for you when you have finished processing the file. So, the previous example can be written more simply as:
### Writing a file line by line

You can use the `append()` method or left shirt (`<<`) operator to append text to a file without erasing the existing contents:

```nextflow
myFile.withReader {
String line
while( line = it.readLine() ) {
println line
}
}
myFile.append('Add this line\n')
myFile << 'Add a line more\n'
```

The methods `newInputStream()` and `withInputStream()` work similarly. The main difference is that they create an [InputStream](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html) object useful for writing binary data.

See the {ref}`stdlib-types-path` reference for the list of available methods.

### Advanced file writing

The `Writer` and `OutputStream` classes provide fine-grained control for writing text and binary files, respectively, including low-level operations for single characters or bytes, and support for big files.

For example, given two file objects `sourceFile` and `targetFile`, the following code copies the first file's content into the second file, replacing all `U` characters with `X`:
For example, the following snippet copies the contents of a source file into a target file, replacing all `U` characters with `X`:

```nextflow
sourceFile.withReader { source ->
targetFile.withWriter { target ->
String line
while( line=source.readLine() ) {
target << line.replaceAll('U','X')
}
}
sourceFile.eachLine { line ->
targetFile << line.replaceAll('U', 'X')
}
```

See the {ref}`stdlib-types-path` reference for the list of available methods.

## Filesystem operations

Methods for performing filesystem operations such as copying, deleting, and directory listing are documented in the {ref}`stdlib-types-path` reference.
See the {ref}`stdlib-types-path` reference for the complete list of methods for performing filesystem operations.

### Listing directories

The simplest way to list a directory is to use `list()` or `listFiles()`, which return a collection of first-level elements (files and directories) of a directory:
You can use the `listFiles()` method to list the contents of a directory:

```nextflow
for( def file : file('any/path').list() ) {
children = file('any/path').list()
children.each { file ->
println file
}
```

Additionally, the `eachFile()` method allows you to iterate through the first-level elements only (just like `listFiles()`). As with other `each*()` methods, `eachFile()` takes a closure as a parameter:
:::{versionchanged} 26.04.0
The `listFiles()` method is deprecated -- use `listDirectory()` instead.
:::

You can use the `eachFile()` method to iterate through the contents of a directory:

```nextflow
myDir.eachFile { item ->
Expand Down
Loading