diff --git a/docs/reference/stdlib-types.md b/docs/reference/stdlib-types.md index 5d46dca09f..6c91752015 100644 --- a/docs/reference/stdlib-types.md +++ b/docs/reference/stdlib-types.md @@ -600,10 +600,10 @@ The following methods are available for writing to files: : Appends text to a file without replacing existing content. `setText( text: String )` -: Writes text to a file. Equivalent to setting the `text` property. +: Writes text to a file, replacing any existing content. Equivalent to setting the `text` property. `write( text: String )` -: Writes a string to a file, replacing any existing content. +: Writes text to a file, replacing any existing content. Equivalent to `setText()`.

Filesystem operations

diff --git a/docs/working-with-files.md b/docs/working-with-files.md index 28cc987206..25e33a040d 100644 --- a/docs/working-with-files.md +++ b/docs/working-with-files.md @@ -2,51 +2,52 @@ # Working with files -## Opening files +## Retrieving files -To access and work with files, use the `file()` method, which returns a file system object given a file path string: +Use the `file()` function to obtain a reference to a file by name: ```nextflow myFile = file('some/path/to/my_file.file') ``` -The `file()` method can reference both files and directories, depending on what the string path refers to in the file system. +The `file()` function can reference both files and directories. -When using the wildcard characters `*`, `?`, `[]` and `{}`, the argument is interpreted as a [glob](http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) path matcher and the `file()` method returns a list object holding the paths of files whose names match the specified pattern, or an empty list if no match is found: +Use the `files()` function to obtain a list of files. When using the wildcard characters `*`, `?`, `[]` and `{}`, the file name is treated as a [glob](http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) pattern, returning all files that match the given pattern, or an empty list if no matching files are found: ```nextflow -listOfFiles = file('some/path/*.fa') +listOfFiles = files('some/path/*.fa') ``` :::{note} -The `file()` method does not return a list if only one file is matched. Use the `files()` method to always return a list. +The `file()` function can also be called with a glob pattern, as long as the pattern is intended to match exactly one file. ::: -:::{note} -A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories. -::: +A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories: -By default, wildcard characters do not match directories or hidden files. For example, if you want to include hidden files in the result list, enable the `hidden` option: +```nextflow +deeplyNestedFiles = files('some/path/**/*.fa') +``` + +By default, wildcard characters do not match directories or hidden files. Use the `hidden` option to include hidden files: ```nextflow listWithHidden = file('some/path/*.fa', hidden: true) ``` -:::{note} -To compose paths, instead of string interpolation, use the `resolve()` method or the `/` operator: +Given a file reference, you can use the `resolve()` method or the `/` operator to obtain files relative to that path: ```nextflow def dir = file('s3://bucket/some/data/path') -def sample1 = dir.resolve('sample.bam') // correct -def sample2 = dir / 'sample.bam' -def sample3 = file("$dir/sample.bam") // correct (but verbose) -def sample4 = "$dir/sample.bam" // incorrect + +dir.resolve('sample.bam') // correct +dir / 'sample.bam' +file("$dir/sample.bam") // correct (but verbose) +"$dir/sample.bam" // incorrect ``` -::: ## Getting file attributes -The `file()` method returns a [Path](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Path.html), which has several methods for retrieving metadata about the file: +The `file()` function returns a {ref}`Path `, which has several methods for retrieving metadata about the file: ```nextflow def path = file('/some/path/file.txt') @@ -57,159 +58,110 @@ assert path.name == 'file.txt' assert path.parent == '/some/path' ``` -:::{tip} -When calling an object method, any method that looks like `get*()` can also be accessed as a field. For example, `path.getName()` is equivalent to `path.name`, `path.getBaseName()` is equivalent to `path.baseName`, and so on. -::: - See the {ref}`stdlib-types-path` reference for the list of available methods. ## Reading and writing ### Reading and writing an entire file -Given a file variable, created with the `file()` method as shown previously, reading a file is as easy as getting the file's `text` property, which returns the file content as a string: +Reading a file is as easy as using the file's `text` property, which returns the file contents as a string: ```nextflow print myFile.text ``` -Similarly, you can save a string to a file by assigning it to the file's `text` property: +Similarly, you can write text to a file by assigning it to the file's `text` property: ```nextflow myFile.text = 'Hello world!' ``` -Binary data can be managed in the same way, just using the file property `bytes` instead of `text`. Thus, the following example reads the file and returns its content as a byte array: - -```nextflow -binaryContent = myFile.bytes -``` +This approach overwrites any existing file contents, and implicitly creates the file if it doesn't exist. -Or you can save a byte array to a file: +:::{tip} +The `text` property is shorthand for the `getText()` and `setText()` methods: ```nextflow -myFile.bytes = binaryContent +println myFile.getText() +myFile.setText('Hello world!') ``` - -:::{note} -The above assignment overwrites any existing file contents, and implicitly creates the file if it doesn't exist. ::: :::{warning} -The above methods read and write the **entire** file contents at once, in a single variable or buffer. For this reason, when dealing with large files it is recommended that you use a more memory efficient approach, such as reading/writing a file line by line or using a fixed size buffer. +The above methods read and write the *entire* file contents at once, requiring the entire file to be loaded into memory. Consider using a more memory-efficient approach for large files, such as reading/writing the file line by line. ::: -### Appending to a file - -In order to append a string value to a file without erasing existing content, you can use the `append()` method: - -```nextflow -myFile.append('Add this line\n') -``` - -Or use the left shift operator, a more idiomatic way to append text content to a file: - -```nextflow -myFile << 'Add a line more\n' -``` - ### Reading a file line by line -In order to read a text file line by line you can use the method `readLines()` provided by the file object, which returns the file content as a list of strings: - -```nextflow -myFile = file('some/my_file.txt') -allLines = myFile.readLines() -for( line : allLines ) { - println line -} -``` - -This can also be written in a more idiomatic syntax: +You can use the `readLines()` method to read a text file line by line: ```nextflow file('some/my_file.txt') .readLines() - .each { println it } + .each { line -> + println line + } ``` -:::{warning} -The method `readLines()` reads the **entire** file at once and returns a list containing all the lines. For this reason, do not use it to read big files. -::: +The `readLines()` method loads the *entire* file into memory, so it is not ideal for large files. -To process a big file, use the method `eachLine()`, which reads only a single line at a time into memory: +You can use the `eachLine()` method to read line by line while only loading one line at a time into memory: ```nextflow count = 0 -myFile.eachLine { str -> - println "line ${count++}: $str" +myFile.eachLine { line -> + println "line ${count++}: $line" } ``` -### Advanced file reading - -The classes `Reader` and `InputStream` provide fine-grained control for reading text and binary files, respectively. +The `withReader()` method creates a [Reader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/Reader.html) that you can use to read the file line by line, or even character by character. It is useful when you don't need to read the entire file. -The method `newReader()` creates a [Reader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/Reader.html) object for the given file that allows you to read the content as single characters, lines or arrays of characters: +For example, to read only the first line of a file: ```nextflow -myReader = myFile.newReader() -String line -while( line = myReader.readLine() ) { - println line +myFile.withReader { r -> + def firstLine = r.readLine() + println firstLine } -myReader.close() ``` -The method `withReader()` works similarly, but automatically calls the `close()` method for you when you have finished processing the file. So, the previous example can be written more simply as: +### Writing a file line by line + +You can use the `append()` method or left shirt (`<<`) operator to append text to a file without erasing the existing contents: ```nextflow -myFile.withReader { - String line - while( line = it.readLine() ) { - println line - } -} +myFile.append('Add this line\n') +myFile << 'Add a line more\n' ``` -The methods `newInputStream()` and `withInputStream()` work similarly. The main difference is that they create an [InputStream](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html) object useful for writing binary data. - -See the {ref}`stdlib-types-path` reference for the list of available methods. - -### Advanced file writing - -The `Writer` and `OutputStream` classes provide fine-grained control for writing text and binary files, respectively, including low-level operations for single characters or bytes, and support for big files. - -For example, given two file objects `sourceFile` and `targetFile`, the following code copies the first file's content into the second file, replacing all `U` characters with `X`: +For example, the following snippet copies the contents of a source file into a target file, replacing all `U` characters with `X`: ```nextflow -sourceFile.withReader { source -> - targetFile.withWriter { target -> - String line - while( line=source.readLine() ) { - target << line.replaceAll('U','X') - } - } +sourceFile.eachLine { line -> + targetFile << line.replaceAll('U', 'X') } ``` -See the {ref}`stdlib-types-path` reference for the list of available methods. - ## Filesystem operations -Methods for performing filesystem operations such as copying, deleting, and directory listing are documented in the {ref}`stdlib-types-path` reference. +See the {ref}`stdlib-types-path` reference for the complete list of methods for performing filesystem operations. ### Listing directories -The simplest way to list a directory is to use `list()` or `listFiles()`, which return a collection of first-level elements (files and directories) of a directory: +You can use the `listFiles()` method to list the contents of a directory: ```nextflow -for( def file : file('any/path').list() ) { +children = file('any/path').list() +children.each { file -> println file } ``` -Additionally, the `eachFile()` method allows you to iterate through the first-level elements only (just like `listFiles()`). As with other `each*()` methods, `eachFile()` takes a closure as a parameter: +:::{versionchanged} 26.04.0 +The `listFiles()` method is deprecated -- use `listDirectory()` instead. +::: + +You can use the `eachFile()` method to iterate through the contents of a directory: ```nextflow myDir.eachFile { item ->