Skip to content

Commit 2b05e15

Browse files
bentshermanpditommaso
authored andcommitted
Update "Working with files" docs page (#6801)
1 parent c7f7025 commit 2b05e15

File tree

2 files changed

+59
-107
lines changed

2 files changed

+59
-107
lines changed

docs/reference/stdlib-types.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -600,10 +600,10 @@ The following methods are available for writing to files:
600600
: Appends text to a file without replacing existing content.
601601

602602
`setText( text: String )`
603-
: Writes text to a file. Equivalent to setting the `text` property.
603+
: Writes text to a file, replacing any existing content. Equivalent to setting the `text` property.
604604

605605
`write( text: String )`
606-
: Writes a string to a file, replacing any existing content.
606+
: Writes text to a file, replacing any existing content. Equivalent to `setText()`.
607607

608608
<h3>Filesystem operations</h3>
609609

docs/working-with-files.md

Lines changed: 57 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -2,51 +2,52 @@
22

33
# Working with files
44

5-
## Opening files
5+
## Retrieving files
66

7-
To access and work with files, use the `file()` method, which returns a file system object given a file path string:
7+
Use the `file()` function to obtain a reference to a file by name:
88

99
```nextflow
1010
myFile = file('some/path/to/my_file.file')
1111
```
1212

13-
The `file()` method can reference both files and directories, depending on what the string path refers to in the file system.
13+
The `file()` function can reference both files and directories.
1414

15-
When using the wildcard characters `*`, `?`, `[]` and `{}`, the argument is interpreted as a [glob](http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) path matcher and the `file()` method returns a list object holding the paths of files whose names match the specified pattern, or an empty list if no match is found:
15+
Use the `files()` function to obtain a list of files. When using the wildcard characters `*`, `?`, `[]` and `{}`, the file name is treated as a [glob](http://docs.oracle.com/javase/tutorial/essential/io/fileOps.html#glob) pattern, returning all files that match the given pattern, or an empty list if no matching files are found:
1616

1717
```nextflow
18-
listOfFiles = file('some/path/*.fa')
18+
listOfFiles = files('some/path/*.fa')
1919
```
2020

2121
:::{note}
22-
The `file()` method does not return a list if only one file is matched. Use the `files()` method to always return a list.
22+
The `file()` function can also be called with a glob pattern, as long as the pattern is intended to match exactly one file.
2323
:::
2424

25-
:::{note}
26-
A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories.
27-
:::
25+
A double asterisk (`**`) in a glob pattern works like `*` but also searches through subdirectories:
2826

29-
By default, wildcard characters do not match directories or hidden files. For example, if you want to include hidden files in the result list, enable the `hidden` option:
27+
```nextflow
28+
deeplyNestedFiles = files('some/path/**/*.fa')
29+
```
30+
31+
By default, wildcard characters do not match directories or hidden files. Use the `hidden` option to include hidden files:
3032

3133
```nextflow
3234
listWithHidden = file('some/path/*.fa', hidden: true)
3335
```
3436

35-
:::{note}
36-
To compose paths, instead of string interpolation, use the `resolve()` method or the `/` operator:
37+
Given a file reference, you can use the `resolve()` method or the `/` operator to obtain files relative to that path:
3738

3839
```nextflow
3940
def dir = file('s3://bucket/some/data/path')
40-
def sample1 = dir.resolve('sample.bam') // correct
41-
def sample2 = dir / 'sample.bam'
42-
def sample3 = file("$dir/sample.bam") // correct (but verbose)
43-
def sample4 = "$dir/sample.bam" // incorrect
41+
42+
dir.resolve('sample.bam') // correct
43+
dir / 'sample.bam'
44+
file("$dir/sample.bam") // correct (but verbose)
45+
"$dir/sample.bam" // incorrect
4446
```
45-
:::
4647

4748
## Getting file attributes
4849

49-
The `file()` method returns a [Path](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/Path.html), which has several methods for retrieving metadata about the file:
50+
The `file()` function returns a {ref}`Path <stdlib-types-path>`, which has several methods for retrieving metadata about the file:
5051

5152
```nextflow
5253
def path = file('/some/path/file.txt')
@@ -57,159 +58,110 @@ assert path.name == 'file.txt'
5758
assert path.parent == '/some/path'
5859
```
5960

60-
:::{tip}
61-
When calling an object method, any method that looks like `get*()` can also be accessed as a field. For example, `path.getName()` is equivalent to `path.name`, `path.getBaseName()` is equivalent to `path.baseName`, and so on.
62-
:::
63-
6461
See the {ref}`stdlib-types-path` reference for the list of available methods.
6562

6663
## Reading and writing
6764

6865
### Reading and writing an entire file
6966

70-
Given a file variable, created with the `file()` method as shown previously, reading a file is as easy as getting the file's `text` property, which returns the file content as a string:
67+
Reading a file is as easy as using the file's `text` property, which returns the file contents as a string:
7168

7269
```nextflow
7370
print myFile.text
7471
```
7572

76-
Similarly, you can save a string to a file by assigning it to the file's `text` property:
73+
Similarly, you can write text to a file by assigning it to the file's `text` property:
7774

7875
```nextflow
7976
myFile.text = 'Hello world!'
8077
```
8178

82-
Binary data can be managed in the same way, just using the file property `bytes` instead of `text`. Thus, the following example reads the file and returns its content as a byte array:
83-
84-
```nextflow
85-
binaryContent = myFile.bytes
86-
```
79+
This approach overwrites any existing file contents, and implicitly creates the file if it doesn't exist.
8780

88-
Or you can save a byte array to a file:
81+
:::{tip}
82+
The `text` property is shorthand for the `getText()` and `setText()` methods:
8983

9084
```nextflow
91-
myFile.bytes = binaryContent
85+
println myFile.getText()
86+
myFile.setText('Hello world!')
9287
```
93-
94-
:::{note}
95-
The above assignment overwrites any existing file contents, and implicitly creates the file if it doesn't exist.
9688
:::
9789

9890
:::{warning}
99-
The above methods read and write the **entire** file contents at once, in a single variable or buffer. For this reason, when dealing with large files it is recommended that you use a more memory efficient approach, such as reading/writing a file line by line or using a fixed size buffer.
91+
The above methods read and write the *entire* file contents at once, requiring the entire file to be loaded into memory. Consider using a more memory-efficient approach for large files, such as reading/writing the file line by line.
10092
:::
10193

102-
### Appending to a file
103-
104-
In order to append a string value to a file without erasing existing content, you can use the `append()` method:
105-
106-
```nextflow
107-
myFile.append('Add this line\n')
108-
```
109-
110-
Or use the left shift operator, a more idiomatic way to append text content to a file:
111-
112-
```nextflow
113-
myFile << 'Add a line more\n'
114-
```
115-
11694
### Reading a file line by line
11795

118-
In order to read a text file line by line you can use the method `readLines()` provided by the file object, which returns the file content as a list of strings:
119-
120-
```nextflow
121-
myFile = file('some/my_file.txt')
122-
allLines = myFile.readLines()
123-
for( line : allLines ) {
124-
println line
125-
}
126-
```
127-
128-
This can also be written in a more idiomatic syntax:
96+
You can use the `readLines()` method to read a text file line by line:
12997

13098
```nextflow
13199
file('some/my_file.txt')
132100
.readLines()
133-
.each { println it }
101+
.each { line ->
102+
println line
103+
}
134104
```
135105

136-
:::{warning}
137-
The method `readLines()` reads the **entire** file at once and returns a list containing all the lines. For this reason, do not use it to read big files.
138-
:::
106+
The `readLines()` method loads the *entire* file into memory, so it is not ideal for large files.
139107

140-
To process a big file, use the method `eachLine()`, which reads only a single line at a time into memory:
108+
You can use the `eachLine()` method to read line by line while only loading one line at a time into memory:
141109

142110
```nextflow
143111
count = 0
144-
myFile.eachLine { str ->
145-
println "line ${count++}: $str"
112+
myFile.eachLine { line ->
113+
println "line ${count++}: $line"
146114
}
147115
```
148116

149-
### Advanced file reading
150-
151-
The classes `Reader` and `InputStream` provide fine-grained control for reading text and binary files, respectively.
117+
The `withReader()` method creates a [Reader](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/io/Reader.html) that you can use to read the file line by line, or even character by character. It is useful when you don't need to read the entire file.
152118

153-
The method `newReader()` creates a [Reader](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/Reader.html) object for the given file that allows you to read the content as single characters, lines or arrays of characters:
119+
For example, to read only the first line of a file:
154120

155121
```nextflow
156-
myReader = myFile.newReader()
157-
String line
158-
while( line = myReader.readLine() ) {
159-
println line
122+
myFile.withReader { r ->
123+
def firstLine = r.readLine()
124+
println firstLine
160125
}
161-
myReader.close()
162126
```
163127

164-
The method `withReader()` works similarly, but automatically calls the `close()` method for you when you have finished processing the file. So, the previous example can be written more simply as:
128+
### Writing a file line by line
129+
130+
You can use the `append()` method or left shirt (`<<`) operator to append text to a file without erasing the existing contents:
165131

166132
```nextflow
167-
myFile.withReader {
168-
String line
169-
while( line = it.readLine() ) {
170-
println line
171-
}
172-
}
133+
myFile.append('Add this line\n')
134+
myFile << 'Add a line more\n'
173135
```
174136

175-
The methods `newInputStream()` and `withInputStream()` work similarly. The main difference is that they create an [InputStream](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/InputStream.html) object useful for writing binary data.
176-
177-
See the {ref}`stdlib-types-path` reference for the list of available methods.
178-
179-
### Advanced file writing
180-
181-
The `Writer` and `OutputStream` classes provide fine-grained control for writing text and binary files, respectively, including low-level operations for single characters or bytes, and support for big files.
182-
183-
For example, given two file objects `sourceFile` and `targetFile`, the following code copies the first file's content into the second file, replacing all `U` characters with `X`:
137+
For example, the following snippet copies the contents of a source file into a target file, replacing all `U` characters with `X`:
184138

185139
```nextflow
186-
sourceFile.withReader { source ->
187-
targetFile.withWriter { target ->
188-
String line
189-
while( line=source.readLine() ) {
190-
target << line.replaceAll('U','X')
191-
}
192-
}
140+
sourceFile.eachLine { line ->
141+
targetFile << line.replaceAll('U', 'X')
193142
}
194143
```
195144

196-
See the {ref}`stdlib-types-path` reference for the list of available methods.
197-
198145
## Filesystem operations
199146

200-
Methods for performing filesystem operations such as copying, deleting, and directory listing are documented in the {ref}`stdlib-types-path` reference.
147+
See the {ref}`stdlib-types-path` reference for the complete list of methods for performing filesystem operations.
201148

202149
### Listing directories
203150

204-
The simplest way to list a directory is to use `list()` or `listFiles()`, which return a collection of first-level elements (files and directories) of a directory:
151+
You can use the `listFiles()` method to list the contents of a directory:
205152

206153
```nextflow
207-
for( def file : file('any/path').list() ) {
154+
children = file('any/path').list()
155+
children.each { file ->
208156
println file
209157
}
210158
```
211159

212-
Additionally, the `eachFile()` method allows you to iterate through the first-level elements only (just like `listFiles()`). As with other `each*()` methods, `eachFile()` takes a closure as a parameter:
160+
:::{versionchanged} 26.04.0
161+
The `listFiles()` method is deprecated -- use `listDirectory()` instead.
162+
:::
163+
164+
You can use the `eachFile()` method to iterate through the contents of a directory:
213165

214166
```nextflow
215167
myDir.eachFile { item ->

0 commit comments

Comments
 (0)