Skip to content

Commit 17a434b

Browse files
authored
Create stdlib/fs.jou and DirIter to loop through directory contents (#1273)
1 parent bfab484 commit 17a434b

File tree

8 files changed

+568
-7
lines changed

8 files changed

+568
-7
lines changed

.github/workflows/windows.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,10 +171,11 @@ jobs:
171171
shell: bash
172172
- run: cp -r repo/tests repo/runtests.sh "test dir"
173173
shell: bash
174-
- name: "Delete tests that depend on the compiler and cannot work without it"
174+
- name: "Delete tests that depend on files outside the tests folder"
175175
run: |
176176
rm -v "test dir"/tests/should_succeed/compiler_unit_tests.jou
177177
rm -v "test dir"/tests/should_succeed/keywords.jou
178+
rm -v "test dir"/tests/should_succeed/fs_test.jou
178179
shell: bash
179180
- run: cd "test dir" && ./jou.exe --verbose examples/hello.jou
180181
shell: bash

doc/fs.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# File system utilities
2+
3+
This file documents `stdlib/fs.jou`.
4+
5+
6+
## Iterating the contents of a directory
7+
8+
TL;DR:
9+
10+
```python
11+
iter = DirIter{dir = "path/to/some/directory"}
12+
while iter.next():
13+
printf("%s\n", iter.path) # path/to/some/directory/file.txt
14+
printf("%s\n", iter.name) # file.txt
15+
16+
if iter.error_code != 0:
17+
printf("Error: %s\n", iter.error_message)
18+
```
19+
20+
The `DirIter` class can be used to loop through the files and folders in a directory
21+
(also known as folder).
22+
23+
When creating a `DirIter`, you should set all unused fields to zero
24+
by e.g. using [the `ClassName{}` syntax](classes.md#instantiating-syntax) as shown above.
25+
You can set the following fields:
26+
- `dir: byte*` is a path to the directory being listed. This is the only field that you must set.
27+
- `include_dot_and_dotdot: bool` can be set to `True`
28+
if you want to get the special `.` and `..` entries when iterating the directory.
29+
They are skipped by default.
30+
31+
You should call `iter.next()` repeatedly until it returns `False`.
32+
Return value `True` means that a file or subdirectory was found,
33+
and `iter.path` and `iter.name` were updated accordingly.
34+
Return value `False` means that either an error occurred or the end of the directory was reached.
35+
If `.next()` has already returned `False`, calling `.next()` again returns `False` without doing anything.
36+
37+
The memory used for iterating is freed when `.next()` returns `False`.
38+
This means that you don't need any cleanup,
39+
but to avoid leaking memory and the underlying directory handle,
40+
you shouldn't stop calling `.next()` until you get the `False`.
41+
Please [create an issue on GitHub](https://github.com/Akuli/jou/issues/new)
42+
if you want to stop the iterating early.
43+
44+
After calling `.next()`, you can use the following fields:
45+
- `path: byte*` is the path to the file or subdirectory inside the given `dir`.
46+
It consists of `dir`, a slash if `dir` does not already end with a slash, and a file or subdirectory name.
47+
The string in `iter.path` is only valid until the following call to `.next()`,
48+
so if you want to use the string after the following call to `.next()`,
49+
you need to make a copy of the string.
50+
This field is `NULL` if `iter.next()` returned `False`.
51+
- `name: byte*` is the file or subdirectory name without the rest of the path.
52+
Similarly to `iter.path`, this is only valid until the following call to `.next()`
53+
and you may need to make a copy.
54+
This field is `NULL` if `iter.next()` returned `False`.
55+
- `error_code: int` is nonzero if `iter.next()` returned `False` due to an error,
56+
and zero if no error has occurred.
57+
This is [a Windows API error number](https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-) on Windows
58+
and an [errno value](../stdlib/errno.jou) on other systems.
59+
- `error_message: byte[512]` is a human-readable error message,
60+
or an empty string if no error has occurred.
61+
62+
The iteration order is whatever the operating system and file system happen to produce,
63+
and you shouldn't rely on it.
64+
For example, you can [sort the strings](sorting.md#sorting-strings):
65+
66+
```python
67+
import "stdlib/fs.jou"
68+
import "stdlib/io.jou"
69+
import "stdlib/list.jou"
70+
import "stdlib/mem.jou"
71+
import "stdlib/sort.jou"
72+
import "stdlib/str.jou"
73+
74+
def main() -> int:
75+
results = List[byte*]{}
76+
77+
iter = DirIter{dir = "doc/images"}
78+
while iter.next():
79+
results.append(strdup(iter.name))
80+
81+
if iter.error_code != 0:
82+
printf("Error: %s\n", iter.error_message)
83+
return 1
84+
85+
sort_strings(results.ptr, results.len)
86+
87+
# Output: 64bit-meme-small.jpg
88+
# Output: 64bit-meme.jpg
89+
# Output: sources.txt
90+
for i = 0; i < results.len; i++:
91+
puts(results.ptr[i])
92+
free(results.ptr[i]) # Free the copy created with strdup()
93+
94+
free(results.ptr)
95+
return 0
96+
```
97+
98+
99+
## Windows support
100+
101+
On Windows, paths containing non-ASCII characters and very long paths may not work properly.
102+
The reason is that `stdlib/fs.jou` uses the ANSI versions of Windows API functions,
103+
such as `FindFirstFileA` and `FindNextFileA`.
104+
Please [create an issue on GitHub](https://github.com/Akuli/jou/issues/new)
105+
if you need to work with arbitrary Windows paths.
106+
A proper fix for this is planned, but not implemented.

doctest.sh

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ function generate_expected_output()
4040
{
4141
local joufile="$1"
4242

43-
(grep -onH '# Warning: .*' "$joufile" || true) | sed -E s/'(.*):([0-9]*):# Warning: '/'compiler warning for file "test.jou", line \2: '/
43+
(grep -onH '# Warning: .*' "$joufile" || true) | sed -E s/'(.*):([0-9]*):# Warning: '/'compiler warning for file "\1", line \2: '/
4444
(grep -onH '# Error: .*' "$joufile" || true) | sed -E s/'(.*):([0-9]*):# Error: '/'compiler error in file "\1", line \2: '/
4545
(grep -oE '# Output:.*' "$joufile" || true) | sed -E s/'^# Output: ?'//
4646
}
@@ -67,14 +67,26 @@ done
6767
ntotal=0
6868
nfail=0
6969

70-
cd tmp/doctest
71-
for file in */*.jou; do
70+
for file in tmp/doctest/*/*.jou; do
7271
# Print file and line number, as in "doc/foo.md:123: "
7372
# Newline is deleted to avoid warning on NetBSD 9.3, see issue #500
74-
echo -n "$(basename "$(dirname "$file")" | tr -d '\n' | base64 -d):$(basename "$file" | cut -d'.' -f1 | sed 's/^0*//'): "
73+
md_file="$(basename "$(dirname "$file")" | tr -d '\n' | base64 -d)"
74+
md_lineno=$(basename "$file" | cut -d'.' -f1 | sed 's/^0*//')
75+
echo -n "$md_file:$md_lineno: "
7576

76-
cp "$file" test.jou
77-
if $diff --text -u <(generate_expected_output test.jou | tr -d '\r') <( ("$jou" test.jou 2>&1 || true) | tr -d '\r'); then
77+
cp "$file" tmp/doctest/test.jou
78+
79+
if [[ $md_file =~ fs.md ]]; then
80+
# These doctests refer files by path
81+
working_dir="."
82+
relative_path="tmp/doctest/test.jou"
83+
else
84+
# Some doctests contain assertion failures that mention "test.jou"
85+
working_dir="tmp/doctest"
86+
relative_path="test.jou"
87+
fi
88+
89+
if $diff --text -u <(cd "$working_dir" && generate_expected_output "$relative_path" | tr -d '\r') <( (cd "$working_dir" && "$jou" "$relative_path" 2>&1 || true) | tr -d '\r'); then
7890
echo "ok"
7991
else
8092
((nfail++)) || true

stdlib/fs.jou

Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
import "stdlib/list.jou"
2+
import "stdlib/mem.jou"
3+
import "stdlib/str.jou"
4+
5+
if WINDOWS:
6+
import "stdlib/assert.jou"
7+
8+
# TODO: This should really use W functions, not A functions.
9+
# But then we would need a way to convert between byte* and uint16*.
10+
# I have an idea for that (WTF-8) but I haven't implemented it yet.
11+
# The rest of the standard library will need changes too.
12+
class WIN32_FIND_DATAA:
13+
dwFileAttributes: uint32
14+
ftCreationTime: uint32[2]
15+
ftLastAccessTime: uint32[2]
16+
ftLastWriteTime: uint32[2]
17+
nFileSizeHigh: uint32
18+
nFileSizeLow: uint32
19+
dwReserved0: uint32
20+
dwReserved1: uint32
21+
cFileName: byte[260] # TODO: this can be quite limiting
22+
cAlternateFileName: byte[14]
23+
24+
declare FindFirstFileA(FileName: byte*, FindFileData: WIN32_FIND_DATAA*) -> int64
25+
declare FindNextFileA(hFindFile: int64, FindFileData: WIN32_FIND_DATAA*) -> int
26+
declare FindClose(hFindFile: int64) -> int
27+
const INVALID_HANDLE_VALUE: int64 = -1
28+
29+
declare GetLastError() -> uint32
30+
const ERROR_PATH_NOT_FOUND: uint32 = 3
31+
const ERROR_NO_MORE_FILES: uint32 = 18
32+
33+
declare FormatMessageA(
34+
dwFlags: uint32,
35+
lpSource: void*,
36+
dwMessageId: uint32,
37+
dwLanguageId: uint32,
38+
lpBuffer: byte*,
39+
nSize: uint32,
40+
Arguments: void*, # actually va_list*
41+
) -> uint32
42+
const FORMAT_MESSAGE_IGNORE_INSERTS: uint32 = 0x00000200
43+
const FORMAT_MESSAGE_FROM_SYSTEM: uint32 = 0x00001000
44+
45+
else:
46+
import "stdlib/errno.jou"
47+
import "stdlib/intnative.jou"
48+
49+
if LINUX:
50+
# There are two versions of strerror_r(), and the one actually named
51+
# strerror_r() is the wrong one.
52+
declare __xpg_strerror_r(errnum: int, buf: byte*, buflen: intnative) -> int
53+
def strerror_r(errnum: int, buf: byte*, buflen: intnative) -> int:
54+
return __xpg_strerror_r(errnum, buf, buflen)
55+
else:
56+
declare strerror_r(errnum: int, buf: byte*, buflen: intnative) -> int
57+
58+
class DIR:
59+
pass
60+
61+
if LINUX:
62+
class dirent:
63+
d_ino: uint64
64+
d_off: int64
65+
d_reclen: uint16
66+
d_type: byte
67+
d_name: byte[256]
68+
elif MACOS:
69+
# This struct definition was a bit painful to find. One way to find it
70+
# is to run C preprocessor on '#include <dirent.h>' in GitHub Actions.
71+
assert not IS_32BIT
72+
class dirent:
73+
d_ino: uint64
74+
d_seekoff: uint64
75+
d_reclen: uint16
76+
d_namlen: uint16
77+
d_type: byte
78+
d_name: byte[1024]
79+
elif NETBSD:
80+
class dirent:
81+
d_fileno: uint64
82+
d_reclen: uint16
83+
d_namlen: uint16
84+
d_type: byte
85+
d_name: byte[512]
86+
else:
87+
assert False # unsupported system
88+
89+
if NETBSD:
90+
# On NetBSD, "opendir" and "readdir" are legacy functions.
91+
# We can't use them because they generate a linker warning.
92+
# The dirent.h header magically renames them at compile time to the following names.
93+
declare __opendir30(name: byte*) -> DIR*
94+
declare __readdir30(dirp: DIR*) -> dirent*
95+
def opendir(name: byte*) -> DIR*:
96+
return __opendir30(name)
97+
def readdir(dirp: DIR*) -> dirent*:
98+
return __readdir30(dirp)
99+
else:
100+
declare opendir(name: byte*) -> DIR*
101+
declare readdir(dirp: DIR*) -> dirent*
102+
103+
declare closedir(dirp: DIR*) -> int
104+
105+
106+
# Iterating directory contents
107+
@public
108+
class DirIter:
109+
# Inputs given by user
110+
dir: byte*
111+
include_dot_and_dotdot: bool
112+
113+
# Output
114+
path: byte*
115+
name: byte*
116+
117+
error_code: int # errno or GetLastError
118+
error_message: byte[512]
119+
120+
# Internal state
121+
path_list: List[byte]
122+
if WINDOWS:
123+
handle: int64
124+
else:
125+
dir_ptr: DIR*
126+
127+
def set_name(self, name: byte*) -> bool:
128+
if (not self.include_dot_and_dotdot) and (strcmp(name, ".") == 0 or strcmp(name, "..") == 0):
129+
return False
130+
131+
self.path_list.len = 0
132+
self.path_list.extend_from_ptr(self.dir, strlen(self.dir))
133+
134+
if WINDOWS:
135+
if not (ends_with(self.dir, "/") or ends_with(self.dir, "\\")):
136+
self.path_list.append('\\')
137+
else:
138+
if not ends_with(self.dir, "/"):
139+
self.path_list.append('/')
140+
141+
name_start_index = self.path_list.len
142+
self.path_list.extend_from_ptr(name, strlen(name))
143+
self.path_list.append('\0')
144+
145+
self.path = self.path_list.ptr
146+
self.name = &self.path[name_start_index]
147+
return True
148+
149+
@public
150+
def next(self) -> bool:
151+
if self.dir == NULL:
152+
return False
153+
154+
if WINDOWS:
155+
if self.dir[0] == '\0':
156+
# "\\*" would search the root of the current drive. Let's not do that.
157+
self.error_code = ERROR_PATH_NOT_FOUND as int
158+
FormatMessageA(
159+
FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
160+
NULL, self.error_code as uint32, 0, self.error_message, sizeof(self.error_message) as uint32, NULL,
161+
)
162+
return False
163+
164+
# Jou initializes everything to zero, but INVALID_HANDLE_VALUE is
165+
# more appropriate for this. Windows never uses zero as a find
166+
# handle value, so this is fine.
167+
if self.handle == 0:
168+
self.handle = INVALID_HANDLE_VALUE
169+
170+
find_data: WIN32_FIND_DATAA
171+
if self.handle == INVALID_HANDLE_VALUE:
172+
# First file
173+
pattern: byte* = NULL
174+
asprintf(&pattern, "%s\\*", self.dir)
175+
assert pattern != NULL
176+
177+
self.handle = FindFirstFileA(pattern, &find_data)
178+
free(pattern)
179+
found = (self.handle != INVALID_HANDLE_VALUE)
180+
else:
181+
# Not first file
182+
found = (FindNextFileA(self.handle, &find_data) != 0)
183+
184+
while found:
185+
if self.set_name(find_data.cFileName):
186+
return True
187+
found = (FindNextFileA(self.handle, &find_data) != 0)
188+
189+
e = GetLastError() as int
190+
191+
free(self.path_list.ptr)
192+
if self.handle != INVALID_HANDLE_VALUE:
193+
FindClose(self.handle)
194+
*self = DirIter{}
195+
196+
if e != ERROR_NO_MORE_FILES:
197+
# It failed
198+
self.error_code = e
199+
FormatMessageA(
200+
FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
201+
NULL, self.error_code as uint32, 0, self.error_message, sizeof(self.error_message) as uint32, NULL,
202+
)
203+
204+
return False
205+
206+
else:
207+
if self.dir_ptr == NULL:
208+
# This is the first time this is called
209+
self.dir_ptr = opendir(self.dir)
210+
if self.dir_ptr == NULL:
211+
# It failed
212+
*self = DirIter{error_code = get_errno()}
213+
strerror_r(self.error_code, self.error_message, sizeof(self.error_message))
214+
return False
215+
216+
while True:
217+
set_errno(0)
218+
entry = readdir(self.dir_ptr)
219+
if entry == NULL:
220+
# End of directory, or error reading directory
221+
e = get_errno()
222+
free(self.path_list.ptr)
223+
closedir(self.dir_ptr)
224+
*self = DirIter{}
225+
if e != 0:
226+
# It failed
227+
self.error_code = e
228+
strerror_r(e, self.error_message, sizeof(self.error_message))
229+
return False
230+
231+
if self.set_name(entry.d_name):
232+
return True
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
hello there
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
hi
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
this is a file

0 commit comments

Comments
 (0)