feat(output): implement JSON output by Dustin-Jiang · Pull Request #1800 · sharkdp/fd

Dustin-Jiang · 2025-09-30T15:01:56Z

Implement --yaml switch for YAML format output, which allows using yq or nushell to interact with the result.

Actually I was initially working on #1765, but could not reach to a perfect state. JSON afraids trailing commas, making it not so easy to statelessly stream the result without a buffer. Considering a large number of potiential results, it would be memory-consuming to store all lines and print them at last.

On the other side, the YAML format, while infamous for its complexity on parsing, is friendly for streaming output. No need for serializing or extra dependencies, the simple and fast write! is all you need.

There are tools like yq that work just like beloved jq, and nushell supports YAML as well, so I suggest this PR might be able to close #1765.

tmccombs · 2025-09-30T15:37:03Z

I think i would really prefer json output. There are more tools that would support that as input.

As for the streaming problem, I think there are a couple of ways to handle that:

Write the comma at the beginning of each entry instead of the end
Use newline delimited json (ndjson) instead of a json array. Tools like jq can still parse this.
Just don't write the last item until we either have the next one, or have reached the end, so we know if we need a comma or not

Dustin-Jiang · 2025-09-30T18:17:33Z

I think i would really prefer json output. There are more tools that would support that as input.

Yeah definitely, but the original functions in output.rs were stateless, so I have to refactor it to make it stateful.

The good news is, fortunately, that we gain a little bit performance improvement after refactoring, maybe because of less param passing (I guess)?

tmccombs

Should also have some tests, and the man page should be updated.

As well as the things I commented on above.

src/config.rs

src/output.rs

src/walk.rs

doc/fd.1

src/cli.rs

src/output.rs

tmccombs · 2025-10-15T07:21:54Z

src/output.rs

+    // Try to convert to UTF-8 first
+    match std::str::from_utf8(bytes) {
+        Ok(utf8_str) => {
+            let escaped: String = utf8_str.escape_default().collect();


I don't think collecting into an intermediate string is strictly necessary, if PathEncoding included a lifetime based on the path.

collecting into an intermediate string

I'm a little bit confused, since the escape_default always allocates new memory.

In my understanding, to avoid an intermediate string, I should borrow the utf8_str and store in the PathEncoding as Cow<'a, str>, and escape it when write!().

But the PathEncoding is created per file and released after the certain file is printed (or not?), and it is a must (or not?) to .escape_default().collet() into a String when escaping, so I can't see differences between.

But I'm just a newbie to Rust and I'm pretty sure you are right :), so looking forward to your further suggestion.

So, you would define PathEncoding like

enum<'a> PathEncoding<'a> { Utf8(EscapeDefault<'a>), Bytes(&'a [u8]), }

Then this would be defined like:

fn encode_path(path: &Path) -> PathEncoding<'_> { match path.to_str() { Some(utf8) => PathEncoding::Utf8(utf8.escape_default()), None => PathEncoding::Bytes(path.as_os_str().as_encoded_bytes()), } }

And FileDetail would need a lifetime parameter as well.

I would strongly encourage y'all to look at how ripgrep encodes paths. And in particular, you really really really should not be using WTF-8 in your output format. From the WTF-8 spec:

Any WTF-8 data must be converted to a Unicode encoding at the system’s boundary before being emitted. UTF-8 is recommended. WTF-8 must not be used to represent text in a file format or for transmission over the Internet.

You'll also want to check how PathEncoding serializes. Sometimes just serializing a &[u8] leads to something sub-optimal (like an array of integers or something). This is why ripgrep base64 encodes data that isn't valid UTF-8.

src/output.rs

tavianator · 2025-11-04T13:33:17Z

Just want to throw out a reference to libxo as my favourite way for command line programs to support structured output formats. Not sure there's something like that in the Rust ecosystem.

tmccombs

I few minor suggestions, but I think it's close.

doc/fd.1

src/walk.rs

CHANGELOG.md

tmccombs · 2025-11-12T06:11:58Z

src/output.rs

+        write!(self.stdout, "}}")
+    }
+
+    fn print_entry_detail(&mut self, format: OutputFormat, entry: &DirEntry) -> io::Result<()> {


nit: We could potentially combine this with the print_entry_json_obj method now, since it is only used for jsonl.

OTOH, it may be useful in the future if we add a yaml format later, or maybe a tabular format?

src/output.rs

phiresky · 2025-11-24T12:18:10Z

It would be useful if there was a distinction between fd --json invocations that do stat calls vs ones that don't. Because for large directories, the stat calls are what dominates runtime, and fd can run e.g. 10x faster without them.

Compare for example

strace -f --summary-only fd -I >/dev/null

with

strace -f --summary-only fd -I -S +0B >/dev/null

The second call is much slower, because the -S forces fd to use stat to find file sizes, without it does not.

For JSON output, there's no reason to not output all the info we have - but we don't have the stat info by default.

For example the param could be called basic --json=basic and --json=stat.

tmccombs · 2025-11-26T07:50:38Z

There isn't much we can output without a full stat call. Just the filename, the file type, and the inode number (on unix).

tmccombs · 2025-11-26T07:52:41Z

@sharkdp since this is a pretty significant change, I'd like to confirm you are ok with this change before merging it.

sharkdp · 2025-11-28T19:28:26Z

I'd like to confirm you are ok with this change before merging it.

Thank you for asking. This looks like a great feature to have! And thank you for your contribution, @Dustin-Jiang!

While I'm here… the format that we introduce here is something that users will depend on, so it is worth investing some time to come up with a first version of this format that we can hopefully depend on for a long time:

In particular, I would like us to consider using the same format as ripgrep for paths (see this comment by BurntSushi).
Also, I would really like to see a unit being included in the size-field (size_bytes).
I was also wondering if "mode" should be a string? It's not supposed to be read as a decimal number, so serializing it as such feels wrong(?)

tmccombs · 2025-11-29T08:32:54Z

I was also wondering if "mode" should be a string? It's not supposed to be read as a decimal number, so serializing it as such feels wrong

I'm kind of split on this. Decimal is a pretty bad format for human consumption. But if this json is then consumed programatically, it is probably more convenient to have it as a number than a string.

Perhaps using the octal encoding as a string could be a good middle graound?

tmccombs · 2025-11-29T08:33:20Z

@Dustin-Jiang, if you want, I could help make those changes to this PR.

sharkdp · 2025-11-29T18:53:17Z

I was also wondering if "mode" should be a string? It's not supposed to be read as a decimal number, so serializing it as such feels wrong

I'm kind of split on this. Decimal is a pretty bad format for human consumption. But if this json is then consumed programatically, it is probably more convenient to have it as a number than a string.

Yes, exactly. The problem with human consumption is that, even if unlikely, it is technically possible for a mode to be ----rwxrwx, and the current format would serialize this as "mode": 77, which is confusing/ambiguous. "mode": "077" would be pretty clear.

I agree that we should probably optimize for programmatic consumption, though. But even then, the string seems more practical to me? Because I will probably want to split the number by owner/group/others, and that is much easier if it's already in this string notation. Converting a single digit from a character to an integer should be an operation that is easily available in every programming language.

It gets more tricky actually if we consider stricky/setgid/setuid where the mode can be something like 4777. That suggests that we should always serialize it as a four-character string, possibly with a leading zero?

Perhaps using the octal encoding as a string could be a good middle graound?

How is that different from what I am proposing? Something like "0644" or "4777" would be the octal encoding as a (fixed length) string, right? Or would you suggest "0o0644" / "0o4777"?

tmccombs · 2025-11-30T01:49:13Z

How is that different from what I am proposing?

Sorry, I meant as opposed to something like "rwxr-x---", which would probably be the most uset friendly but not very computer friendly in most cases.

This also addresses feedback from json PR: - mode is output as a string, using the octal representation - path uses the same format as the ripgrep output - use "size_bytes" instead of "size" to make the unit more clear Also, I fixed an issue where the mode included high bytes that are actually used to encode the filetype (at least on Linux).

tmccombs · 2025-12-23T09:03:42Z

src/fmt/json.rs

+    // as_encoded_bytes() isn't necessarily stable between rust versions
+    // so the best we can really do is a lossy string
+    #[cfg(not(unix))]
+    write!(out, r#""path":{{"text":{:?}}}"#, path.to_string_lossy())?;


This is what ripgrep does. Although I'm not sure if it would be better to do one of the following:

Use OsStr::as_encoded_bytes. Currently, on windows, I think this uses WTF-8, but the documentation makes it clear this isn't guaranteed, and could change in a future version of rust.

Output as (invalid) UTF-16, base64 encoded on windows. Probably not what most people would expect, but at least doesn't lose data.

Explicitly output using WTF-8 by converting the OsStr to [u16] (or an iterater over wide chars), then back to wtf8. Possibly using the "wtf8" crate. It seems wasteful, and not very performant, but at least we aren't reliant on rust's current implementation.

Yes, ripgrep has been doing this lossy encoding for years. So far there hasn't been a single complaint. I think non-UTF-16 paths are extremely rare on Windows. Much rarer than non-UTF-8 paths on Unix.

2. Output as (invalid) UTF-16, base64 encoded on windows. Probably not what most people would expect, but at least doesn't lose data.

If you want to avoid lossiness, I think this is the best option. ripgrep also does this for invalid UTF-8. (It looks like this PR does it too.) Namely, this avoids making it too easy to spread WTF-8 as an interchange format:

WTF-8 must not be used to represent text in a file format or for transmission over the Internet.

This also addresses feedback from json PR: - mode is output as a string, using the octal representation - path uses the same format as the ripgrep output - use "size_bytes" instead of "size" to make the unit more clear Also, I fixed an issue where the mode included high bytes that are actually used to encode the filetype (at least on Linux).

Unless we use binary output

Dustin-Jiang added 4 commits September 30, 2025 22:13

feat(output): add yaml output

ffec773

docs: update CHANGELOG

97d6bfb

fix: escape path, and disable permission display on Windows

cdb7bc0

fix(ci): fix warnings in cargo clippy

c9cbb25

Dustin-Jiang added 4 commits October 1, 2025 01:47

refactor: make output stateful

192ca92

feat(output): add json output

602b389

fix(ci): fix warnings in cargo clippy

eb2f10d

fix: fix function calling in Windows

4ea0ed3

Dustin-Jiang changed the title ~~feat(output): implement YAML output~~ feat(output): implement YAML and JSON output Sep 30, 2025

fix: fix reference mutable type annotation in Windows

e225946

tmccombs requested changes Oct 2, 2025

View reviewed changes

tmccombs requested a review from sharkdp October 2, 2025 07:54

Dustin-Jiang added 8 commits October 10, 2025 23:33

fix: resolve suggested changes

977ee0e

fix: move JSON array printing to Printer

6434ee5

feat: implement NDJSON output

b2d385f

fix(ci): fix warnings in cargo clippy

703b32f

tests: add tests for --output flags

650e86c

docs: update manpage for --output flags

2e463c7

Merge branch 'master' into feature-yaml

e46cce0

tests: fix invalid utf8 base64 test

10570e9

Dustin-Jiang requested a review from tmccombs October 15, 2025 07:22

tmccombs reviewed Oct 15, 2025

View reviewed changes

Dustin-Jiang added 3 commits October 15, 2025 16:39

docs: update manpage to change "ndjson" to "jsonl"

949a5aa

fix: change FileDetail creating logic and base64 import

cb3ef97

fix: change ndjson flag to commonly used jsonl

60ecc09

tmccombs reviewed Oct 28, 2025

View reviewed changes

src/output.rs Outdated Show resolved Hide resolved

Dustin-Jiang added 2 commits October 29, 2025 23:37

fix: replace String to &str with lifetime, adopt as_encoded_bytes

7c9f1d8

feat: add --json flag for JSONL output

2c1bdb5

Merge remote-tracking branch 'origin/master' into feature-yaml

e4741c0

Dustin-Jiang added 3 commits November 8, 2025 20:07

Merge remote-tracking branch 'origin/master' into feature-yaml

4a0ecc5

fix: remove the --output flag

56d347e

tests: fix --output tests to --json

6db4409

tmccombs requested changes Nov 12, 2025

View reviewed changes

Dustin-Jiang added 4 commits November 13, 2025 18:54

docs: add fields explaination in manual

c2c8497

docs: change the flag to --json in CHANGELOG.md

13d0868

fix(printer): make Priner.stdout private

47ee6ce

Merge remote-tracking branch 'origin/master' into feature-yaml

58b90f7

tmccombs approved these changes Nov 13, 2025

View reviewed changes

phiresky reviewed Nov 20, 2025

View reviewed changes

src/output.rs Outdated Show resolved Hide resolved

Dustin-Jiang added 2 commits November 20, 2025 22:14

fix: use jiff::Timestamp::try_from to process SystemTime

8f36886

Merge remote-tracking branch 'origin/master' into feature-yaml

64da6b5

tmccombs changed the title ~~feat(output): implement YAML and JSON output~~ feat(output): implement JSON output Nov 23, 2025

tmccombs mentioned this pull request Nov 23, 2025

Implement --list-details natively #1845

Open

tmccombs added 2 commits December 23, 2025 01:49

Merge branch 'master' into feature-yaml

766d684

tmccombs reviewed Dec 23, 2025

View reviewed changes

tmccombs added 2 commits December 24, 2025 01:01

fix: Use path separator in json output

2adb30d

Unless we use binary output

tmccombs mentioned this pull request Dec 29, 2025

feat(output): Add --json option #1866

Open

Uh oh!

Conversation

Dustin-Jiang commented Sep 30, 2025

Uh oh!

tmccombs commented Sep 30, 2025

Uh oh!

Dustin-Jiang commented Sep 30, 2025

Uh oh!

tmccombs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tmccombs Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Dustin-Jiang Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

tmccombs Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

BurntSushi Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tavianator commented Nov 4, 2025

Uh oh!

tmccombs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tmccombs Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

phiresky commented Nov 24, 2025

Uh oh!

tmccombs commented Nov 26, 2025

Uh oh!

tmccombs commented Nov 26, 2025

Uh oh!

sharkdp commented Nov 28, 2025

Uh oh!

tmccombs commented Nov 29, 2025

Uh oh!

tmccombs commented Nov 29, 2025

Uh oh!

sharkdp commented Nov 29, 2025

Uh oh!

tmccombs commented Nov 30, 2025

Uh oh!

tmccombs Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

BurntSushi Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

BurntSushi Dec 24, 2025 •

edited

Loading