charfreq-rs 🦀

Count the occurrences of characters in a codebase or any directory.

A Rust rewrite of https://github.com/plumj-am/char-freq.

The original Python implementation was created to determine the symbols I use most when writing code so I could optimise the layout on my split keyboard.

My first actual project written in Rust outside of learning/exercises so this was mostly for practice.

If improvements can be made, please open a PR or issue!

Usage:

Install

cargo install charfreq

Run

Usage: charfreq [OPTIONS] --dir <REPO_PATH>

Options:
  -d, --dir <REPO_PATH>            Path to the repository
  -t, --top <TOP>                  Number of top characters to display [default: 20]
  -s, --show-spaces                Include spaces and whitespace characters in the output
  -e, --exclude-letters            Exclude all letters (A-Z, a-z) from the output
  -c, --csv                        Save results as CSV in the current working directory
  -v, --verbose                    Show files with errors during the scan (usually invalid file types)
  -i, --ignore <IGNORE_FILETYPES>  Additional filetypes to ignore (comma-separated or once for each filetype)
  -I, --ignore-dir <IGNORE_DIRS>   Additional directories to ignore (comma-separated or once for each directory)
  -h, --help                       Print help

Example:

$ ./charfreq -d ~/projects/charfreq-rs --top 5 --exclude-letters

Will show the top 5 non-alphabetic characters in a codebase.

[!NOTE] Many filetypes (e.g. .exe, .mp3) and directories (e.g.node_modules/, .idea/) are ignored by default.

A full list of ignored filetypes and directories can be found in src/scanner.rs.

Benchmarks

Test

Tool: hyperfine

Tested on:

Linux kernel source tree: torvalds/linux
90_958 files
1_533_310_419 characters

Hardware:

CPU: i5-13600KF @5.3GHz (OC)
RAM: 2x16GB DDR5 G.Skill Z5 Trident @7000MT/s (OC)
MOBO: Gigabyte Z790 AORUS ELITE AX
SSD: Kingston SKC3000S1024G NVME SSD
OS: NixOS 25.11 (Xantusia) x86_64
KERNEL: Linux 6.17.2-zen1

$ hyperfine --warmup=10 --runs=10 --shell=NONE
  'python3 ./char-freq/char_freq.py ./linux'
  './charfreq-rs/target/release/charfreq -d ./linux'

^ Compares the latest version to the original Python script.

Latest results

Benchmark 1: python3 ./char-freq/char_freq.py ./linux
  Time (mean ± σ):     35.116 s ±  0.169 s    [User: 34.792 s, System: 0.284 s]
  Range (min … max):   34.886 s … 35.351 s    10 runs

Benchmark 2: ./charfreq-rs/target/release/charfreq -d ./linux
  Time (mean ± σ):     168.5 ms ±  18.0 ms    [User: 2005.4 ms, System: 573.9 ms]
  Range (min … max):   152.0 ms … 206.5 ms    10 runs

Summary
  ./charfreq-rs/target/release/charfreq -d ./linux ran
  208.45 ± 22.25 times faster than python3 ./char-freq/char_freq.py ./linux

TL;DR: The latest Rust version is ~208x faster than the original Python script.

Improvements

Testing
Push performance further

License

This project is licensed under the MIT license (LICENSE or http://opensource.org/licenses/MIT)

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github		.github
src		src
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
example.csv		example.csv
prepare_release.nu		prepare_release.nu
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

charfreq-rs 🦀

Usage:

Install

Run

Benchmarks

Test

Latest results

Improvements

License

About

Uh oh!

Releases 3

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

charfreq-rs 🦀

Usage:

Install

Run

Benchmarks

Test

Latest results

Improvements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors

Uh oh!

Languages