Add LLM benchmarking proposal. by crertel · Pull Request #31 · NixOS/GSoC

crertel · 2026-02-07T07:39:07Z

No description provided.

trueNAHO · 2026-02-14T18:31:16Z

ideas/2026.md

+
+## Generative Nix: Surveying LLM Proficiency In NixOS
+
+Effort: small (90 hours)


I would not trust any survey that took less than 100 hours to conduct.

For reference, see NixOS/nixpkgs#410741 (comment) for a possible "survey", although this one cannot be conducted externally.

Survey in the sense of "see what's out there", as one might survey a landscape to make a map--not survey as in "let's poll a bunch of people". Sorry for any miscommunication.

Survey in the sense of "see what's out there", as one might survey a landscape to make a map--not survey as in "let's poll a bunch of people".

IIUC, you want to benchmark and rank LLMs to determine the currently best one for Nix. With LLMs constantly being obsoleted by better ones, would it not be better to establish a benchmark suite for continously updating the ranking instead of providing a one-time ranking?

Delegating this effort to the Nix community sounds like a lot of effort, when IMHO LLMs should be the ones promoting and declaring their domain proficiencies.

Either way, take my input with a grain of salt because I am not really interested in using LLMs.

The deliverables for this project include exactly that, a reusable selection of benchmarks for that purpose.

Eveeifyeve · 2026-02-23T09:45:15Z

This would be useful for #32

crertel added 3 commits February 7, 2026 01:38

Add LLM benchmarking proposal.

acd1b62

Update 2026.md

0812004

Update 2026.md

d352f58

trueNAHO suggested changes Feb 14, 2026

View reviewed changes

tomberek approved these changes Feb 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLM benchmarking proposal.#31

Add LLM benchmarking proposal.#31
crertel wants to merge 3 commits intoNixOS:mainfrom
crertel:patch-1

crertel commented Feb 7, 2026

Uh oh!

trueNAHO Feb 14, 2026

Uh oh!

crertel Feb 14, 2026

Uh oh!

trueNAHO Feb 14, 2026

Uh oh!

crertel Feb 15, 2026

Uh oh!

Eveeifyeve commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		## Generative Nix: Surveying LLM Proficiency In NixOS

		Effort: small (90 hours)

Uh oh!

Conversation

crertel commented Feb 7, 2026

Uh oh!

trueNAHO Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

crertel Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

trueNAHO Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

crertel Feb 15, 2026

Choose a reason for hiding this comment

Uh oh!

Eveeifyeve commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants