Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Conversation

@SamSaffron
Copy link
Member

@SamSaffron SamSaffron commented Feb 18, 2025

  • Add non-contiguous search/replace support using ... syntax
  • Add judge support for evaluating LLM outputs with ratings
  • Improve error handling and reporting in eval runner
  • Add full section replacement support without search blocks
  • Add fabricators and specs for artifact diffing
  • Track failed searches to improve debugging
  • Add JS syntax validation for artifact versions in eval system
  • Update prompt documentation with clear guidelines

@SamSaffron SamSaffron changed the title eval edit DEV: improve artifact editing and eval system Feb 19, 2025
@SamSaffron SamSaffron marked this pull request as ready for review February 19, 2025 03:35
Comment on lines +64 to +69
content =
DiscourseAi::Utils::DiffUtils::SimpleDiff.apply(
content,
block[:search],
block[:replace],
)
Copy link
Contributor

@nattsw nattsw Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a chance different blocks might target the same search section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah llms are crazy... but since this applies it one at a time future replaces will just fail.

my next iteration is going to be hijacking the "stream" and interrupting the llm if it starts misbehaving... but going to leave this to next time

Copy link
Contributor

@nattsw nattsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that replace question, we can try this out.

@SamSaffron SamSaffron merged commit 0c94660 into main Feb 19, 2025
6 checks passed
@SamSaffron SamSaffron deleted the eval-edit branch February 19, 2025 04:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants