Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,8 @@ __testing__
# New spans
apps/web/ingest
apps/workers/workspaces/*/traces
provider-logs/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I might have gotten the "missing completion metadata" issue because I didnt have the FILES_STORAGE_PATH, FILE_PUBLIC_PATH, and PUBLIC_FILES_STORAGE_PATH env vars. Adding these created me lots of new files in these folders, so adding them in the gitignore

workspaces/*

# Helm chart secrets (do not commit)
charts/latitude/values.secrets.yaml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,8 +122,8 @@
provider: 'openai',
model: 'gpt-4o',
configuration: {},
input: input as any,

Check warning on line 125 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
output: output as any,

Check warning on line 126 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
}

const disk = diskFactory('private')
Expand Down Expand Up @@ -167,7 +167,7 @@
span: {
...span,
type: SpanType.Prompt,
} as any,

Check warning on line 170 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
hasPassed,
...(error ? { error } : {}),
...(metadata
Expand All @@ -178,10 +178,10 @@
expectedOutput: metadata.expectedOutput ?? 'expected output',
datasetLabel: metadata.datasetLabel ?? 'default',
reason: metadata.reason,
} as any,

Check warning on line 181 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
}
: {}),
} as any)

Check warning on line 184 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type

await createIssueEvaluationResult({
workspace,
Expand Down Expand Up @@ -221,6 +221,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -256,6 +257,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -283,6 +285,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand All @@ -307,6 +310,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand All @@ -327,7 +331,7 @@
})

await createEvaluationResultForIssue({
span: httpSpan as any,

Check warning on line 334 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
metadata: {
reason: 'This should be skipped',
},
Expand All @@ -337,6 +341,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -366,6 +371,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(false)
Expand All @@ -379,6 +385,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -426,6 +433,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -535,6 +543,7 @@
workspace,
commit: commit2,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -642,6 +651,7 @@
workspace,
commit: draftCommit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down Expand Up @@ -713,14 +723,14 @@
span: {
...span2,
type: SpanType.Prompt,
} as any,

Check warning on line 726 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
hasPassed: false,
metadata: {
configuration: llmEvaluation.configuration,
actualOutput: 'actual output',
expectedOutput: 'expected output',
reason: 'LLM evaluation reason',
} as any,

Check warning on line 733 in packages/core/src/data-access/issues/getSpanMessagesAndEvaluationResultsByIssue.test.ts

View workflow job for this annotation

GitHub Actions / node-lint / node-lint

Unexpected any. Specify a different type
} as any)

await createIssueEvaluationResult({
Expand All @@ -733,6 +743,7 @@
workspace,
commit,
issue,
existingEvaluations: [evaluation],
})

expect(result.ok).toBe(true)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,18 @@ import {
EvaluationResultsV2Repository,
} from '../../repositories'
import { Result } from '../../lib/Result'
import { EvaluationResultV2 } from '../../constants'
import {
EvaluationResultSuccessValue,
EvaluationResultV2,
EvaluationV2,
} from '../../constants'
import { Issue } from '../../schema/models/types/Issue'
import { Message as LegacyMessage } from '@latitude-data/constants/legacyCompiler'
import { assembleTraceWithMessages } from '../../services/tracing/traces/assemble'
import { adaptCompletionSpanMessagesToLegacy } from '../../services/tracing/spans/fetching/findCompletionSpanFromTrace'
import { UnprocessableEntityError } from '../../lib/errors'
import { getHITLSpansByIssue } from './getHITLSpansByIssue'
import { getEvaluationMetricSpecification } from '../../services/evaluationsV2/specifications'

export type SpanMessagesWithEvalResultReason = {
messages: LegacyMessage[]
Expand All @@ -29,10 +34,12 @@ export async function getSpanMessagesAndEvaluationResultsByIssue({
workspace,
commit,
issue,
existingEvaluations,
}: {
workspace: Workspace
commit: Commit
issue: Issue
existingEvaluations: EvaluationV2[]
}) {
// Three is enough, as we don't want to overfit or add too many tokens to the prompt
const spansResult = await getHITLSpansByIssue({
Expand Down Expand Up @@ -79,30 +86,37 @@ export async function getSpanMessagesAndEvaluationResultsByIssue({
}

// There will always be exactly one evaluation result for a span and trace id
const evaluationResults = evaluationResultsResult.filter(
const evaluationResult = evaluationResultsResult.find(
(result) =>
result.evaluatedSpanId === span.id &&
result.evaluatedTraceId === span.traceId,
)[0]
)

const evaluation = evaluationResult
? existingEvaluations.find(
(e) => e.uuid === evaluationResult.evaluationUuid,
)
: undefined

messagesAndEvaluationResults.push({
messages: adaptCompletionSpanMessagesToLegacy(completionSpan),
reason: getReasonFromEvaluationResult(evaluationResults),
reason: getReasonFromEvaluationResult(evaluationResult, evaluation),
})
}

return Result.ok(messagesAndEvaluationResults)
}

// We need an efficient way of extracting reasons directly from metadata without fetching evaluations
function getReasonFromEvaluationResult(result: EvaluationResultV2) {
if (result.error || !result.metadata) {
function getReasonFromEvaluationResult(
result: EvaluationResultV2 | undefined,
evaluation: EvaluationV2 | undefined,
): string {
if (!result || !evaluation || result.error) {
return ''
}

// LLM, Rule, and Human evaluations all have a reason field (required for LLM, optional for Rule/Human)
if ('reason' in result.metadata && result.metadata.reason) {
return result.metadata.reason ?? ''
}
return ''
const specification = getEvaluationMetricSpecification(evaluation)
const resultReason = specification.resultReason as (
result: EvaluationResultSuccessValue,
) => string | undefined
return resultReason(result) ?? ''
Comment on lines +118 to +121
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const resultReason = specification.resultReason as (
result: EvaluationResultSuccessValue,
) => string | undefined
return resultReason(result) ?? ''
const resultReason = specification.resultReason(result as EvaluationResultSuccessValue)
return resultReason ?? ''

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TS type complaint with this:

rule.ts(78, 5): 'pattern' is declared here.

}
Original file line number Diff line number Diff line change
Expand Up @@ -99,24 +99,28 @@ export async function generateEvaluationConfigFromIssueWithCopilot(
const copilot = copilotResult.unwrap()

// Get the existing evaluation names for the same commit and document to avoid generating evals with the same name (unique key)
const existingEvaluationNamesResult = await getExistingEvaluationNames({
workspace: workspace,
commit: commit,
issue: issue,
})

if (!Result.isOk(existingEvaluationNamesResult)) {
return existingEvaluationNamesResult
const evaluationsRepository = new EvaluationsV2Repository(workspace.id)
const evaluationsFromSameCommitAndDocumentResult =
await evaluationsRepository.listAtCommitByDocument({
projectId: commit.projectId,
commitUuid: commit.uuid,
documentUuid: issue.documentUuid,
})
if (!Result.isOk(evaluationsFromSameCommitAndDocumentResult)) {
return evaluationsFromSameCommitAndDocumentResult
}
const existingEvaluations =
evaluationsFromSameCommitAndDocumentResult.unwrap()

const existingEvaluationNames = existingEvaluationNamesResult.unwrap()
const existingEvaluationNames = existingEvaluations.map((e) => e.name)

// Getting failed examples (evaluation results with the issue attached) to feed the copilot
const messagesAndReasonWhyFailedForIssueResult =
await getSpanMessagesAndEvaluationResultsByIssue({
workspace: workspace,
commit: commit,
issue: issue,
existingEvaluations,
})

if (!Result.isOk(messagesAndReasonWhyFailedForIssueResult)) {
Expand Down Expand Up @@ -197,31 +201,6 @@ export async function generateEvaluationConfigFromIssueWithCopilot(
return Result.ok(evaluationConfigWithProviderAndModel)
}

async function getExistingEvaluationNames({
workspace,
commit,
issue,
}: {
workspace: Workspace
commit: Commit
issue: Issue
}) {
const evaluationsRepository = new EvaluationsV2Repository(workspace.id)
const evaluationsFromSameCommitAndDocumentResult =
await evaluationsRepository.listAtCommitByDocument({
projectId: commit.projectId,
commitUuid: commit.uuid,
documentUuid: issue.documentUuid,
})
if (!Result.isOk(evaluationsFromSameCommitAndDocumentResult)) {
return evaluationsFromSameCommitAndDocumentResult
}
const existingEvaluations =
evaluationsFromSameCommitAndDocumentResult.unwrap()

return Result.ok(existingEvaluations.map((e) => e.name))
}

export async function getSpansFromSpanAndTraceIdPairs({
spanAndTraceIdPairs,
workspace,
Expand Down
Loading