Skip to content

Conversation

bnodir
Copy link
Contributor

@bnodir bnodir commented Oct 19, 2024

Purpose

Fix for:

In this PR, I implemented the following approach:

From #2049 (comment)
A third approach: We could always send back a list of the possible citations in the context, and require that a valid citation matches one of those. They're already there now, I believe, in the data_points, but they could also be in a separate list like citations: ["bla.pdf", ...]

Before
image

After
image

Test strings
◇Testing [km/h], [speed.pdf], [Menu]-[Sub-menu]-[Command], report.txt [summary.pdf], [[document.docx]], [[[file.docx]]], [note1]], [[[note2]], [cm], [m], [kg], [g], [L], [ml], [°C], [°F].◆
console-export_[Streaming]=ON.log
console-export_[Streaming]=OFF.log

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[x] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[x] No

Type of change

[x] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@pamelafox pamelafox merged commit fd81b69 into Azure-Samples:main Oct 23, 2024
10 checks passed
@bnodir bnodir deleted the bugfix/citation-v2 branch October 24, 2024 10:41
@bnodir bnodir restored the bugfix/citation-v2 branch October 24, 2024 10:41
@bnodir bnodir deleted the bugfix/citation-v2 branch October 24, 2024 10:41
@pamelafox
Copy link
Collaborator

@bnodir I've had a reported regression for customers with whitespace in their filenames. This PR used /^[^\s]+.[a-zA-Z0-9]+/ for the regex, could we loosen that to allow internal whitespace?

@bnodir bnodir restored the bugfix/citation-v2 branch October 25, 2024 11:18
@bnodir
Copy link
Contributor Author

bnodir commented Oct 25, 2024

@pamelafox - Sorry for the regression. I was able to reproduce it on my side. Here is my suggestion, what do you think:

  • Reproduce:
    With const regex = /^[^\s]+\.[a-zA-Z0-9]+/;

image

  • Fix for regression:
    Adjust the regex to allow internal whitespace in the filenames by using const regex = /^[^\s]+(\s[^\s]+)*\.[a-zA-Z0-9]+(#.*)?$/;

image
image
→ There’s still an issue where, if the PDF filename contains whitespace, the fragment identifier (e.g., #page=8) isn’t included at the end. This seems to be a pre-existing issue—the PDF opens when clicking the citation, but it doesn’t jump to the specified page. Could you please check it?

@nickmachairas
Copy link

Pardon my jumping in. This is definitely an issue for my new deployments, ever since AnswerParser.tsx was significantly changed. I tried const regex = /^[^\s]+(\s[^\s]+)*\.[a-zA-Z0-9]+(#.*)?$/; and it is very inconsistent, with or without whitespaces and/or pages at the end.

For example, [GSDG Ex. B - 6Cle Rx. Klepper.pdf#page=54] and [GSDG Ex. B - 6Cle Rx. Klepper.pdf] are both parsed as expected and converted to links, however, [6-4-14_1455x_RE STAGNEW Rex CAT 16.pdf] is not.

What's even more baffling is that the same sources that were parsed to links sometimes will not on a new chat session and vise versa.

@bnodir
Copy link
Contributor Author

bnodir commented Oct 26, 2024

I apologize for the inconvenience caused by the recent changes. I’ll work on resolving this issue to the best of my ability over the weekend, though I can’t guarantee a complete solution. If needed, I may also consider requesting a reversion of the changes from this PR. Thank you for your patience.

@nickmachairas
Copy link

No worries, please let me know if you'd like me to test any changes. Thanks

@bnodir
Copy link
Contributor Author

bnodir commented Oct 27, 2024

@nickmachairas - I came up with const regex = /.+\.\w{1,}(?:#\S*)?$/ and tested it on my end, confirming that it works. Could you please verify it as well?

Tested Strings.txt

[Streaming=ON]console-export-2024-10-27_12-59-18.txt

image

[Streaming=OFF]console-export-2024-10-27_13-4-16.txt
image

// Function to validate citation format and check if dataPoint includes citationCandidate
function isCitationValid(contextDataPoints: any, citationCandidate: string): boolean {
    const regex = /.+\.\w{1,}(?:#\S*)?$/;
    console.log("🌐 Checking citation format for:", citationCandidate);
    if (!regex.test(citationCandidate)) {
        console.log("✖ Citation format is invalid:", citationCandidate);
        return false;
    }
    console.log("✔ Citation format is valid.");

    // Check if contextDataPoints is an object with a text property that is an array
    let dataPointsArray: string[];
    if (Array.isArray(contextDataPoints)) {
        dataPointsArray = contextDataPoints;
        console.log("📂 contextDataPoints is an array:", dataPointsArray);
    } else if (contextDataPoints && Array.isArray(contextDataPoints.text)) {
        dataPointsArray = contextDataPoints.text;
        console.log("📂 contextDataPoints.text is an array:", dataPointsArray);
    } else {
        console.log("✖ contextDataPoints does not contain a valid array.");
        return false;
    }

    // Check if any data point includes the citationCandidate
    const isValidCitation = dataPointsArray.some(dataPoint => {
        console.log("🔍 Checking dataPoint:", dataPoint);
        return dataPoint.startsWith(citationCandidate);
    });

    console.log(isValidCitation ? "✔ Citation candidate found in data points." : "✖ Citation candidate not found in data points.");
    return isValidCitation;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants