Skip to content

Conversation

@ParthSareen
Copy link

@ParthSareen ParthSareen commented Dec 12, 2024

Rules are currently sorted by key name as they're in a map. If there is a nested schema, the wrong key might get sampled first, reducing accuracy of the generation.

This PR introduces an ordered_map to maintain the ordering of rules in the SchemaConverter

@ParthSareen ParthSareen force-pushed the enable-key-ordering-grammars branch from dccb5f5 to 3b2601d Compare December 12, 2024 01:27
@ParthSareen ParthSareen changed the title llama: enable key ordering in rules for grammar llama: enable maintaining key ordering in rules for grammar Dec 12, 2024
@ParthSareen ParthSareen marked this pull request as ready for review December 12, 2024 02:34
@ggerganov
Copy link
Member

Could you provide an example to understand when the problem occurs?

@ParthSareen
Copy link
Author

ParthSareen commented Dec 12, 2024

Sure @ggerganov !

Pulled out a portion of json-schema-to-grammar.cpp:

class SchemaConverter {
private:
    std::map<std::string, std::string> _rules;

public:
    std::string _add_rule(const std::string & name, const std::string & rule) {
        std::string esc_name = name;
        if (_rules.find(esc_name) == _rules.end() || _rules[esc_name] == rule) {
            _rules[esc_name] = rule;
            return esc_name;
        } else {
            int i = 0;
            while (_rules.find(esc_name + "-" + std::to_string(i)) != _rules.end() && 
                   _rules[esc_name + "-" + std::to_string(i)] != rule) {
                i++;
            }
            std::string key = esc_name + "-" + std::to_string(i);
            _rules[key] = rule;
            return key;
        }
    }

    const std::map<std::string, std::string>& get_rules() const {
        return _rules;
    }
};

void test_order() {
    SchemaConverter converter;
    std::string rule_name;
    rule_name = converter._add_rule("test", "rule1");
    rule_name = converter._add_rule("test", "rule2");
    rule_name = converter._add_rule("beta", "rule3");
    rule_name = converter._add_rule("alpha", "rule4");

    for (const auto& [key, value] : converter.get_rules()) {
        std::cout << key << " = " << value << "\n";
    }
}

The output of this map loses order and gets sorted by key name:

alpha = rule4
beta = rule3
test = rule1
test-0 = rule2

Through the testing of this I actually found that even unordered_map will lose insertion order and orders keys based on an internal hash. This means that currently when grammars are generated and then used for sampling, the incorrect key might get sampled first which depends on the data of a nested or prior key.

Going to close this PR out as it does not address this issue. json-schema-to-grammar.cpp needs to maintain the ordering as rules are added into the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants