Skip to content

Proposal of improvements both in the guidelines of the dataset collection and in the tasks for dataset collection #3185

@gabrielmfern

Description

@gabrielmfern

I have been contributing to the dataset for some time now and I have noticed a few things that lead to the following problems:

  • A model that hallucinates too much when changing topics or providing prompts that are random
  • Good messages being low quality or downvoted due to misunderstandings among the people that are classifying it or due to a lack of editing capabilities
  • People not using markdown properly which has been somewhat fixed with the new message editor
  • Some people copying directly from ChatGPT even though this is against the guidelines

I hereby propose the following solutions to the problems I have just bringed to your attention:

  • Add editing capabilities into ALL messages where you are the author
  • Add editing capabilities based on voting for people whom are not the author
  • Add a feature to add comments into messages where you are the author so that everyone understands what is the purpose of something when it is unclear (this does happen)
  • Make some model that detects messages created by ChatGPT, and because of these models having poor accuracy, just warn people when either pasting some dubious text or writing it saying that it will lead to sanctions if they are copying directly from ChatGPT
    • I know that OpenAI has a similar model that tries to do this but with very low accuracy, here is a demo.
  • Make a model that detects when some written text is not using the proper markdown and also warn the user or (if the accuracy of the model is very high) does not allow them to post at all.
    • I suppose this happens to very simple to get training data for, since we just need to get access to some markdown text and copy its rendered result (can be done using some javascript just requires markdown text).
  • Change the guidelines as to allow for either the changing of topics (at least make it obvious that it's allowed) or asking unrelated questions

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions