Skip to content

Provide compound ID for all files #55

@davidlmobley

Description

@davidlmobley

I think we should probably move towards a model where all ligands (or guests) in each benchmark set have an appropriate, unique, paper-specific numerical compound ID, rather than the current model where this is dependent on what set we're looking at. For example:

  • CB7 Tables 1&2: Has unique CID we assigned
  • GDCC Tables 3: Has unique CID we assigned, but will get broken if we want to provide structures docked into hosts as there are two hosts but only one set of compound IDs
  • GDCC Table 4: Has unique CID we assigned
  • CD Table 5 and 6: Has unique CID we assigned
  • lysozyme Tables 7 and 8: No CIDs, uses compound names only
  • BRD4(1) Table 9: Uses heterogeneous identifiers -- "Compound 4", "alprazolam", "Bzt-7", "JQ1(+)" etc.; this is probably the worst offender since some of these are pretty unsuitable as filenames due to special characters and/or spaces (e.g. some tools can't load files with spaces in their filenames and/or handle some of these special characters).

@GHeinzelmann @nhenriksen - thoughts? My preference I think is to make sure every set has a unique numerical compound ID in the tables and that this is used for all of the relevant files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions