-
Notifications
You must be signed in to change notification settings - Fork 18
Open
Description
I think we should probably move towards a model where all ligands (or guests) in each benchmark set have an appropriate, unique, paper-specific numerical compound ID, rather than the current model where this is dependent on what set we're looking at. For example:
- CB7 Tables 1&2: Has unique CID we assigned
- GDCC Tables 3: Has unique CID we assigned, but will get broken if we want to provide structures docked into hosts as there are two hosts but only one set of compound IDs
- GDCC Table 4: Has unique CID we assigned
- CD Table 5 and 6: Has unique CID we assigned
- lysozyme Tables 7 and 8: No CIDs, uses compound names only
- BRD4(1) Table 9: Uses heterogeneous identifiers -- "Compound 4", "alprazolam", "Bzt-7", "JQ1(+)" etc.; this is probably the worst offender since some of these are pretty unsuitable as filenames due to special characters and/or spaces (e.g. some tools can't load files with spaces in their filenames and/or handle some of these special characters).
@GHeinzelmann @nhenriksen - thoughts? My preference I think is to make sure every set has a unique numerical compound ID in the tables and that this is used for all of the relevant files.
Metadata
Metadata
Assignees
Labels
No labels