-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Proposed design:
ProteoFAV's main features:
1 - Reading/parsing formatted files to pandas DataFrames (e.g. mmCIF, PDB, SIFTS XML, DSSP files)
2 - Downloading data files on the fly (e.g. mmCIF, PDB, SIFTS XML, DSSP files)
3 - Fetching sequence annotations (features) (e.g. variants from Ensembl and UniProt)
4 - Merging all the previous data onto a main DataFrame
With this in mind, I think would be great to have a structure like this:
proteofav.mmCIF.read()
proteofav.mmCIF.write()
proteofav.mmCIF.download()
proteofav.mmCIF.select()
proteofav.PDB.read()
proteofav.PDB.write()
proteofav.PDB.download()
proteofav.PDB.select()
proteofav.DSSP.read()
proteofav.DSSP.download()
proteofav.DSSP.select()
proteofav.SIFTS.read()
proteofav.SIFTS.download()
proteofav.SIFTS.select()
proteofav.Validation.read()
proteofav.Validation.download()
proteofav.Validation.select()
proteofav.Annotations.read()
proteofav.Annotations.download()
proteofav.Annotations.select()
proteofav.Variants.fetch()
proteofav.Variants.select()
proteofav.Tables.merge()
proteofav.Tables.generate()
Classes generally have the following basic methods
- read - read/parse from file
- write - write output to a file
- download - downloads data to a file (mmCIF, etc.)
- fetch - downloads data to the handle, but can be cached (JSON, etc.)
- merge - merge any set of DataFrames, so each DataFrame should be aware of what type of data it contains
- generate - automated table generation by input (i.e. input PDB ID/CHAIN ID or input UniProt ID)
Reactions are currently unavailable