Skip to content

Developing a more modular approach #45

@biomadeira

Description

@biomadeira

Proposed design:

ProteoFAV's main features:
1 - Reading/parsing formatted files to pandas DataFrames (e.g. mmCIF, PDB, SIFTS XML, DSSP files)
2 - Downloading data files on the fly (e.g. mmCIF, PDB, SIFTS XML, DSSP files)
3 - Fetching sequence annotations (features) (e.g. variants from Ensembl and UniProt)
4 - Merging all the previous data onto a main DataFrame

With this in mind, I think would be great to have a structure like this:

proteofav.mmCIF.read() 		
proteofav.mmCIF.write() 
proteofav.mmCIF.download()
proteofav.mmCIF.select()
proteofav.PDB.read()
proteofav.PDB.write()
proteofav.PDB.download()
proteofav.PDB.select()
proteofav.DSSP.read()
proteofav.DSSP.download()
proteofav.DSSP.select()
proteofav.SIFTS.read()
proteofav.SIFTS.download()
proteofav.SIFTS.select()
proteofav.Validation.read()
proteofav.Validation.download()
proteofav.Validation.select()
proteofav.Annotations.read()
proteofav.Annotations.download()
proteofav.Annotations.select()
proteofav.Variants.fetch()
proteofav.Variants.select()
proteofav.Tables.merge()
proteofav.Tables.generate()

Classes generally have the following basic methods

  • read - read/parse from file
  • write - write output to a file
  • download - downloads data to a file (mmCIF, etc.)
  • fetch - downloads data to the handle, but can be cached (JSON, etc.)
  • merge - merge any set of DataFrames, so each DataFrame should be aware of what type of data it contains
  • generate - automated table generation by input (i.e. input PDB ID/CHAIN ID or input UniProt ID)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions