All data and source code can be obtained from the project GitHub repository: https://github.com/anchitaaghag/Codon-Optimization-for-Biopharming
All of the data and code in these files can be used in the following order:
For an overview of the project and a quick how-to.
If license information is required.
The current file with repository details.
A. Kazusa_Codon_Usage_Data Sub-Folder
1. The Using_Command_Line_to_Process_Kazuza_CU.md document lists how to process codon counts data obtained from the Kazusa database. The associated files are also provided in this sub-folder:
a) Kazuza_CU_for_each_CDS_Format.txt
b) Kazuza_CU_for_each_CDS_in_N_benthamiana.txt
c) Kazuza_CU_for_each_CDS_in_N_tabacum.txt
d) N_benthamiana_Codon_Counts_Only.txt
e) N_tabacum_Codon_Counts_Only.txt
B. Build_Codon_Usage_Table.R
C. Functions_For_Stat.R
D. Statistical_Analysis_of_Codon_Usage.R
E. Features_Data.csv (if required)
F. NCBI_Data.csv (if required)
G. Updated_Codon_Usage_Information.csv (to be used in the Statistical Analysis R file)
A. Updated_Codon_Usage.txt
B. Reverse_Translate_Function.R
C. Codon_Optimization_Script.R
D. Example_Protein_Sequences.txt
E. Example_Results.txt
A. The Tool_Testing_Data.xlsx spreadsheet provides an overview of the steps taken and data obtained from SolGenomics tBLASTn and CAIcal Server outputs.
B. CO_Tool_Input_Protein_Sequences.txt
C. The CO_Tool_Output_DNA_Sequences Sub-Folder contains the results after running the CO_Tool_Input_Protein_Sequences.txt file through the codon optimization tool ten times. The resulting files are included:
1. Run_1_Results.txt
2. Run_2_Results.txt
3. Run_3_Results.txt
4. Run_4_Results.txt
5. Run_5_Results.txt
6. Run_6_Results.txt
7. Run_7_Results.txt
8. Run_8_Results.txt
9. Run_9_Results.txt
10. Run_10_Results.txt
D. Sol_Genomics_DNA_Sequences.txt
E. CAI_Values.txt (created from the Tool_Testing_Data.xlsx spreadsheet)
F. GC_Values.txt (created from the Tool_Testing_Data.xlsx spreadsheet)
G. Tool_Testing.R