Skip to content

The Scripts used in my bachelor thesis "Bioinformatic Prediction of NF-kB Target Genes in the Human Genome"

License

Notifications You must be signed in to change notification settings

fthiemer/BT_NF-kB

Repository files navigation

BT_NF-kB

The Scripts used in my bachelor thesis "Bioinformatic Prediction of NF-kB Target Genes in the Human Genome." Resulting datasets are not put online, as they are intended for internal use.

The figures were reused in the paper by B. Pucker & H. Kadhim et al. (with different numbers: BachelorThesis' 4 -> Papers 2, BT 6 -> P. 4, BT 5 -> P. 6)

For a guide on how to install third party python modules (listed below), which the scripts need to work properly, you can check https://automatetheboringstuff.com/appendixa/ . This ressource is taken from the book "Automate the boring stuff with python" (available for free on https://automatetheboringstuff.com/#toc ), by Al Sweigart, my main ressource for learning the Python used in this thesis.

Dependencies:

:: OS: Everything should run on Windows and MacOS(tested only partially), Linux not tested.

:: Firefox must be installed

:: Third party modules:

                    - re

                    - requests
                    
                    - csv
                    
                    - sys
                    
                    - os
                    
                    - selenium
                    
                    - time
                    
                    - lxml
                    
                    - platform
                    
                    - numpy
                    
                    - matplotlib
                    
                    - scipy
                    
                    - itertools

:: Two Folders in the scripts directory (names mandatory):

- "geckodriver" -> holds the geckodriver utility for your OS (downloadable at https://github.com/mozilla/geckodriver/releases )

- "source_files" -> must hold: 

                               - "gene2ensembl" mapping file from ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/

                               - "cross_references.txt" & "promoter_ensembl.txt" from ftp://ccg.vital-it.ch/epdnew/H_sapiens/006/db/
                               
                               - "all_data.txt" and "promoters.gff3" the result of the scripts by bpucker on https://github.com/bpucker/NFkB
                               - "fasta_sequences_from_-5000_to_+100.fa" downloadable via EPD's select/download tool at https://epd.vital-it.ch/EPDnew_select.php by not entering any ids, pressing select and specifying a range of -5000 to +100 on the website after. Name needs to be adjusted after downloading.

::"Figure_6.py" MUST be executed FIRST and requires some manual steps, which are described in the scripts comments and printed to the console.

:: Most of the scripts depend on the "Bielefeld_basic_python_dicts.py" script, so this must be downloaded and put in the directory the script you want to execute is in.

About

The Scripts used in my bachelor thesis "Bioinformatic Prediction of NF-kB Target Genes in the Human Genome"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages