Installation ======================================= Install from source -------------------------------------- .. code-block:: bash git clone https://github.com/algaebrown/Metadensity.git cd Metadensity # Install dependencies conda env create -n Metadensity --file environment.yaml conda activate Metadensity # Install Metadensity cd Metadensity pip install -e . Build annoatations -------------------------------------- 1. Step 1: Download annoatations .. code-block:: bash # on bash, download gencode wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/gencode.v40.primary_assembly.annotation.gff3.gz wget https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_40/GRCh38.primary_assembly.genome.fa.gz # Unzip it gunzip gencode.v40.primary_assembly.annotation.gff3.gz 2. compile config.ini file (Using gencode 40 as an example) Example file is in `here `_ This is a file that stores all the annoatations .. code-block:: bash [FILES] GENOME_FA=/home/hsher/gencode_coords/GRCh38.p13.genome.fa GENCODE=/home/hsher/gencode_coords/gencode.v33.annotation.gff3 # OTHER BRANCHPOINT=/home/hsher/gencode_coords/branchpoint_hg38.bed BRANCHPOINT_PRED=/home/hsher/gencode_coords/gencode_v26_branchpoints_pred.csv POLYA=/home/hsher/gencode_coords/polyA.atlas.clusters.2.0.GRCh38.96.bed MIRNA=/home/hsher/gencode_coords/miR/hsa_hg38.gff3 SNORNA=/home/hsher/gencode_coords/snoDB_hg38.tsv LNCRNA=/home/hsher/gencode_coords/lncipedia_5_2_hc_hg38.gff # processed from GENCODE TRANSCRIPT=/home/hsher/gencode_coords/gencode.v33.transcript.gff3 FEATURE=/home/hsher/gencode_coords/gencode.v33.combine.sorted.gff3 # PARSED PICKLE DATADIR=/home/hsher/projects/Metadensity/metadensity/data/hg38 * Fill in GENOME_FA, GENCODE as the files that you just downloaded. * The fields under OTHER are optional. See here for download `link `_ * TRANSCRIPT and FEATURE is generated from the following steps. Simply replace .gff with .transcript.gff and combined.sorted.gff3 * Lastly, DATADIR is also generated from the steps below. Simply put any directory that can hold lots of data. 3. Generate DATADIR use `this notebook `_ to generate. This notebooks takes CONFIG.ini and parse all the annotations into a dictionary. 4. Generate TRANSCRIPT and FEATURE from GENCODE and UCSC canonical transcripts Download canonical transcripts `here `_. Use `this script `_ to generate TRANSCRIPT and FEATURE files. .. code-block:: module load bedtools bash gencode_canon_filtering.sh gencode.v40.primary_assembly.annotation.gff3 knownCanonical.txt Run the test notebook -------------------------------------- Run this `test notebook `_.