diff --git a/notebooks/CapeTown_Genomics_Tutorial_partI.ipynb b/notebooks/CapeTown_Genomics_Tutorial_partI.ipynb index f35cf2fb4dcd8fe83e3be74e1b60572fef3715fe..016f3ee156a87322702c3ad9ffaab270ca7ca026 100644 --- a/notebooks/CapeTown_Genomics_Tutorial_partI.ipynb +++ b/notebooks/CapeTown_Genomics_Tutorial_partI.ipynb @@ -17,11 +17,13 @@ "source": [ "## 0. Getting started\n", "### How to start the jupyter notebook\n", - "1. Access the cloud: ssh student01@86.119.40.206\n", + "1. Access the cloud: ssh studentXX@86.119.40.206\n", "2. Your password is: stphcourse2018\n", - "3. cp -r ../Workshop_SA.git\n", - "4. singularity exec ../container.img jupyter notebook --no-browser --ip='*' --port=YourPortNumber eg.30000\n", - "5. Type in the browser: http://86.119.40.206:YourPortNumber/?token=c0669c145a630ea14b6ec3b29b870811844fefe12c375feb\n" + "3. copy this folder to your home directory: cp -r /home/Workshop_SA/ .\n", + "4. In hour home, type: singularity exec /home/container.img jupyter notebook --no-browser --ip='*' --port=YourPortNumber eg.30000\n", + "\n", + "\n", + "If you wish to access the git from your webbrowser, the URL is: https://git.scicore.unibas.ch/TBRU/Workshop_SA" ] }, { @@ -39,7 +41,7 @@ "- It is your bioinformatics 'lab book'.\n", "\n", "### Useful tips to use in the jupyter notebook\n", - "- Run the command in the 'code cell': Shift + Return\n", + "- Run the command in the 'code cell': Shift + Enter\n", "- You can change the cell type from Code to Markdown to include explanatory text in your notebook\n", "- Use the \"tab\" key to autocomplement commands\n", "- https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/\n", @@ -53,19 +55,47 @@ "### Magics\n", "Taken from: https://blog.dominodatalab.com/lesser-known-ways-of-using-notebooks/\n", "\n", - "You can start notebooks with different kernels (e.g., R, Julia) — not just Python. What you might not know is that even within a notebook, you can run different types of code in different cells. With \"magics\", it is possible to use different languages \n", + "You can start notebooks with different kernels (e.g., R, Shell) — not just Python. What you might not know is that even within a notebook, you can run different types of code in different cells. With \"magics\", it is possible to use different languages \n", "By running % lsmagic in a cell you get a list of all the available magics. You can use % to start a single-line expression to run with the magics command. Or you can use a double %% to run a multi-line expression.\n", "\n", "Some of my favorites are:\n", "\n", - "!: to run a shell command.\n", - "% bash to run cell with bash in a subprocess.\n", + " ! to run a shell command.\n", + "\n", + " % bash to run cell with bash in a subprocess.\n", "\n", "### Using shell commands\n", "\n", "Any command that works at the command-line can be used in IPython by prefixing it with the ! character. For example, the ls, pwd, and echo commands can be run as follows:\n" ] }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/scicore/home/gagneux/loiseau/Workshop_SA/notebooks\n", + "The files in my working directory are:\n", + "adapters\t\t\t\t Drug_resistance_mutations_MTBC.txt\n", + "annotation\t\t\t\t images\n", + "CapeTown_Genomics_Tutorial_partIII.ipynb Locus_to_exclude_Mtb.txt\n", + "CapeTown_Genomics_Tutorial_partII.ipynb reference_genome\n", + "CapeTown_Genomics_Tutorial_partI.ipynb\t slurm_scripts\n" + ] + } + ], + "source": [ + "! pwd\n", + "! echo 'The files in my working directory are:'\n", + "! ls" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -75,19 +105,19 @@ " - Perform essential steps of a Illumina whole-genome sequencing analysis pipeline of MTBC genomes.\n", "\n", "## Content of this tutorial:\n", - "- Finding genetic variants from raw sequencing data:\n", - " - Looking into a fastq file: reads, Phred Quality scores\n", - " - Raw read processing and quality assessment\n", + "- **Finding genetic variants from raw sequencing data**:\n", + " - Looking into a fastq file: quality assessment of the reads\n", + " - Raw read processing: trimming of illumina adapters and low quality bases \n", " - Mapping processed reads to a reference genome (creation of a BAM file)\n", " - BAM post-processing \n", " - BAM quality assesment\n", " - Variant identification (creation of a VCF file)\n", " - Variant Annotation\n", - "<img src=\"images/Pipeline1.png\" width=\"500\">\n", + "<img src=\"images/Pipeline1.png\" width=\"600\">\n", "- You want to find genetic variants (SNPs, insertion, deletions) in these sequences.\n", "- To do so, you need to perform the following bioinformatics steps:\n", "\n", - "<img src=\"images/Pipeline2.png\" width=\"500\">\n", + "<img src=\"images/Pipeline2.png\" width=\"600\">\n", "\n" ] }, @@ -114,7 +144,7 @@ " Forward read: ~/Workshop_SA/data_Eldholm/ERR760779_1.fastq.gz\n", " Reverse read: ~/Workshop_SA/data_Eldholm/ERR760779_2.fastq.gz\n", " \n", - "The fastq files are compressed (.gz) to save space. Let's have a look at the first read of the file.\n", + "The fastq files are compressed (.gz) to save space. Let's have a look at the first read of the file using zcat.\n", "For this, read the first 4 lines of the file:" ] }, @@ -222,7 +252,11 @@ "metadata": {}, "source": [ "Go back to the terminal and run from the command line type:\n", - " - sbatch ~/Workshop_SA/notebooks/slurm_scripts/launch_fastqc.slurm" + " - sbatch ~/Workshop_SA/notebooks/slurm_scripts/launch_fastqc.slurm\n", + " \n", + "For this you will have to open a new terminal window and reconnect:\n", + " - ssh studentXX@86.119.40.206\n", + " - password: stphcourse2018" ] }, { @@ -244,11 +278,11 @@ " - an html which you can visualise using firefox for example\n", " - a compressed folder (.zip). You can see the content of this folder by using the command 'unzip'. \n", " \n", - "You can visualise the html file using firefox. \n", + "To visualise the html file open a new terminal on MobaXterm and type:\n", + " - scp studentXX@86.119.40.206:/home/studentXX/ERR760779_1_fastqc.html Desktop\n", "\n", - "From the terminal, type:\n", - " \n", - " firefox ERR760779_1_fastqc.html" + "\n", + "The html file is now on your local computer, on your desktop. Double click on it." ] }, { @@ -342,10 +376,10 @@ "metadata": {}, "source": [ "### Exercise: \n", - "- How many reads were dropped by Trimmomatic ? \n", - "- Why are complete reads dropped ? \n", - "- What is the percentage of reads we will find in the files ERR760779_**1P**.trimmed.fastq.gz and ERR760779_**2P**.trimmed.fastq.gz ?\n", - "- What is the percentage of reads we will find in the files ERR760779_**1U**.trimmed.fastq.gz and ERR760779_**2U**.trimmed.fastq.gz ?" + "- How many reads were dropped by Trimmomatic ? ................\n", + "- Why are complete reads dropped ? ................\n", + "- What is the percentage of reads we will find in the files ERR760779_**1P**.trimmed.fastq.gz and ERR760779_**2P**.trimmed.fastq.gz ? ................\n", + "- What is the percentage of reads we will find in the files ERR760779_**1U**.trimmed.fastq.gz and ERR760779_**2U**.trimmed.fastq.gz ? ................" ] }, { @@ -389,9 +423,18 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As before, to visualise the html file produce, open it with firefox:\n", - " - firefox ERR760779_1P.html" + "As before, to visualise the html file produced:\n", + " - scp studentXX@86.119.40.206:/home/studentXX/ERR760779_1P.trimmed.html Desktop" ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/notebooks/Locus_to_exclude_Mtb.txt b/notebooks/Locus_to_exclude_Mtb.txt new file mode 100755 index 0000000000000000000000000000000000000000..a66d26a19a68837274535d5796695adc15a6869e --- /dev/null +++ b/notebooks/Locus_to_exclude_Mtb.txt @@ -0,0 +1,506 @@ +Chrom ChromStart ChromEnd locus tag Comment +NC_000962 23182 23269 IG18_Rv0018c-Rv0019c +NC_000962 33582 33794 Rv0031 remnant of A transposase +NC_000962 80194 80623 IG71_Rv0071-Rv0072 +NC_000962 103710 104663 Rv0094c 50bp_duplicated +NC_000962 104663 104805 IG_Rv0094c-Rv0095c +NC_000962 104805 105215 Rv0095c 50bp_duplicated +NC_000962 105215 105324 IG_Rv0095c-Rv0096 +NC_000962 105324 106715 Rv0096 PPE family protein +NC_000962 131382 132872 Rv0109 PE-PGRS family protein +NC_000962 149533 150996 Rv0124 PE-PGRS family protein +NC_000962 154130 154231 IG127_Rv0126-Rv0127 +NC_000962 177543 179309 Rv0151c PE family protein +NC_000962 179309 179319 IG_Rv0151c-Rv0152c +NC_000962 179319 180896 Rv0152c PE family protein +NC_000962 187433 188839 Rv0159c PE family protein +NC_000962 188839 188931 IG_Rv0159c-Rv0160c +NC_000962 188931 190439 Rv0160c PE family protein +NC_000962 307877 309547 Rv0256c PPE family protein +NC_000962 309547 309699 IG_Rv0256c-Rv0257 +NC_000962 309699 310073 Rv0257 50bp_duplicated +NC_000962 332708 333136 Rv0277c 50bp_duplicated +NC_000962 333136 333437 IG_Rv0277c-Rv0278c +NC_000962 333437 336310 Rv0278c PE-PGRS family protein +NC_000962 336310 336560 IG_Rv0278c-Rv0279c +NC_000962 336560 339073 Rv0279c PE-PGRS family protein +NC_000962 339073 339364 IG_Rv0279c-Rv0280 +NC_000962 339364 340974 Rv0280 PPE family protein +NC_000962 349624 349932 Rv0285 PE family protein +NC_000962 349935 351476 Rv0286 PPE family protein +NC_000962 361334 363109 Rv0297 PE-PGRS family protein +NC_000962 366150 372764 Rv0304c PPE family protein +NC_000962 372764 372820 IG_Rv0304c-Rv0305c +NC_000962 372820 375711 Rv0305c PPE family protein +NC_000962 399535 400050 Rv0335c PE family protein +NC_000962 400050 400192 IG_Rv0335c-Rv0336 +NC_000962 400192 401703 Rv0336 50bp_duplicated +NC_000962 423639 424019 Rv0353 50bp_duplicated +NC_000962 424019 424269 IG_Rv0353-Rv0354c +NC_000962 424269 424694 Rv0354c PPE family protein +NC_000962 424694 424777 IG_Rv0354c-Rv0355c +NC_000962 424777 434679 Rv0355c PPE family protein +NC_000962 466672 467406 Rv0387c PPE family protein +NC_000962 467406 467459 IG_Rv0387c-Rv0388c +NC_000962 467459 468001 Rv0388c PPE family protein +NC_000962 472781 474106 Rv0393 50bp_duplicated +NC_000962 475816 476184 Rv0397 50bp_duplicated +NC_000962 530751 532214 Rv0442c PPE family protein +NC_000962 543174 544730 Rv0453 PPE family protein +NC_000962 576787 577338 Rv0487 50bp_duplicated +NC_000962 579349 580581 Rv0490 50bp_duplicated +NC_000962 606551 608062 Rv0515 50bp_duplicated +NC_000962 616832 616845 IG533_Rv0525-Rv0526 +NC_000962 622793 624577 Rv0532 PE-PGRS family protein +NC_000962 630040 631686 Rv0538 50bp_duplicated +NC_000962 642812 642888 IG559_Rv0551c-Rv0552 +NC_000962 671996 675916 Rv0578c PE-PGRS family protein +NC_000962 701406 702014 Rv0605 repeat region +NC_000962 706930 706947 IG622_Rv0612-Rv0613c +NC_000962 831776 832303 Rv0740 50bp_duplicated +NC_000962 832303 832534 IG_Rv0740-Rv0741 +NC_000962 832534 832848 Rv0741 transposase +NC_000962 832848 832981 IG_Rv0741-Rv0742 +NC_000962 832981 833508 Rv0742 PE-PGRS family protein +NC_000962 835701 838052 Rv0746 PE-PGRS family protein +NC_000962 838052 838451 IG_Rv0746-Rv0747 +NC_000962 838451 840856 Rv0747 PE-PGRS family protein +NC_000962 842033 842278 Rv0750 50bp_duplicated +NC_000962 846159 847913 Rv0754 PE-PGRS family protein +NC_000962 847913 850527 IG_Rv0754-Rv0755A +NC_000962 848103 850040 Rv0755c PPE family protein +NC_000962 850342 850527 Rv0755A transposase +NC_000962 863159 863255 IG784_Rv0769-Rv0770 +NC_000962 889072 889398 Rv0795 transposase IS6110 +NC_000962 889347 890333 Rv0796 transposase IS6110 +NC_000962 889347 889398 IG_Rv0795-Rv0796 +NC_000962 890333 890388 IG_Rv0796-Rv0797 +NC_000962 890388 891482 Rv0797 50bp_duplicated +NC_000962 908181 908483 Rv0814c 50bp_duplicated +NC_000962 916477 917646 Rv0823c 50bp_duplicated +NC_000962 921575 921865 Rv0829 50bp_duplicated +NC_000962 924951 925364 Rv0832 PE-PGRS family protein +NC_000962 925361 927610 Rv0833 PE-PGRS family protein +NC_000962 927610 927837 IG_Rv0833-Rv0834c +NC_000962 927837 930485 Rv0834c PE-PGRS family protein +NC_000962 947312 947644 Rv0850 transposase +NC_000962 960152 960341 IG877_Rv0861c-Rv0862c +NC_000962 964312 965535 Rv0867c 50bp_duplicated +NC_000962 968424 970244 Rv0872c PE-PGRS family protein +NC_000962 976872 978203 Rv0878c PPE family protein +NC_000962 1020058 1021329 Rv0915c PPE family protein +NC_000962 1021329 1021344 IG_Rv0915c-Rv0916c +NC_000962 1021344 1021643 Rv0916c PE family protein +NC_000962 1025497 1026816 Rv0920c transposase +NC_000962 1026816 1027104 IG_Rv0920c-Rv0921 +NC_000962 1027104 1027685 Rv0921 resolvase +NC_000962 1027685 1029337 Rv0922 transposase +NC_000962 1090373 1093144 Rv0977 PE-PGRS family protein +NC_000962 1093144 1093361 IG_Rv0977-Rv0978c +NC_000962 1093361 1094356 Rv0978c PE-PGRS family protein +NC_000962 1095078 1096451 Rv0980c PE-PGRS family protein +NC_000962 1158918 1159307 Rv1034c transposase +NC_000962 1159307 1159375 IG_Rv1034c-Rv1035c +NC_000962 1159375 1160061 Rv1035c transposase +NC_000962 1160061 1160095 IG_Rv1035c-Rv1036c +NC_000962 1160095 1160433 Rv1036c truncated IS1560 transposase +NC_000962 1160433 1160544 IG_Rv1036c-Rv1037c +NC_000962 1160544 1160828 Rv1037c 50bp_duplicated +NC_000962 1160828 1160855 IG_Rv1037c-Rv1038c +NC_000962 1160855 1161151 Rv1038c 50bp_duplicated +NC_000962 1161151 1161297 IG_Rv1038c-Rv1039c +NC_000962 1161297 1162472 Rv1039c PPE family protein +NC_000962 1162472 1162549 IG_Rv1039c-Rv1040c +NC_000962 1162549 1163376 Rv1040c PE family protein +NC_000962 1163376 1164572 IG_Rv1040c-Rv1041c +NC_000962 1164572 1165435 Rv1041c IS like-2 transposase +NC_000962 1165092 1165499 Rv1042c IS like-2 transposase +NC_000962 1169423 1170670 Rv1047 transposase +NC_000962 1188421 1190424 Rv1067c PE-PGRS family protein +NC_000962 1190424 1190757 IG_Rv1067c-Rv1068c +NC_000962 1190757 1192148 Rv1068c PE-PGRS family protein +NC_000962 1211560 1213863 Rv1087 PE-PGRS family protein +NC_000962 1213863 1214513 IG_Rv1087-Rv1088 +NC_000962 1214513 1214947 Rv1088 PE family protein +NC_000962 1214769 1215131 Rv1089 PE family protein +NC_000962 1216469 1219030 Rv1091 PE-PGRS family protein +NC_000962 1251617 1252972 Rv1128c repeat_region +NC_000962 1262272 1264128 Rv1135c PPE family protein +NC_000962 1276300 1277748 Rv1148c 50bp_duplicated +NC_000962 1277748 1277893 IG_Rv1148c-Rv1149 +NC_000962 1277893 1278300 Rv1149 transposase +NC_000962 1278269 1278820 Rv1150 Possible fragment of transposase +NC_000962 1298764 1299804 Rv1168c PPE family protein +NC_000962 1299804 1299822 IG_Rv1168c-Rv1169c +NC_000962 1299822 1300124 Rv1169c PE family protein +NC_000962 1301755 1302681 Rv1172c PE family protein +NC_000962 1306002 1306201 IG1195_Rv1174c-Rv1175c +NC_000962 1339003 1339302 Rv1195 PE family protein +NC_000962 1339302 1339349 IG_Rv1195-Rv1196 +NC_000962 1339349 1340524 Rv1196 PPE family protein +NC_000962 1340524 1340659 IG_Rv1196-Rv1197 +NC_000962 1340659 1340955 Rv1197 50bp_duplicated +NC_000962 1340955 1341006 IG_Rv1197-Rv1198 +NC_000962 1341006 1341290 Rv1198 50bp_duplicated +NC_000962 1341290 1341358 IG_Rv1198-Rv1199c +NC_000962 1341358 1342605 Rv1199c transposase +NC_000962 1357293 1357625 Rv1214c PE family protein +NC_000962 1384989 1386677 Rv1243c PE-PGRS family protein +NC_000962 1441348 1442718 Rv1288 50bp_duplicated +NC_000962 1450697 1451779 Rv1295 50bp_duplicated +NC_000962 1468171 1469505 Rv1313c transposase +NC_000962 1479199 1480824 Rv1318c 50bp_duplicated +NC_000962 1480824 1480894 IG_Rv1318c-Rv1319c +NC_000962 1480894 1482501 Rv1319c 50bp_duplicated +NC_000962 1488154 1489965 Rv1325c PE-PGRS family protein +NC_000962 1532443 1533633 Rv1361c PPE family protein +NC_000962 1541994 1542980 Rv1369c transposase +NC_000962 1542929 1543255 Rv1370c transposase +NC_000962 1561464 1561772 Rv1386 PE family protein +NC_000962 1561769 1563388 Rv1387 PPE family protein +NC_000962 1572127 1573857 Rv1396c PE-PGRS family protein +NC_000962 1606386 1607972 Rv1430 PE family protein +NC_000962 1618209 1619684 Rv1441c PE-PGRS family protein +NC_000962 1630638 1634627 Rv1450c PE-PGRS family protein +NC_000962 1636004 1638229 Rv1452c PE-PGRS family protein +NC_000962 1643319 1644260 Rv1458c 50bp_duplicated +NC_000962 1655609 1656721 Rv1468c PE-PGRS family protein +NC_000962 1678942 1679172 Rv1489A 50bp_duplicated +NC_000962 1684005 1686257 Rv1493 50bp_duplicated +NC_000962 1751297 1753333 Rv1548c PPE family protein +NC_000962 1761744 1762937 Rv1557 repeat_region +NC_000962 1762937 1762947 IG_Rv1557-Rv1558 +NC_000962 1762947 1763393 Rv1558 repeat_region +NC_000962 1779194 1779298 Rv1572c repeat_region +NC_000962 1779298 1779314 IG_Rv1572c-Rv1573 +NC_000962 1779314 1779724 Rv1573 phiRV1 phage protein +NC_000962 1779724 1779930 IG_Rv1573-Rv1574 +NC_000962 1779930 1780241 Rv1574 repeat_region +NC_000962 1779930 1780241 Rv1574 phiRV1 phage related protein +NC_000962 1780199 1780699 Rv1575 repeat_region +NC_000962 1780199 1780699 Rv1575 phiRV1 phage protein +NC_000962 1780643 1782064 Rv1576c phiRV1 phage protein +NC_000962 1782064 1782072 IG_Rv1576c-Rv1577c +NC_000962 1782072 1782584 Rv1577c phiRv1 phage protein +NC_000962 1782584 1782758 IG_Rv1577c-Rv1578c +NC_000962 1782758 1783228 Rv1578c phiRv1 phage protein +NC_000962 1783228 1783309 IG_Rv1578c-Rv1579c +NC_000962 1783309 1783623 Rv1579c phiRv1 phage protein +NC_000962 1783620 1783892 Rv1580c phiRv1 phage protein +NC_000962 1783892 1783906 IG_Rv1580c-Rv1581c +NC_000962 1783906 1784301 Rv1581c phiRv1 phage protein +NC_000962 1784301 1784497 IG_Rv1581c-Rv1582c +NC_000962 1784497 1785912 Rv1582c phiRv1 phage protein +NC_000962 1785912 1786310 Rv1583c phiRv1 phage protein +NC_000962 1786307 1786528 Rv1584c phiRv1 phage protein +NC_000962 1786528 1786584 IG_Rv1584c-Rv1585c +NC_000962 1786584 1787099 Rv1585c phiRv1 phage protein +NC_000962 1787096 1788505 Rv1586c phiRv1 integrase +NC_000962 1788162 1789163 Rv1587c REP13E12 repeat-containing protein +NC_000962 1789163 1789168 IG_Rv1587c-Rv1588c +NC_000962 1789168 1789836 Rv1588c REP13E12 repeat-containing protein +NC_000962 1855764 1856696 Rv1646 PE family protein +NC_000962 1862347 1865382 Rv1651c PE-PGRS family protein +NC_000962 1907321 1907593 IG1711_Rv1682-Rv1683 +NC_000962 1927211 1928575 Rv1702c repeat_region +NC_000962 1931497 1932654 Rv1705c PPE family protein +NC_000962 1932654 1932694 IG_Rv1705c-Rv1706c +NC_000962 1932694 1933878 Rv1706c PPE family protein +NC_000962 1981614 1984775 Rv1753c PPE family protein +NC_000962 1987745 1988731 Rv1756c putative transposase +NC_000962 1988680 1989006 Rv1757c putative transposase +NC_000962 1989006 1989042 IG_Rv1757c-Rv1758 +NC_000962 1989042 1989566 Rv1758 putative transposase +NC_000962 1989566 1989833 IG_Rv1758-Rv1759c +NC_000962 1989833 1992577 Rv1759c PE-PGRS family protein +NC_000962 1996152 1996478 Rv1763 putative transposase +NC_000962 1996427 1997413 Rv1764 putative transposase +NC_000962 1997413 1997418 IG_Rv1764-Rv1765c +NC_000962 1997418 1998515 Rv1765c 50bp_duplicated +NC_000962 1998515 1999142 IG_Rv1765c-Rv1765A +NC_000962 1999142 1999357 Rv1765A transposase +NC_000962 2000614 2002470 Rv1768 PE-PGRS family protein +NC_000962 2025301 2026398 Rv1787 PPE family protein +NC_000962 2026398 2026477 IG_Rv1787-Rv1788 +NC_000962 2026477 2026776 Rv1788 PE family protein +NC_000962 2026776 2026790 IG_Rv1788-Rv1789 +NC_000962 2026790 2027971 Rv1789 PPE family protein +NC_000962 2027971 2028425 IG_Rv1789-Rv1790 +NC_000962 2028425 2029477 Rv1790 PPE family protein +NC_000962 2029477 2029904 IG_Rv1790-Rv1791 +NC_000962 2029904 2030203 Rv1791 PE family protein +NC_000962 2030694 2030978 Rv1793 50bp_duplicated +NC_000962 2039453 2041420 Rv1800 PPE family protein +NC_000962 2041420 2042001 IG_Rv1800-Rv1801 +NC_000962 2042001 2043272 Rv1801 PPE family protein +NC_000962 2043272 2043384 IG_Rv1801-Rv1802 +NC_000962 2043384 2044775 Rv1802 PPE family protein +NC_000962 2044775 2044923 IG_Rv1802-Rv1803c +NC_000962 2044923 2046842 Rv1803c PE-PGRS family protein +NC_000962 2048072 2048371 Rv1806 PE family protein +NC_000962 2048371 2048398 IG_Rv1806-Rv1807 +NC_000962 2048398 2049597 Rv1807 PPE family protein +NC_000962 2049597 2049921 IG_Rv1807-Rv1808 +NC_000962 2049921 2051150 Rv1808 PPE family protein +NC_000962 2051150 2051282 IG_Rv1808-Rv1809 +NC_000962 2051282 2052688 Rv1809 PPE family protein +NC_000962 2061178 2062674 Rv1818c PE-PGRS family protein +NC_000962 2073943 2074437 Rv1829 50bp_duplicated +NC_000962 2087971 2089518 Rv1840c PE-PGRS family protein +NC_000962 2156706 2157299 Rv1910c 50bp_duplicated +NC_000962 2157299 2157382 IG_Rv1910c-Rv1911c +NC_000962 2157382 2157987 Rv1911c 50bp_duplicated +NC_000962 2162932 2167311 Rv1917c PPE family protein +NC_000962 2167311 2167649 IG_Rv1917c-Rv1918c +NC_000962 2167649 2170612 Rv1918c PPE family protein +NC_000962 2195989 2197353 Rv1945 repeat_region +NC_000962 2226244 2227920 Rv1983 PE-PGRS family protein +NC_000962 2260665 2261144 Rv2013 transposase +NC_000962 2261098 2261688 Rv2014 transposase +NC_000962 2261688 2261816 IG_Rv2014-Rv2015c +NC_000962 2261816 2263072 Rv2015c 50bp_duplicated +NC_000962 2294531 2306986 Rv2048c 50bp_duplicated +NC_000962 2338709 2340874 Rv2082 50bp_duplicated +NC_000962 2343027 2343332 Rv2085 repeat_region +NC_000962 2347373 2348554 Rv2090 50bp_duplicated +NC_000962 2356729 2358206 Rv2098c PE-PGRS family protein +NC_000962 2365465 2365791 Rv2105 transposase +NC_000962 2365740 2366726 Rv2106 transposase +NC_000962 2366726 2367359 IG_Rv2106-Rv2107 +NC_000962 2367359 2367655 Rv2107 PE family protein +NC_000962 2367655 2367711 IG_Rv2107-Rv2108 +NC_000962 2367711 2368442 Rv2108 PPE family protein +NC_000962 2370905 2372569 Rv2112c 50bp_duplicated +NC_000962 2381071 2382492 Rv2123 PPE family protein +NC_000962 2387202 2387972 Rv2126c PE-PGRS family protein +NC_000962 2423240 2424838 Rv2162c PE-PGRS family protein +NC_000962 2430159 2431145 Rv2167c transposase +NC_000962 2431094 2431420 Rv2168c transposase +NC_000962 2439282 2439947 Rv2177c transposase +NC_000962 2459678 2461327 Rv2196 50bp_duplicated +NC_000962 2530836 2531897 Rv2258c 50bp_duplicated +NC_000962 2549124 2550029 Rv2277c 50bp_duplicated +NC_000962 2550029 2550065 IG_Rv2277c-Rv2278 +NC_000962 2550065 2550391 Rv2278 transposase +NC_000962 2550340 2551326 Rv2279 transposase +NC_000962 2600731 2601879 Rv2328 PE family protein +NC_000962 2617667 2618908 Rv2340c PE-PGRS family protein +NC_000962 2625888 2626172 Rv2346c 50bp_duplicated +NC_000962 2626172 2626223 IG_Rv2346c-Rv2347c +NC_000962 2626223 2626519 Rv2347c 50bp_duplicated +NC_000962 2632923 2634098 Rv2352c PPE family protein +NC_000962 2634098 2634528 IG_Rv2352c-Rv2353c +NC_000962 2634528 2635592 Rv2353c PPE family protein +NC_000962 2635592 2635628 IG_Rv2353c-Rv2354 +NC_000962 2635628 2635954 Rv2354 transposase +NC_000962 2635903 2636889 Rv2355 transposase +NC_000962 2636889 2637688 IG_Rv2355-Rv2356c +NC_000962 2637688 2639535 Rv2356c PPE family protein +NC_000962 2651753 2651938 Rv2371 PE-PGRS family protein +NC_000962 2692799 2693884 Rv2396 PE-PGRS family protein +NC_000962 2706017 2706736 Rv2408 PE family protein +NC_000962 2720776 2721777 Rv2424c transposase +NC_000962 2727336 2727920 Rv2430c PPE family protein +NC_000962 2727920 2727967 IG_Rv2430c-Rv2431c +NC_000962 2727967 2728266 Rv2431c PE family protein +NC_000962 2762531 2763175 Rv2460c repeat_region +NC_000962 2763172 2763774 Rv2461c repeat_region +NC_000962 2784657 2785643 Rv2479c transposase +NC_000962 2785592 2785918 Rv2480c transposase +NC_000962 2795301 2797385 Rv2487c PE-PGRS family protein +NC_000962 2800846 2801145 Rv2489c repeat_region +NC_000962 2801145 2801254 IG_Rv2489c-Rv2490c +NC_000962 2801254 2806236 Rv2490c PE-PGRS family protein +NC_000962 2828556 2829803 Rv2512c IS1081 transposase +NC_000962 2835785 2837263 Rv2519 PE family protein +NC_000962 2866468 2867127 Rv2543 50bp_duplicated +NC_000962 2867124 2867786 Rv2544 50bp_duplicated +NC_000962 2921551 2923182 Rv2591 PE-PGRS family protein +NC_000962 2935046 2936788 Rv2608 PPE family protein +NC_000962 2943600 2944985 Rv2615c PE-PGRS family protein +NC_000962 2960105 2962441 Rv2634c PE-PGRS family protein +NC_000962 2972160 2972486 Rv2648 transposase IS6110 +NC_000962 2972435 2973421 Rv2649 transposase IS6110 +NC_000962 2973421 2973795 IG_Rv2649-Rv2650c +NC_000962 2973795 2975234 Rv2650c phiRv2 prophage protein +NC_000962 2975234 2975242 IG_Rv2650c-Rv2651c +NC_000962 2975242 2975775 Rv2651c phiRv2 prophage protease +NC_000962 2975775 2975928 IG_Rv2651c-Rv2652c +NC_000962 2975928 2976554 Rv2652c phiRv2 prophage protein +NC_000962 2976554 2976586 IG_Rv2652c-Rv2653c +NC_000962 2976586 2976909 Rv2653c phiRv2 prophage protein +NC_000962 2976909 2976989 IG_Rv2653c-Rv2654c +NC_000962 2976989 2977234 Rv2654c phiRv2 prophage protein +NC_000962 2977231 2978658 Rv2655c phiRv2 prophage protein +NC_000962 2978658 2978660 IG_Rv2655c-Rv2656c +NC_000962 2978660 2979052 Rv2656c phiRv2 prophage protein +NC_000962 2979049 2979309 Rv2657c phiRv2 prophage protein +NC_000962 2979691 2980818 Rv2659c phiRv2 prophage integrase +NC_000962 2982699 2982980 Rv2665 50bp_duplicated +NC_000962 2982980 2983071 IG_Rv2665-Rv2666 +NC_000962 2983071 2983874 Rv2666 truncated IS1081 transposase +NC_000962 2989291 2990592 Rv2673 50bp_duplicated +NC_000962 2996105 2996737 Rv2680 50bp_duplicated +NC_000962 3005845 3007062 Rv2689c 50bp_duplicated +NC_000962 3007062 3007236 IG_Rv2689c-Rv2690c +NC_000962 3007236 3009209 Rv2690c repeat_region +NC_000962 3053914 3055491 Rv2741 PE-PGRS family protein +NC_000962 3076894 3078078 Rv2768c PPE family protein +NC_000962 3078078 3078158 IG_Rv2768c-Rv2769c +NC_000962 3078158 3078985 Rv2769c PE family protein +NC_000962 3078985 3079309 IG_Rv2769c-Rv2770c +NC_000962 3079309 3080457 Rv2770c PPE family protein +NC_000962 3082352 3082756 Rv2774c 50bp_duplicated +NC_000962 3100202 3101581 Rv2791c transposase +NC_000962 3101581 3102162 Rv2792c resolvase +NC_000962 3112867 3113271 Rv2805 50bp_duplicated +NC_000962 3113658 3114812 Rv2807 50bp_duplicated +NC_000962 3115741 3116142 Rv2810c transposase +NC_000962 3116818 3118227 Rv2812 transposase +NC_000962 3120566 3121552 Rv2814c transposase +NC_000962 3121501 3121827 Rv2815c transposase +NC_000962 3132892 3133539 Rv2825c 50bp_duplicated +NC_000962 3135788 3136333 Rv2828c 50bp_duplicated +NC_000962 3162268 3164115 Rv2853 PE-PGRS family protein +NC_000962 3170720 3171646 Rv2859c 50bp_duplicated +NC_000962 3191644 3192201 Rv2882c 50bp_duplicated +NC_000962 3194166 3195548 Rv2885c transposase +NC_000962 3195545 3196432 Rv2886c resolvase +NC_000962 3200794 3202020 Rv2892c PPE family protein +NC_000962 3245445 3251075 Rv2931 50bp_duplicated +NC_000962 3251072 3255688 Rv2932 50bp_duplicated +NC_000962 3288464 3289705 Rv2943 IS1533 transposase +NC_000962 3289705 3290235 Rv2943A transposase +NC_000962 3289790 3290506 Rv2944 IS1533 transposase +NC_000962 3313283 3313672 Rv2961 transposase +NC_000962 3318816 3318900 IG3012_Rv2965c-Rv2966c +NC_000962 3319468 3319662 IG3013_Rv2966c-Rv2967c +NC_000962 3332787 3333788 Rv2977c 50bp_duplicated +NC_000962 3333785 3335164 Rv2978c transposase +NC_000962 3335164 3335748 Rv2979c resolvase +NC_000962 3335748 3335960 IG_Rv2979c-Rv2980 +NC_000962 3335960 3336505 Rv2980 50bp_duplicated +NC_000962 3376939 3378243 Rv3018c PPE family protein +NC_000962 3378243 3378329 IG_Rv3018c-Rv3018A +NC_000962 3378329 3378415 Rv3018A PE family protein +NC_000962 3379376 3380452 Rv3021c PPE family protein +NC_000962 3380440 3380682 Rv3022c PPE family protein +NC_000962 3380679 3380993 Rv3022A PE family protein +NC_000962 3380993 3381375 IG_Rv3022A-Rv3023c +NC_000962 3381375 3382622 Rv3023c transposase +NC_000962 3465778 3467091 Rv3097c PE-PGRS family protein +NC_000962 3481451 3482698 Rv3115 transposase +NC_000962 3490476 3491651 Rv3125c PPE family protein +NC_000962 3501334 3501732 Rv3135 PPE family protein +NC_000962 3501732 3501794 IG_Rv3135-Rv3136 +NC_000962 3501794 3502936 Rv3136 PPE family protein +NC_000962 3510088 3511317 Rv3144c PPE family protein +NC_000962 3527391 3529163 Rv3159c PPE family protein +NC_000962 3551281 3551607 Rv3184 transposase +NC_000962 3551556 3552542 Rv3185 transposase +NC_000962 3552542 3552764 IG_Rv3185-Rv3186 +NC_000962 3552764 3553090 Rv3186 transposase +NC_000962 3553039 3554025 Rv3187 transposase +NC_000962 3557311 3558345 Rv3191c transposase +NC_000962 3663689 3664222 Rv3281 50bp_duplicated +NC_000962 3710433 3710759 Rv3325 transposase +NC_000962 3710708 3711694 Rv3326 transposase +NC_000962 3711694 3711749 IG_Rv3326-Rv3327 +NC_000962 3711749 3713461 Rv3327 transposase +NC_000962 3729364 3736935 Rv3343c PPE family protein +NC_000962 3736935 3736984 IG_Rv3343c-Rv3344c +NC_000962 3736984 3738438 Rv3344c PE-PGRS family protein +NC_000962 3738158 3742774 Rv3345c PE-PGRS family protein +NC_000962 3742774 3743198 IG_Rv3345c-Rv3346c +NC_000962 3743198 3743455 Rv3346c 50bp_duplicated +NC_000962 3743455 3743711 IG_Rv3346c-Rv3347c +NC_000962 3743711 3753184 Rv3347c PPE family protein +NC_000962 3753184 3753765 IG_Rv3347c-Rv3348 +NC_000962 3753765 3754256 Rv3348 transposase +NC_000962 3754256 3754293 IG_Rv3348-Rv3349c +NC_000962 3754293 3755033 Rv3349c transposase +NC_000962 3755033 3755952 IG_Rv3349c-Rv3350c +NC_000962 3755952 3767102 Rv3350c PPE family protein +NC_000962 3769514 3769807 Rv3355c 50bp_duplicated +NC_000962 3778568 3780334 Rv3367 PE-PGRS family protein +NC_000962 3795100 3796086 Rv3380c transposase +NC_000962 3796035 3796361 Rv3381c transposase +NC_000962 3800092 3800796 Rv3386 transposase +NC_000962 3800786 3801463 Rv3387 transposase +NC_000962 3801463 3801653 IG_Rv3387-Rv3388 +NC_000962 3801653 3803848 Rv3388 PE-PGRS family protein +NC_000962 3841714 3842076 Rv3424c 50bp_duplicated +NC_000962 3842076 3842239 IG_Rv3424c-Rv3425 +NC_000962 3842239 3842769 Rv3425 PPE family protein +NC_000962 3842769 3843036 IG_Rv3425-Rv3426 +NC_000962 3843036 3843734 Rv3426 PPE family protein +NC_000962 3843734 3843885 IG_Rv3426-Rv3427c +NC_000962 3843885 3844640 Rv3427c transposase +NC_000962 3844640 3844738 IG_Rv3427c-Rv3428c +NC_000962 3844738 3845970 Rv3428c transposase +NC_000962 3845970 3847165 IG_Rv3428c-Rv3429 +NC_000962 3847165 3847701 Rv3429 PPE family protein +NC_000962 3847642 3848805 Rv3430c transposase +NC_000962 3848805 3849294 IG_Rv3430c-Rv3431c +NC_000962 3849294 3850139 Rv3431c repeat_region +NC_000962 3883525 3884193 Rv3466 repeat_region +NC_000962 3883964 3884917 Rv3467 repeat_region +NC_000962 3890830 3891156 Rv3474 transposase IS6110 +NC_000962 3891105 3892091 Rv3475 transposase IS6110 +NC_000962 3894093 3894389 Rv3477 PE family protein +NC_000962 3894389 3894426 IG_Rv3477-Rv3478 +NC_000962 3894426 3895607 Rv3478 PE family protein +NC_000962 3926569 3930714 Rv3507 PE-PGRS family protein +NC_000962 3930714 3931005 IG_Rv3507-Rv3508 +NC_000962 3931005 3936710 Rv3508 PE-PGRS family protein +NC_000962 3939617 3941761 Rv3511 PE-PGRS family protein +NC_000962 3941724 3944963 Rv3512 PE-PGRS family protein +NC_000962 3944963 3945092 IG_Rv3512-Rv3513c +NC_000962 3945092 3945748 Rv3513c 50bp_duplicated +NC_000962 3945748 3945794 IG_Rv3513c-Rv3514 +NC_000962 3945794 3950263 Rv3514 PE-PGRS family protein +NC_000962 3950263 3950824 IG_Rv3514-Rv3515c +NC_000962 3950824 3952470 Rv3515c 50bp_duplicated +NC_000962 3969343 3970563 Rv3532 PPE family protein +NC_000962 3970563 3970705 IG_Rv3532-Rv3533c +NC_000962 3970705 3972453 Rv3533c PPE family protein +NC_000962 3978059 3979498 Rv3539 PPE family protein +NC_000962 3997980 3999638 Rv3558 PPE family protein +NC_000962 4031404 4033158 Rv3590c PE-PGRS family protein +NC_000962 4036731 4038050 Rv3595c PE-PGRS family protein +NC_000962 4052950 4053603 Rv3611 50bp_duplicated +NC_000962 4059984 4060268 Rv3619c 50bp_duplicated +NC_000962 4060268 4060295 IG_Rv3619c-Rv3620c +NC_000962 4060295 4060591 Rv3620c 50bp_duplicated +NC_000962 4060591 4060648 IG_Rv3620c-Rv3621c +NC_000962 4060648 4061889 Rv3621c PPE family protein +NC_000962 4061889 4061899 IG_Rv3621c-Rv3622c +NC_000962 4061899 4062198 Rv3622c PE family protein +NC_000962 4075752 4076099 Rv3636 transposase +NC_000962 4076099 4076484 IG_Rv3636-Rv3637 +NC_000962 4076484 4076984 Rv3637 transposase +NC_000962 4076984 4077730 Rv3638 transposase +NC_000962 4077730 4077884 IG_Rv3638-Rv3639c +NC_000962 4077884 4078450 Rv3639c 50bp_duplicated +NC_000962 4078450 4078520 IG_Rv3639c-Rv3640c +NC_000962 4078520 4079749 Rv3640c transposase +NC_000962 4091233 4091517 Rv3650 PE family protein +NC_000962 4093632 4093946 Rv3652 PE-PGRS family protein +NC_000962 4093940 4094527 Rv3653PE-PGRS family protein +NC_000962 4119795 4120955 Rv3680 50bp_duplicated +NC_000962 4153740 4155674 Rv3710 50bp_duplicated +NC_000962 4189285 4190232 Rv3738c PPE family protein +NC_000962 4190232 4190284 IG_Rv3738c-Rv3739c +NC_000962 4190284 4190517 Rv3739c PPE family protein +NC_000962 4196171 4196506 Rv3746c PE family protein +NC_000962 4252993 4254327 Rv3798 transposase +NC_000962 4276571 4278085 Rv3812 PE-PGRS family protein +NC_000962 4299812 4301566 Rv3826 50bp_duplicated +NC_000962 4301563 4302789 Rv3827c transposase +NC_000962 4302786 4303397 Rv3828c resolvase +NC_000962 4318775 4319266 Rv3844 transposase +NC_000962 4351075 4352181 Rv3873 PPE family protein +NC_000962 4353010 4355010 Rv3876 50bp_duplicated +NC_000962 4374484 4375683 Rv3892c PPE family protein +NC_000962 4375683 4375762 IG_Rv3892c-Rv3893c +NC_000962 4375762 4375995 Rv3893c PE family protein diff --git a/notebooks/slurm_scripts/launch_GATK.slurm b/notebooks/slurm_scripts/launch_GATK.slurm new file mode 100644 index 0000000000000000000000000000000000000000..b7a694649b52bb25bcd310e2479c4d0a40b41ed0 --- /dev/null +++ b/notebooks/slurm_scripts/launch_GATK.slurm @@ -0,0 +1,13 @@ +#!/bin/bash + +#SBATCH --job-name=GATK +#SBATCH --cpus-per-task=1 +#SBATCH --mem-per-cpu=4G +#SBATCH --time=6:00:00 +#SBATCH --output=GATK.o +#SBATCH --error=GATK.e + +singularity exec container.img gatk-launch -T RealignerTargetCreator -nt 1 -R ~/Workshop_SA/notebooks/reference_genome/MTB_ancestor_reference.fasta -o ERR760779.intervals -I ERR760779.dedup.bam + + +singularity exec container.img gatk-launch --disable_bam_indexing -T IndelRealigner R ~/Workshop_SA/notebooks/reference_genome/MTB_ancestor_reference.fasta -targetIntervals ERR760779.intervals -I ERR760779.dedup.bam -o ERR760779.dedup.realigned.bam diff --git a/notebooks/slurm_scripts/launch_index2.slurm b/notebooks/slurm_scripts/launch_index2.slurm new file mode 100644 index 0000000000000000000000000000000000000000..ed6f3c865d0f321007d8787b8b3f657f4f761624 --- /dev/null +++ b/notebooks/slurm_scripts/launch_index2.slurm @@ -0,0 +1,7 @@ +#!/bin/bash + +#SBATCH --job-name=index +#SBATCH --cpus-per-task=1 +#SBATCH --mem-per-cpu=2G + +singularity exec /home/container.img samtools index ERR760779.dedup.realigned.bam