diff --git a/notebooks/CapeTown_Genomics_Tutorial_partI.pdf b/notebooks/CapeTown_Genomics_Tutorial_partI.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..8ff53be0030036c641750c37b79b84bfa69cb0c7
Binary files /dev/null and b/notebooks/CapeTown_Genomics_Tutorial_partI.pdf differ
diff --git a/notebooks/CapeTown_Genomics_Tutorial_partII.ipynb b/notebooks/CapeTown_Genomics_Tutorial_partII.ipynb
index c7e25353a370357cb5f71a48c799ceda5e75dc06..f3c955e463022a0b5f8cae85a240db02cde0d6ae 100644
--- a/notebooks/CapeTown_Genomics_Tutorial_partII.ipynb
+++ b/notebooks/CapeTown_Genomics_Tutorial_partII.ipynb
@@ -503,58 +503,6 @@
     "    - sbatch ~/Workshop_SA/notebooks/slurm_scripts/launch_index.slurm"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 3.2.3 InDel Realignment with GATK\n",
-    "\n",
-    "it is common to realign reads around small indels. Since, differences in resolving the indels may\n",
-    "cause artificial SNPs in the downstream analysis.\n",
-    "The\n",
-    "GATK software for instance offers the possibility to realign\n",
-    "\n",
-    "\n",
-    "\n",
-    "Please run in the command line:\n",
-    "    - sbatch ~/Workshop_SA/notebooks/slurm_scripts/launch_GATK.slurm\n",
-    "\n",
-    "\n",
-    "Then you have to index the newly produced BAM:\n",
-    "    - sbatch ~/Workshop_SA/notebooks/slurm_scripts/launch_index2.slurm"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "#!/bin/bash \r\n",
-      "\r\n",
-      "#SBATCH --job-name=GATK\r\n",
-      "#SBATCH --cpus-per-task=1\r\n",
-      "#SBATCH --mem-per-cpu=4G\r\n",
-      "#SBATCH --time=6:00:00\r\n",
-      "#SBATCH --output=GATK.o\r\n",
-      "#SBATCH --error=GATK.e\r\n",
-      "\r\n",
-      "singularity exec container.img gatk-launch -T RealignerTargetCreator -nt 1 -R ~/Workshop_SA/notebooks/reference_genome/MTB_ancestor_reference.fasta -o ERR760779.intervals -I ERR760779.dedup.bam\r\n",
-      "\r\n",
-      "\r\n",
-      "singularity exec container.img gatk-launch --disable_bam_indexing -T IndelRealigner R ~/Workshop_SA/notebooks/reference_genome/MTB_ancestor_reference.fasta -targetIntervals ERR760779.intervals -I ERR760779.dedup.bam -o ERR760779.dedup.realigned.bam\r\n"
-     ]
-    }
-   ],
-   "source": [
-    "! cat /scicore/home/gagneux/loiseau/Workshop_SA/notebooks/slurm_scripts/launch_GATK.slurm"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -618,7 +566,7 @@
    },
    "outputs": [],
    "source": [
-    "%cd ERR760779.dedup_stats"
+    "%cd ~/ERR760779.dedup_stats/"
    ]
   },
   {
diff --git a/notebooks/CapeTown_Genomics_Tutorial_partII.pdf b/notebooks/CapeTown_Genomics_Tutorial_partII.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..60dcda564a853069a8cab2c9841a22f93ba60905
Binary files /dev/null and b/notebooks/CapeTown_Genomics_Tutorial_partII.pdf differ
diff --git a/notebooks/CapeTown_Genomics_Tutorial_partIII.ipynb b/notebooks/CapeTown_Genomics_Tutorial_partIII.ipynb
index 0915b67c27576e6d8ff6d913efcf6f1a5fb66f3a..46840d9f0703ca26a210c72cb2238e2579e4f8c7 100644
--- a/notebooks/CapeTown_Genomics_Tutorial_partIII.ipynb
+++ b/notebooks/CapeTown_Genomics_Tutorial_partIII.ipynb
@@ -13,7 +13,7 @@
    "source": [
     "## 4. Variant Calling\n",
     "\n",
-    "In order to call SNPs, we need first to generate a pileup file. A pileup file summarizes, for each position in the reference genome, the number of reads covering it as well as the mapping quality of those reads. So, the most important difference with respect to the previous formats is that those were **read-centered outputs** and now we move to **reference-centered outputs**. We will use SAMtools again to produce the pileup:\n",
+    "In order to call SNPs, we need first to generate a pileup file. A pileup file summarizes, for each position in the reference genome, the number of reads covering it as well as the mapping quality of those reads. We will use SAMtools again to produce the pileup:\n",
     "\n",
     "\n",
     "### 4.1 Mpileup \n",
@@ -31,7 +31,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true
+    "collapsed": false
    },
    "outputs": [],
    "source": [
@@ -42,11 +42,11 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true
+    "collapsed": false
    },
    "outputs": [],
    "source": [
-    "! head -n 20 ERR760779.pileup "
+    "! head -n 20 $HOME/ERR760779.pileup "
    ]
   },
   {
@@ -73,7 +73,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true
+    "collapsed": false
    },
    "outputs": [],
    "source": [
@@ -109,21 +109,13 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {
     "collapsed": false
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "cat: varscan.e: No such file or directory\r\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "! cat varscan.e"
+    "! cat $HOME/varscan.e"
    ]
   },
   {
@@ -162,7 +154,7 @@
    },
    "outputs": [],
    "source": [
-    "! head -n 28 ERR760779.snps.vcf"
+    "! head -n 24 $HOME/ERR760779.all.pos.vcf"
    ]
   },
   {
@@ -199,7 +191,17 @@
    },
    "outputs": [],
    "source": [
-    "! head -n 29 ERR760779.snps.vcf | tail -n 1"
+    "! head -n 25 $HOME/ERR760779.all.pos.vcf | tail -n 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's have a look at the positions that did not pass the quality filter (7th column).\n",
+    "If the filter has passed then the 7th column had 'PASS'. \n",
+    "\n",
+    "So let us grep for every row in the VCF that does not contain the string 'PASS':"
    ]
   },
   {
@@ -210,14 +212,22 @@
    },
    "outputs": [],
    "source": [
-    "! grep -v 'PASS' ERR760779.snps.vcf"
+    "! grep -v 'PASS' $HOME/ERR760779.all.pos.vcf | grep -v \"#\""
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We now have the genomic positions where the variants are but we don't know in which gene they fall, or what the effect of the variant is."
+    "in the header section of the VCF it says that **str10** means that less than 10% or more than 90% of variant supporting reads on one strand.\n",
+    "**Strandedness** is a key factor influence the rate of false positives so make sure you exclude these positions from your analysis ! "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### We now have the genomic positions where the variants are but we don't know in which gene they fall, or what the effect of the variant is."
    ]
   },
   {
@@ -244,7 +254,7 @@
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "collapsed": true
+    "collapsed": false
    },
    "outputs": [],
    "source": [
@@ -271,6 +281,102 @@
     "! cat ~/Workshop_SA/notebooks/annotation/additionnal_annotations.bed "
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "! cat $HOME/ERR760779.snps.vcf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You want to exclude the ~10% of the region which are repetitive? We propose a parsing script in python that will read the annotates vcf of SNPs and create two new files:\n",
+    "    - one file with only the fixed snps (variant frequency >= 90%)\n",
+    "    - one file with onle the variable snps (10% <= variant frequency < 90%)\n",
+    "\n",
+    "The two files will no longer contain snps that fall in repetitive regions and other regions which are challenging to map"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%cd "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "pos_exclude = []\n",
+    "vcf_fixed_snp = open('ERR760779.fixed.annot.filtered.snps','w') # create a new file with the fixed snps\n",
+    "vcf_variable_snp = open('ERR760779.variable.annot.filtered.snps','w') # create a new file with the fixed snps\n",
+    "snp_file = open('ERR760779.snps.ann.vcf','r') # read the annotates snps vcf\n",
+    "\n",
+    "with open('Workshop_SA/notebooks/Locus_to_exclude_Mtb.txt','r') as positions_to_exclude_file:\n",
+    "    table = [position.strip().split('\\t') for position in positions_to_exclude_file]\n",
+    "\n",
+    "    for i in range(1,len(table)):\n",
+    "        StartPosition = int(table[i][1])\n",
+    "        EndPosition = int(table[i][2])\n",
+    "        pos_exclude += [[StartPosition,EndPosition]]\n",
+    "\n",
+    "    positions_to_exclude_templist = []\n",
+    "    positions_to_exclude = {}\n",
+    "\n",
+    "    for coord in pos_exclude:\n",
+    "        positions_to_exclude_templist += range(coord[0], coord[1])\n",
+    "    for coordinates in positions_to_exclude_templist:\n",
+    "        positions_to_exclude[coordinates] = ''\n",
+    "\n",
+    "    vcf_table = [i.split() for i in snp_file]\n",
+    "\n",
+    "    fixed_snps_count = 0\n",
+    "    variable_snps_count = 0\n",
+    "\n",
+    "    for line in vcf_table: # loop through the vcf table\n",
+    "        if line[0][0] == '#': # if it's the header lines\n",
+    "            vcf_fixed_snp.write(\"\\t\".join(line) + '\\n') # write them to the new file\n",
+    "            vcf_variable_snp.write(\"\\t\".join(line) + '\\n')\n",
+    "        else: # if it's another line than the header i.e the variants\n",
+    "            if int(line[1]) not in positions_to_exclude: # if column 2 (python is 0 based!) which is the position, is not present in the list of positions we exclude from the analysis\n",
+    "                if 'HOM=1' in line[7]: # HOM=1 means it's a fixed snp\n",
+    "                    fixed_snps_count += 1\n",
+    "                    vcf_fixed_snp.write(\"\\t\".join(line) + '\\n')\n",
+    "                elif 'HET=1' in line[7]: # HET=1 means it's a variable snp\n",
+    "                    variable_snps_count += 1\n",
+    "                    vcf_variable_snp.write(\"\\t\".join(line) + '\\n')\n",
+    "\n",
+    "snp_file.close()\n",
+    "vcf_fixed_snp.close()\n",
+    "vcf_variable_snp.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "! cat ERR760779.fixed.annot.filtered.snps"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -413,7 +519,26 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/scicore/home/gagneux/loiseau\n"
+     ]
+    }
+   ],
+   "source": [
+    "%cd"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
    "metadata": {
     "collapsed": true
    },
@@ -444,7 +569,7 @@
    "outputs": [],
    "source": [
     "VCF_dico = {}\n",
-    "with open('ERR760779.snp.ann.vcf','r') as VCF: # open the VCF\n",
+    "with open('ERR760779.snps.ann.vcf','r') as VCF: # open the VCF\n",
     "    for i in VCF: # Loop into the lines of the VCF\n",
     "        if i[0] != '#': # if the first character of the line in the VCF is # ( we want to ignore the header of the VCF)\n",
     "            i = i.strip()\n",
@@ -502,31 +627,23 @@
     "\n",
     "For the time of this workshop we will not have time to perform the different bioinformatics steps for all the genomes of the Elholm dataset.\n",
     "\n",
-    "We've prepared a merged VCF using GATK."
+    "We've prepared a merged VCF using GATK: Workshop_SA/data_Eldholm/Eldholm2015_vcf_merged"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": null,
    "metadata": {
     "collapsed": false
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "/scicore/home/gagneux/loiseau\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "%cd loiseau/"
+    "! head -n 15 Workshop_SA/data_Eldholm/Eldholm2015_vcf_merged"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": null,
    "metadata": {
     "collapsed": false
    },
@@ -535,7 +652,7 @@
     "list_of_sample_names = []\n",
     "VCF_merged_dico = {}          \n",
     "\n",
-    "with open('Workshop_SA/data_Eldholm/Eldholm2015_vcf_merged','r') as vcf_merged:\n",
+    "with open('Workshop_SA/data_Eldholm/Eldholm2015_vcf_merged','r') as vcf_merged: # read the merged vcf file\n",
     "    for lines in vcf_merged:\n",
     "        if lines[0:2] != '##':\n",
     "            lines = lines.strip().split()\n",
@@ -556,34 +673,11 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": null,
    "metadata": {
     "collapsed": false
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['761110', 'A', 'T', 'Rv0667', 'D435V', 'RIF', 'Walker_resistant-resistant']\n",
-      "['4247431', 'G', 'C', 'Rv3795', 'M306I', 'EMB', 'Walker_resistant-resistant']\n",
-      "['4249583', 'G', 'A', 'Rv3795', 'D1024N', 'EMB', 'Boettger_DST']\n",
-      "['761100', 'C', 'A', 'Rv0667', 'Q432K', 'RIF', 'Walker_resistant-resistant']\n",
-      "['2155168', 'C', 'G', 'Rv1908c', 'S315T', 'INH', 'Walker_resistant-resistant']\n",
-      "['761155', 'C', 'T', 'Rv0667', 'S450L', 'RIF', 'Walker_resistant-resistant']\n",
-      "['761140', 'A', 'G', 'Rv0667', 'H445R', 'RIF', 'Walker_resistant-resistant']\n",
-      "['7581', 'G', 'A', 'Rv0006', 'D94N', 'FQ', 'Walker_resistant-resistant']\n",
-      "['7582', 'A', 'G', 'Rv0006', 'D94G', 'FQ', 'Walker_resistant-resistant']\n",
-      "['4247429', 'A', 'G', 'Rv3795', 'M306V', 'EMB', 'Walker_resistant-resistant']\n",
-      "['7570', 'C', 'T', 'Rv0006', 'A90V', 'FQ', 'Walker_resistant-resistant']\n",
-      "['1473246', 'A', 'G', 'rrs', 'A1401G', 'AK;CAP;KAN', 'Walker_resistant-resistant']\n",
-      "['1673432', 'T', 'C', 'Rv1483', 'T-8C', 'INH;ETH', 'Walker_resistant-resistant;Boettger_DST']\n",
-      "['761139', 'C', 'T', 'Rv0667', 'H445Y', 'RIF', 'Walker_resistant-resistant']\n",
-      "['1673425', 'C', 'T', 'Rv1483', 'C-15T', 'INH;ETH', 'Walker_resistant-resistant;Boettger_DST']\n",
-      "['4247730', 'G', 'C', 'Rv3795', 'G406A', 'EMB', 'Walker_resistant-resistant']\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "with open('all_DRM_in_Eldholm.txt','w') as DRM_Eldholm_output_file:\n",
     "    DRM_Eldholm_output_file.write('position\\tref\\talt\\tlocus\\tmutation\\t'+'\\t'.join(list_of_sample_names)+'\\n')\n",
@@ -603,34 +697,11 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": null,
    "metadata": {
     "collapsed": false
    },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "('D435V', 4, 0)\n",
-      "('M306I', 1, 0)\n",
-      "('D1024N', 1, 0)\n",
-      "('Q432K', 1, 0)\n",
-      "('S315T', 248, 0)\n",
-      "('S450L', 234, 1)\n",
-      "('H445R', 1, 0)\n",
-      "('D94N', 0, 1)\n",
-      "('D94G', 3, 0)\n",
-      "('M306V', 1, 0)\n",
-      "('A90V', 3, 0)\n",
-      "('A1401G', 230, 0)\n",
-      "('T-8C', 1, 0)\n",
-      "('H445Y', 3, 0)\n",
-      "('C-15T', 1, 0)\n",
-      "('G406A', 234, 0)\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "\n",
     "with open('all_DRM_in_Eldholm.txt','r') as ifile:\n",
@@ -659,6 +730,18 @@
    "display_name": "Python 2",
    "language": "python",
    "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
   }
  },
  "nbformat": 4,
diff --git a/notebooks/CapeTown_Genomics_Tutorial_partIII.pdf b/notebooks/CapeTown_Genomics_Tutorial_partIII.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..677cb57dae6c35d754ec1d8f49d1f2d15465aee1
Binary files /dev/null and b/notebooks/CapeTown_Genomics_Tutorial_partIII.pdf differ
diff --git a/notebooks/annotation/snpEff.config b/notebooks/annotation/snpEff.config
index f07483b3e222f1b631af1abdf17cf2da70e59b39..7fd2eb11955991356b0e023c1849114cf07b2bf3 100644
--- a/notebooks/annotation/snpEff.config
+++ b/notebooks/annotation/snpEff.config
@@ -14,7 +14,7 @@
 # Also, a non-absolute path will be relative to config's file dir
 # 
 #---
-data.dir = /Workshop_SA/notebooks/annotation/data/
+data.dir = data/
 
 #---
 # Database repository: A URL to the server where you can download databases (command: 'snpEff download dbName')
diff --git a/notebooks/slurm_scripts/launch_BWA.slurm b/notebooks/slurm_scripts/launch_BWA.slurm
index 6fd419ba9bcfa41591ca5d8eb5a0705de9ab7b59..0484fca5cc3499afaaf34039feecf019d5ca33d9 100644
--- a/notebooks/slurm_scripts/launch_BWA.slurm
+++ b/notebooks/slurm_scripts/launch_BWA.slurm
@@ -1,7 +1,7 @@
 #!/bin/bash 
 
 #SBATCH --job-name=BWA
-#SBATCH --cpus-per-task=1
+#SBATCH --cpus-per-task=2
 #SBATCH --mem-per-cpu=4G
 #SBATCH --output=BWA.o
 #SBATCH --error=BWA.e
diff --git a/notebooks/slurm_scripts/launch_bamqc.slurm b/notebooks/slurm_scripts/launch_bamqc.slurm
index 42919a9555bcbb285326e96afd14f1849eb777a5..c100118c32bf41af82f64ed54d0d368891940eb9 100644
--- a/notebooks/slurm_scripts/launch_bamqc.slurm
+++ b/notebooks/slurm_scripts/launch_bamqc.slurm
@@ -4,4 +4,4 @@
 #SBATCH --cpus-per-task=1
 #SBATCH --mem-per-cpu=4G
 
-singularity exec /home/container.img qualimap bamqc -bam ERR760779.dedup.realigned.bam -sd -sdmode 1 -outdir . -outfile ERR760779_bamqc
+singularity exec /home/container.img qualimap bamqc -bam ERR760779.dedup.bam  -sd -sdmode 1 -outdir . -outfile ERR760779_bamqc
diff --git a/notebooks/slurm_scripts/launch_mpileup.slurm b/notebooks/slurm_scripts/launch_mpileup.slurm
index aca1de106c3dfa65a4beb45dfcc0f0a65d481aa8..d737accc19399e483851c7fdad071fd3d8323e6a 100644
--- a/notebooks/slurm_scripts/launch_mpileup.slurm
+++ b/notebooks/slurm_scripts/launch_mpileup.slurm
@@ -4,5 +4,5 @@
 #SBATCH --cpus-per-task=1
 #SBATCH --mem-per-cpu=4G
 
-singularity exec /home/container.img samtools mpileup -ABQ0 -q 20 -f ~/Workshop_SA/notebooks/reference_genome/MTB_ancestor_reference.fasta ERR760779.dedup.realigned.bam > ERR760779.pileup
+singularity exec /home/container.img samtools mpileup -ABQ0 -q 20 -f ~/Workshop_SA/notebooks/reference_genome/MTB_ancestor_reference.fasta ERR760779.dedup.bam > ERR760779.pileup
 
diff --git a/notebooks/slurm_scripts/launch_snpeff.slurm b/notebooks/slurm_scripts/launch_snpeff.slurm
index a311d49aef51b530d486946cad4e1097fe4cd526..a062aaff5575e316f8e682cc37855b8b1eb1cb23 100644
--- a/notebooks/slurm_scripts/launch_snpeff.slurm
+++ b/notebooks/slurm_scripts/launch_snpeff.slurm
@@ -4,4 +4,4 @@
 #SBATCH --cpus-per-task=1
 #SBATCH --mem-per-cpu=4G
 
-singularity exec /home/container.img snpEff ann -c /home/Workshop_SA/notebooks/annotation/snpEff.config -noStats -no-downstream -no-upstream MTB_ANC -interval ~/Workshop_SA/notebooks/annotation/additionnal_annotations.bed ERR760779.snps.vcf > ERR760779.snps.ann.vcf
+singularity exec /home/container.img snpEff ann -c Workshop_SA/notebooks/annotation/snpEff.config -noStats -no-downstream -no-upstream MTB_ANC -interval ~/Workshop_SA/notebooks/annotation/additionnal_annotations.bed ERR760779.snps.vcf > ERR760779.snps.ann.vcf