Determine P-site offsets
Determine P-site offsets based for the different read lengths. Add it to the snakemake/process_data workflow.
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- BIOPZ-Gypas Foivos mentioned in issue #6 (closed)
mentioned in issue #6 (closed)
- BIOPZ-Gypas Foivos changed title from Determine P-site offsets to Determine A-site offsets
changed title from Determine P-site offsets to Determine A-site offsets
- BIOPZ-Gypas Foivos changed the description
changed the description
- BIOPZ-Gypas Foivos changed title from Determine A-site offsets to Determine P-site offsets
changed title from Determine A-site offsets to Determine P-site offsets
- BIOPZ-Gypas Foivos changed the description
changed the description
- Author Maintainer
One solution would be to use ribowaltz -> https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006169#pcbi-1006169-g001
- Author Maintainer
Got the following part from Joao by email on 21//11/2018:
(cds_start,cds_end) = self.genome.genome[gene_name].transcripts[transcript_name].transcript_coding_region ### Calculating P-site offset possible_offsets = [10,11,12,13,14,15,16] # Create a dictionary to score the different offsets for the distinct read lenghts if not readsDict.has_key(read_len): readsDict[read_len] = {} for offset in possible_offsets: readsDict[read_len][offset] = [0,0,0,0,0] #1st,2nd,3rd,UTR,CDS readsDict[read_len]["reads"] = [] for offset in possible_offsets: psite_pos = (read_start + offset) #P-site asite_pos = psite_pos + 3 #A-site rel_pos_psite = psite_pos - cds_start rel_pos_asite = asite_pos - cds_start - 3 if filterReadsForPeriodicity: #There's NO peak at start codon if rel_pos_asite < 0: readsDict[read_len][offset][3] += 1 elif rel_pos_asite > (cds_end - cds_start): readsDict[read_len][offset][3] += 1 else: if rel_pos_asite > 60: #ignore first 20 codons frame_rel_pos = rel_pos_asite % 3 readsDict[read_len][offset][frame_rel_pos] += 1 else: #There's a peak at start codon if rel_pos_psite < 0: readsDict[read_len][offset][3] += 1 elif rel_pos_psite > 2: readsDict[read_len][offset][4] += 1 elif rel_pos_psite in [0,1,2]: #anchor at start codon readsDict[read_len][offset][rel_pos_psite] += 1 readsDict[read_len]["reads"].append(alngt) for rl in readsDict.keys(): best_offset = 0 best_offset_score = 0 for offset in possible_offsets: raw_counts = readsDict[rl][offset][0:3] if sum(raw_counts) > 0: if (sum(raw_counts) > 200) and (raw_counts[0] > raw_counts[1]) and (raw_counts[0] > raw_counts[2]): if (raw_counts[0] > best_offset_score) and ((not filterReadsForPeriodicity) or (float(raw_counts[0])/sum(raw_counts) > 0.4)): best_offset = offset best_offset_score = raw_counts[0] if best_offset != 0: self.alignment_offset[rl] = best_offset self.filtered_alignments += (readsDict[rl]["reads"]) sys.stderr.write("PEAK CALL:"+str(rl)+": peak found at position "+str(best_offset)+"\n") else: sys.stderr.write("PEAK CALL:"+str(rl)+": peak not found\n") print "Total:",len(self.mappingsObject.alignments),"\tWith periodicity:",len(self.filtered_alignments)
Based on this I should make a proper script.
Edited by BIOPZ-Gypas Foivos - Author Maintainer
Done on fc115178
- BIOPZ-Gypas Foivos closed
closed
Please register or sign in to reply