Draft: remove all content for review (DO NOT MERGE!)
This branch and merge request is for reviewing your code and code structure. It is only for the purpose of reviewing and should never be merged! Note that any changes made to the repo by you after creation of this merge request are not considered for this review. Please also note that we will not review code style here (we will use automatic tools for that in a future session), but rather we will keep feedback high level at this point. Please address any issues raised (if you haven't already addressed them on your own in the meantime).
Next to the in-line and general code comments we will give during the review sessions (done by me and Mihaela), we keep track of the status of the more formal requirements (i.e., not the actual code) with respect to repository setup, packaging and documentation in the following checklist. Please complete any pieces that remain after the "List checked by reviewers" was set to "Yes" by us (indicating that we are done checking your repo against the checklist). To address these, please check the relevant notebooks and homeworks for more info.
List checked by reviewers
-
Yes -
No
Version control
-
Repo configured correctly: repo public; default branch protected against pushes; fast-forward merges; encourage squash commits; delete source branch by default -
License available in file LICENSEwithout any file extension -
.gitignoreavailable and includes common Python artifacts; no such artifacts (e.g., from building/packaging) are under version control -
Files organized as expected, with at least one and not more than three directories, all in lower case; one directory containing all the tool/app code and named after the app (required), one directory testscontaining all test-related content (required if tests or test files are available), and one directory namedimgorimages, containing the screenshots from exercise 1 (fine if omitted or deleted at this point); all other files (LICENSE,README.md,.gitignore,setup.py, etc.) in repository root directory
Packaging
-
setup.pyavailable -
CLI executable available -
CLI arguments available -
Tool can be successfully installed with pip install . -
CLI executable can be successfully executed with -hoption
Documentation
-
README.mdhas at least a synopsis, usage and installation instructions and contact information (can use zavolab-biozentrum@unibas.ch if you don't want to put your own); other sections, as outlined in the course materials, welcome -
Google-style docstrings available for all modules, classes, functions, methods -
Type hints provided at least for all functions & methods
Merge request reports
Activity
- README.md deleted 100644 → 0
41 ## Usage/Examples 42 43 ```python script 44 45 46 ``` 47 48 49 ## License 50 51 [MIT](https://choosealicense.com/licenses/mit/) license, Copyright (c) 2022 Zavolan Lab, Biozentrum, University of Basel 52 53 54 ## Contributers 55 Samuel Mondal, Ahmed Hassan Hussein H.Mahmoud, Gina Boot 56 - sequence_extractor/cli.py deleted 100644 → 0
1 import argparse 2 import logging 3 from pre_bedtools import exon_extraction_from_gtf 4 from exon_concatenation import exon_concatenation 5 from polyA import PolyA_generator 6 from list_to_file import list_to_file 7 8 parser = argparse.ArgumentParser( 9 prog = 'transcript_sequence_extractor', 10 description = 'extracts transcript sequences from genome sequence and ouputs transcripts with PolyA tail added to them') - sequence_extractor/pre_bedtools.py deleted 100755 → 0
4 5 6 exons = gtf[gtf[2]=="exon"] 7 feat = list(exons[8]) 8 superlist = [] 9 idlist = [] 10 for x in range(len(feat)): 11 newlist = feat[x].split(";") 12 superlist.append(str(newlist[2])[16:-1]) 13 idlist.append(str(newlist[0])[9:-1]) 14 15 16 bed = {"chr":exons[0],"start":exons[3],"end":exons[4],"transcript_id":superlist,"score":exons[5],"strand":exons[6],"gene_id":idlist} 17 class bed: 18 def__init__(self, exons, chr, start, end, transcript_id, score, strand, gene_id): 19 self.exons = exons If you do have isolated the parts of the gtf line to pass them to the "bed" class constructor, what's the need for the class? It does not have additional methods defined... Please think what are the conceptual tasks that you need to solve and start from there to decide whether or not and if so where you need to define classes.
- sequence_extractor/pre_bedtools.py deleted 100755 → 0
34 35 Returns 36 ------- 37 Class 38 A class which defines columns in standard BED format. 39 40 41 42 Raises 43 ------ 44 TypeError 45 ValueError: Not all columns found in GTF. 46 """ 47 bed = pd.DataFrame(bed) 48 bed.to_csv("bed_file.bed",sep="\t",index=False) 49 bed[(bed["gene_id"]=="ENSG00000160072")|(bed["gene_id"]== "ENSG00000142611")|(bed["gene_id"]=="ENSG00000232596")].to_csv("test.bed",sep="\t",index=False,header=None) - sequence_extractor/list_to_file.py deleted 100644 → 0
- sequence_extractor/list_to_file.py deleted 100644 → 0
1 def list_to_file( 2 to_write_to_file: list, 3 filename: str, 4 ) -> None: 5 """Creates a file from a list that is input to the function. 6 7 Args: 8 to_write_to_file: The list that you want to write to a file. 9 filename: The name you want the output fasta file to have (also include the extension of the file while calling the function). 10 11 Returns: 12 Nothing, since it outputs a file directly to the working directory 13 """ 14 file = open(filename,'a') 14 to_write_to_file = [] 15 for x in range(int(len(lines)/2)): 16 if x == 0: 17 annotation = lines[0] 18 read = lines[1] 19 if x >= 1: 20 if lines[2*x] == lines[2*(x-1)]: 21 read+= lines[(2*x)+1] 22 else: 23 to_write_to_file.append(annotation) 24 to_write_to_file.append(read) 25 annotation = lines[2*x] 26 read = lines[(2*x)+1] 27 to_write_to_file.append(annotation) 28 to_write_to_file.append(read) 29 return to_write_to_file First, there are a lot of assumptions here. For e.g., fasta files usually have the sequence split on multiple lines, each line with 60 or 80 characters (nucleotides). Then, it is not guaranteed that the related entries (exons in this case) come in consecutive lines. Even if they do, it's not guaranteed that they occur in the correct order (in this case, the order of transcription). There is also no need to return the values in a structure like a list, in which the order of the items is obscured to the user, as opposed to say, a tuple.