Draft: remove all content for review (DO NOT MERGE!)
This branch and merge request is for reviewing your code and code structure. It is only for the purpose of reviewing and should never be merged! Note that any changes made to the repo by you after creation of this merge request are not considered for this review. Please also note that we will not review code style here (we will use automatic tools for that in a future session), but rather we will keep feedback high level at this point. Please address any issues raised (if you haven't already addressed them on your own in the meantime).
Next to the in-line and general code comments we will give during the review sessions (done by me and Mihaela), we keep track of the status of the more formal requirements (i.e., not the actual code) with respect to repository setup, packaging and documentation in the following checklist. Please complete any pieces that remain after the "List checked by reviewers" was set to "Yes" by us (indicating that we are done checking your repo against the checklist). To address these, please check the relevant notebooks and homeworks for more info.
List checked by reviewers
-
Yes -
No
Version control
-
Repo configured correctly: repo public; default branch protected against pushes; fast-forward merges; encourage squash commits; delete source branch by default -
License available in file LICENSE
without any file extension -
.gitignore
available and includes common Python artifacts; no such artifacts (e.g., from building/packaging) are under version control -
Files organized as expected, with at least one and not more than three directories, all in lower case; one directory containing all the tool/app code and named after the app (required), one directory tests
containing all test-related content (required if tests or test files are available), and one directory namedimg
orimages
, containing the screenshots from exercise 1 (fine if omitted or deleted at this point); all other files (LICENSE
,README.md
,.gitignore
,setup.py
, etc.) in repository root directory
Packaging
-
setup.py
available -
CLI executable available -
CLI arguments available -
Tool can be successfully installed with pip install .
-
CLI executable can be successfully executed with -h
option
Documentation
-
README.md
has at least a synopsis, usage and installation instructions and contact information (can use zavolab-biozentrum@unibas.ch if you don't want to put your own); other sections, as outlined in the course materials, welcome -
Google-style docstrings available for all modules, classes, functions, methods -
Type hints provided at least for all functions & methods
Merge request reports
Activity
- README.md deleted 100644 → 0
- README.md deleted 100644 → 0
6 7 **Input files** 8 9 Copy_number_file: 10 11 - csv-formatted file ("NewTranscriptID,ID,Count") 12 13 - id of generated transcript 14 15 - id of original transcript (without intron inclusions) 16 count 17 18 _Eample_ 19 20 `[id of generated transcript] [ID] [Count]` 21 The convention is to use dashes instead of underscores in commandline arguments, and the parts of the names should not be capitalized. Moreover, the names of the fields should be intuitive and meaningful for the user of the code. For e.g. for this module it is irrelevant that there is an original transcript without introns, and it is unclear what a "NewTranscript" is. So the fields could be simply called "TranscriptID" and "ParentID". Finally, the formatting of the info is not quite right. The fact that the file is csv-formatted is meta information about the file, while here this info is shown as part of a drop-down list showing the content of a line in a file. Better would be something like this: transcript_copies (csv-formatted) containing:
- ID of transcript
- ID of parent transcript
- copy number
The example does not provide any additional information, and in fact introduces more confusion because it shows a tab-separated, not comma-separated set of fields. Furthermore, the fact that these fields are shown in brackets suggests that they are optional, which they are not.
- README.md deleted 100644 → 0
31 `> [id of generated transcript] 32 AGUGACGUUAGACCAGAUAGAC....` 33 34 35 priming_site_file: 36 37 - gtf-formatted file 38 39 - id of generated transcript? 40 41 - position of priming site and binding likelihood 42 43 _Eample_ 44 45 `[id of generated transcript] ... [position of priming site]... [binding likelihood ]` 46 - README.md deleted 100644 → 0
48 **Output files** 49 50 cDNA_file: 51 52 - fasta-formatted file 53 54 - Includes all the uniquie "cDNA sequence" and "cDNA sequence ID" 55 56 57 58 cDNA_count_file: 59 60 - csv-formatted file 61 62 - Includes "cDNA sequence ID" and "cDNA count" 63 - cdna/cdna.py deleted 100644 → 0
- cdna/cdna.py deleted 100644 → 0
1 import sys 2 3 def translate(res): 4 translate_dict = {"A": "T", "U": "A", "G": "C", "C":"G"} 5 if res not in translate_dict.keys(): 6 print("cDNA residue not A,T,U or G ") 7 sys.exit(1) 8 return translate_dict[res] 9 This function seems to return the complement of a base? Please use meaningful names for functions and variables. "translate" is usually used in the sense of DNA/RNA to protein. Unclear what "res" is.
Since you are using the sys package and exit method anyway, you could put the message directly into the call to sys.exit, like this:
sys.exit("cDNA residue not A,T,U or G").
Moreover, since there is error checking of the argument, would be probably useful to notice other issues (e.g. upper/lower case letters).
Finally, this type of error would be better caught earlier, just for practical purposes, i.e. if the user works with longer sequences, better to check the whole sequence at once for non-A/C/G/T characters, so that the user fixes them in one go rather than one at a time.
- cdna/cdna.py deleted 100644 → 0
2 3 def translate(res): 4 translate_dict = {"A": "T", "U": "A", "G": "C", "C":"G"} 5 if res not in translate_dict.keys(): 6 print("cDNA residue not A,T,U or G ") 7 sys.exit(1) 8 return translate_dict[res] 9 10 class cDNA_Gen: 11 def __init__(self, 12 fasta, 13 gtf, 14 cpn, 15 output_fasta = "cDNA.fasta", 16 output_csv = "cDNA.csv" 17 ): As we discussed before, having default file names for output files could be problematic, as it can lead to inadvertent overwriting of files in the user's directory. That's why the user should be prompted to think about where they want to save the files.
Here the number of function arguments is still manageable, but just about. If the number of arguments is large, other options for initialization need to be considered, as the user will have a hard time remembering that all arguments are provided in the correct order. It would be better to use keyword arguments (as you do here already with output_fasta and output_csv. The you can create the dictionary of argument names/values before calling the function and pass this dictionary as argument.