Skip to content
Snippets Groups Projects

Draft: remove all content for review (DO NOT MERGE!)

Open Alex Kanitz requested to merge review_milestone_2 into main

This branch and merge request is for reviewing your code and code structure. It is only for the purpose of reviewing and should never be merged! Note that any changes made to the repo by you after creation of this merge request are not considered for this review. Please also note that we will not review code style here (we will use automatic tools for that in a future session), but rather we will keep feedback high level at this point. Please address any issues raised (if you haven't already addressed them on your own in the meantime).

Next to the in-line and general code comments we will give during the review sessions (done by me and Mihaela), we keep track of the status of the more formal requirements (i.e., not the actual code) with respect to repository setup, packaging and documentation in the following checklist. Please complete any pieces that remain after the "List checked by reviewers" was set to "Yes" by us (indicating that we are done checking your repo against the checklist). To address these, please check the relevant notebooks and homeworks for more info.

List checked by reviewers

  • Yes
  • No

Version control

  • Repo configured correctly: repo public; default branch protected against pushes; fast-forward merges; encourage squash commits; delete source branch by default
  • License available in file LICENSE without any file extension
  • .gitignore available and includes common Python artifacts; no such artifacts (e.g., from building/packaging) are under version control
  • Files organized as expected, with at least one and not more than three directories, all in lower case; one directory containing all the tool/app code and named after the app (required), one directory tests containing all test-related content (required if tests or test files are available), and one directory named img or images, containing the screenshots from exercise 1 (fine if omitted or deleted at this point); all other files (LICENSE, README.md, .gitignore, setup.py, etc.) in repository root directory

Packaging

  • setup.py available (used pyproject.toml as alternative)
  • CLI executable available
  • CLI arguments available
  • Tool can be successfully installed with pip install .
  • CLI executable can be successfully executed with -h option

Documentation

  • README.md has at least a synopsis, usage and installation instructions and contact information (can use zavolab-biozentrum@unibas.ch if you don't want to put your own); other sections, as outlined in the course materials, welcome
  • Google-style docstrings available for all modules, classes, functions, methods
  • Type hints provided at least for all functions & methods
Edited by Alex Kanitz

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Alex Kanitz marked the checklist item License available as completed

    marked the checklist item License available as completed

  • Alex Kanitz marked the checklist item .gitignore available as completed

    marked the checklist item .gitignore available as completed

  • Alex Kanitz changed the description

    changed the description

  • Alex Kanitz marked the checklist item License available in file LICENSE without any file extension as incomplete

    marked the checklist item License available in file LICENSE without any file extension as incomplete

  • Alex Kanitz changed the description

    changed the description

  • Alex Kanitz changed the description

    changed the description

  • Alex Kanitz changed the description

    changed the description

  • Alex Kanitz changed the description

    changed the description

  • Alex Kanitz marked the checklist item Files organized as expected, with at least one and not more than three directories, all in lower case; one directory containing all the tool/app code and named after the app (required), one directory tests containing all test-related content (required if tests or test files are available), and one directory named img or images, containing the screenshots from exercise 1 (fine if omitted or deleted at this point); all other files (LICENSE, README.md, .gitignore, setup.py, etc.) in repository root directory as completed

    marked the checklist item Files organized as expected, with at least one and not more than three directories, all in lower case; one directory containing all the tool/app code and named after the app (required), one directory tests containing all test-related content (required if tests or test files are available), and one directory named img or images, containing the screenshots from exercise 1 (fine if omitted or deleted at this point); all other files (LICENSE, README.md, .gitignore, setup.py, etc.) in repository root directory as completed

  • Alex Kanitz changed the description

    changed the description

  • Alex Kanitz marked the checklist item Yes as completed

    marked the checklist item Yes as completed

  • Alex Kanitz marked the checklist item No as incomplete

    marked the checklist item No as incomplete

  • Alex Kanitz
1 """Transcript structure generator package."""
2
3
4 __version__ = '0.0.0'
  • This is a great way towards keeping a definite version in a single place, and a good reminder to check whether we did cover this pattern in the materials (which, if we didn't, we probably should).

    Now, recently I ran into some vexing issues with keeping the __version__ string in the main package module (__init__.py of the root package). The reason is that it is typically a place where convencience imports are defined, where a convenience import is defined as a sort of import shortcut for entry points into a package. For example, say, a class TranscriptStructureGenerator is the main entry point into your package and it is defined in module transcript_structure_generator.py in the package tsg. Then, people wanting to make use of that entry point by importing the class, would need to do:

    from tsg.transcript_structure_generator import TranscriptStructureGenerator

    Worse even, if the class was actually defined in a module of some subpackage, say ... transcript_structure_generator. Then, the import would become:

    from tsg.transcript_structure_generator.transcript_structure_generator import TranscriptStructureGenerator

    I guess the pattern is clear...

    Now, what one can do is to import the class with the above call in an easily accessible location, e.g., in tsg/__init__.py. Then users could, in their turn, import the class with simply:

    from tsg import TranscriptStructureGenerator

    Much easier!

    Back to the problem with __version__ being defined in the same module as the convenience imports: If you want to make use of your single source of truth for your version and actually importing __version__ in another module, this will then import the entire entry point, and possibly the entire app! This can have unintended consequeces, especially when, e.g., importing the entry point actually creates some web service or daemon or something of that kind (which is probably a bad pattern anyway, but maybe sometimes desired). It can also potentially lead to circular imports, which are nasty to debug/fix.

    Running into one of the latter, I have recently moved to a pattern where I store __version__.py in a module version.py in the package root, i.e., here it would be in tsg/version.py. By having a dedicated module for the version that does not contain anything else, importing __version__ anywhere in the package becomes safe.

    On that topic though: It is actually not safe to import __version__ outside of the package, e.g., in your tests/ directory or in setup.py (another issue I ran into). This is because the actual package may not actually be available (yet) at the time the import is called. It makes sense: When you use setup.py to install a package for the first time and one of the first lines of that file is an instruction to import from a package that the file is designed to install further down, then this is bound to cause problems.

    A way to keep a single source of truth in package/version.py while not relying on the package to be available can be found here: https://github.com/elixir-cloud-aai/foca/blob/f476cf129b5acdfb91d0dada1cfe2371a6495e72/setup.py#L8

    In fact, this PR addresses both of the mentioned issues (detaching convenience imports from version and using __version__ without importing it) are addressed in this PR for the package mentioned above: https://github.com/elixir-cloud-aai/foca/pull/152/files

  • See also this for a more comprehensive list of ways how you can single source your version: https://packaging.python.org/en/latest/guides/single-sourcing-package-version/

    Not sure how to do it with pyproject.toml, but you could try if the attr: directive works (gray box in technique 1).

    By the way, I have just seen that your tool version in pyproject.toml and tsg/__init__.py already diverged. Guess it illustrates that single sourcing the version is rather important.

  • Please register or sign in to reply
  • Alex Kanitz
  • 1 from tsg.cli import app
    2
    3 if __name__ == "__main__":
    4 app()
  • Alex Kanitz
  • 1 import argparse
    2 import logging
    3 from pathlib import Path
    4
    5 from tsg.main import sample_transcripts
    6
    7
    8 def setup_logging(loglevel: str = None) -> None:
    9 """Set up logging. Loglevel can be one of ["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG"].
    10
    11 Args:
    12 loglevel (str, optional): Level of log output. Defaults to None.
    • Best not to provide type information in two places because of the risk of divergence in the future. Try to keep a single source of info in all places, most definitely in documentation (which anyway runs a risk of becoming outdated when people forget to update it with their code). Adding type hints is preferred over embedding type info in docstrings. Autodoc generation tools like Sphinx know how to parse the type info from type hints, so no info is lost.

    • Please register or sign in to reply
  • Alex Kanitz
  • 5 from tsg.main import sample_transcripts
    6
    7
    8 def setup_logging(loglevel: str = None) -> None:
    9 """Set up logging. Loglevel can be one of ["CRITICAL", "ERROR", "WARNING", "INFO", "DEBUG"].
    10
    11 Args:
    12 loglevel (str, optional): Level of log output. Defaults to None.
    13
    14 Raises:
    15 ValueError: If string that is not a log level is passed, raise error.
    16 """
    17 if loglevel:
    18 numeric_level = getattr(logging, loglevel.upper())
    19 if not isinstance(numeric_level, int):
    20 raise ValueError("Invalid log level: %s" % loglevel)
    • Comment on lines -18 to -20

      Actually, an error (AttributeError) would likely already be raised when calling getattr on an attribute that doesn't exist on the logging object. So likely your code to raise an error for invalid log levels would never be raised.

      This is also a good opportunity to follow Python's "It's better to ask forgiveness than permission" trope, i.e., catch the AttributeError rather than checking and raising an error manually.

      By the way, perhaps a better way to define the available options is via an enum. This gives one the option to type hint to the name of the defined enum, which then automatically lists or at least refers to the available options, dynamically, i.e., no need to update when more options are added or options are removed. You could also rewrite your check by checking enum membership, which - while not following the forgiveness/permission pattern - has the advantage of being very direct and explicit.

    • Please register or sign in to reply
  • Alex Kanitz
  • 20 raise ValueError("Invalid log level: %s" % loglevel)
    21 else:
    22 numeric_level = getattr(logging, "INFO")
    23
    24 logging.basicConfig(
    25 format='[%(asctime)s: %(levelname)s] %(message)s (module "%(module)s")',
    26 level=numeric_level,
    27 )
    28
    29
    30 def build_arg_parser() -> argparse.ArgumentParser:
    31 parser = argparse.ArgumentParser()
    32 parser.add_argument("--transcripts", type=str)
    33 parser.add_argument("--annotation", type=str)
    34 parser.add_argument("--prob_inclusion", type=float)
    35 parser.add_argument("--log", type=str)
    • Comment on lines -32 to -35

      Please consider the comments about flag/option names and command-line interface design in read-sequencer!24 and at the very least:

      • add informative help messages for each option
      • provide defaults where reasonable (--prob-inclusion and --log, I guess)
    • Please register or sign in to reply
  • Alex Kanitz
  • 30 def build_arg_parser() -> argparse.ArgumentParser:
    31 parser = argparse.ArgumentParser()
    32 parser.add_argument("--transcripts", type=str)
    33 parser.add_argument("--annotation", type=str)
    34 parser.add_argument("--prob_inclusion", type=float)
    35 parser.add_argument("--log", type=str)
    36
    37 return parser
    38
    39
    40 def get_args() -> argparse.Namespace:
    41 parser = build_arg_parser()
    42
    43 args = parser.parse_args()
    44
    45 return args
    • Comment on lines -40 to -45

      I'm all for encapsulating code, but here you basically define an extra function just for a single line of code. Given that there are hardly any use cases for calling build_arg_parser() outside of a situation where you also want to actually parse these arguments, I would probably merge the two. Otherwise you need to pay the maintenance overhead of having extra functions (documentation, type annotations, writing separate unit tests!).

      Of course up to you, it's a totally legit pattern as is, and it certainly nicely separates concerns (definition vs action).

    • Please register or sign in to reply
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Please register or sign in to reply
    Loading