Skip to content
Snippets Groups Projects

Issue13

14 unresolved threads
Closed Valentin Gallier requested to merge issue13 into main
14 unresolved threads

Adding the code for the issue 13

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
1 <component name="InspectionProjectProfileManager">
  • Please remove this file manually with rm <file> && git add <file>, then commit and push. You may also want to add a pattern to .gitignore so that these types of files aren't accidentally added in the future (note that this won't affect files that have already been added, hence the manual deletion is necessary).

    The same goes for all other files in the .idea/ directory so it's best to add that entire folder as an ignore pattern. Check Google on how to do that.

  • Please register or sign in to reply
  • Alex Kanitz
  • 1 class AccuracyEstimate:
  • Alex Kanitz
  • 8 def accuracy_estimate(input_file, simulation_file):
    9
    10 """Determine the accuracy of the simulated values regarding the input values.
    11
    12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    13
    14 Args:
    15 input_file: Csv-file with input values
    16 simulation_file: Csv-file with mean and variance obtained from the simulation
    17
    18 Returns:
    19 Scatter plot of initial vs inferred counts for all genes with error bars
    20
    21 """
    22
    23 import pandas as pd
  • Alex Kanitz
  • 1 class AccuracyEstimate:
    • In the current implementation, there is no added benefit of having your code defined in a class, as the only defined method is basically a static function and there is no __init__() method or anything like that. I would recommend to just get rid of the class and only keep the function. We can always refactor later in case new code is added and a class actually makes sense

    • Please register or sign in to reply
  • Alex Kanitz
  • 1 class AccuracyEstimate:
    2
    3 # Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    • Please use a proper (Google-style) docstring like you do below for the function. Or better (if you decide to get rid of the class), work any important info from here into the function's docstring.

    • Please register or sign in to reply
  • Alex Kanitz
  • 1 class AccuracyEstimate:
    2
    3 # Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    4 # Input: 1. Csv-file with input values
    5 # 2. Csv-file with mean and variance obtained from the simulation
    6 # Output: scatter plot of initial vs inferred (mean and error bars) counts for all genes
    7
    8 def accuracy_estimate(input_file, simulation_file):
    • Please add type hints for all your args as well as for the return value, something like:

      def my_func(my_arg_1: str, my_arg_2: int = 8) -> str:

      ^ This function would take one required argument (no default!) of type str and one optional argument (defaults to 8 if not provided) of type int and return a value of type str.

    • Please register or sign in to reply
  • Alex Kanitz
  • 12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    13
    14 Args:
    15 input_file: Csv-file with input values
    16 simulation_file: Csv-file with mean and variance obtained from the simulation
    17
    18 Returns:
    19 Scatter plot of initial vs inferred counts for all genes with error bars
    20
    21 """
    22
    23 import pandas as pd
    24 import numpy as np
    25 import matplotlib.pyplot as plt
    26
    27 # input_file = "input.csv"
    • Please keep your testing code separated from your actual implementation. Put all your test files in tests/test_files and write tests that cover all of the conditional branches of your code (you may need to create additional test files for some conditions).

    • Please register or sign in to reply
  • Alex Kanitz
  • 1 if __name__ == '__main__':
    2 import accuracy_estimate
    • Again, a module-level docstring should be added, followed by all imports, grouped and sorted. Also, it's good practice to define a main() function with all application-level code (i.e., the CLI parsing, calling of your accuracy_estimate() method etc, basically everything between lines 5 and 14), then have the call of the main() function be the only thing you do in the `if name should be defined at the top of the module, and

    • Please register or sign in to reply
  • Alex Kanitz
  • 1 if __name__ == '__main__':
    2 import accuracy_estimate
    • Instead of importing the module, it's usually better (more performant and more explicit) to import only what you really need, so better say:

      from accuracy_estimate.AccuracyEstimate import accuracy_estimate

      Or, if you remove the class, this becomes a little simpler:

      from accuracy_estimate import accuracy_estimate
    • Please register or sign in to reply
  • Alex Kanitz
  • 1 GeneID,Count
  • Alex Kanitz
  • 1 class AccuracyEstimate:
    2
    3 # Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    4 # Input: 1. Csv-file with input values
    5 # 2. Csv-file with mean and variance obtained from the simulation
    6 # Output: scatter plot of initial vs inferred (mean and error bars) counts for all genes
    7
    8 def accuracy_estimate(input_file, simulation_file):
    9
    10 """Determine the accuracy of the simulated values regarding the input values.
    11
    12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    13
    14 Args:
    15 input_file: Csv-file with input values
  • Alex Kanitz
  • 1 if __name__ == '__main__':
    2 import accuracy_estimate
    3 import argparse
    4
    5 parser = argparse.ArgumentParser(description='Do something')
    6 parser.add_argument('input_file', metavar='FILE', type=str, help='Enter the path of the input file')
    7 parser.add_argument('simulation_file', metavar='FILE', type=str, help='Enter the path of the simulation file')
    8
    9 args = parser.parse_args()
    10
    11 input_file = args.input_file
    • You only need these once, in your call in line 14, so there's little use of assigning these here. Instead you can just call the function like this (if you import the function/method directly):

      accuracy_estimate(
          input_file=args.input_file,
          simulation_file=args.simulation_file,
      )

      This is easier to read and modify later on.

      Also, please note that input_file is not a very descriptive file name. All arguments to a function are basically inputs, and whether something is a file can go in the description of the arguments. Better to find some more descriptive names for the CLI and function arguments.

    • Please register or sign in to reply
  • Alex Kanitz
  • 30 df1 = pd.read_csv(input_file)
    31 df2 = pd.read_csv(simulation_file)
    32
    33 # Plot mean and error bars
    34
    35 x = df1['Count']
    36 y = df2['Mean']
    37 yerr = np.sqrt(df2['Variance'])
    38
    39 fig = plt.figure()
    40 ax = fig.add_subplot(1, 1, 1)
    41 ax.scatter(x, y, vmin=0, vmax=100)
    42 ax.errorbar(x, y, yerr=yerr, ecolor="r", fmt="bo", capsize=5)
    43 ax.set_title("Scatter plot of initial vs inferred counts for all genes")
    44
    45 plt.show()
    • The function doesn't seem to return what the docstring says it does. In fact there is no return statement here at all, so it will return None. Given that we want to use the script in a workflow, we cannot have interactive parts, like opening a plot, which would require the user to interact with it or close it. So this should be reimplemented such that the plot is flushed to a file and the file path returned instead.

    • Please register or sign in to reply
  • Alex Kanitz
  • 5 # 2. Csv-file with mean and variance obtained from the simulation
    6 # Output: scatter plot of initial vs inferred (mean and error bars) counts for all genes
    7
    8 def accuracy_estimate(input_file, simulation_file):
    9
    10 """Determine the accuracy of the simulated values regarding the input values.
    11
    12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement.
    13
    14 Args:
    15 input_file: Csv-file with input values
    16 simulation_file: Csv-file with mean and variance obtained from the simulation
    17
    18 Returns:
    19 Scatter plot of initial vs inferred counts for all genes with error bars
    20
  • Apart from the minor comments above, the code should be unit-tested and a one-step Nextflow subworkflow written (see e.g., !10 (closed) and !16 (closed)) to show that this actually works as expected.

  • Alex Kanitz mentioned in merge request !23 (closed)

    mentioned in merge request !23 (closed)

  • Alex Kanitz mentioned in merge request !21 (closed)

    mentioned in merge request !21 (closed)

  • closed

  • Please register or sign in to reply
    Loading