Issue13
Adding the code for the issue 13
Merge request reports
Activity
1 <component name="InspectionProjectProfileManager"> Please remove this file manually with
rm <file> && git add <file>
, then commit and push. You may also want to add a pattern to.gitignore
so that these types of files aren't accidentally added in the future (note that this won't affect files that have already been added, hence the manual deletion is necessary).The same goes for all other files in the
.idea/
directory so it's best to add that entire folder as an ignore pattern. Check Google on how to do that.
- accuracy_estimate/accuracy_estimate.py 0 → 100644
- accuracy_estimate/accuracy_estimate.py 0 → 100644
8 def accuracy_estimate(input_file, simulation_file): 9 10 """Determine the accuracy of the simulated values regarding the input values. 11 12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. 13 14 Args: 15 input_file: Csv-file with input values 16 simulation_file: Csv-file with mean and variance obtained from the simulation 17 18 Returns: 19 Scatter plot of initial vs inferred counts for all genes with error bars 20 21 """ 22 23 import pandas as pd Please place all imports at the top of the module, right after the docstring, but outside of any class or function definitions. Please have a look at PEP 8. Also have a look at this comment: !13 (comment 24993)
Edited by Alex Kanitz
- accuracy_estimate/accuracy_estimate.py 0 → 100644
1 class AccuracyEstimate: In the current implementation, there is no added benefit of having your code defined in a class, as the only defined method is basically a static function and there is no
__init__()
method or anything like that. I would recommend to just get rid of the class and only keep the function. We can always refactor later in case new code is added and a class actually makes sense
- accuracy_estimate/accuracy_estimate.py 0 → 100644
1 class AccuracyEstimate: 2 3 # Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. - accuracy_estimate/accuracy_estimate.py 0 → 100644
1 class AccuracyEstimate: 2 3 # Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. 4 # Input: 1. Csv-file with input values 5 # 2. Csv-file with mean and variance obtained from the simulation 6 # Output: scatter plot of initial vs inferred (mean and error bars) counts for all genes 7 8 def accuracy_estimate(input_file, simulation_file): Please add type hints for all your args as well as for the return value, something like:
def my_func(my_arg_1: str, my_arg_2: int = 8) -> str:
^ This function would take one required argument (no default!) of type
str
and one optional argument (defaults to 8 if not provided) of typeint
and return a value of typestr
.
- accuracy_estimate/accuracy_estimate.py 0 → 100644
12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. 13 14 Args: 15 input_file: Csv-file with input values 16 simulation_file: Csv-file with mean and variance obtained from the simulation 17 18 Returns: 19 Scatter plot of initial vs inferred counts for all genes with error bars 20 21 """ 22 23 import pandas as pd 24 import numpy as np 25 import matplotlib.pyplot as plt 26 27 # input_file = "input.csv" - accuracy_estimate/cli.py 0 → 100644
1 if __name__ == '__main__': 2 import accuracy_estimate Again, a module-level docstring should be added, followed by all imports, grouped and sorted. Also, it's good practice to define a
main()
function with all application-level code (i.e., the CLI parsing, calling of youraccuracy_estimate()
method etc, basically everything between lines 5 and 14), then have the call of themain()
function be the only thing you do in the `if name should be defined at the top of the module, and
- accuracy_estimate/cli.py 0 → 100644
1 if __name__ == '__main__': 2 import accuracy_estimate Instead of importing the module, it's usually better (more performant and more explicit) to import only what you really need, so better say:
from accuracy_estimate.AccuracyEstimate import accuracy_estimate
Or, if you remove the class, this becomes a little simpler:
from accuracy_estimate import accuracy_estimate
- accuracy_estimate/input.csv 0 → 100644
- accuracy_estimate/accuracy_estimate.py 0 → 100644
1 class AccuracyEstimate: 2 3 # Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. 4 # Input: 1. Csv-file with input values 5 # 2. Csv-file with mean and variance obtained from the simulation 6 # Output: scatter plot of initial vs inferred (mean and error bars) counts for all genes 7 8 def accuracy_estimate(input_file, simulation_file): 9 10 """Determine the accuracy of the simulated values regarding the input values. 11 12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. 13 14 Args: 15 input_file: Csv-file with input values - accuracy_estimate/cli.py 0 → 100644
1 if __name__ == '__main__': 2 import accuracy_estimate 3 import argparse 4 5 parser = argparse.ArgumentParser(description='Do something') 6 parser.add_argument('input_file', metavar='FILE', type=str, help='Enter the path of the input file') 7 parser.add_argument('simulation_file', metavar='FILE', type=str, help='Enter the path of the simulation file') 8 9 args = parser.parse_args() 10 11 input_file = args.input_file You only need these once, in your call in line 14, so there's little use of assigning these here. Instead you can just call the function like this (if you import the function/method directly):
accuracy_estimate( input_file=args.input_file, simulation_file=args.simulation_file, )
This is easier to read and modify later on.
Also, please note that
input_file
is not a very descriptive file name. All arguments to a function are basically inputs, and whether something is a file can go in the description of the arguments. Better to find some more descriptive names for the CLI and function arguments.
- accuracy_estimate/accuracy_estimate.py 0 → 100644
30 df1 = pd.read_csv(input_file) 31 df2 = pd.read_csv(simulation_file) 32 33 # Plot mean and error bars 34 35 x = df1['Count'] 36 y = df2['Mean'] 37 yerr = np.sqrt(df2['Variance']) 38 39 fig = plt.figure() 40 ax = fig.add_subplot(1, 1, 1) 41 ax.scatter(x, y, vmin=0, vmax=100) 42 ax.errorbar(x, y, yerr=yerr, ecolor="r", fmt="bo", capsize=5) 43 ax.set_title("Scatter plot of initial vs inferred counts for all genes") 44 45 plt.show() The function doesn't seem to return what the docstring says it does. In fact there is no
return
statement here at all, so it will returnNone
. Given that we want to use the script in a workflow, we cannot have interactive parts, like opening a plot, which would require the user to interact with it or close it. So this should be reimplemented such that the plot is flushed to a file and the file path returned instead.
- accuracy_estimate/accuracy_estimate.py 0 → 100644
5 # 2. Csv-file with mean and variance obtained from the simulation 6 # Output: scatter plot of initial vs inferred (mean and error bars) counts for all genes 7 8 def accuracy_estimate(input_file, simulation_file): 9 10 """Determine the accuracy of the simulated values regarding the input values. 11 12 Knowing the relative expression levels of genes in the input and calculated from the simulation data, summarize the agreement. 13 14 Args: 15 input_file: Csv-file with input values 16 simulation_file: Csv-file with mean and variance obtained from the simulation 17 18 Returns: 19 Scatter plot of initial vs inferred counts for all genes with error bars 20 Apart from the minor comments above, the code should be unit-tested and a one-step Nextflow subworkflow written (see e.g., !10 (closed) and !16 (closed)) to show that this actually works as expected.
mentioned in merge request !23 (closed)
mentioned in merge request !21 (closed)