refactor: remove everything labkey-related

32c933a3 · Maciej Bak · 1f67d718 · 32c933a3 · 32c933a3 · 1f67d718
Commit 32c933a3 authored 3 years ago by Maciej Bak
--- a/README.md
+++ b/README.md
@@ -218,56 +218,6 @@ your run.
    bash run.sh
    ```
-### Configuring workflow runs via LabKey tables
-Our lab stores metadata for sequencing samples in a locally deployed
-[LabKey][labkey] instance. This repository provides two scripts that give
-programmatic access to the LabKey data table and convert it to the
-corresponding workflow inputs (`samples.tsv` and `config.yaml`), respectively.
-As such, these scripts largely automate step 3. of the above instructions.
-However, as these scripts were written specifically for the needs of our lab, 
-they are likely not directly usable or, at least, will require considerable 
-modification for other setups (e.g., different LabKey table structure).
-Nevertheless, they can serve as an example for interfacing between LabKey and
-your workflow.
-> **NOTE:** All of the below steps assume that your current working directory
-> is the repository's root directory.
-1. The scripts have additional dependencies that can be installed with:
-    ```bash
-    pip install -r scripts/requirements.txt
-    ```
-2. In order to gain programmatic access to LabKey via its API, a credential
-file is required. Create it with the following command after replacing the
-placeholder values with your real credentials (talk to your LabKey manager if
-you do not have these):
-    ```bash
-    cat << EOF | ( umask 0377; cat >> ${HOME}/.netrc; )
-    machine <remote-instance-of-labkey-server>
-    login <user-email>
-    password <user-password>
-    EOF
-    ```
-3. Generate the workflow configuration with the following command, after
-replacing the placeholders with the appropriate values (check out the
-help screen with option '--help' for further options and information):
-    ```bash
-    python scripts/prepare_inputs.py \
-        --labkey-domain="my.labkey.service.io"
-        --labkey-domain="/my/project/path"
-        --input-to-output-mapping="scripts/prepare_inputs.dict.tsv" \
-        --resources-dir="/path/to/my/genome/resources" \
-        --output-table="config/my_run/samples.tsv" \
-        --config_file="config/my_run/config.yaml" \
-        <table_name>
-    ```
 #### Additional information
 The metadata field names in the LabKey instance and those in the parameters
@@ -328,7 +278,6 @@ Contaminant sequences | contaminant_seqs
 [conda]: <https://docs.conda.io/projects/conda/en/latest/index.html>
 [profiles]: <https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles>
-[labkey]: <https://www.labkey.com/>
 [miniconda-installation]: <https://docs.conda.io/en/latest/miniconda.html>
 [rule-graph]: images/rule_graph.svg
 [snakemake]: <https://snakemake.readthedocs.io/en/stable/>

--- a/install/environment.dev.yml
+++ b/install/environment.dev.yml
@@ -10,5 +10,4 @@ dependencies:
  - pip:
    - pandas==1.0.1
    - biopython==1.76
-    - labkey==1.2.0
--- a/scripts/labkey_api.py
+++ b/scripts/labkey_api.py
-# This script targets the client api version 0.4.0 and later
-#
-#  Check the page: https://github.com/LabKey/labkey-api-python/blob/master/samples/query_examples.py
-#  for example about filtering in queries.
-#  A starting point to investigate further is here:
-#  https://www.labkey.org/download/clientapi_docs/javascript-api/symbols/LABKEY.Query.Filter.html
-import labkey
-import pandas as pd
-import sys
-# for convenience, load QueryFilter explicitly (avoids long lines in filter definitions)
-from labkey.query import QueryFilter
-if __name__ == "__main__":
-  # These are values of variables for which the script works
-  # project_name = "TEST_ABOERSCH"
-  # query_name = "RNA_Seq_data_template"
-  project_name = sys.argv[1]
-  query_name = sys.argv[2]
-  server_context = labkey.utils.create_server_context('labkey.scicore.unibas.ch', '/Zavolan Group/'+project_name, 'labkey', use_ssl=True)
-  schema_name = "lists"
-  results = labkey.query.select_rows(server_context,schema_name,query_name)
-  table_of_data = pd.DataFrame(results["rows"])
-  print(table_of_data)
--- a/scripts/prepare_inputs.dict.tsv
+++ b/scripts/prepare_inputs.dict.tsv
-labkey	snakemake
-Entry_Date	entry_date
-Path_Fastq_Files	fastq_path
-Condition_Name	condition
-Sample_Name	sample_name
-Single_Paired	seqmode
-Mate1_File	fq1
-Mate2_File	fq2
-Mate1_Direction	mate1_direction
-Mate2_Direction	mate2_direction
-Mate1_5p_Adapter	fq1_5p
-Mate1_3p_Adapter	fq1_3p
-Mate2_5p_Adapter	fq2_5p
-Mate2_3p_Adapter	fq2_3p
-Fragment_Length_Mean	mean
-Fragment_Length_SD	sd
-Quality_Control_Flag	quality_control_flag
-Checksum_Raw_FASTQ_Mate1	mate1_checksum
-Checksum_Raw_FASTQ_Mate2	mate2_checksum
-File_Name_Metadata_File	metadata
-Name_Quality_Control_File_Mate1	mate1_quality
-Name_Quality_Control_File_Mate2	mate2_quality
-Organism	organism
-TaxonID	taxon_id
-Strain_Isolate_Breed_Ecotype	strain_name
-Strain_Isolate_Breed_Ecotype_ID	strain_id
-Biomaterial_Provider	biomaterial_provider
-Source_Tissue_Name	source_name
-Tissue_Code	tissue_code
-Additional_Tissue_Description	tissue_description
-Genotype_Short_Name	genotype_name
-Genotype_Description	genotype_description
-Disease_Short_Name	disease_name
-Disease_Description	disease_description
-Treatment_Short_Name	treatment
-Treatment_Description	treatment_description
-Gender	gender
-Age	age
-Developmental_Stage	development_stage
-Passage_Number	passage_number
-Sample_Preparation_Date	sample_prep_date
-Prepared_By	prepared_by
-Documentation	documentation
-Protocol_File	protocol_file
-Sequencing_Date	seq_date
-Sequencing_Instrument	seq_instrument
-Library_preparation_kit	library_kit
-Cycles	cycles
-Molecule	molecule
-Contaminant_Sequences	contaminant_seqs
-BioAnalyzer_File	bioanalyser_file
--- a/scripts/prepare_inputs.py
+++ b/scripts/prepare_inputs.py
--- a/scripts/requirements.txt
+++ b/scripts/requirements.txt
-biopython==1.76
-labkey==1.2.0
-pandas==0.25.3
--- a/tests/test_scripts_prepare_inputs_labkey/expected_output.md5
+++ b/tests/test_scripts_prepare_inputs_labkey/expected_output.md5
-aa583b9bad45eeb520d9d624cca0af78  samples.tsv
-c4cda83b069eb7ccb16547e1a9cdb34a  config.yaml
\ No newline at end of file
--- a/tests/test_scripts_prepare_inputs_labkey/test.sh
+++ b/tests/test_scripts_prepare_inputs_labkey/test.sh
-#!/bin/bash
-# Scripts requires environment variables 'LABKEY_HOST', 'LABKEY_USER' and
-# 'LABKEY_PASS' to be set with the appropriate values
-# Tear down test environment
-cleanup () {
-    rc=$?
-    rm -rf ${HOME}/.netrc
-    rm -rf .snakemake/
-    rm -rf config.yaml
-    rm -rf samples.tsv.labkey
-    rm -rf samples.tsv
-    cd $user_dir
-    echo "Exit status: $rc"
-}
-trap cleanup EXIT
-# Set up test environment
-set -eo pipefail  # ensures that script exits at first command that exits with non-zero status
-set -u  # ensures that script exits when unset variables are used
-set -x  # facilitates debugging by printing out executed commands
-user_dir=$PWD
-script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
-cd $script_dir
-cat << EOF | ( umask 0377; cat >> ${HOME}/.netrc; )
-machine ${LABKEY_HOST}
-login ${LABKEY_USER}
-password ${LABKEY_PASS}
-EOF
-# Run tests
-python "../../scripts/prepare_inputs.py" \
-    --labkey-domain="${LABKEY_HOST}" \
-    --labkey-path="/Zavolan Group/TEST_LABKEY" \
-    --input-to-output-mapping="../../scripts/prepare_inputs.dict.tsv" \
-    --resources-dir="../input_files" \
-    --output-table="samples.tsv" \
-    --config-file="config.yaml" \
-    --multimappers='10' \
-    --logo="../../images/logo.128px.png" \
-    --debug \
-    "RNA_Seq_data_template_raw"
-# Check if dry run completes
-snakemake \
-    --snakefile="../../Snakefile" \
-    --configfile="config.yaml" \
-    --dryrun \
-    --verbose
-#md5sum --check "expected_output.md5"
-# MD5 sums obtained with command:
-# md5sum config.yaml samples.tsv > expected_output.md5
-md5sum config.yaml samples.tsv
--- a/tests/test_scripts_prepare_inputs_table/expected_output.md5
+++ b/tests/test_scripts_prepare_inputs_table/expected_output.md5
-40bd0f0fcecdd0d9bc932f63c2811478  config.yaml
-d8fb1773e3b83b6fab0a0d44c9fa71e6  samples.tsv
\ No newline at end of file
--- a/tests/test_scripts_prepare_inputs_table/input_table.tsv
+++ b/tests/test_scripts_prepare_inputs_table/input_table.tsv
-Mate2_5p_Adapter	Condition_Name	Name_Quality_Control_File_Mate1	Disease_Short_Name	Single_Paired	Gender	Entry_Date	Disease_Description	Strain_Isolate_Breed_Ecotype	Genotype_Description	Mate1_File	Source_Tissue_Name	Developmental_Stage	Mate1_Direction	Quality_Control_Flag	Genotype_Short_Name	Strain_Isolate_Breed_Ecotype_ID	Fragment_Length_Mean	Organism	Contaminant_Sequences	TaxonID	Documentation	Prepared_By	_labkeyurl_Entry_Date	Molecule	Mate2_Direction	Library_preparation_kit	Checksum_Raw_FASTQ_Mate1	Cycles	Fragment_Length_SD	Sample_Name	Passage_Number	Mate1_5p_Adapter	Mate2_3p_Adapter	Path_Fastq_Files	Mate1_3p_Adapter	Treatment_Short_Name	Age	Sequencing_Date	Checksum_Raw_FASTQ_Mate2	Biomaterial_Provider	Treatment_Description	Sample_Preparation_Date	BioAnalyzer_File	Sequencing_Instrument	Additional_Tissue_Description	Protocol_File	Name_Quality_Control_File_Mate2	Tissue_Code	File_Name_Metadata_File	Mate2_File
-	synthetic_10_reads_paired	xxx	xxx	PAIRED	xxx	Fri Dec 20 00:00:00 CET 2019	xxx	xxx	xxx	synthetic.mate_1.fastq.gz	xxx	xxx	SENSE	xxx	xxx	xxx	250.0	Homo sapiens	xxx	9606	xxx	xxx	/labkey/Zavolan%20Group/Test_labkey/list-details.view?listId=9&pk=../input_files/project1	xxx	ANTISENSE	xxx	xxx	xxx	100.0	synthetic_10_reads_paired	xxx		AGATCGGAAGAGCGT	../input_files/project1	AGATCGGAAGAGCACA	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	synthetic.mate_2.fastq.gz
-	synthetic_10_reads_mate_1	xxx	xxx	SINGLE	xxx	Fri Dec 20 00:00:00 CET 2019	xxx	xxx	xxx	synthetic.mate_1.fastq.gz	xxx	xxx	SENSE	xxx	xxx	xxx	250.0	Homo sapiens	xxx	9606	xxx	xxx	/labkey/Zavolan%20Group/Test_labkey/list-details.view?listId=9&pk=../input_files/project2	xxx		xxx	xxx	xxx	100.0	synthetic_10_reads_mate_1	xxx			../input_files/project2	AGATCGGAAGAGCACA	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	xxx	
--- a/tests/test_scripts_prepare_inputs_table/test.sh
+++ b/tests/test_scripts_prepare_inputs_table/test.sh
-#!/bin/bash
-# Tear down test environment
-cleanup () {
-    rc=$?
-    rm -rf .snakemake/
-    rm -rf config.yaml
-    rm -rf samples.tsv
-    rm -rf logs
-    cd $user_dir
-    echo "Exit status: $rc"
-}
-trap cleanup EXIT
-# Set up test environment
-set -eo pipefail  # ensures that script exits at first command that exits with non-zero status
-set -u  # ensures that script exits when unset variables are used
-set -x  # facilitates debugging by printing out executed commands
-user_dir=$PWD
-script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
-cd $script_dir/
-# Run tests
-python "../../scripts/prepare_inputs.py" \
-    --input-to-output-mapping="../../scripts/prepare_inputs.dict.tsv" \
-    --resources-dir="../input_files" \
-    --output-table="samples.tsv" \
-    --config-file="config.yaml" \
-    --multimappers='10' \
-    --logo="../../images/logo.128px.png" \
-    --output-dir="" \
-    --no-process-paths \
-    "input_table.tsv"
-# Check if dry run completes
-snakemake \
-    --snakefile="../../workflow/Snakefile" \
-    --configfile="config.yaml" \
-    --dryrun \
-    --verbose
-md5sum --check "expected_output.md5"
-# MD5 sums obtained with command:
-# md5sum config.yaml samples.tsv > expected_output.md5