Init DRUGSENS

47b36959 · Flavio Lombardo · 47b36959 · 47b36959 · 47b36959 · 47b36959
Commit 47b36959 authored 1 year ago by Flavio Lombardo
--- a/.Rbuildignore
+++ b/.Rbuildignore
+^renv$
+^renv\.lock$
+^.*\.Rproj$
+^\.Rproj\.user$
+^\.github$
+^LICENSE\.md$
--- a/.gitignore
+++ b/.gitignore
+.Rproj.user
+.Rhistory
+.RData
+.Ruserdata
--- a/DESCRIPTION
+++ b/DESCRIPTION
+Package: DRUGSENS
+Title: Get Data From QuPath For Data Analysis
+Version: 0.1.0
+BugReports: https://git.scicore.unibas.ch/ovca-research/DRUGSENS/issues
+Authors@R: c(
+    person("Flavio", "Lombardo", "C.", "flavio.lombardo@unibas.ch", role = c("aut", "cre", "cph")),
+    person("Ovarian Cancer Research", role = c("cph")),   
+    person("University of Basel and University Hospital Basel", role = c("cph"))   
+    )
+URL: https://git.scicore.unibas.ch/ovca-research/DRUGSENS/
+Maintainer: Flavio C. Lombardo <flavio.lombardo@unibas.cht>
+Description: This package simplifies the analysis of QuPath data is complementary to the STAR Protocol: 
+    "DRUG-SENS: Quantification of Drug Sensitivity in 3D Patient-derived Ovarian Cancer Models"
+License: MIT + file LICENSE
+Imports:
+    dplyr,
+    tidyr,
+    stringr,
+    knitr,
+    testthat,
+    ggplot2,
+    ggpubr,
+    roxygen2
+Depends: 
+    R (>= 4.2)
+VignetteBuilder: 
+    knitr
+Encoding: UTF-8
+LazyData: true
+Roxygen: list(markdown = TRUE)
+RoxygenNote: 7.3.1
+Suggests: 
+    testthat (>= 3.0.0)
+Config/testthat/edition: 3
--- a/DRUGSENS.Rproj
+++ b/DRUGSENS.Rproj
+Version: 1.0
+RestoreWorkspace: Default
+SaveWorkspace: Default
+AlwaysSaveHistory: Default
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+RnwWeave: Sweave
+LaTeX: pdfLaTeX
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
+BuildType: Package
+PackageUseDevtools: Yes
+PackageInstallArgs: --no-multiarch --with-keep.source
--- a/LICENSE
+++ b/LICENSE
+YEAR: 2024
+COPYRIGHT HOLDER: Flavio C. Lombardo, Ricardo Jorge Bouça-Nova Coelho, Ovarian cancer research, University of Basel and University Hospital Basel
--- a/LICENSE.md
+++ b/LICENSE.md
+# MIT License
+Copyright (c) 2024 Flavio C. Lombardo, Ricardo Jorge Bouça-Nova Coelho, Ovarian cancer research, University of Basel and University Hospital Basel
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/NAMESPACE
+++ b/NAMESPACE
+# Generated by roxygen2: do not edit by hand
+export(change_data_format_to_longer)
+export(data_binding)
+export(generate_qupath_script)
+export(get_QC_plots)
+export(make_count_dataframe)
+export(make_run_config)
+import(ggplot2)
+import(ggpubr)
+import(knitr)
+import(testthat)
+importFrom(dplyr,filter)
+importFrom(dplyr,select)
+importFrom(stringr,str_extract)
+importFrom(tidyr,pivot_longer)
--- a/NEWS.md
+++ b/NEWS.md
+# DRUGSENS Changelog
+All notable changes to the DRUGSENS project will be documented in this file.
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+## [Unreleased]
+### Added
+- Initial creation of the DRUGSENS tool for analyzing drug sensitivity in cancer cell lines and cancer patients.
+- Addition of `make_count_dataframe` function for counting cell marker expressions.
+- Implementation of `change_data_format_to_longer` function to reformat data into a longer format for easier analysis.
+### Changed
+- Improved algorithm for more accurate cell marker detection.
+### Deprecated
+- None
+### Removed
+- None
+### Fixed
+- Bug fixes in data preprocessing to handle edge cases in input data.
+### Security
+- Enhanced data encryption for patient data storage and processing.
+## [0.1.0] - 2024-01-01
+### Added
+- Launch of the first version of DRUGSENS, providing functionalities for drug sensitivity analysis in translational research.
+- Support for `.csv` files.
+- Comprehensive metadata extraction from microscopy images, including patient ID, tissue type, and treatment details.
+- Testing
+### Changed
+- Updated documentation to include detailed descriptions of metadata fields.
+### Fixed
+- Resolved issues with metadata extraction accuracy.
--- a/R/change_data_format_to_longer.R
+++ b/R/change_data_format_to_longer.R
+#' Reformat the counts data in longer format
+#' @description
+#' This function gets the count data data.frame, that has a wider format and it returns a longer-formatted data.frame
+#' @importFrom tidyr pivot_longer
+#' @importFrom dplyr select
+#' @return A `dataframe`/`tibble`.
+#' @param .data The markers count dataframe that is coming from the processing of the microscopy data
+#' @param pattern_column_markers The markers' pattern name to obtain the column with ratios of the markers (it defaults to "_ratio_of_total_cells")
+#' @param additional_columns columns that can be additionally added to the longer formatted data.frame, "Defaults as c("Treatment", "PID", "Image_number", "Tissue", "Concentration", "DOC")"
+#' @param unique_name_row_identifier String that indicates the unique identifier for each image, defaults as "filter_image"
+#' @export
+#' @example
+#' change_data_format_to_longer(.data, pattern_column_markers = "_ratio_of_total_cells", additional_columns = TRUE)
+# adding the image number so to identify the distribution
+# pivot_longer
+change_data_format_to_longer <- function(.data,
+                                         pattern_column_markers = "_ratio_of_total_cells",
+                                         unique_name_row_identifier = "filter_image",
+                                         additional_columns = TRUE) {
+  # names of the columns
+  col_names_of_markers <- colnames(.data)[which(grepl(x = colnames(.data), pattern = pattern_column_markers))]
+  if (additional_columns){
+  additional_columns_to_use <- c("Treatment", "PID", "Image_number", "Tissue", "Concentration", "DOC", "Treatment_complete", "ReplicaOrNot")
+  } else {
+    additional_columns_to_use <- NULL
+  }
+  if (length(col_names_of_markers) < 1) stop(paste0("Failed to find pattern: ", pattern_column_markers, " in the columnames"))
+  if (!all(additional_columns_to_use %in% colnames(.data))) stop(paste0('One or more of the following columnames:
+                                                     c(Treatment", "PID", "Image_number", "Tissue", "Concentration", "DOC") could not be found.
+                                                     Please check the names of your data.frame and/or provide your selection'),
+                                                     "Those are the colnames found in the input data: ",
+                                                     colnames(.data))
+  if (!"Image_number" %in% additional_columns_to_use) stop("Image_number has to be in the dataframe.")
+  if (!"Treatment_complete" %in% additional_columns_to_use) stop("Treatment_complete has to be in the dataframe.")
+  longer_format <- .data |>
+    select(unique_name_row_identifier, col_names_of_markers, additional_columns_to_use) |>
+    pivot_longer(cols = c(col_names_of_markers),
+      names_to = "marker_positivity",
+      values_to = "marker_positivity_ratio"
+    )
+return(
+  longer_format
+)
+}
--- a/R/data_binding.R
+++ b/R/data_binding.R
+# Get a list of all the files that are in a user-specified folder and get a list of full paths
+list_all_files <- function(define_path, extension, recursive_search) {
+  list_listed_files <- list.files(
+    path = define_path,
+    pattern = extension,
+    ignore.case = TRUE,
+    recursive = recursive_search,
+    full.names = TRUE
+  ) |>
+    Filter(
+      x = _,
+      f = function(z) grepl(x = z, pattern = extension)
+    )
+  return(
+    list_listed_files
+  )
+}
+# Helper function to read and process a single file
+process_file <- function(file_path,
+                         # relabeling_map,
+                         extension) {
+  message(paste0("Reading file: ", file_path))
+  # Read the CSV file into a data frame
+  data <- read.csv(file_path, stringsAsFactors = FALSE)
+  extension <- sub(x = extension, pattern = "\\.", "")
+  # add the image name
+  data$Image_number <- stringr::str_extract(
+    string = data$Image,
+    pattern = "series.\\d*"
+  )
+  # extract information from the data
+  data$PID <- str_extract(data$Image, "[A-Z0-9]+(?=_)")
+  data$Tissue <-  sapply(strsplit(data$Image, "_"), `[`, 2, simplify=FALSE) |> unlist()
+  data$Date1 <- str_extract(data$Image, "\\d{4}.\\d{2}.\\d{2}")
+  data$DOC <- str_extract(data$Image, "(?<=DOC)\\d{4}\\.\\d{2}\\.\\d{2}")
+  data$ReplicaOrNot <- ifelse(stringr::str_detect(data$Image, pattern = "Replica|Rep|rep|replica|REPLICA|REP"), "Replica", NA_character_)
+  data$Treatment <- str_extract(string = data$Image, pattern = "(?<=\\d{4}\\.\\d{2}\\.\\d{2}_)[A-Za-z0-9]+(?=_.+)")
+  data$Concentration <-  str_extract(data$Image, "\\d+(?=_[un][Mm])")
+  data$ConcentrationUnits <- str_extract(data$Image, "[un][Mm](?=_)")
+  # get the name, relabelling of the markers WIP
+  for(nam in names(list_of_relabeling)) {
+    data$Name <- gsub(
+      x = as.character(data$Name),
+      pattern = nam,
+      replacement = list_of_relabeling[[nam]],
+      ignore.case = FALSE
+    )
+  }
+  ## create unique_identifier
+  data$filter_image <- paste(
+    data$PID,
+    data$Date1,
+    data$DOC,
+    data$Tissue,
+    data$Image_number,
+    data$Treatment,
+    data$Concentration,
+    data$ConcentrationUnits,
+    data$ReplicaOrNot,
+    sep = "_"
+  )
+  return(data)
+}
+#' Merge all the dataframes coming out from the QuPath
+#' @description
+#' This function try to guess the string patterns that are in the dataset and then fill the dataframe
+#' with that information. Finally the data is combined and combined them into one file
+#' @import knitr
+#' @import testthat
+#' @importFrom stringr str_extract
+#' @return A `dataframe`/`tibble`.
+#' @param path_to_the_projects_folder The path where the files coming out of QuPath are located
+#' @param files_extension_to_look_for The extension of the file outputted from QuPath
+#' @param recursive_search Boolean, it defined the behavior of the file search, if recursive or not, (default is FALSE)
+#'
+#' @export
+#' @example
+#' dataframe_output <- data_binding(path_to_the_projects_folder = "<USER_DEFINED_PATH>"
+#'                                  files_extension_to_look_for = "csv")
+#'#This will return the dataframe of all the data in the folder
+# Main function to bind data from multiple files
+data_binding <- function(path_to_the_projects_folder,
+                         files_extension_to_look_for,
+                         recursive_search = FALSE
+                         ) {
+  # run configuration file
+  make_run_config()
+  # Validate input parameters
+  if (!dir.exists(path_to_the_projects_folder)) {
+    stop("The specified path does not exist.")
+  }
+  if (is.null(files_extension_to_look_for)) {
+    stop("File extension to look for has to be provided.")
+  }
+  if (!is.list(list_of_relabeling) && !is.null(list_of_relabeling)) {
+    stop("The relabeling information should be provided as a list.")
+  }
+  # List all files with the specified extension in the given folder
+  list_csv_files <- list_all_files(path_to_the_projects_folder,
+                                   files_extension_to_look_for,
+                                   recursive_search)
+  # Process each file and combine the results
+  df_list <- lapply(list_csv_files,
+                    process_file,
+                    # relabeling_map = use_custom_column_names,
+                    files_extension_to_look_for)
+  combined_df <- do.call(rbind, df_list)
+  # # remove namings
+  # rm(list_csv_files, col_names_qupath_output_files)
+  # Return the combined dataframe
+  return(combined_df)
+}
--- a/R/generate_qu_path_script.R
+++ b/R/generate_qu_path_script.R
+#' Generate the groovy script used for the analysis
+#' @description
+#' Generate a useful script to consistently save the output data from QuPath in .csv format following the naming conventions
+#' followed during the package development.
+#'
+#' @return `script_for_qupath.txt in local dir`.
+#'
+#' @export
+#' @example
+#' generate_qupath_script()
+#' script_for_qupath.txt # in the wd
+generate_qupath_script <- function() {
+  write(
+    x = paste0('
+//This code script was tested with QuPath 4
+//This code script was tested with QuPath 4
+import qupath.lib.gui.tools.MeasurementExporter
+import qupath.lib.objects.PathCellObject
+import qupath.lib.objects.PathDetectionObject
+// Get the list of all images in the current project
+def project = getProject()
+def imagesToExport = project.getImageList()
+// Separate each measurement value in the output file with a tab ("\t")
+def separator = ","
+// Choose the columns that will be included in the export
+// Note: if columnsToInclude is empty, all columns will be included
+def columnsToInclude = new String[]{"Image", "Name", "Class","Centroid X um","Centroid Y um","Nucleus: Area", "Nucleus: DAPI mean","Nucleus: E-Cadherin mean", "Nucleus: Cleaved caspase 3 mean", "Cell: Area","Cell: E-Cadherin mean", "Cell: Cleaved caspase 3 mean","Cytoplasm: E-Cadherin mean","Cytoplasm: Cleaved caspase 3 mean","Nucleus/Cell area ratio"}
+// Choose the type of objects that the export will process
+// Other possibilities include:
+//    1. PathAnnotationObject
+//    2. PathDetectionObject
+//    3. PathRootObject
+// Note: import statements should then be modified accordingly
+def exportType = PathCellObject.class
+// Choose your *full* output path
+def outputPath = "<USER_DEFINED_PATH>/<PID>_<TISSUE>_',Sys.Date(),'_<SAMPLE_DOC>_<TREATMENT_INITIALS>_<CONCENTRATION>_<CONCENTRATION_UNITS>_<REPLICA_OR_NOT>_<TUMOR_MARKER>_<APOPTOTIC_MARKER>.csv"
+def outputFile = new File(outputPath)
+// example <USER_DEFINED_PATH>/B39_Ascites_2023.11.10_DOC2023.10.05_NIRAPARIB_1000_nM_Rep_EpCAM_Ecad_cCasp3_ QuPath will add (series 1) at the end of this line
+// example <USER_DEFINED_PATH>/B39_Ascites_2023.11.10_DOC2023.10.05_NIRAPARIB_1000_nM_Rep_EpCAM_Ecad_cCasp3_(series 01).tif
+// the part EpCAM_Ecad_cCasp3_ is optional but recommended
+// Create the measurementExporter and start the export
+def exporter  = new MeasurementExporter()
+        .imageList(imagesToExport)            // Images from which measurements will be exported
+        .separator(separator)                 // Character that separates values
+        .includeOnlyColumns(columnsToInclude) // Columns are case-sensitive
+        .exportType(exportType)               // Type of objects to export
+        .exportMeasurements(outputFile)       // Start the export process
+print "Done!"
+      '),
+    file = paste0(path.expand(getwd()), "/script_for_qupath.txt")
+  )
+  message("You can now take the script and personalize it to your needs")
+  message(paste0(Sys.time(), " The script file was generated here: ", getwd(), "/"))
+  message(paste0(Sys.time(), " Please make sure to follow the name convention here proposed, or it might fail to get all the information"))
+}
--- a/R/get_QC_plots.R
+++ b/R/get_QC_plots.R
+#' Plot some QC plots to define that everything ran correctly
+#' @description
+#' Plot data to visualize immediate trends
+#' @param .data The preprocessed data (after running make_count_dataframe() and change_data_format_to_longer()) merged data.frame that should be visualized
+#' @param patient_column_name The PID's column name in the merged data.frame (defaults to "PID")
+#' @param colors  A list of colors to supply to personalize the plot, as default 4 colors c("dark green", "red", "orange", "pink")
+#' @param save_plots  A Boolean value indicating if the plots should be saved or not, TRUE for saving in the current working directory, FALSE to not. Default is FALSE
+#' @param folder_name A string indicating the name of the folder where to save the plots in case that save_plots = TRUE
+#' @param isolate_a_specific_patient A string indicating the patient name to isolate for single plot case (default is NULL)
+#' @param x_plot_var A string indicating the treatment's full name for the QC plots (default is "Treatment_complete")
+#'
+#' @import ggplot2
+#' @import ggpubr
+#' @importFrom dplyr filter
+#' @return A `dataframe`/`tibble`.
+#' @example
+#' get_QC_plots(longer_format_dataframe, patient_column_name = "PID", save_plots = TRUE, folder_name = "figures")
+#' @export
+get_QC_plots <- function(.data,
+                         patient_column_name = "PID",
+                         colors = c("darkgreen", "red", "orange", "pink"),
+                         save_plots = FALSE,
+                         folder_name = "figures",
+                         x_plot_var = "Treatment_complete",
+                         isolate_a_specific_patient = NULL) {
+  if (!is.null(isolate_a_specific_patient)) .data <- .data[.data[[patient_column_name]] == isolate_a_specific_patient, ]
+  if (nrow(.data) < 1) stop("The data cannot be empty")
+  # run for every unique PID the QC plot
+  for (i in unique(.data[patient_column_name])) {
+    message(paste0("Running the QC plot function for PID: ", i))
+    QC_plot <- .data |>
+      dplyr::filter(.data[[patient_column_name]] == i) |>
+      ggplot(aes(x = !!as.name(x_plot_var), y = marker_positivity_ratio, col = marker_positivity)) +
+      geom_boxplot(
+        position = position_dodge(width = 1.0),
+      ) +
+      facet_wrap(~marker_positivity) +
+      geom_jitter(width = 0.15) +
+      theme_light() +
+      labs(title = paste0("Cell marker ratios for PID: ", i), color = "Cell marker") +
+      ylab("Percentage of expression marker (marker-positive-cells/total_cell_count)") +
+      xlab("Drugs") +
+      theme(axis.text.x = element_text(angle = 45, hjust = 1.0)) +
+      scale_color_manual(values = colors) +
+      stat_summary(
+        fun = "median", geom = "pointrange",
+        mapping = aes(xend = after_stat(x) - 0.25, yend = after_stat(y)),
+        size = 1.5, alpha = 1.0,
+        position = position_dodge(width = 1)
+      ) +
+      stat_summary(
+        geom = "line", fun = "median", position = position_dodge(width = 1),
+        size = 1, alpha = 0.3, aes(group = marker_positivity)
+      ) +
+      theme(
+        axis.title.x = element_blank(),
+        plot.title = element_text(hjust = 0.5),
+        axis.ticks.x = element_blank(),
+        panel.grid = element_blank(),
+        strip.background = element_rect(
+          colour = "black",
+          fill = "grey1"
+        )
+      )
+    if (save_plots) {
+      if (!dir.exists(paths = paste0(getwd(), "/", folder_name, "/"))) dir.create(path = paste0(getwd(), "/", folder_name, "/"), showWarnings = F, recursive = T)
+      ggsave(QC_plot,
+        filename = paste0(folder_name, "/", "patients_QC_box_plots_", i, "_", "median", Sys.Date(), ".pdf"),
+        device = "pdf",
+        height = 12,
+        width = 12
+      )
+    }
+  }
+  message(paste0("If save_plots = TRUE, the plots will be saved here:", paste0(folder_name, "/", "patients_QC_box_plots_", "median", Sys.Date(), ".pdf")))
+}
--- a/R/make_count_dataframe.R
+++ b/R/make_count_dataframe.R
+#' Count the main marker expression
+#' @description
+#' This function counts every single marker present in the "Name" column of the data.frame and return a dataframe of the counts per marker
+#' @importFrom tidyr pivot_longer
+#' @return A `dataframe`/`tibble`.
+#' @param .data The dataframe that is coming from the processing of the microscopy data
+#' @param unique_name_row_identifier The name of the column of the .data where the unique name can be used to counts (it defaults to "filter_image")
+#' @param name_of_the_markers_column The name of the column of the .data where the marker names are expressed (ie E-Caderin, DAPI), "Defaults as Name"
+#' @export
+#' @example
+#' make_count_dataframe(data, name_of_the_markers_column = "Name", unique_name_row_identifier = "filter_image")
+# adding the image number so to identify the distribution
+make_count_dataframe <- function(.data, unique_name_row_identifier = "filter_image",
+                                 name_of_the_markers_column = "Name"
+                                ) {
+  counts_total <- as.data.frame.matrix(
+    table(.data[[unique_name_row_identifier]], .data[[name_of_the_markers_column]])
+  )
+  # get a vector of all the markers in the dataset
+  markers_names <- .data[[name_of_the_markers_column]] |> unique()
+  # add sum of the markers
+  counts_total$sum_cells <- apply(MARGIN = 1, X = counts_total[, markers_names], FUN = sum)
+  # # calculate the ratios
+  # lapply(markers_names, \(marker) {
+  #   counts_total[[paste0(marker, "_ratio_of_total_cells2")]] <<- round(counts_total[[marker]]/counts_total[["sum_cells"]], 2)
+  # })
+  # Calculate the ratios
+  counts_total[paste0(markers_names, "_ratio_of_total_cells")] <-
+    round(counts_total[, markers_names] / counts_total[["sum_cells"]], 2)
+  # names of the columns
+  # col_names_of_markers <- colnames(counts_total)[which(grepl(x = colnames(counts_total), pattern = "_ratio_of_total_cells"))]
+  counts_total[[unique_name_row_identifier]] <- row.names(counts_total)
+  # get variables back
+  counts_total$PID <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 1)
+  counts_total$DOC <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 2)
+  counts_total$Date <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 3)
+  counts_total$Tissue <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 4)
+  counts_total$Image_number <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 5)
+  counts_total$Treatment <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 6)
+  counts_total$Concentration <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 7)
+  counts_total$ConcentrationUnits <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 8)
+  counts_total$ReplicaOrNot <- sapply(strsplit(counts_total[[unique_name_row_identifier]], "_"), '[', 9)
+  # add drug plus concentration plus units
+  for (i in unique(tolower(counts_total$Treatment))) {
+    rows <- tolower(counts_total$Treatment) == i
+    # Check if the current treatment is not in the specified list
+    if (!i %in% c("dmso", "control", "ctrl")) {
+      counts_total$Treatment_complete[rows] <- paste(counts_total$Treatment[rows], counts_total$Concentration[rows], counts_total$ConcentrationUnits[rows], sep = ".")
+    } else {
+      counts_total$Treatment_complete[rows] <- counts_total$Treatment[rows]
+    }
+  }
+  # Return the data
+  return(
+    counts_total
+  )
+}
--- a/R/make_run_config.R
+++ b/R/make_run_config.R
+#' Generates and use a config txt file
+#' @description
+#' When this function run the first time, it will generated a config.txt file in the user working directory.
+#' It will import the data config file into the use environment. This data will be used to change the column names
+#' of the imported dataset and change the name of the markers that is often incorrectly exported.
+#' @export
+#' @return A `dataframe`/`tibble`.
+#' @example
+make_run_config <- function() {
+  if (file.exists("config_DRUGSENS.txt")) {
+    tryCatch(
+      expr = {
+        source("config_DRUGSENS.txt", local = FALSE)
+      },
+      error = function(error) {
+        message("DRUGSENS could not load the 'config.txt' file.
+                Please, generate a valid config file with the substitution names form the dataframe
+                and the name of the columns to use for your project.
+                Once the 'config.txt' is available re-run run_config to veryfy that the data was correctly read")
+      }
+    )
+  } else {
+    write(
+      x =
+        (
+        '
+        # List of markers to relabel
+        list_of_relabeling =
+        list(
+            "PathCellObject" = "DAPI",
+            "cCasp3" = "cCASP3",
+            "E-Cadherin: cCASP3" = "E-Cadherin and cCASP3",
+            "EpCAM_E-Cadherin" = "E-Cadherin",
+            "EpCAM_E-Cadherin and cCASP3" = "E-Cadherin and cCASP3"
+          )'
+        ),
+      file = paste0(path.expand(getwd()), "/config_DRUGSENS.txt")
+    )
+  }
+}
--- a/README.md
+++ b/README.md
+![](https://img.shields.io/badge/R-%3E%3D%204.0.0-blue)
+# Overview
+Running DRUGSENS for QuPAth script with your project Here we provide the code to run a QuPath for a reproducible example. For more detailed examples please read [QuPath Documentation](https://qupath.readthedocs.io/en/stable/). This script should be placed into scripts within QuPath. We tested this code to a previous version of QuPath.
+# Installation
+``` r
+devtools::install_gitlab("https://git.scicore.unibas.ch/ovca-research/drugsens")
+# OR
+devtools::install_github("https://github.com/flalom/drugsens") # this is the mirroring repo of the gitlab
+```
+`devtools` is required to install DRUGSENS. If `devtools` is not installed yet you can install it with:
+``` r
+# Install devtools from CRAN
+install.packages("devtools")
+# Or the development version from GitHub:
+# install.packages("pak")
+pak::pak("r-lib/devtools")
+```
+You can have a look at it [devtools]("https://github.com/r-lib/devtools")
+# Usage
+## Example
+We recommend making a new project when working with `DRUGSENS`, to have clear and defined path. This will make the data analysis much easier and reproducible. 
+You can also set you working directory with `setwd()`.
+### QuPath script used
+To make this code locally available:
+``` r
+library("DRUGSENS")
+generate_qupath_script()
+```
+This function will generate a `script_for_qupath.txt` file with the code that one can copy/paste into the QuPath's script manager. All the sections that contain \<\> should be replaced with the user experimental information. The `columnsToInclude` in the script should also be user defined, depending on the markers used.
+It is very important that the file naming structure QuPath's output is maintained for `DRUGSENS` to work correctly.
+``` groovy
+//This groovy snipped script was tested with QuPath 4
+import qupath.lib.gui.tools.MeasurementExporter
+import qupath.lib.objects.PathCellObject
+import qupath.lib.objects.PathDetectionObject
+// Get the list of all images in the current project
+def project = getProject()
+def imagesToExport = project.getImageList()
+// Separate each measurement value in the output file with a tab ("\t")
+def separator = ","
+// Choose the columns that will be included in the export
+// Note: if columnsToInclude is empty, all columns will be included
+def columnsToInclude = new String[]{"Image", "Name", "Class","Centroid X µm","Centroid Y µm","Nucleus: Area", "Nucleus: DAPI mean","Nucleus: E-Cadherin mean", "Nucleus: Cleaved caspase 3 mean", "Cell: Area","Cell: E-Cadherin mean", "Cell: Cleaved caspase 3 mean","Cytoplasm: E-Cadherin mean","Cytoplasm: Cleaved caspase 3 mean","Nucleus/Cell area ratio"}
+// Choose the type of objects that the export will process
+// Other possibilities include:
+//    1. PathAnnotationObject
+//    2. PathDetectionObject
+//    3. PathRootObject
+// Note: import statements should then be modified accordingly
+def exportType = PathCellObject.class
+// Choose your *full* output path
+def outputPath = "<USER_DEFINED_PATH>/<PID>_<TISSUE>_',Sys.Date(),'_<SAMPLE_DOC>_<TREATMENT_INITIALS>_<CONCENTRATION>_<CONCENTRATION_UNITS>_<REPLICA_OR_NOT>_<TUMOR_MARKER>_<APOPTOTIC_MARKER>.csv"
+def outputFile = new File(outputPath)
+// example <USER_DEFINED_PATH>/B39_Ascites_2023.11.10_DOC2023.10.05_NIRAPARIB_1000_nM_Rep_EpCAM_Ecad_cCasp3_ QuPath will add (series 1) at the end of this line
+// example <USER_DEFINED_PATH>/B39_Ascites_2023.11.10_DOC2023.10.05_NIRAPARIB_1000_nM_Rep_EpCAM_Ecad_cCasp3_(series 01).tif
+// Create the measurementExporter and start the export
+def exporter  = new MeasurementExporter()
+        .imageList(imagesToExport)            // Images from which measurements will be exported
+        .separator(separator)                 // Character that separates values
+        .includeOnlyColumns(columnsToInclude) // Columns are case-sensitive
+        .exportType(exportType)               // Type of objects to export
+        .exportMeasurements(outputFile)       // Start the export process
+print "Done!"
+```
+### Generate configuration file
+This command will generate a `config_DRUGSENS.txt` that should be edited to include the names of the cell markers that have been used by the experimenter.
+``` r
+make_run_config()
+```
+Once the file `config_DRUGSENS.txt` has been modified; you can feed it back to `R`; by running the command again.
+``` r
+make_run_config()
+```
+Now the `list_of_relabeling` should be available in the R environment and it can be used by `DRUGSENS` to work. `list_of_relabeling` is a named list that is required for relabeling the markers name, that is often not user friendly. 
+In case the markers naming doesn't need corrections/relabeling you can leave the `list_of_relabeling` unchanged.
+> 📝**NOTE** It is recommended having no spaces and using camelCase style for the list of cell markers.
+>
+> - Start the name with a lowercase letter.
+> - Do not include spaces or underscores between words.
+> - Capitalize the first letter of each subsequent word.
+### Explore example datasets
+We present here a few mock datasets, as an example. Those can be explored from the folder
+``` r
+system.file("extdata/to_merge/", package = "DRUGSENS")
+```
+### Bind QuPath files
+The example data can be bound together with this command:
+``` r
+bind_data <- data_binding(path_to_the_projects_folder = system.file("extdata/to_merge/", package = "DRUGSENS"), files_extension_to_look_for = "csv")
+```
+You will be now able to `View(bind_data)`. You should see all the images from the QuPath in one dataframe. This dataframe will have all the metadata parsed from the `Image` column (this is the first column defined in the in `columnsToInclude` within the `script_for_qupath.txt`).
+### Counting the markers for every image
+This function will take the previous step's generated dataframe and it will counts image by image for every sample the number of marker occurrences. This function will keep the metadata
+``` r
+counts_dataframe <- make_count_dataframe(bind_data)
+```
+### Making plotting-ready data
+This function will change the wider format into longer format keeping all the metadata
+``` r
+plotting_ready_dataframe <- change_data_format_to_longer(counts_dataframe)
+```
+### Make a plot
+Visualizing the results of the previous steps is essential to asses your experiment.
+``` r
+get_QC_plots(plotting_ready_dataframe, save_plots = TRUE, isolate_a_specific_patient = "B39")
+```
+<img src="assets/QC_plot.png" alt="QC Plot example" title="QC Plot example" width="500" height="500"/>
+<br>
+## Run with user's data
+Let's run `DRUGSENS` with your data. `DRUGSENS` is not very strict about the capitalization of the file name but is very strict on the position of the parameters. This to avoid potential parsing problems. Here how the labeled data should look like in your QuPath generated file. Here below is shown a the first row from the file `A8759_drug1..conc2.csv` contained as example in `system.file("extdata/to_merge/", package = "DRUGSENS")`
+```         
+A8759_p.wash_2020.11.10_DOC2001.10.05_compoundX34542_10000_uM_EpCAM_Ecad_cCasp3_(series 01).tif
+```
+That follows the structure suggested in the QuPath script
+```         
+"<USER_DEFINED_PATH>/<PID>_<TISSUE>_',Sys.Date(),'_<SAMPLE_DOC>_<TREATMENT_INITIALS>_<CONCENTRATION>_<CONCENTRATION_UNITS>_<REPLICA_OR_NOT>_<TUMOR_MARKER>_<APOPTOTIC_MARKER>.csv"
+```
+> ⚠️ **WARNING**: It is highly recommended to follow the recommended naming structure to obtain the correct output
+### Data Binding and Processing
+These lines sets stage for `DRUGSENS` to find the directory path where the microscopy image data are located. `defined_path` is a predefined variable that should contain the base path. This makes it easier to access and manage the files during processing. It is convenient also to define the `desired_file_extensions_of_the_files`, usually `csv` is a good start.
+``` r
+defined_path <- "<USER_DEFINED_PATH>"
+desired_file_extensions <- "csv"
+```
+You can then
+``` r
+bind_data <- data_binding(path_to_the_projects_folder = defined_path, 
+files_extension_to_look_for = desired_file_extensions, recursive_search = FALSE)
+```
+> 📝**NOTE**It is recommended to run `data_binding()` with `recursive_search = FALSE` in the case that the target folder has subfolders that belong to other projects that use other cell markers. 
+Each file is read, and additional metadata is extracted. This will return a dataframe of all the csv files within the folder merged with some additional parsing, the metadata is parsed from the file name will be retrieved and appended to the data. Metadata such as:
+- **PID** = A unique identifier assigned to each sample. This ID helps in distinguishing and tracking individual samples' data throughout the experiment.
+- **Date1** = The date on which the experiment or analysis was conducted. This field records when the data was generated or processed.
+- **DOC** = The date when the biological sample was collected.
+- **Tissue** = Indicates the type of tissue from which the sample was derived. This could be a specific organ or cell type
+- **Image_number** = Represents the order or sequence number of the image in a stack of images
+- **Treatment** = The name or type of drug treatment applied to the sample
+- **Concentration** = The amount of the drug treatment applied (concentration), quantitatively described.
+- **ConcentrationUnits** = The units in which the drug concentration is measured, such as micromolar (uM) or nanomolar (nM)
+- **ReplicaOrNot** = Indicates whether the sample is a replica or repeat of a previous experiment
+- **Name** = The standardized name of the cell markers as defined in the `config_DRUGSENS.txt` file. This ensures consistency and accuracy in identifying and referring to specific cell markers. 
+### Cell markers counting
+`make_count_dataframe()`, is designed for processing microscopy data stored in a dataframe. It counts occurrences of different markers present in the dataset and computes additional metadata based on unique identifiers within each row.
+``` r
+cell_markers_counts_data <- make_count_dataframe(bind_data)
+```
+- `.data`: The input dataframe containing microscopy data.
+- `unique_name_row_identifier`: The name of the column in .data that contains unique identifiers for each row (default is "filter_image").
+- `name_of_the_markers_column`: The name of the column in .data that contains the names of the markers (default is "Name").
+> 📝**NOTE** `make_count_dataframe()` accepts directly the `bind_data` generated in the previous step, unless the fiels were modified, in that case the paramenters `unique_name_row_identifier` and `name_of_the_markers_column` should be passed to the function.
+The data output will be a dataframe, with all the metadata coming from the previous preprocessing. At this point, you can you the data already, but you can additionally change the format from wider to longer. This is useful especially for plotting and more fine analysis.
+### Prepare the data for plotting
+`change_data_format_to_longer`, transforms count data from a wide format to a longer format, making it more suitable for certain types of analysis or visualization.
+- `.data`: The input dataframe containing count data in a wide format, typically generated from microscopy data processing.
+- `pattern_column_markers`: A pattern used to identify columns related to marker ratios (defaults to "_ratio_of_total_cells").
+- `unique_name_row_identifier`: The name of the column in .data that contains unique identifiers for each image (defaults to "filter_image").
+- `additional_columns`: A logical value indicating whether to include additional metadata columns in the longer format dataframe. It defaults to TRUE.
+``` r
+plotting_format <- change_data_format_to_longer(cell_markers_counts_data)
+```
+> 📝**NOTE** `change_data_format_to_longer()` accepts directly the `cell_markers_counts_data` generated in the previous step, unless the fiels were modified, in that case the paramenters `pattern_column_markers` and `unique_name_row_identifier` and `additional_columns` should be passed to the function.
+This will return a dataframe that can be easily used for plotting and additional analyses.
+### QC plotting
+get_QC_plots, is designed for generating Quality Control (QC) plots from preprocessed microscopy data. It visualizes cell marker ratios across different treatments for each patient or a specific patient, aiding in the immediate assessment of data quality and trends.
+Input Parameters:
+``` r
+get_QC_plots(plotting_format, isolate_a_specific_patient = "A8759", save_plots = T)
+```
+More parameters can be specified to personalize the plot(s).
+- `.data`: The preprocessed and merged dataframe, expected to be in a long format, typically obtained after processing through make_count_dataframe() and change_data_format_to_longer().
+- `patient_column_name`: Specifies the column in .data that contains patient identifiers (defaults to "PID").
+- `colors`: A vector of colors for the plots. Defaults to c("darkgreen", "red", "orange", "pink").
+- `save_plots`: A Boolean flag indicating whether to save the generated plots. If TRUE, plots are saved in the specified directory.
+- `folder_name`: The name of the folder where plots will be saved if save_plots is TRUE. Defaults to "figures".
+- `isolate_a_specific_patient`: If specified, QC plots will be generated for this patient only. Defaults to NULL, meaning plots will be generated for all patients.
+- `x_plot_var`: The variable to be used on the x-axis, typically indicating different treatments. Defaults to "Treatment_complete".
+## Contributing
+We welcome contributions from the community! Here are some ways you can contribute:
+- Reporting bugs
+- Suggesting enhancements
+- Submitting pull requests for bug fixes or new features
+### Setting Up the Development Environment
+To get started with development, follow these setup instructions:
+<details>
+<summary>Development Environment Setup</summary>
+This project uses `renv` for R package management to ensure reproducibility. To set up your development environment:
+1. Clone the repository to your local machine.
+2. Open the project in RStudio or start an R session in the project directory.
+3. Run `renv::restore()` to install the required R packages.
+Renv will automatically activate and install the necessary packages as specified in the `renv.lock` file.
+</details>
+### Reporting Issues
+If you encounter any bugs or have suggestions for improvements, please file an issue using our [GitLab]("https://git.scicore.unibas.ch/ovca-research/DRUGSENS/issues"). Be sure to include as much information as possible to help us understand and address the issue.
+Please make sure to file the issue in gitlab as the GitHub is a mirror repo.
--- a/assets/QC_plot.png
+++ b/assets/QC_plot.png
--- a/inst/extdata/merged/output_example.RDS
+++ b/inst/extdata/merged/output_example.RDS
--- a/inst/extdata/to_merge/A8759_drug1..conc2.csv
+++ b/inst/extdata/to_merge/A8759_drug1..conc2.csv
--- a/inst/extdata/to_merge/A8759_drug1__conc1.csv
+++ b/inst/extdata/to_merge/A8759_drug1__conc1.csv
--- a/inst/extdata/to_merge/B36_Ascites_rep_cont.csv
+++ b/inst/extdata/to_merge/B36_Ascites_rep_cont.csv