Skip to content
Snippets Groups Projects

ASCII-style alignment pileup

Description

Generates an ASCII-style pileup of read alignments in one or more BAM files against one or more regions specified in a BED file.

Usage

ascii_alignment_pileup.R [-hv] [OPTIONS] bed bam [bam2 ...]

Requirements

  • R 3.6.0 / 4.1.3 / 4.2.1
  • optparse 1.6.2 / 1.7.1
  • rtracklayer 1.44.0 / 1.54.0 / 1.56.1

The script was successfully tested with the indicated versions. Other versions may work as well, but have not been tested.

Installation

The easiest way to install the script is via Conda. If you have Conda installed, all you need to do is:

conda env create -f environment.yml
conda activate ascii-alignment-pileup

Alternatively, you can build a container with all required software from the provided Dockerfile with docker build .. You can also pull a prebuilt Docker image from https://hub.docker.com/repository/docker/zavolab/ascii-alignment-pileup.

Input files

  • BED file; the score column is ignored, so it can contain arbitrary values
  • BAM file(s)
  • Optional: FASTA file compressed with bgzip
  • Optional: GFF/GTF/GFF3 file

You can have a look at the test input files in tests/test_files to see examples for each file type. For reference, also the uncompressed counterparts of BAM and FASTA files (test.sam and test.fa, respectively) are provided.

Output files

  • Custom file format. Output from the provided Example.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>	test-mir
....>>>>>>>>>>>>>>>>>>>>>>.....................................................	test-mir-5p
.......................................................>>>>>>>>>>>>>>>>>>>>>...	test-mir-3p
ATCTCAGCACTTTGAGAGGCCAAAGTGGATGGATCACTTGAGGCCAGGAGTTCAAGACCAGCCTGGCCAACAAGGTGAA	test_ref:3618-3696:+
ACCATGAGGTAGTAGGTTGTATAGTT.....................................................	1
..CATGAGGTAGTAGGTTGTATAGTT.....................................................	10
..GA-GAGGTAGTAGGTTGTATAGTT.....................................................	2
...A-GAGGTAGTAGGTTGTATAGTT.....................................................	19
....TGAGGTAGTAGGTTGTATAGTT.....................................................	17
.....GAGGTAGTAGGTTGTATAGTT.....................................................	33
......AGGTAGTAGGTTGTATAGTT.....................................................	9
......AGGTAGTAGGTTGTATAGTTT....................................................	2
.......GGTAGTAGGTTGTATAGTT.....................................................	7
..................................................GATAACTATACAATCTACTGTCTT.....	1
.....................................................AACTATACAATCTACT..........	1
.......................................................CTATACAATCTACTGTCTTTCT..	28
.......................................................CTATACAATCTACTGTCTTTC-T.	22
.......................................................CTATACAATCTACTGTCTTTCC..	19
.......................................................CTATACAATCTACTGTCTTTC...	12
.......................................................CTATACAATCTACTGTCTTTCTT.	2
.......................................................CTATACAATCTACTGTC.......	1
.......................................................CTATACAATCTACTGTCTT.....	1
.......................................................CTATACAATCTACTGTCTTTCG..	1
........................................................TATACAATCTACTGTCTTTCT..	4
........................................................TATACAATCTACTGTCTTTC-T.	4
........................................................TATACAATCTACTGTCTTTC...	2
........................................................TATACAATCTACTGTCTTTCC..	1
........................................................TATACAATCTACTGTCTTTCCT.	1

Example

There is a set of test files available in directory tests/. If you are in the repository root directory, and all dependencies are installed and available, you can use the following command to run a test:

ascii_alignment_pileup.R \
  --verbose \
  --reference="tests/test_files/test.fa.gz" \
  --annotations="tests/test_files/test.gff" \
  --output-directory="$PWD" \
  "tests/test_files/test.bed" \
  "tests/test_files/test.bam"

Note that if you build a Docker image from the provided Dockerfile or pull one of the prebuilt images (see section Requirements), the test files are included in those images. Therefore, you can also run the tests in a container. To do that, start a container with:

docker run --rm -it <IMAGE_ID> /bin/bash

Then run the test command above.

In both cases, a successful test run with the above command will create a file test.test-mir.min.1.pileup.tab in the current working directory with MD5 sum 6b5a66981bd83329219002897be393a6.

Options

--reference=FILE
        Reference genome sequence in FASTA format. The file *MUST* be compressed
    with BGZIP. If supplied, the reference sequence for the query region(s) will
    be added to the output. Note that on the first run with a specific reference
    genome file, an FAI index is generated which will take some time.

--annotations=FILE
        Annotation file in GFF/GTF format used to annotate sequences. If
    supplied, features overlapping the query region(s) will be visualized in the
    output. Ensure that the argument to option `annotation-name-field`
    corresponds to a field in the annotations, otherwise the script will fail.

--output-directory=DIR
        Output directory. One output file will be created for each region in
    `--bed` and the filenames will be generated from the basenames of the
    supplied BAM file(s) and the name field (4th column) of the BED file.
    [default "."]

--maximum-region-width=INT
        Maximum input region width. Use with care as wide regions will use
    excessive resources. [default 200]

--do-not-collapse-alignments
        Show alignments of reads with identical sequences individually.

--minimum-count=INT
        Alignments of reads with less copies than the specified number will not
    be printed. Option is not considered if `do-not-collapse-alignments` is
    set. [default 1]

--annotation-name-field=STR
        Annotation field used to populate the `name` column in the output.
    [default "Name"]

--padding-character=CHAR
        Character used for padding alignments. [default "."]

--indel-character=CHAR
        Character to denote insertions and deletions in alignments.
    [default "-"]

-h, --help
        Show this information and die.

-v, --verbose
        Print log messages to STDOUT.

Creating a BGZIP-compressed reference

To create a BGZIP-compressed copy of your reference file in FASTA format, as required by option --reference, you will need the bgzip utility that comes with the HTSlib suite.

Supposing you have HTSlib installed and have a reference file test.fa in your current working directory, you can create a BGZIP-compressed copy of it with the following command:

bgzip < test.fa > test.fa.gz

To remove the uncompressed file instead, keeping only the compressed copy, do:

bgzip test.fa

Instead of installing HTSlib, you can also use a prebuilt Docker image, e.g., from BioContainers to create your BGZIP-compressed copy.

For example, when using container image quay.io/biocontainers/htslib:1.15.1--h9753748_0, and again assuming that you want to compress file test.fa in your current working directory, you can create and run the following command:

docker run \
  --rm \
  -it \
  -v $PWD:/data \
  quay.io/biocontainers/htslib:1.15.1--h9753748_0 \
  bash -c 'bgzip < /data/test.fa > /data/test.fa.gz'

You can find other BioContainers-built HTSlib Docker images at: https://quay.io/repository/biocontainers/htslib?tab=tags

Contact

Email: zavolab-biozentrum@unibas.ch

© 2019 Zavolab, Biozentrum, University of Basel