ASCII-style alignment pileup
Description
Generates an ASCII-style pileup of read alignments in one or more BAM files against one or more regions specified in a BED file.
Usage
ascii_alignment_pileup.R [-hv] [OPTIONS] bed bam [bam2 ...]
Requirements
-
R
3.6.0 / 4.1.3 / 4.2.1 -
optparse
1.6.2 / 1.7.1 -
rtracklayer
1.44.0 / 1.54.0 / 1.56.1
The script was successfully tested with the indicated versions. Other versions may work as well, but have not been tested.
Installation
The easiest way to install the script is via Conda. If you have Conda installed, all you need to do is:
conda env create -f environment.yml
conda activate ascii-alignment-pileup
Alternatively, you can build a container with all required software from the
provided Dockerfile
with docker build .
. You can also pull a prebuilt
Docker image from
https://hub.docker.com/repository/docker/zavolab/ascii-alignment-pileup.
Input files
- BED file; the score column is ignored, so it can contain arbitrary values
- BAM file(s)
- Optional: FASTA file compressed with
bgzip
- Optional: GFF/GTF/GFF3 file
You can have a look at the test input files in tests/test_files
to see
examples for each file type. For reference, also the uncompressed counterparts
of BAM and FASTA files (test.sam
and test.fa
, respectively) are provided.
Output files
- Custom file format. Output from the provided Example.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test-mir
....>>>>>>>>>>>>>>>>>>>>>>..................................................... test-mir-5p
.......................................................>>>>>>>>>>>>>>>>>>>>>... test-mir-3p
ATCTCAGCACTTTGAGAGGCCAAAGTGGATGGATCACTTGAGGCCAGGAGTTCAAGACCAGCCTGGCCAACAAGGTGAA test_ref:3618-3696:+
ACCATGAGGTAGTAGGTTGTATAGTT..................................................... 1
..CATGAGGTAGTAGGTTGTATAGTT..................................................... 10
..GA-GAGGTAGTAGGTTGTATAGTT..................................................... 2
...A-GAGGTAGTAGGTTGTATAGTT..................................................... 19
....TGAGGTAGTAGGTTGTATAGTT..................................................... 17
.....GAGGTAGTAGGTTGTATAGTT..................................................... 33
......AGGTAGTAGGTTGTATAGTT..................................................... 9
......AGGTAGTAGGTTGTATAGTTT.................................................... 2
.......GGTAGTAGGTTGTATAGTT..................................................... 7
..................................................GATAACTATACAATCTACTGTCTT..... 1
.....................................................AACTATACAATCTACT.......... 1
.......................................................CTATACAATCTACTGTCTTTCT.. 28
.......................................................CTATACAATCTACTGTCTTTC-T. 22
.......................................................CTATACAATCTACTGTCTTTCC.. 19
.......................................................CTATACAATCTACTGTCTTTC... 12
.......................................................CTATACAATCTACTGTCTTTCTT. 2
.......................................................CTATACAATCTACTGTC....... 1
.......................................................CTATACAATCTACTGTCTT..... 1
.......................................................CTATACAATCTACTGTCTTTCG.. 1
........................................................TATACAATCTACTGTCTTTCT.. 4
........................................................TATACAATCTACTGTCTTTC-T. 4
........................................................TATACAATCTACTGTCTTTC... 2
........................................................TATACAATCTACTGTCTTTCC.. 1
........................................................TATACAATCTACTGTCTTTCCT. 1
Example
There is a set of test files available in directory tests/
. If you are in
the repository root directory, and all dependencies are installed and
available, you can use the following command to run a test:
ascii_alignment_pileup.R \
--verbose \
--reference="tests/test_files/test.fa.gz" \
--annotations="tests/test_files/test.gff" \
--output-directory="$PWD" \
"tests/test_files/test.bed" \
"tests/test_files/test.bam"
Note that if you build a Docker image from the provided Dockerfile
or pull
one of the prebuilt images (see section Requirements), the
test files are included in those images. Therefore, you can also run the tests
in a container. To do that, start a container with:
docker run --rm -it <IMAGE_ID> /bin/bash
Then run the test command above.
In both cases, a successful test run with the above command will create a file
test.test-mir.min.1.pileup.tab
in the current working directory with MD5 sum
6b5a66981bd83329219002897be393a6
.
Options
--reference=FILE
Reference genome sequence in FASTA format. The file *MUST* be compressed
with BGZIP. If supplied, the reference sequence for the query region(s) will
be added to the output. Note that on the first run with a specific reference
genome file, an FAI index is generated which will take some time.
--annotations=FILE
Annotation file in GFF/GTF format used to annotate sequences. If
supplied, features overlapping the query region(s) will be visualized in the
output. Ensure that the argument to option `annotation-name-field`
corresponds to a field in the annotations, otherwise the script will fail.
--output-directory=DIR
Output directory. One output file will be created for each region in
`--bed` and the filenames will be generated from the basenames of the
supplied BAM file(s) and the name field (4th column) of the BED file.
[default "."]
--maximum-region-width=INT
Maximum input region width. Use with care as wide regions will use
excessive resources. [default 200]
--do-not-collapse-alignments
Show alignments of reads with identical sequences individually.
--minimum-count=INT
Alignments of reads with less copies than the specified number will not
be printed. Option is not considered if `do-not-collapse-alignments` is
set. [default 1]
--annotation-name-field=STR
Annotation field used to populate the `name` column in the output.
[default "Name"]
--padding-character=CHAR
Character used for padding alignments. [default "."]
--indel-character=CHAR
Character to denote insertions and deletions in alignments.
[default "-"]
-h, --help
Show this information and die.
-v, --verbose
Print log messages to STDOUT.
Creating a BGZIP-compressed reference
To create a BGZIP-compressed copy of your reference file in FASTA format, as
required by option --reference
, you will need the bgzip
utility that comes
with the HTSlib suite.
Supposing you have HTSlib installed and have a reference file test.fa
in
your current working directory, you can create a BGZIP-compressed copy of it
with the following command:
bgzip < test.fa > test.fa.gz
To remove the uncompressed file instead, keeping only the compressed copy, do:
bgzip test.fa
Instead of installing HTSlib, you can also use a prebuilt Docker image, e.g., from BioContainers to create your BGZIP-compressed copy.
For example, when using container image
quay.io/biocontainers/htslib:1.15.1--h9753748_0
, and again assuming that you
want to compress file test.fa
in your current working directory, you can
create and run the following command:
docker run \
--rm \
-it \
-v $PWD:/data \
quay.io/biocontainers/htslib:1.15.1--h9753748_0 \
bash -c 'bgzip < /data/test.fa > /data/test.fa.gz'
You can find other BioContainers-built HTSlib Docker images at: https://quay.io/repository/biocontainers/htslib?tab=tags
Contact
Email: zavolab-biozentrum@unibas.ch
© 2019 Zavolab, Biozentrum, University of Basel