-
Studer Gabriel authoredStuder Gabriel authored
|project| Actions
A pure command line interface of |project| is provided by actions.
You can execute pm help
for a list of possible actions and for every action,
you can type pm <ACTION> -h
to get a description on its usage.
Here we list the most prominent actions with simple examples.
Building models
You can run a full protein homology modelling pipeline from the command line with
$ pm build-model [-h] (-f <FILE> | -c <FILE> | -j <OBJECT>|<FILE>)
(-p <FILE> | -e <FILE>) [-s <FILE>] [-o <FILENAME>]
Example usage:
$ pm build-model -f aln.fasta -p tpl.pdb
This reads a target-template alignment from :file:`aln.fasta` and a matching
structure from :file:`tpl.pdb` and produces a gap-less model which is stored as
:file:`model.pdb`. The output filename can be controlled with the -o
flag.
Target-template alignments can be provided in FASTA (-f
), CLUSTAL (-c
)
or as JSON files/objects (-j
). Files can be plain or gzipped.
At least one alignment must be given and you cannot mix file formats.
Multiple alignment files can be given and target chains will be appended in the
given order. The chains of the target model are named with default chain names
(A, B, C, ..., see :meth:`~promod3.modelling.BuildRawModel`).
Notes on the input formats:
-
Leading/trailing whitespaces of sequence names will always be deleted
-
FASTA input example:
>target HGFHVHEFGDNTNGCMSSGPHFNPYGKEHGAPVDENRHLG >2jlp-1.A|55 RAIHVHQFGDLSQGCESTGPHYNPLAVPH------PQHPG
Target sequence is either named "trg" or "target" or the first sequence is used. Template sequence names can encode an identifier for the chain to attach to it and optionally an offset (here: 55, see below for details). Leading whitespaces of fasta headers will be deleted
-
CLUSTAL input follows the same logic as FASTA input
-
JSON input: filenames are not allowed to start with '{'. JSON objects contain an entry with key 'alignmentlist'. That in turn is an array of objects with keys 'target' and 'template'. Those in turn are objects with keys 'name' (string id. for sequence), 'seqres' (string for aligned sequence) and optionally for templates 'offset' (number of residues to skip in structure file attached to it). Example:
{"alignmentlist": [ { "target": { "name": "mytrg", "seqres": "HGFHVHEFGDNTNGCMSSGPHFNPYGKEHGAPVDENRHLG" }, "template": { "name": "2jlp-1.A", "offset": 55, "seqres": "RAIHVHQFGDLSQGCESTGPHYNPLAVPH------PQHPG" } } ] }
Structures can be provided in PDB (-p
) or in any format readable by the
:func:`ost.io.LoadEntity` method (-e
). In the latter case, the format is
chosen by file ending. Recognized File Extensions: .ent
, .pdb
,
.ent.gz
, .pdb.gz
, .cif
, .cif.gz
. At least one structure must be
given and you cannot mix file formats. Multiple structures can be given and each
structure may have multiple chains, but care must be taken to identify which
chain to attach to which template sequence. Chains for each sequence are
identified based on the sequence name of the templates in the alignments. Valid
sequence names are:
- anything, if only one structure with one chain
- "<FILE>.<CHAIN>", where <FILE> is the base file name of an imported structure with no extensions and <CHAIN> is the identifier of the chain in the imported structure.
- "<FILE>" if only one chain in file
- "<CHAIN>" if only one file imported
- "<CHAINID>|<OFFSET>", where <CHAINID> identifies the chain as above and <OFFSET> is the number of residues to skip for that chain to reach the first residue in the aligned sequence. Leading/trailing whitespaces of <CHAINID> and <OFFSET> are ignored.
Example: ... -p data/2jlp.pdb.gz
, where the pdb file has chains A
,
B
, C
and the template sequence is named 2jlp.A|55
.
You can optionally specify sequence profiles to be added (-s
) and linked
to the corresponding target sequences. This has an impact on loop scoring with
the database approach.
The profiles can be provided as plain files or gzipped. Following file
extensions are understood: .hhm, .hhm.gz, .pssm, .pssm.gz.
- The profiles are mapped based on exact matches towards the gapless target sequences from the provided alignment files, i.e. one profile is mapped to several chains in case of homo-oligomers
- Every profile must have a unique sequence to avoid ambiguities
- All or nothing - You cannot provide profiles for only a subset of target sequences
Example usage:
$ pm build-model -f aln.fasta -p tpl.pdb -s prof.hhm
Possible exit codes of the action:
- 0: all went well
- 1: an unhandled exception was raised
- 2: arguments cannot be parsed or required arguments are missing
- 3: failed to perform modelling (internal error)
- 4: failed to write results to file
- other non-zero: failure in argument checking (see :class:`promod3.core.pm3argparse.PM3ArgumentParser`)
Sidechain Modelling
You can (re-)construct the sidechains in a model from the command line.
$ usage: build-sidechains [-h] (-p <FILE> | -e <FILE>) [-o <FILENAME>] [-k]
[-n] [-r]
Example usage:
$ pm build-sidechains -p input.pdb
This reads a structure stored in in.pdb, strips all sidechains,
detects and models disulfid bonds and reconstructs all sidechains with the
flexible rotamer model. The result is stored as :file:`out.pdb`.
The output filename can be controlled with the -o
flag.
A structure can be provided in PDB (-p
) or in any format readable by the
:func:`ost.io.LoadEntity` method (-e
). In the latter case, the format is
chosen by file ending. Recognized File Extensions: .ent
, .pdb
,
.ent.gz
, .pdb.gz
, .cif
, .cif.gz
.
Several flags control the modelling behaviour: