Skip to content
Snippets Groups Projects
seq.rst 11.40 KiB

:mod:`~ost.seq` -- Sequences and Alignments

The :mod:`seq` module helps you working with sequence data of various kinds. It has classes for :class:`single sequences <SequenceHandle>`, :class:`lists of sequences <SequenceList>` and :class:`alignments <AlignmentHandle>` of two or more sequences.

Attaching Structures to Sequences

As OpenStructure is a computational structural biology framework, it is not surprising that the sequence classes have been designed to work together with structural data. Each sequence can have an attached :class:`~mol.EntityView` allowing for fast mapping between residues in the entity view and position in the sequence.

Sequence Offset

When using sequences and structures together, often the start of the structure and the beginning of the sequence do not fall together. In the following case, the alignment of sequences B and C only covers a subpart of structure A:

A acefghiklmnpqrstuvwy
B     ghiklm
C     123-45

We would now like to know which residue in protein A is aligned to which residue in sequence C. This is achieved by setting the sequence offset of sequence C to 4. In essence, the sequence offset influences all the mapping operations from position in the sequence to residue index and vice versa. By default, the sequence offset is 0.

Loading and Saving Sequences and Alignments

The :mod:`io` module supports input and output of common sequence formats. Single sequences can be loaded from disk with :func:`io.LoadSequence`, alignments are loaded with :func:`io.LoadAlignment` and lists of sequences are loaded with :func:`io.LoadSequenceList`. In addition to the file based input methods, sequences can also be loaded from a string:

seq_string='''>sequence
abcdefghiklmnop'''
s=io.LoadSequenceFromString(seq_string, 'fasta')
print s.name, s # will print "sequence abcdefghiklmnop"

Note that, in that case specifying the format is mandatory.

The SequenceHandle

Represents a sequence. New instances are created with :func:`CreateSequence`.

The SequenceList

Represents a list of sequences. The class provides a row-based interface. New instances are created with :func:`CreateSequenceList`.

The AlignmentHandle

The :class:`AlignmentHandle` represents a list of aligned sequences. In constrast to :class:`SequenceList`, an alignment requires all sequences to be of the same length. New instances of alignments are created with :func:`CreateAlignment` and :func:`AlignmentFromSequenceList`.

Typically sequence alignments are used column-based, i.e by looking at an aligned columns in the sequence alignment. To get a row-based (sequence) view on the sequence list, use :meth:`GetSequenceList()`.

All functions that operate on an alignment will again produce a valid alignment. This mean that it is not possible to change the length of one sequence, without adjusting the other sequences, too.

The following example shows how to iterate over the columns and sequences of an alignment:

aln=io.LoadAlignment('aln.fasta')
# iterate over the columns
for col in aln:
  print col

# iterate over the sequences
for s in aln.sequences:
  print s

Note

Several of these methods just forward calls to the sequence. For more detailed information, have a look at the :class:`SequenceHandle` documentation.