Skip to content
Snippets Groups Projects
Commit afb30763 authored by Gerardo Tauriello's avatar Gerardo Tauriello
Browse files

Doc update for seq.ProfileHandle class and io.LoadSequenceProfile

parent 22f082dd
No related branches found
No related tags found
No related merge requests found
......@@ -96,7 +96,7 @@ Varia
**Datasets:** :doc:`tabular data <table>`
**Supported File Formats:** :doc:`structure formats<io/structure_formats>` | :doc:`sequence formats <io/sequence_formats>` | :doc:`image formats <io/image_formats>`
**Supported File Formats:** :doc:`structure formats<io/structure_formats>` | :doc:`sequence formats <io/sequence_formats>` | :doc:`sequence profile formats <io/sequence_profile_formats>` | :doc:`image formats <io/image_formats>`
**Users** :doc:`Reporting a problem <users>`
......
......@@ -157,16 +157,49 @@ Loading sequence or alignment files
.. function:: LoadSequenceList(filename, format='auto')
For a desription of how to use :func:`LoadSequenceList` please refer to
For a description of how to use :func:`LoadSequenceList` please refer to
:func:`LoadSequence`. For a list of file formats supported by
:func:`LoadSequenceList` see :doc:`sequence_formats`.
.. function:: LoadAlignment(filename, format='auto')
For a desription of how to use :func:`LoadAlignment` please refer to
For a description of how to use :func:`LoadAlignment` please refer to
:func:`LoadSequence`. For a list of file formats supported by
:func:`LoadAlignment` see :doc:`sequence_formats`.
.. function:: LoadSequenceProfile(filename, format='auto')
Load sequence profile data from disk. If format is set to 'auto', the function
guesses the filetype based on the extension of the file. Files ending in
'.hhm' (output of HHblits) and '.pssm' (ASCII Table (PSSM) output of PSI-BLAST
as generated with blastpgp and flag -Q) will automatically be loaded.
For files with non-standard extensions, the format can be set explicitly
specifying the `format` parameter.
.. code-block:: python
# recognizes hhm file by file extension
myprof = io.LoadSequenceProfile('myhmm.hhm')
# recognizes pssm file by file extension
myprof = io.LoadSequenceProfile('myprof.pssm')
# to override format
myprof = io.LoadSequenceProfile('myfile', format='hhm')
myprof = io.LoadSequenceProfile('myfile', format='pssm')
For a list of file formats supported by :func:`LoadSequenceProfile` see
:doc:`sequence_profile_formats`.
:rtype: :class:`~ost.seq.ProfileHandle`
:raises: :exc:`~ost.io.IOUnknownFormatException` if the format string supplied
is not recognized or the file format can not be detected based on the
file extension.
:exc:`~ost.io.IOException` if the import fails due to an erroneous or
inexistent file.
Saving Sequence Data
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
......@@ -247,7 +280,7 @@ Loading Density Maps
# recognizes mrc file by file extension
ent = io.LoadImage('file.mrc')
# it is always possible to explicitely set the image format
# it is always possible to explicitly set the image format
# DAT file explicitly
ent = io.LoadImage('file', Dat())
......
Supported Sequence Profile File Formats
================================================================================
HHblits output
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
*Recognized File Extensions*
.hhm, .hhm.gz
*Format Name*
hhm
PSI-BLAST output
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ASCII Table (PSSM) output of PSI-BLAST as generated with blastpgp and flag -Q.
*Recognized File Extensions*
.pssm, .pssm.gz
*Format Name*
pssm
......@@ -438,17 +438,22 @@ an alignment:
Remove sequence at *index* from the alignment.
Handling Hidden Markov Models
Handling Sequence Profiles
--------------------------------------------------------------------------------
The HMM provides a simple container for hidden markov models in form of
single columns containing amino acid frequencies and transition probabilities.
The :class:`ProfileHandle` provides a simple container for profiles for each
residue. It mainly contains:
.. class:: HMMColumn
- *N* :class:`ProfileColumn` objects (*N* = number of residues in sequence)
which each contains 20 amino acid frequencies
- a :attr:`~ProfileHandle.sequence` (:class:`str`) of length *N*
- a :attr:`~ProfileHandle.null_model` to use for this profile
.. class:: ProfileColumn
.. method:: BLOSUMNullModel()
Static method, that returns a new :class:`HMMColumn` with amino acid
Static method, that returns a new :class:`ProfileColumn` with amino acid
frequencies given from the BLOSUM62 substitution matrix.
.. method:: GetFreq(aa)
......@@ -465,75 +470,72 @@ single columns containing amino acid frequencies and transition probabilities.
:type aa: :class:`str`
:type freq: :class:`float`
.. method:: GetTransitionFreq(from,to)
:param from: Current state of HMM (HMM_MATCH, HMM_INSERT or HMM_DELETE)
:param to: Next state
:returns: Frequency of given state transition
.. method:: SetTransitionFreq(from,to,freq)
:param from: Current state of HMM (HMM_MATCH, HMM_INSERT or HMM_DELETE)
:param to: Next state
:param freq: Frequency of transition
.. attribute:: one_letter_code
One letter code, this column is associated to
.. attribute:: entropy
Shannon entropy based on the columns amino acid frequencies
.. class:: HMM
.. method:: Load(filename)
.. class:: ProfileHandle
Static method to load an hmm in the hhm format as it is in use in the HHSuite.
.. method:: __len__()
Returns the length of the sequence for which we have profile.
:param filename: Name of file to load
:type filename: :class:`str`
:rtype: :class:`int`
.. method:: AddColumn(col)
Appends column in the internal column list.
:param col: Column to add
:type col: :class:`HMMColumn`
:type col: :class:`ProfileColumn`
.. method:: Extract(from,to)
:param from: Col Idx to start from
:param to: End Idx, not included in sub-HMM
:param to: End Idx, not included in sub-ProfileHandle
:type from: :class:`int`
:type to: :class:`int`
:returns: sub-HMM as defined by given indices
:returns: sub-profile as defined by given indices
(:attr:`null_model` is copied)
:rtype: :class:`ProfileHandle`
:raises: :exc:`~exceptions.Error` if if *to* <= *from* or
*to* > :meth:`__len__`.
.. method:: SetSequence(sequence)
Sets :attr:`sequence`.
.. method:: SetNullModel(null_model)
Sets :attr:`null_model`.
.. attribute:: sequence
Sequence of the columns
Sequence for which we have this profile.
Note: user must enforce consistency between sequence length and number of
profile columns.
.. attribute:: columns
Iterable columns of the HMM
Iterable columns of the profile
.. attribute:: null_model
Null model of the HMM
Null model of the profile
.. attribute:: avg_entropy
Average entropy of all the columns
.. class:: HMMDB
.. class:: ProfileDB
A simple database to gather :class:`HMM` objects. It is possible
A simple database to gather :class:`ProfileHandle` objects. It is possible
to save them to disk in a compressed format with limited accuracy
(4 digits for freq values).
(4 digits for each frequency).
.. method:: Save(filename)
......@@ -548,32 +550,27 @@ single columns containing amino acid frequencies and transition probabilities.
:type filename: :class:`str`
:returns: The loaded database
.. method:: AddHMM(name, hmm)
.. method:: AddProfile(name, prof)
:param name: Name of HMM to be added
:param hmm: HMM to be added
:param name: Name of profile to be added
:param prof: Profile to be added
:type name: :class:`str`
:type hmm: :class:`HMM`
:type prof: :class:`ProfileHandle`
:raises: :class:`Exception` when filename is longer than 255 characters.
.. method:: GetHMM(name)
.. method:: GetProfile(name)
:param name: Name of HMM to be returned
:param name: Name of profile to be returned
:type name: :class:`str`
:returns: The requested :class:`HMM`
:raises: :class:`Exception` when no :class:`HMM` for **name** exists.
:returns: The requested :class:`ProfileHandle`
:raises: :class:`Exception` when no :class:`ProfileHandle` for **name** exists.
.. method:: Size()
:returns: Number of :class:`HMM` objects in the database
:returns: Number of :class:`ProfileHandle` objects in the database
.. method:: GetNames()
:returns: A nonsorted list of the names of all :class:`HMM` objects in the database
:returns: A nonsorted list of the names of all :class:`ProfileHandle`
objects in the database
......@@ -60,6 +60,8 @@ void export_profile_handle()
.def("__len__",&ProfileHandle::size)
.def("AddColumn", &ProfileHandle::push_back)
.def("Extract", &ProfileHandle::Extract)
.def("SetNullModel", &ProfileHandle::SetNullModel)
.def("SetSequence", &ProfileHandle::SetSequence)
.add_property("null_model", make_function(&ProfileHandle::GetNullModel,
return_value_policy<copy_const_reference>()))
.add_property("columns",
......
......@@ -44,7 +44,7 @@ typedef boost::shared_ptr<ProfileHandle> ProfileHandlePtr;
typedef boost::shared_ptr<ProfileDB> ProfileDBPtr;
typedef std::vector<ProfileColumn> ProfileColumnList;
/// \brief Defines profile of 20 frequencies for one residue.
/// \brief Defines profile of 20 frequencies for one residue.
///
/// Frequencies are identified by the one-letter-code for that amino acid.
/// (possible codes: ACDEFGHIKLMNPQRSTVWY)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment