diff --git a/modules/index.rst b/modules/index.rst index 912fcdaed5a0f07b6843a114ed6da669f23c3207..f664be809700225a527c9c342483d6ed7c7b18e9 100644 --- a/modules/index.rst +++ b/modules/index.rst @@ -96,7 +96,7 @@ Varia **Datasets:** :doc:`tabular data <table>` -**Supported File Formats:** :doc:`structure formats<io/structure_formats>` | :doc:`sequence formats <io/sequence_formats>` | :doc:`image formats <io/image_formats>` +**Supported File Formats:** :doc:`structure formats<io/structure_formats>` | :doc:`sequence formats <io/sequence_formats>` | :doc:`sequence profile formats <io/sequence_profile_formats>` | :doc:`image formats <io/image_formats>` **Users** :doc:`Reporting a problem <users>` diff --git a/modules/io/doc/io.rst b/modules/io/doc/io.rst index 8d6cee55047692afa3ff20b6f1734fb5266c0e43..9b40234d5084fec0f9a8d58d8ab48bba5a0ef926 100644 --- a/modules/io/doc/io.rst +++ b/modules/io/doc/io.rst @@ -157,16 +157,49 @@ Loading sequence or alignment files .. function:: LoadSequenceList(filename, format='auto') - For a desription of how to use :func:`LoadSequenceList` please refer to + For a description of how to use :func:`LoadSequenceList` please refer to :func:`LoadSequence`. For a list of file formats supported by :func:`LoadSequenceList` see :doc:`sequence_formats`. .. function:: LoadAlignment(filename, format='auto') - For a desription of how to use :func:`LoadAlignment` please refer to + For a description of how to use :func:`LoadAlignment` please refer to :func:`LoadSequence`. For a list of file formats supported by :func:`LoadAlignment` see :doc:`sequence_formats`. + +.. function:: LoadSequenceProfile(filename, format='auto') + + Load sequence profile data from disk. If format is set to 'auto', the function + guesses the filetype based on the extension of the file. Files ending in + '.hhm' (output of HHblits) and '.pssm' (ASCII Table (PSSM) output of PSI-BLAST + as generated with blastpgp and flag -Q) will automatically be loaded. + + For files with non-standard extensions, the format can be set explicitly + specifying the `format` parameter. + + .. code-block:: python + + # recognizes hhm file by file extension + myprof = io.LoadSequenceProfile('myhmm.hhm') + # recognizes pssm file by file extension + myprof = io.LoadSequenceProfile('myprof.pssm') + + # to override format + myprof = io.LoadSequenceProfile('myfile', format='hhm') + myprof = io.LoadSequenceProfile('myfile', format='pssm') + + For a list of file formats supported by :func:`LoadSequenceProfile` see + :doc:`sequence_profile_formats`. + + :rtype: :class:`~ost.seq.ProfileHandle` + + :raises: :exc:`~ost.io.IOUnknownFormatException` if the format string supplied + is not recognized or the file format can not be detected based on the + file extension. + :exc:`~ost.io.IOException` if the import fails due to an erroneous or + inexistent file. + Saving Sequence Data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -247,7 +280,7 @@ Loading Density Maps # recognizes mrc file by file extension ent = io.LoadImage('file.mrc') - # it is always possible to explicitely set the image format + # it is always possible to explicitly set the image format # DAT file explicitly ent = io.LoadImage('file', Dat()) diff --git a/modules/io/doc/sequence_profile_formats.rst b/modules/io/doc/sequence_profile_formats.rst new file mode 100644 index 0000000000000000000000000000000000000000..8c14c7d7f8ad7f5003d700d7edc060b48da119ec --- /dev/null +++ b/modules/io/doc/sequence_profile_formats.rst @@ -0,0 +1,23 @@ +Supported Sequence Profile File Formats +================================================================================ + +HHblits output +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +*Recognized File Extensions* + .hhm, .hhm.gz + +*Format Name* + hhm + + +PSI-BLAST output +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +ASCII Table (PSSM) output of PSI-BLAST as generated with blastpgp and flag -Q. + +*Recognized File Extensions* + .pssm, .pssm.gz + +*Format Name* + pssm diff --git a/modules/seq/base/doc/seq.rst b/modules/seq/base/doc/seq.rst index c370c1a4015b9759912b5fb47dd5d4f401c7bb65..5022c0aba584d651d8e0d0d056e89cbc635468ff 100644 --- a/modules/seq/base/doc/seq.rst +++ b/modules/seq/base/doc/seq.rst @@ -438,17 +438,22 @@ an alignment: Remove sequence at *index* from the alignment. -Handling Hidden Markov Models +Handling Sequence Profiles -------------------------------------------------------------------------------- -The HMM provides a simple container for hidden markov models in form of -single columns containing amino acid frequencies and transition probabilities. +The :class:`ProfileHandle` provides a simple container for profiles for each +residue. It mainly contains: -.. class:: HMMColumn +- *N* :class:`ProfileColumn` objects (*N* = number of residues in sequence) + which each contains 20 amino acid frequencies +- a :attr:`~ProfileHandle.sequence` (:class:`str`) of length *N* +- a :attr:`~ProfileHandle.null_model` to use for this profile + +.. class:: ProfileColumn .. method:: BLOSUMNullModel() - Static method, that returns a new :class:`HMMColumn` with amino acid + Static method, that returns a new :class:`ProfileColumn` with amino acid frequencies given from the BLOSUM62 substitution matrix. .. method:: GetFreq(aa) @@ -465,75 +470,72 @@ single columns containing amino acid frequencies and transition probabilities. :type aa: :class:`str` :type freq: :class:`float` - .. method:: GetTransitionFreq(from,to) - - :param from: Current state of HMM (HMM_MATCH, HMM_INSERT or HMM_DELETE) - :param to: Next state - - :returns: Frequency of given state transition - - .. method:: SetTransitionFreq(from,to,freq) - - :param from: Current state of HMM (HMM_MATCH, HMM_INSERT or HMM_DELETE) - :param to: Next state - :param freq: Frequency of transition - - .. attribute:: one_letter_code - - One letter code, this column is associated to - .. attribute:: entropy Shannon entropy based on the columns amino acid frequencies -.. class:: HMM - - .. method:: Load(filename) +.. class:: ProfileHandle - Static method to load an hmm in the hhm format as it is in use in the HHSuite. + .. method:: __len__() + + Returns the length of the sequence for which we have profile. - :param filename: Name of file to load - :type filename: :class:`str` + :rtype: :class:`int` .. method:: AddColumn(col) Appends column in the internal column list. :param col: Column to add - :type col: :class:`HMMColumn` + :type col: :class:`ProfileColumn` .. method:: Extract(from,to) :param from: Col Idx to start from - :param to: End Idx, not included in sub-HMM + :param to: End Idx, not included in sub-ProfileHandle :type from: :class:`int` :type to: :class:`int` - :returns: sub-HMM as defined by given indices + :returns: sub-profile as defined by given indices + (:attr:`null_model` is copied) + :rtype: :class:`ProfileHandle` + + :raises: :exc:`~exceptions.Error` if if *to* <= *from* or + *to* > :meth:`__len__`. + + .. method:: SetSequence(sequence) + + Sets :attr:`sequence`. + + .. method:: SetNullModel(null_model) + + Sets :attr:`null_model`. .. attribute:: sequence - Sequence of the columns + Sequence for which we have this profile. + Note: user must enforce consistency between sequence length and number of + profile columns. .. attribute:: columns - Iterable columns of the HMM + Iterable columns of the profile .. attribute:: null_model - Null model of the HMM + Null model of the profile .. attribute:: avg_entropy Average entropy of all the columns -.. class:: HMMDB +.. class:: ProfileDB - A simple database to gather :class:`HMM` objects. It is possible + A simple database to gather :class:`ProfileHandle` objects. It is possible to save them to disk in a compressed format with limited accuracy - (4 digits for freq values). + (4 digits for each frequency). .. method:: Save(filename) @@ -548,32 +550,27 @@ single columns containing amino acid frequencies and transition probabilities. :type filename: :class:`str` :returns: The loaded database - .. method:: AddHMM(name, hmm) + .. method:: AddProfile(name, prof) - :param name: Name of HMM to be added - :param hmm: HMM to be added + :param name: Name of profile to be added + :param prof: Profile to be added :type name: :class:`str` - :type hmm: :class:`HMM` + :type prof: :class:`ProfileHandle` :raises: :class:`Exception` when filename is longer than 255 characters. - .. method:: GetHMM(name) + .. method:: GetProfile(name) - :param name: Name of HMM to be returned + :param name: Name of profile to be returned :type name: :class:`str` - :returns: The requested :class:`HMM` - :raises: :class:`Exception` when no :class:`HMM` for **name** exists. + :returns: The requested :class:`ProfileHandle` + :raises: :class:`Exception` when no :class:`ProfileHandle` for **name** exists. .. method:: Size() - :returns: Number of :class:`HMM` objects in the database + :returns: Number of :class:`ProfileHandle` objects in the database .. method:: GetNames() - :returns: A nonsorted list of the names of all :class:`HMM` objects in the database - - - - - - + :returns: A nonsorted list of the names of all :class:`ProfileHandle` + objects in the database diff --git a/modules/seq/base/pymod/export_profile_handle.cc b/modules/seq/base/pymod/export_profile_handle.cc index 77479ac31876728d6edd62fcaf3259acc7c58d00..e563960f424e448863bf17f7d52d10db8b2875e5 100644 --- a/modules/seq/base/pymod/export_profile_handle.cc +++ b/modules/seq/base/pymod/export_profile_handle.cc @@ -60,6 +60,8 @@ void export_profile_handle() .def("__len__",&ProfileHandle::size) .def("AddColumn", &ProfileHandle::push_back) .def("Extract", &ProfileHandle::Extract) + .def("SetNullModel", &ProfileHandle::SetNullModel) + .def("SetSequence", &ProfileHandle::SetSequence) .add_property("null_model", make_function(&ProfileHandle::GetNullModel, return_value_policy<copy_const_reference>())) .add_property("columns", diff --git a/modules/seq/base/src/profile_handle.hh b/modules/seq/base/src/profile_handle.hh index e3642b6e16c45f8a10e9bba39dbbd29b04d4a91d..0d67f8c696a0995133473c2da7a979d8b84bd6bd 100644 --- a/modules/seq/base/src/profile_handle.hh +++ b/modules/seq/base/src/profile_handle.hh @@ -44,7 +44,7 @@ typedef boost::shared_ptr<ProfileHandle> ProfileHandlePtr; typedef boost::shared_ptr<ProfileDB> ProfileDBPtr; typedef std::vector<ProfileColumn> ProfileColumnList; -/// \brief Defines profile of 20 frequencies for one residue. +/// \brief Defines profile of 20 frequencies for one residue. /// /// Frequencies are identified by the one-letter-code for that amino acid. /// (possible codes: ACDEFGHIKLMNPQRSTVWY)