diff --git a/modules/io/doc/io.rst b/modules/io/doc/io.rst index fad95f3c743a21290e3ccb4bd4642468ade510d3..70174b3753f094fb5654c9a89a650c04943cb422 100644 --- a/modules/io/doc/io.rst +++ b/modules/io/doc/io.rst @@ -67,9 +67,10 @@ and mmCIF files with :func:`LoadMMCIF` (this also gives you access to the :class:`MMCifInfo` class). It offers tighter control over the exact loading behaviour. - .. autofunction:: ost.io.LoadPDB +.. autofunction:: ost.io.LoadMMCIF + .. function:: PDBStrToEntity(pdb_string, profile=IOProfile(), process=False) Load entity from a string in PDB format. By default the entity is loaded with @@ -305,6 +306,8 @@ file: .. autofunction:: ost.io.SavePDB +.. autofunction:: ost.io.SaveMMCIF + .. function:: EntityToPDBStr(ent, profile=IOProfile()) Return entity as a string in PDB format. diff --git a/modules/io/doc/mmcif.rst b/modules/io/doc/mmcif.rst index 53880c2dddcbf3428b394345048af31ed3c81731..85a8b3337d91de71fcea8a98112e3b647dd0ebbe 100644 --- a/modules/io/doc/mmcif.rst +++ b/modules/io/doc/mmcif.rst @@ -1,22 +1,21 @@ mmCIF File Format --------------------------------------------------------------------------------- +================================================================================ .. currentmodule:: ost.io The mmCIF file format is a container for structural entities provided by the -PDB. Here we describe how to load those files and how to deal with information -provided above the legacy PDB format (:class:`MMCifInfo`, +PDB. Saving/loading happens through dedicated convenient functions +(:func:`ost.io.LoadMMCIF`/:func:`ost.io.SaveMMCIF`). Here provide more in-depth +information on mmCIF IO and describe how to deal with information provided above +the legacy PDB format (:class:`MMCifInfo`, :class:`MMCifInfoCitation`, :class:`MMCifInfoTransOp`, :class:`MMCifInfoBioUnit`, :class:`MMCifInfoStructDetails`, :class:`MMCifInfoObsolete`, :class:`MMCifInfoStructRef`, :class:`MMCifInfoStructRefSeq`, :class:`MMCifInfoStructRefSeqDif`, :class:`MMCifInfoRevisions`, :class:`MMCifInfoEntityBranchLink`). - -Loading mmCIF Files -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -.. autofunction:: ost.io.LoadMMCIF +Reading mmCIF files +-------------------------------------------------------------------------------- Categories Available @@ -1349,9 +1348,344 @@ of the annotation available. See :attr:`bond_order` -Biounits + +Writing mmCIF files +-------------------------------------------------------------------------------- + + +Star Writer +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The syntax of mmCIF is a subset of the syntax of STAR files. OpenStructure +implements a simple :class:`StarWriter` that is able to write data in two ways: + +* **key-value**: A category name and an attribute name that is linked to a value. Example: + + .. code-block:: bash + + _citation.year 2024 + + ``_citation.year`` is called an mmCIF token. +* **tabular**: Represents several values for an mmCIF token. The tokens are written in a header which is followed by the respective values. Example: + + .. code-block:: bash + + loop_ + _atom_site.group_PDB + _atom_site.type_symbol + _atom_site.label_atom_id + _atom_site.label_comp_id + _atom_site.label_asym_id + _atom_site.label_entity_id + _atom_site.label_seq_id + _atom_site.label_alt_id + _atom_site.Cartn_x + _atom_site.Cartn_y + _atom_site.Cartn_z + _atom_site.occupancy + _atom_site.B_iso_or_equiv + _atom_site.auth_seq_id + _atom_site.auth_asym_id + _atom_site.id + _atom_site.pdbx_PDB_ins_code + ATOM N N SER A 0 1 . -47.333 0.941 8.834 1.00 52.56 71 P 0 ? + ATOM C CA SER A 0 1 . -45.849 0.731 8.796 1.00 53.56 71 P 1 ? + ATOM C C SER A 0 1 . -45.191 1.608 7.714 1.00 51.61 71 P 2 ? + ... + +What follows is an example of how to use the :class:`StarWriter` and its +associated objects. In principle thats enough to write a full mmCIF file +but you definitely want to check out the :class:`MMCifWriter` which extends +:class:`StarWriter` and extracts the relevant data from an OpenStructure +:class:`ost.mol.EntityHandle`. + +.. code-block:: python + + from ost import io + import math + + writer = io.StarWriter() + + # Add key value pair + value = io.StarWriterValue.FromInt(42) + data_item = io.StarWriterDataItem("_the", "answer", value) + writer.Push(data_item) + + # Add tabular data + loop_desc = io.StarWriterLoopDesc("_math_oper") + loop_desc.Add("num") + loop_desc.Add("sqrt") + loop_desc.Add("square") + loop = io.StarWriterLoop(loop_desc) + for i in range(10): + data = list() + data.append(io.StarWriterValue.FromInt(i)) + data.append(io.StarWriterValue.FromFloat(math.sqrt(i), 3)) + data.append(io.StarWriterValue.FromInt(i*i)) + loop.AddData(data) + writer.Push(loop) + + # Write this groundbreaking data into a file with name numbers.gz + # and yes, its directly gzipped + writer.Write("numbers", "numbers.gz") + + +The content of the written file: + +.. code-block:: bash + + data_numbers + _the.answer 42 + # + loop_ + _math_oper.num + _math_oper.sqrt + _math_oper.square + 0 0.000 0 + 1 1.000 1 + 2 1.414 4 + 3 1.732 9 + 4 2.000 16 + 5 2.236 25 + 6 2.449 36 + 7 2.646 49 + 8 2.828 64 + 9 3.000 81 + # + +.. class:: StarWriterValue + + A value which is stored as string - must be constructed from static + constructor functions + + .. method:: FromInt(int_val) + + Static constructor from an integer value + + :param int_val: The value + :type int_val: :class:`int` + :returns: :class:`StarWriterValue` + + .. method:: FromFloat(float_val, decimals) + + Static constructor from a float value + + :param float_val: The value + :type float_val: :class:`float` + :param decimals: Number decimals that get stored in internal value + :returns: :class:`StarWriterValue` + + .. method:: FromString(string_val) + + Static constructor from a string value, stores input as is + with the exception of the following processing: + + * encapsulate string in brackets if *string_val* contains space character + * set to "." if *string_val* is an empty string + + :param string_val: The value + :type string_val: :class:`str` + :returns: :class:`StarWriterValue` + + + .. method:: GetValue + + Returns the internal string representation + + +.. class:: StarWriterDataItem(category, attribute, value) + + key-value data representation + + :param category: The category name of the data item + :type category: :class:`str` + :param attribute: The attribute name of the data item + :type attribute: :class:`str` + :param value: The value of the data item + :type value: :class:`StarWriterValue` + + .. method:: GetCategory + + Returns *category* + + .. method:: GetAttribute + + Returns *attribute* + + .. method:: GetValue + + Returns *value* + + +.. class:: StarWriterLoopDesc(category) + + Defines header for tabular data representation for the specified *category* + + :param category: The category + :type category: :class:`str` + + .. method:: GetCategory + + Returns *category* + + .. method:: GetSize + + Returns number of added attributes + + .. method:: Add(attribute) + + Adds an attribute + + :param attribute: The attribute + :type attribute: :class:`str` + + .. method:: GetIndex(attribute) + + Returns index for specified *attribute*, -1 if not found + + :param attribute: The attribute for which the index should be returned + :type attribute: :class:`str` + + +.. class:: StarWriterLoop(desc) + + Allows to populate :class:`StarWriterLoopDesc` with data to get a full tabular + data representation + + :param desc: The header + :type desc: :class:`StarWriterLoopDesc` + + .. method:: GetDesc + + Returns *desc* + + .. method:: GetN + + Returns number of added data lists + + .. method:: AddData(data_list) + + Add data for each attribute in *desc*. + + :param data_list: Data to be added, length must match attributes in *desc* + :type data_list: :class:`list` of :class:`StarWriterValue` + + +.. class:: StarWriter + + Can be populated with data which can then be written to a file. + + .. method:: Push(star_writer_object) + + Push data to be written + + :param star_writer_object: Data + :type star_writer_object: :class:`StarWriterDataItem`/:class:`StarWriterLoop` + + .. method:: Write(data_name, filename) + + Writes pushed data in specified file. + + :param data_name: Name of data block, i.e. the written file starts with + data\_<data_name>. + :type data_name: :class:`str` + :param filename: Name of generated file - applies gzip compression in case + of .gz suffix. + :type filename: :class:`str` + + +mmCIF Writer ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The OpenStructure mmCIF writer considers the following data categories. The +listed attributes are written which fulfills all dependencies in an mmCIF +file according to `mmcif_pdbx_v50 <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Index/>`_. + +* `_atom_site <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/atom_site.html>`_ + + * group_PDB + * type_symbol + * label_atom_id + * label_asym_id + * label_entity_id + * label_seq_id + * label_alt_id + * Cartn_x + * Cartn_y + * Cartn_z + * occupancy + * B_iso_or_equiv + * auth_seq_id + * auth_asym_id + * id + * pdbx_PDB_ins_code + + +* `_entity <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/entity.html>`_ + + * id + * type + +* `_struct_asym <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/struct_asym.html>`_ + + * id + * entity_id + +* `_entity_poly <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/entity_poly.html>`_ + + * entity_id + * type + * pdbx_seq_one_letter_code + * pdbx_seq_one_letter_code_can + +* `_pdbx_poly_seq_scheme <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/pdbx_poly_seq_scheme.html>`_ + + * asym_id + * entity_id + * mon_id + * seq_id + * pdb_strand_id + * pdb_seq_num + * pdb_ins_code + + +* `_entity_poly_seq <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/pdbx_reference_entity_poly_seq.html>`_ + + * entity_id + * mon_id + * num + + +* `_chem_comp <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/chem_comp.html>`_ + + * id + * type + +* `_atom_type <https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Categories/atom_type.html>`_ + + * symbol + +The writer is designed to only require an OpenStructure +:class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` as input but +optionally performs preprocessing in order to separate residues of chains into +valid mmCIF entities. This is controlled by the *mmcif_conform* flag which has +significant impact on how chains are assigned to mmCIF entities, chain names and +residue numbers. Ideally, the input is *mmcif_conform*. That means each chain +has a :class:`ost.mol.ChainType` attached and all of its residues are consistent +with that :class:`ost.mol.ChainType`. E.g. +:func:`ost.mol.ResidueHandle.GetChemClass` evaluates to +:class:`PEPTIDE_LINKING` or :class:`L_PEPTIDE_LINKING` for a +chain of type :class:`CHAINTYPE_POLY_PEPTIDE_L`. These requirements are +fulfilled for structures loaded from mmCIF files. +Structures loaded from PDB files may have peptide residues, water and +non-polymers in the same chain. They are not *mmcif_conform* and thus +require preprocessing. + +* Chains represent valid mmCIF entities, e.g. + +Biounits +-------------------------------------------------------------------------------- + .. _Biounits: Biological assemblies, i.e. biounits, are an integral part of mmCIF files and diff --git a/modules/io/pymod/__init__.py b/modules/io/pymod/__init__.py index ca619a1561732da5bc65912e1216d339351271f9..73ac9e65634d0c7e5d0028875dd52dc7cbb69293 100644 --- a/modules/io/pymod/__init__.py +++ b/modules/io/pymod/__init__.py @@ -20,7 +20,6 @@ import os, tempfile, ftplib, http.client from ._ost_io import * from ost import mol, geom, conop, seq -from ost import LogWarning class IOProfiles: def __init__(self): @@ -447,11 +446,6 @@ def LoadMMCIF(filename, fault_tolerant=None, calpha_only=None, reader.info.ConnectBranchLinks() #else: # raise IOError("File doesn't contain any entities") - - # Warn about info dependency on seqres - if info and not reader.seqres: - LogWarning("MMCifInfo is incomplete when seqres=False") - if seqres and info: return ent, reader.seqres, reader.info if seqres: @@ -462,6 +456,37 @@ def LoadMMCIF(filename, fault_tolerant=None, calpha_only=None, except: raise + +def SaveMMCIF(ent, filename, data_name="OST_structure", mmcif_conform = True): + """ + Save OpenStructure entity in mmCIF format + + :param ent: OpenStructure Entity to be saved + :param filename: Filename - .gz suffix triggers gzip compression + :param data_name: Name of data block that will be written to + mmCIF file. Typically, thats the PDB ID or some + identifier. + :param mmcif_conform: Controls processing of structure, i.e. identification + of mmCIF entities etc. before writing. Detailed + description is in the documentation below. In short: + If *mmcif_conform* is set to True, Chains in *ent* are + expected to be valid mmCIF entities with residue numbers + set according to underlying SEQRES. That should be the + case when *ent* has been loaded with :func:`LoadMMCIF`. + If *mmcif_conform* is set to False, heuristics kick in + to identify and separate mmCIF entities based on + :class:`ost.mol.ChemClass` of the residues in a chain. + :type ent: :class:`ost.mol.EntityHandle`/:class:`ost.mol.EntityView` + :type filename: :class:`str` + :type data_name: :class:`str` + :type mmcif_conform: :class:`bool` + """ + writer = MMCifWriter() + writer.SetStructure(ent, mmcif_conform = mmcif_conform) + writer.Write(data_name, filename) + + + # this function uses a dirty trick: should be a member of MMCifInfoBioUnit # which is totally C++, but we want the method in Python... so we define it # here (__init__) and add it as a member to the class. With this, the first diff --git a/modules/io/pymod/export_mmcif_io.cc b/modules/io/pymod/export_mmcif_io.cc index ae6e956fa612f8b9c83575696c48abdea4b0fcc5..3665c351bb0d22730ad187ffd501c37132e71116 100644 --- a/modules/io/pymod/export_mmcif_io.cc +++ b/modules/io/pymod/export_mmcif_io.cc @@ -26,6 +26,8 @@ using namespace boost::python; #include <ost/io/mol/io_profile.hh> #include <ost/io/mol/mmcif_reader.hh> #include <ost/io/mol/mmcif_info.hh> +#include <ost/io/mol/star_writer.hh> +#include <ost/io/mol/mmcif_writer.hh> #include <ost/io/mmcif_str.hh> using namespace ost; using namespace ost::io; @@ -55,6 +57,43 @@ boost::python::tuple WrapMMCifStringToEntity(const String& mmcif, std::get<2>(res)); } +String WrapEntityToMMCifStringEnt(const ost::mol::EntityHandle& ent, + const String& data_name, + bool mmcif_conform) { + return EntityToMMCifString(ent, data_name, mmcif_conform); +} + +String WrapEntityToMMCifStringView(const ost::mol::EntityView& ent, + const String& data_name, + bool mmcif_conform) { + return EntityToMMCifString(ent, data_name, mmcif_conform); +} + +void WrapStarLoopAddData(StarWriterLoop& sl, const boost::python::list& l) { + std::vector<StarWriterValue> v; + for (int i = 0; i < boost::python::len(l); ++i){ + v.push_back(boost::python::extract<StarWriterValue>(l[i])); + } + sl.AddData(v); +} + +void WrapStarWriterWrite(StarWriter& writer, const String& data_name, + const String& filename) { + writer.Write(data_name, filename); +} + +void WrapSetStructureHandle(MMCifWriter& writer, + const ost::mol::EntityHandle& ent, + bool mmcif_conform) { + writer.SetStructure(ent, mmcif_conform); +} + +void WrapSetStructureView(MMCifWriter& writer, + const ost::mol::EntityView& ent, + bool mmcif_conform) { + writer.SetStructure(ent, mmcif_conform); +} + void export_mmcif_io() { class_<MMCifReader, boost::noncopyable>("MMCifReader", init<const String&, EntityHandle&, const IOProfile&>()) @@ -75,6 +114,44 @@ void export_mmcif_io() return_value_policy<copy_const_reference>())) ; + class_<StarWriterObject, boost::noncopyable>("StarWriterObject", no_init); + + class_<StarWriterValue>("StarWriterValue", no_init) + .def("FromInt", &StarWriterValue::FromInt, (arg("int_val"))).staticmethod("FromInt") + .def("FromFloat", &StarWriterValue::FromFloat, (arg("float_val"), arg("decimals"))).staticmethod("FromFloat") + .def("FromString", &StarWriterValue::FromString, (arg("string_val"))).staticmethod("FromString") + .def("GetValue", &StarWriterValue::GetValue, return_value_policy<copy_const_reference>()) + ; + + class_<StarWriterDataItem, bases<StarWriterObject> >("StarWriterDataItem", init<const String&, const String&, const StarWriterValue&>()) + .def("GetCategory", &StarWriterDataItem::GetCategory, return_value_policy<copy_const_reference>()) + .def("GetAttribute", &StarWriterDataItem::GetAttribute, return_value_policy<copy_const_reference>()) + .def("GetValue", &StarWriterDataItem::GetValue, return_value_policy<copy_const_reference>()) + ; + + class_<StarWriterLoopDesc, bases<StarWriterObject> >("StarWriterLoopDesc", init<const String&>()) + .def("GetCategory", &StarWriterLoopDesc::GetCategory, return_value_policy<copy_const_reference>()) + .def("GetSize", &StarWriterLoopDesc::GetSize) + .def("Add", &StarWriterLoopDesc::Add, (arg("attribute"))) + .def("GetIndex", &StarWriterLoopDesc::GetIndex, (arg("attribute"))) + ; + + class_<StarWriterLoop, bases<StarWriterObject> >("StarWriterLoop", init<const StarWriterLoopDesc&>()) + .def("GetDesc", &StarWriterLoop::GetDesc, return_value_policy<reference_existing_object>()) + .def("GetN", &StarWriterLoop::GetN) + .def("AddData", &WrapStarLoopAddData, (arg("data_list"))) + ; + + class_<StarWriter>("StarWriter", init<>()) + .def("Push", &StarWriter::Push, arg("star_writer_object")) + .def("Write", &WrapStarWriterWrite, (arg("data_name"), arg("filename"))) + ; + + class_<MMCifWriter, bases<StarWriter> >("MMCifWriter", init<>()) + .def("SetStructure", &WrapSetStructureHandle, (arg("ent"), arg("mmcif_conform")=true)) + .def("SetStructure", &WrapSetStructureView, (arg("ent"), arg("mmcif_conform")=true)) + ; + enum_<MMCifInfoCitation::MMCifInfoCType>("MMCifInfoCType") .value("Journal", MMCifInfoCitation::JOURNAL) .value("Book", MMCifInfoCitation::BOOK) @@ -451,4 +528,12 @@ void export_mmcif_io() def("MMCifStrToEntity", &WrapMMCifStringToEntity, (arg("pdb_string"), arg("profile")=IOProfile(), arg("process")=false)); + + def("EntityToMMCifString", &WrapEntityToMMCifStringEnt, (arg("ent"), + arg("data_name")="OST_structure", + arg("mmcif_conform")=true)); + + def("EntityToMMCifString", &WrapEntityToMMCifStringView, (arg("ent"), + arg("data_name")="OST_structure", + arg("mmcif_conform")=true)); } diff --git a/modules/io/src/mol/CMakeLists.txt b/modules/io/src/mol/CMakeLists.txt index 5f17d12bc4ba6d5c991275b859ce1330253f5551..f281e45a35fef883c3720eb30cf6adf7d2f07523 100644 --- a/modules/io/src/mol/CMakeLists.txt +++ b/modules/io/src/mol/CMakeLists.txt @@ -17,8 +17,10 @@ chemdict_parser.cc io_profile.cc dcd_io.cc star_parser.cc +star_writer.cc mmcif_reader.cc mmcif_info.cc +mmcif_writer.cc pdb_str.cc sdf_str.cc mmcif_str.cc @@ -30,8 +32,10 @@ PARENT_SCOPE set(OST_IO_MOL_HEADERS chemdict_parser.hh star_parser.hh +star_writer.hh mmcif_reader.hh mmcif_info.hh +mmcif_writer.hh io_profile.hh dcd_io.hh entity_io_crd_handler.hh diff --git a/modules/io/src/mol/mmcif_str.cc b/modules/io/src/mol/mmcif_str.cc index e24f9d79113f3e85513d3a9f08d35cd088162d2b..a127ab7fabe8f775a3f8658a1887ffa7d1daff0e 100644 --- a/modules/io/src/mol/mmcif_str.cc +++ b/modules/io/src/mol/mmcif_str.cc @@ -19,9 +19,28 @@ #include <sstream> #include <ost/io/mol/mmcif_str.hh> #include <ost/io/mol/mmcif_reader.hh> +#include <ost/io/mmcif_writer.hh> namespace ost { namespace io { +String EntityToMMCifString(const ost::mol::EntityHandle& ent, + const String& data_name, bool mmcif_conform) { + std::stringstream ss; + MMCifWriter writer; + writer.SetStructure(ent, mmcif_conform); + writer.Write(data_name, ss); + return ss.str(); +} + +String EntityToMMCifString(const ost::mol::EntityView& ent, + const String& data_name, bool mmcif_conform) { + std::stringstream ss; + MMCifWriter writer; + writer.SetStructure(ent, mmcif_conform); + writer.Write(data_name, ss); + return ss.str(); +} + std::tuple<mol::EntityHandle, MMCifInfo, ost::seq::SequenceList> MMCifStringToEntity(const String& mmcif, const IOProfile& profile, bool process) { std::stringstream stream(mmcif); diff --git a/modules/io/src/mol/mmcif_str.hh b/modules/io/src/mol/mmcif_str.hh index 7e0ec2a113f77d92d0f8e94dbecab32c2a62d080..ec0c92df40d53c6deb55ab6cf53b557862e3b0d4 100644 --- a/modules/io/src/mol/mmcif_str.hh +++ b/modules/io/src/mol/mmcif_str.hh @@ -27,6 +27,14 @@ namespace ost { namespace io { +String DLLEXPORT_OST_IO +EntityToMMCifString(const ost::mol::EntityHandle& ent, const String& data_name, + bool mmcif_conform); + +String DLLEXPORT_OST_IO +EntityToMMCifString(const ost::mol::EntityView& ent, const String& data_name, + bool mmcif_conform); + std::tuple<mol::EntityHandle, MMCifInfo, ost::seq::SequenceList> DLLEXPORT_OST_IO MMCifStringToEntity(const String& mmcif, const IOProfile& profile, bool process); diff --git a/modules/io/src/mol/mmcif_writer.cc b/modules/io/src/mol/mmcif_writer.cc new file mode 100644 index 0000000000000000000000000000000000000000..18831c0465b34eab6a6450bed00018e95dfe79cd --- /dev/null +++ b/modules/io/src/mol/mmcif_writer.cc @@ -0,0 +1,1480 @@ +//------------------------------------------------------------------------------ +// This file is part of the OpenStructure project <www.openstructure.org> +// +// Copyright (C) 2008-2023 by the OpenStructure authors +// +// This library is free software; you can redistribute it and/or modify it under +// the terms of the GNU Lesser General Public License as published by the Free +// Software Foundation; either version 3.0 of the License, or (at your option) +// any later version. +// This library is distributed in the hope that it will be useful, but WITHOUT +// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +// FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more +// details. +// +// You should have received a copy of the GNU Lesser General Public License +// along with this library; if not, write to the Free Software Foundation, Inc., +// 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +//------------------------------------------------------------------------------ + +#include <ost/mol/chem_class.hh> +#include <ost/io/mol/mmcif_writer.hh> + +namespace { + + // generates as many chain names as you want (potentially multiple characters) + struct ChainNameGenerator{ + ChainNameGenerator() { + chain_names = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz"; + n_chain_names = chain_names.size(); + indices.push_back(-1); + } + + String Get() { + int idx = indices.size() - 1; + indices[idx] += 1; + bool more_digits = false; + while(idx >= 0) { + if(indices[idx] >= n_chain_names) { + indices[idx] = 0; + if(idx>0) { + indices[idx-1] += 1; + --idx; + } else { + more_digits = true; + break; + } + } else { + break; + } + } + if(more_digits) { + indices.insert(indices.begin(), 0); + } + String ch_name(indices.size(), 'X'); + for(uint i = 0; i < indices.size(); ++i) { + ch_name[i] = chain_names[indices[i]]; + } + return ch_name; + } + + void Reset() { + indices.clear(); + indices.push_back(-1); + } + + String chain_names; + int n_chain_names; + std::vector<int> indices; + }; + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + String GuessEntityPolyType(const T& res_list) { + + // guesses _entity_poly.type based on residue chem classes + + // allowed values according to mmcif_pdbx_v50.dic: + // - cyclic-pseudo-peptide + // - other + // - peptide nucleic acid + // - polydeoxyribonucleotide + // - polydeoxyribonucleotide/polyribonucleotide hybrid + // - polypeptide(D) + // - polypeptide(L) + // - polyribonucleotide + + // this function won't identify cyclic-pseudo-peptides + + std::set<char> chem_classes; + for(auto res: res_list) { + chem_classes.insert(res.GetChemClass()); + } + + // check for polypeptide(L) + if(chem_classes.size() == 1 && + chem_classes.find(ost::mol::ChemClass::L_PEPTIDE_LINKING) != chem_classes.end()) { + return "polypeptide(L)"; + } + + if(chem_classes.size() == 2 && + chem_classes.find(ost::mol::ChemClass::L_PEPTIDE_LINKING) != chem_classes.end() && + chem_classes.find(ost::mol::ChemClass::PEPTIDE_LINKING) != chem_classes.end()) { + return "polypeptide(L)"; + } + + // check for polypeptide(D) + if(chem_classes.size() == 1 && + chem_classes.find(ost::mol::ChemClass::D_PEPTIDE_LINKING) != chem_classes.end()) { + return "polypeptide(D)"; + } + + if(chem_classes.size() == 2 && + chem_classes.find(ost::mol::ChemClass::D_PEPTIDE_LINKING) != chem_classes.end() && + chem_classes.find(ost::mol::ChemClass::PEPTIDE_LINKING) != chem_classes.end()) { + return "polypeptide(D)"; + } + + // check for polydeoxyribonucleotide + if(chem_classes.size() == 1 && + chem_classes.find(ost::mol::ChemClass::DNA_LINKING) != chem_classes.end()) { + return "polydeoxyribonucleotide"; + } + + // check for polyribonucleotide + if(chem_classes.size() == 1 && + chem_classes.find(ost::mol::ChemClass::RNA_LINKING) != chem_classes.end()) { + return "polyribonucleotide"; + } + + // check for polydeoxyribonucleotide/polyribonucleotide hybrid + if(chem_classes.size() == 2 && + chem_classes.find(ost::mol::ChemClass::RNA_LINKING) != chem_classes.end() && + chem_classes.find(ost::mol::ChemClass::DNA_LINKING) != chem_classes.end()) { + return "polydeoxyribonucleotide/polyribonucleotide hybrid"; + } + + // check for peptide nucleic acid + bool peptide_linking = chem_classes.find(ost::mol::ChemClass::L_PEPTIDE_LINKING) != chem_classes.end() || + chem_classes.find(ost::mol::ChemClass::D_PEPTIDE_LINKING) != chem_classes.end() || + chem_classes.find(ost::mol::ChemClass::PEPTIDE_LINKING) != chem_classes.end(); + bool nucleotide_linking = chem_classes.find(ost::mol::ChemClass::DNA_LINKING) != chem_classes.end() || + chem_classes.find(ost::mol::ChemClass::RNA_LINKING) != chem_classes.end(); + std::set<char> pepnuc_set; + pepnuc_set.insert(ost::mol::ChemClass::L_PEPTIDE_LINKING); + pepnuc_set.insert(ost::mol::ChemClass::D_PEPTIDE_LINKING); + pepnuc_set.insert(ost::mol::ChemClass::PEPTIDE_LINKING); + pepnuc_set.insert(ost::mol::ChemClass::DNA_LINKING); + pepnuc_set.insert(ost::mol::ChemClass::RNA_LINKING); + pepnuc_set.insert(chem_classes.begin(), chem_classes.end()); + if(peptide_linking && nucleotide_linking && pepnuc_set.size() == 5) { + return "peptide nucleic acid"; + } + + return "other"; + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + String GuessEntityBranchType(const T& res_list) { + // guesses _pdbx_entity_branch.type based on residue chem classes + // + // Thats a hard one... only allowed value according to mmcif_pdbx_v50.dic: + // oligosaccharide + return "oligosaccharide"; + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + String GuessEntityType(const T& res_list) { + + // guesses _entity.type based on residue chem classes + + // allowed values according to mmcif_pdbx_v50.dic: + // - branched + // - macrolide + // - non-polymer + // - polymer + // - water + + // this function is overly simplistic and won't identify macrolid or + // branched => explicitely checks for water, everything else is either + // non-polymer or polymer depending on number of residues + + std::set<char> chem_classes; + for(auto res: res_list) { + chem_classes.insert(res.GetChemClass()); + } + + // check for water + if(chem_classes.size() == 1 && + chem_classes.find(ost::mol::ChemClass::WATER) != chem_classes.end()) { + return "water"; + } + + // DISCUSS THIS OVER A BEER... + // Entities must have at least 3 residues to be considered polymers + // for now, we just set entities with 1 or 2 residues as non-polymer + if(res_list.size() == 1 || res_list.size() == 2) { + return "non-polymer"; + } + + return "polymer"; + } + + inline String ChemClassToChemCompType(char chem_class) { + String type = ""; + switch(chem_class) { + case 'P': { + type = "PEPTIDE LINKING"; + break; + } + case 'D': { + type = "D-PEPTIDE LINKING"; + break; + } + case 'L': { + type = "L-PEPTIDE LINKING"; + break; + } + case 'R': { + type = "RNA LINKING"; + break; + } + case 'S': { + type = "DNA LINKING"; + break; + } + case 'N': { + type = "NON-POLYMER"; + break; + } + case 'X': { + type = "L-SACCHARIDE"; + break; + } + case 'Y': { + type = "D-SACCHARIDE"; + break; + } + case 'Z': { + type = "SACCHARIDE"; + break; + } + case 'W': { + type = "NON-POLYMER"; // yes, water is a non-polymer + // https://www.rcsb.org/ligand/HOH + break; + } + case 'U': { + type = "OTHER"; + break; + } + default: { + std::stringstream err; + err << "Invalid chem class: "<<chem_class; + throw ost::io::IOException(err.str()); + } + } + return type; + } + + inline String MonIDToOLC(char chem_class, + const String& mon_id) { + + // hardcoded table according + // https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_entity_poly.pdbx_seq_one_letter_code.html + + if(ost::mol::ChemClass(chem_class).IsPeptideLinking()) { + switch(mon_id[0]) { + case 'A': { + if(mon_id == "ALA") { + return "A"; + } + if(mon_id == "ACE") { + return "(ACE)"; + } + if(mon_id == "ASP") { + return "D"; + } + if(mon_id == "ASN") { + return "N"; + } + if(mon_id == "ARG") { + return "R"; + } + break; + } + case 'C': { + if(mon_id == "CYS") { + return "C"; + } + break; + } + case 'G': { + if(mon_id == "GLU") { + return "E"; + } + if(mon_id == "GLY") { + return "G"; + } + if(mon_id == "GLN") { + return "Q"; + } + break; + } + case 'H': { + if(mon_id == "HIS") { + return "H"; + } + break; + } + case 'I': { + if(mon_id == "ILE") { + return "I"; + } + break; + } + case 'L': { + if(mon_id == "LEU") { + return "L"; + } + if(mon_id == "LYS") { + return "K"; + } + break; + } + case 'M': { + if(mon_id == "MET") { + return "M"; + } + if(mon_id == "MSE") { + return "(MSE)"; + } + break; + } + case 'N': { + if(mon_id == "NH2") { + return "(NH2)"; + } + break; + } + case 'P': { + if(mon_id == "PHE") { + return "F"; + } + if(mon_id == "PYL") { + return "O"; + } + if(mon_id == "PRO") { + return "P"; + } + if(mon_id == "PTR") { + return "(PTR)"; + } + if(mon_id == "PCA") { + return "(PCA)"; + } + break; + } + case 'S': { + if(mon_id == "SER") { + return "S"; + } + if(mon_id == "SEC") { + return "U"; + } + if(mon_id == "SEP") { + return "(SEP)"; + } + break; + } + case 'T': { + if(mon_id == "THR") { + return "T"; + } + if(mon_id == "TRP") { + return "W"; + } + if(mon_id == "TYR") { + return "Y"; + } + if(mon_id == "TPO") { + return "(TPO)"; + } + break; + } + case 'V': { + if(mon_id == "VAL") { + return "V"; + } + break; + } + } + } else if(ost::mol::ChemClass(chem_class).IsNucleotideLinking()) { + switch(mon_id[0]) { + case 'A': { + if(mon_id == "A") { + return "A"; + } + break; + } + case 'C': { + if(mon_id == "C") { + return "C"; + } + break; + } + case 'D': { + if(mon_id == "DA") { + return "(DA)"; + } + if(mon_id == "DC") { + return "(DC)"; + } + if(mon_id == "DG") { + return "(DG)"; + } + if(mon_id == "DT") { + return "(DT)"; + } + break; + } + case 'G': { + if(mon_id == "G") { + return "G"; + } + break; + } + case 'I': { + if(mon_id == "I") { + return "I"; + } + break; + } + case 'U': { + if(mon_id == "U") { + return "U"; + } + break; + } + } + } + + return "(" + mon_id + ")"; + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + void SetupChemComp(const T& res_list, + std::map<String, ost::io::MMCifWriterComp>& comp_infos) { + for(auto res: res_list) { + String res_name = res.GetName(); + String type = ChemClassToChemCompType(res.GetChemClass()); + auto it = comp_infos.find(res_name); + if(it != comp_infos.end()) { + if(it->second.type != type) { + // If the already stored type or the incoming type are OTHER, + // we keep the one that is NOT OTHER => everything has preference over + // OTHER. However, if both types are NOT OTHER and they do not match, + // we throw an error. + if(type == "OTHER") { + continue; + } else if (it->second.type == "OTHER") { + ost::io::MMCifWriterComp info; + info.type = type; + comp_infos[res_name] = info; + } else { + std::stringstream ss; + ss << "Residue " << res << "has _chem_comp.type \"" << type; + ss << "\" which is derived from its chem class: " << res.GetChemClass(); + ss << ". Observed already another _chem_comp.type for a residue of "; + ss << "the same name: " << it->second.type; + throw ost::io::IOException(ss.str()); + } + } + } else { + ost::io::MMCifWriterComp info; + info.type = type; + comp_infos[res_name] = info; + } + } + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + bool MatchEntity(const T& res_list, + const ost::io::MMCifWriterEntity& info) { + // checks if the residue names in res_list are an exact match + // with mon_ids in info + std::vector<String> mon_ids; + for(auto res : res_list) { + mon_ids.push_back(res.GetName()); + } + return mon_ids == info.mon_ids; + } + + void AddAsym(const String& asym_chain_name, + ost::io::MMCifWriterEntity& info) { + // adds asym_chain_name to info under the assumption that mon_ids + // exactly match => just add a copy of mon_ids to asym_alns + info.asym_ids.push_back(asym_chain_name); + info.asym_alns.push_back(info.mon_ids); + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + bool MatchEntityResnum(const T& res_list, + const ost::io::MMCifWriterEntity& info, + Real beyond_frac = 0.05) { + // Checks if res_list matches SEQRES given in info.mon_ids + // It may be that res_list defines residues beyond this SEQRES or + // has residues that are not yet defined, i.e. the respective mon_id is "-". + // This function returns True if the number of such occurences is below + // the specified fraction of actual matches (default: 5%). + int n_beyond = 0; + int n_matches = 0; + int num_mon_ids = info.mon_ids.size(); + for(auto res: res_list) { + int num = res.GetNumber().GetNum(); + char ins_code = res.GetNumber().GetInsCode(); + if(num < 1) { + std::stringstream ss; + ss << "Try to construct resnum based alignments. Negative residue "; + ss << "numbers are not allowed in this case. Got: "; + ss << num << " in residue " << res; + ss << ". You may set mmcif_conform flag to False to write something "; + ss << "but be aware of the consequences..."; + throw ost::io::IOException(ss.str()); + } + if(ins_code != '\0') { + std::stringstream ss; + ss << "Try to construct resnum based alignments. Insertion codes "; + ss << "are not allowed in this case. Got: "; + ss << ins_code << " in residue " << res; + ss << ". You may set mmcif_conform flag to False to write something "; + ss << "but be aware of the consequences..."; + throw ost::io::IOException(ss.str()); + } + if (num > num_mon_ids) { + ++n_beyond; + } else { + if(info.mon_ids[num-1] == "-") { + ++n_beyond; // we're basically filling an unknown gap... + } else if(info.mon_ids[num-1] == res.GetName()) { + ++n_matches; + } else { + return false; + } + } + } + if(n_matches == 0) { + return false; + } else { + return n_beyond / n_matches <= beyond_frac; + } + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + void AddAsymResnum(const String& asym_chain_name, + const T& res_list, + ost::io::MMCifWriterEntity& info) { + + if(!info.is_poly) { + // no need for SEQRES alignment vodoo + AddAsym(asym_chain_name, info); + return; + } + + int max_resnum = info.mon_ids.size(); + std::vector<String> mon_ids; + std::vector<int> resnums; + + for(auto res: res_list) { + int num = res.GetNumber().GetNum(); + // assumes that MatchEntityResnum has already been run to check for + // resnum < 1 + mon_ids.push_back(res.GetName()); + resnums.push_back(num); + max_resnum = std::max(max_resnum, num); + } + + std::vector<String> aln_mon_ids(max_resnum, "-"); + for(size_t i = 0; i < mon_ids.size(); ++i) { + aln_mon_ids[resnums[i]-1] = mon_ids[i]; + } + + if(max_resnum > static_cast<int>(info.mon_ids.size())) { + // This chain covers more residues towards C-terminus than any chain that + // is associated with this entity - expand to enforce equal size + int N = max_resnum - info.mon_ids.size(); + info.mon_ids.insert(info.mon_ids.end(), N, "-"); + info.seq_olcs.insert(info.seq_olcs.end(), N, "-"); + info.seq_can_olcs.insert(info.seq_can_olcs.end(), N, "-"); + for(size_t asym_idx = 0; asym_idx < info.asym_alns.size(); ++asym_idx) { + info.asym_alns[asym_idx].insert(info.asym_alns[asym_idx].end(), N, "-"); + } + } + + // Fill SEQRES infos newly covered by this asym chain + for(size_t i = 0; i < resnums.size(); ++i) { + if(info.mon_ids[resnums[i]-1] == "-") { + info.mon_ids[resnums[i]-1] = mon_ids[i]; + info.seq_olcs[resnums[i]-1] = MonIDToOLC(res_list[i].GetChemClass(), + mon_ids[i]); + char olc = res_list[i].GetOneLetterCode(); + if(olc < 'A' || olc > 'Z') { + info.seq_can_olcs[resnums[i]-1] = "X"; + } else { + info.seq_can_olcs[resnums[i]-1] = String(1, olc); + } + } + } + + // finalize + info.asym_ids.push_back(asym_chain_name); + info.asym_alns.push_back(aln_mon_ids); + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + int SetupEntity(const String& asym_chain_name, + const String& type, + const String& poly_type, + const String& branch_type, + const T& res_list, + bool resnum_alignment, + std::vector<ost::io::MMCifWriterEntity>& entity_infos) { + + bool is_poly = type == "polymer"; + + if(!is_poly && res_list.size() != 1 && type != "water" && type != "branched") { + std::stringstream ss; + ss << "Cannot setup entity with " << res_list.size() << " residues "; + ss << "but is of type: " << type; + throw ost::io::IOException(ss.str()); + } + + // check if entity is already there + for(size_t i = 0; i < entity_infos.size(); ++i) { + if(entity_infos[i].type == "water" && type == "water") { + AddAsym(asym_chain_name, entity_infos[i]); + return i; + } + if(entity_infos[i].type == type && + entity_infos[i].poly_type == poly_type && + entity_infos[i].branch_type == branch_type) { + if(is_poly && resnum_alignment) { + if(MatchEntityResnum(res_list, entity_infos[i])) { + AddAsymResnum(asym_chain_name, res_list, entity_infos[i]); + return i; + } + } else { + if(MatchEntity(res_list, entity_infos[i])) { + AddAsym(asym_chain_name, entity_infos[i]); + return i; + } + } + } + } + + // need to create new entity + std::vector<String> mon_ids; + std::vector<String> seq; + std::vector<String> seq_can; + + if(is_poly && resnum_alignment) { + int max_resnum = res_list.size(); + std::vector<String> res_mon_ids; + std::vector<int> resnums; + for(auto res: res_list) { + int num = res.GetNumber().GetNum(); + char ins_code = res.GetNumber().GetInsCode(); + if(num < 1) { + std::stringstream ss; + ss << "Try to construct mmCIF entity from residues using resnum "; + ss << "based alignments. Negative residue numbers are not allowed "; + ss << "in this case. Got: " << num << " in residue " << res; + ss << ". You may set mmcif_conform flag to False to write something "; + ss << "but be aware of the consequences..."; + throw ost::io::IOException(ss.str()); + } + if(ins_code != '\0') { + std::stringstream ss; + ss << "Try to construct mmCIF entity from residues using resnum "; + ss << "based alignments. Insertion codes are not allowed "; + ss << "in this case. Got: " << ins_code << " in residue " << res; + ss << ". You may set mmcif_conform flag to False to write something "; + ss << "but be aware of the consequences..."; + throw ost::io::IOException(ss.str()); + } + res_mon_ids.push_back(res.GetName()); + resnums.push_back(num); + max_resnum = std::max(max_resnum, num); + } + mon_ids.assign(max_resnum, "-"); + seq.assign(max_resnum, "-"); + seq_can.assign(max_resnum, "-"); + for(size_t i = 0; i < res_mon_ids.size(); ++i) { + mon_ids[resnums[i]-1] = res_mon_ids[i]; + seq[resnums[i]-1] = MonIDToOLC(res_list[i].GetChemClass(), + mon_ids[resnums[i]-1]); + char olc = res_list[i].GetOneLetterCode(); + if(olc < 'A' || olc > 'Z') { + seq_can[resnums[i]-1] = "X"; + } else { + seq_can[resnums[i]-1] = String(1, olc); + } + } + } else { + if(type == "water") { + mon_ids.push_back("HOH"); + seq.push_back("(HOH)"); + seq_can.push_back("?"); + } else { + for(auto res: res_list) { + mon_ids.push_back(res.GetName()); + seq.push_back(MonIDToOLC(res.GetChemClass(), res.GetName())); + char olc = res.GetOneLetterCode(); + if(olc < 'A' || olc > 'Z') { + seq_can.push_back("X"); + } else { + seq_can.push_back(String(1, olc)); + } + } + } + } + + int entity_idx = entity_infos.size(); + entity_infos.push_back(ost::io::MMCifWriterEntity()); + entity_infos.back().type = type; + entity_infos.back().poly_type = poly_type; + entity_infos.back().branch_type = branch_type; + entity_infos.back().mon_ids = mon_ids; + entity_infos.back().seq_olcs = seq; + entity_infos.back().seq_can_olcs = seq_can; + entity_infos.back().is_poly = is_poly; + + if(is_poly && resnum_alignment) { + AddAsymResnum(asym_chain_name, res_list, entity_infos.back()); + } else { + AddAsym(asym_chain_name, entity_infos.back()); + } + + return entity_idx; + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + int SetupEntity(const String& asym_chain_name, + const T& res_list, + bool resnum_alignment, + std::vector<ost::io::MMCifWriterEntity>& entity_infos) { + // use chem types in res_list to determine _entity.type and + // _entity_poly.type + String type = GuessEntityType(res_list); + bool is_poly = type == "polymer"; + String poly_type = ""; + if(is_poly) { + poly_type = GuessEntityPolyType(res_list); + } + bool is_branched = type == "branched"; + String branch_type = ""; + if(is_branched) { + branch_type = GuessEntityBranchType(res_list); + } + return SetupEntity(asym_chain_name, type, poly_type, branch_type, res_list, + resnum_alignment, entity_infos); + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + int SetupEntity(const String& asym_chain_name, + ost::mol::ChainType chain_type, + const T& res_list, + bool resnum_alignment, + std::vector<ost::io::MMCifWriterEntity>& entity_infos) { + // use chain_type info attached to chain to determine + // _entity.type and _entity_poly.type + String type = ost::mol::EntityTypeFromChainType(chain_type); + bool is_poly = type == "polymer"; + String poly_type = ""; + if(is_poly) { + poly_type = ost::mol::EntityPolyTypeFromChainType(chain_type); + } + bool is_branched = type == "branched"; + String branch_type = ""; + if(is_branched) { + branch_type = ost::mol::BranchedTypeFromChainType(chain_type); + } + return SetupEntity(asym_chain_name, type, poly_type, branch_type, res_list, + resnum_alignment, entity_infos); + } + + ost::io::StarWriterLoopPtr Setup_atom_type_ptr() { + ost::io::StarWriterLoopDesc desc("_atom_type"); + desc.Add("symbol"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_atom_site_ptr() { + ost::io::StarWriterLoopDesc desc("_atom_site"); + desc.Add("group_PDB"); + desc.Add("type_symbol"); + desc.Add("label_atom_id"); + desc.Add("label_comp_id"); + desc.Add("label_asym_id"); + desc.Add("label_entity_id"); + desc.Add("label_seq_id"); + desc.Add("label_alt_id"); + desc.Add("Cartn_x"); + desc.Add("Cartn_y"); + desc.Add("Cartn_z"); + desc.Add("occupancy"); + desc.Add("B_iso_or_equiv"); + desc.Add("auth_seq_id"); + desc.Add("auth_asym_id"); + desc.Add("id"); + desc.Add("pdbx_PDB_ins_code"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_pdbx_poly_seq_scheme_ptr() { + ost::io::StarWriterLoopDesc desc("_pdbx_poly_seq_scheme"); + desc.Add("asym_id"); + desc.Add("entity_id"); + desc.Add("mon_id"); + desc.Add("seq_id"); + desc.Add("pdb_strand_id"); + desc.Add("pdb_seq_num"); + desc.Add("pdb_ins_code"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_entity_ptr() { + ost::io::StarWriterLoopDesc desc("_entity"); + desc.Add("id"); + desc.Add("type"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_struct_asym_ptr() { + ost::io::StarWriterLoopDesc desc("_struct_asym"); + desc.Add("id"); + desc.Add("entity_id"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_entity_poly_ptr() { + ost::io::StarWriterLoopDesc desc("_entity_poly"); + desc.Add("entity_id"); + desc.Add("type"); + desc.Add("pdbx_seq_one_letter_code"); + desc.Add("pdbx_seq_one_letter_code_can"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_entity_poly_seq_ptr() { + ost::io::StarWriterLoopDesc desc("_entity_poly_seq"); + desc.Add("entity_id"); + desc.Add("mon_id"); + desc.Add("num"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_chem_comp_ptr() { + ost::io::StarWriterLoopDesc desc("_chem_comp"); + desc.Add("id"); + desc.Add("type"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + ost::io::StarWriterLoopPtr Setup_pdbx_entity_branch_ptr() { + ost::io::StarWriterLoopDesc desc("_pdbx_entity_branch"); + desc.Add("entity_id"); + desc.Add("type"); + ost::io::StarWriterLoopPtr sl(new ost::io::StarWriterLoop(desc)); + return sl; + } + + void Feed_atom_type(ost::io::StarWriterLoopPtr atom_type_ptr, + ost::io::StarWriterLoopPtr atom_site_ptr) { + // we're just extracting every type_symbol that we observed + // in atom_site (this is a bit of circular stupidity...) + std::set<String> symbols; + int desc_size = atom_site_ptr->GetDesc().GetSize(); + int type_symbol_idx = atom_site_ptr->GetDesc().GetIndex("type_symbol"); + int N = atom_site_ptr->GetN(); + const std::vector<ost::io::StarWriterValue>& data = atom_site_ptr->GetData(); + for(int i = 0; i < N; ++i) { + symbols.insert(data[i*desc_size + type_symbol_idx].GetValue()); + } + std::vector<ost::io::StarWriterValue> atom_type_data(1); + for(auto symbol: symbols) { + atom_type_data[0] = ost::io::StarWriterValue::FromString(symbol); + atom_type_ptr->AddData(atom_type_data); + } + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + void Feed_pdbx_poly_seq_scheme(ost::io::StarWriterLoopPtr pdbx_poly_seq_scheme_ptr, + const String& label_asym_id, + int label_entity_id, + ost::io::MMCifWriterEntity& entity_info, + const T& res_list) { + + std::vector<ost::io::StarWriterValue> data(7); + // processing chain by chain, label_asym_id and label_entity_id are constant + data[0] = ost::io::StarWriterValue::FromString(label_asym_id); + data[1] = ost::io::StarWriterValue::FromInt(label_entity_id); + + int asym_idx = entity_info.GetAsymIdx(label_asym_id); + const std::vector<String>& aln = entity_info.asym_alns[asym_idx]; + int label_seq_id = 0; // 0-based index + + for(auto res: res_list) { + String res_name = res.GetName(); + while(aln[label_seq_id] == "-") { + ++label_seq_id; + } + + data[2] = ost::io::StarWriterValue::FromString(res_name); + data[3] = ost::io::StarWriterValue::FromInt(label_seq_id + 1); + + // the remaining data items honor String properties if set: + // pdb_auth_chain_name, pdb_auth_resnum and pdb_auth_ins_code + + if(res.GetChain().HasProp("pdb_auth_chain_name")) { + data[4] = + ost::io::StarWriterValue::FromString(res.GetChain().GetStringProp("pdb_auth_chain_name")); + } else { + data[4] = ost::io::StarWriterValue::FromString(res.GetChain().GetName()); + } + + if(res.HasProp("pdb_auth_resnum")) { + // this feels so wrong that this is stored as a string property... + data[5] = ost::io::StarWriterValue::FromString(res.GetStringProp("pdb_auth_resnum")); + } else { + data[5] = ost::io::StarWriterValue::FromInt(res.GetNumber().GetNum()); + } + + if(res.HasProp("pdb_auth_ins_code")) { + data[6] = ost::io::StarWriterValue::FromString(res.GetStringProp("pdb_auth_ins_code")); + } else { + char ins_code = res.GetNumber().GetInsCode(); + if(ins_code == '\0') { + data[6] = ost::io::StarWriterValue::FromString(""); + } else { + data[6] = ost::io::StarWriterValue::FromString(String(1, ins_code)); + } + } + pdbx_poly_seq_scheme_ptr->AddData(data); + label_seq_id += 1; + } + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + void Feed_atom_site(ost::io::StarWriterLoopPtr atom_site_ptr, + const String& label_asym_id, + int label_entity_id, + const ost::io::MMCifWriterEntity& entity_info, + const T& res_list) { + + int asym_idx = entity_info.GetAsymIdx(label_asym_id); + const std::vector<String>& aln = entity_info.asym_alns[asym_idx]; + int label_seq_id = 0; // 0-based index + std::vector<ost::io::StarWriterValue> at_data(17); + + for(auto res: res_list) { + String comp_id = res.GetName(); + + auto at_list = res.GetAtomList(); + String auth_asym_id = res.GetChain().GetName(); + if(res.HasProp("pdb_auth_chain_name")) { + auth_asym_id = res.GetStringProp("pdb_auth_chain_name"); + } + String auth_seq_id = res.GetNumber().AsString(); + if(res.HasProp("pdb_auth_resnum")) { + auth_seq_id = res.GetStringProp("pdb_auth_resnum"); + } + String ins_code = ""; + if(res.HasProp("pdb_auth_ins_code")) { + ins_code = res.GetStringProp("pdb_auth_ins_code"); + } + + if(entity_info.is_poly) { + while(aln[label_seq_id] == "-") { + ++label_seq_id; + } + } + + for(auto at: at_list) { + // group_PDB + if(at.IsHetAtom()) { + at_data[0] = ost::io::StarWriterValue::FromString("HETATM"); + } else { + at_data[0] = ost::io::StarWriterValue::FromString("ATOM"); + } + // type_symbol + at_data[1] = ost::io::StarWriterValue::FromString(at.GetElement()); + // label_atom_id + at_data[2] = ost::io::StarWriterValue::FromString(at.GetName()); + // label_comp_id + at_data[3] = ost::io::StarWriterValue::FromString(comp_id); + // label_asym_id + at_data[4] = ost::io::StarWriterValue::FromString(label_asym_id); + // label_entity_id + at_data[5] = ost::io::StarWriterValue::FromInt(label_entity_id); + // label_seq_id + if(entity_info.is_poly) { + at_data[6] = ost::io::StarWriterValue::FromInt(label_seq_id+1); + } else { + at_data[6] = ost::io::StarWriterValue::FromString("."); + } + // label_alt_id + at_data[7] = ost::io::StarWriterValue::FromString("."); + // Cartn_x + at_data[8] = ost::io::StarWriterValue::FromFloat(at.GetPos().GetX(), 3); + // Cartn_y + at_data[9] = ost::io::StarWriterValue::FromFloat(at.GetPos().GetY(), 3); + // Cartn_z + at_data[10] = ost::io::StarWriterValue::FromFloat(at.GetPos().GetZ(), 3); + // occupancy + at_data[11] = ost::io::StarWriterValue::FromFloat(at.GetOccupancy(), 2); + // B_iso_or_equiv + at_data[12] = ost::io::StarWriterValue::FromFloat(at.GetBFactor(), 2); + // auth_seq_id + at_data[13] = ost::io::StarWriterValue::FromString(auth_seq_id); + // auth_asym_id + at_data[14] = ost::io::StarWriterValue::FromString(auth_asym_id); + // id + at_data[15] = ost::io::StarWriterValue::FromInt(atom_site_ptr->GetN()); + // pdbx_PDB_ins_code + at_data[16] = ost::io::StarWriterValue::FromString(ins_code); + atom_site_ptr->AddData(at_data); + } + ++label_seq_id; + } + } + + void Feed_entity(ost::io::StarWriterLoopPtr entity_ptr, + const std::vector<ost::io::MMCifWriterEntity>& entity_info) { + std::vector<ost::io::StarWriterValue> ent_data(2); + for(size_t entity_idx = 0; entity_idx < entity_info.size(); ++entity_idx) { + ent_data[0] = ost::io::StarWriterValue::FromInt(entity_idx); + ent_data[1] = ost::io::StarWriterValue::FromString(entity_info[entity_idx].type); + entity_ptr->AddData(ent_data); + } + } + + void Feed_struct_asym(ost::io::StarWriterLoopPtr struct_asym_ptr, + const std::vector<ost::io::MMCifWriterEntity>& entity_info) { + std::vector<ost::io::StarWriterValue> asym_data(2); + for(size_t entity_idx = 0; entity_idx < entity_info.size(); ++entity_idx) { + for(auto asym_id : entity_info[entity_idx].asym_ids) { + asym_data[0] = ost::io::StarWriterValue::FromString(asym_id); + asym_data[1] = ost::io::StarWriterValue::FromInt(entity_idx); + struct_asym_ptr->AddData(asym_data); + } + } + } + + void Feed_entity_poly_seq(ost::io::StarWriterLoopPtr entity_poly_seq_ptr, + const std::vector<ost::io::MMCifWriterEntity>& entity_info) { + std::vector<ost::io::StarWriterValue> entity_poly_seq_data(3); + for(size_t entity_idx = 0; entity_idx < entity_info.size(); ++entity_idx) { + if(entity_info[entity_idx].is_poly) { + const std::vector<String>& mon_ids = entity_info[entity_idx].mon_ids; + for(size_t mon_idx = 0; mon_idx < mon_ids.size(); ++mon_idx) { + entity_poly_seq_data[0] = ost::io::StarWriterValue::FromInt(entity_idx); + entity_poly_seq_data[1] = ost::io::StarWriterValue::FromString(mon_ids[mon_idx]); + entity_poly_seq_data[2] = ost::io::StarWriterValue::FromInt(mon_idx+1); + entity_poly_seq_ptr->AddData(entity_poly_seq_data); + } + } + } + } + + void Feed_entity_poly(ost::io::StarWriterLoopPtr entity_poly_ptr, + const std::vector<ost::io::MMCifWriterEntity>& entity_info) { + std::vector<ost::io::StarWriterValue> entity_poly_data(4); + for(size_t entity_idx = 0; entity_idx < entity_info.size(); ++entity_idx) { + if(entity_info[entity_idx].is_poly) { + entity_poly_data[0] = ost::io::StarWriterValue::FromInt(entity_idx); + entity_poly_data[1] = ost::io::StarWriterValue::FromString(entity_info[entity_idx].poly_type); + std::stringstream seq; + std::stringstream seq_can; + for(size_t idx = 0; idx < entity_info[entity_idx].mon_ids.size(); ++idx) { + if(entity_info[entity_idx].seq_olcs[idx] == "-") { + // X IS PROBABLY NOT THE RIGHT THING FOR + // pdbx_seq_one_letter_code + seq << "X"; + } else { + seq << entity_info[entity_idx].seq_olcs[idx]; + } + if(entity_info[entity_idx].seq_can_olcs[idx] == "-") { + seq_can << "X"; + } else { + seq_can << entity_info[entity_idx].seq_can_olcs[idx]; + } + } + entity_poly_data[2] = ost::io::StarWriterValue::FromString(seq.str()); + entity_poly_data[3] = ost::io::StarWriterValue::FromString(seq_can.str()); + entity_poly_ptr->AddData(entity_poly_data); + } + } + } + + void Feed_chem_comp(ost::io::StarWriterLoopPtr chem_comp_ptr, + const std::map<String, ost::io::MMCifWriterComp>& comp_infos) { + std::vector<ost::io::StarWriterValue> comp_data(2); + for(auto it = comp_infos.begin(); it != comp_infos.end(); ++it) { + comp_data[0] = ost::io::StarWriterValue::FromString(it->first); + comp_data[1] = ost::io::StarWriterValue::FromString(it->second.type); + chem_comp_ptr->AddData(comp_data); + } + } + + void Feed_pdbx_entity_branch(ost::io::StarWriterLoopPtr pdbx_entity_branch_ptr, + const std::vector<ost::io::MMCifWriterEntity>& entity_infos) { + std::vector<ost::io::StarWriterValue> branch_data(2); + for(size_t i = 0; i < entity_infos.size(); ++i) { + if(entity_infos[i].type == "branched") { + branch_data[0] = ost::io::StarWriterValue::FromInt(i); + branch_data[1] = ost::io::StarWriterValue::FromString(entity_infos[i].branch_type); + pdbx_entity_branch_ptr->AddData(branch_data); + } + } + } + + // template to allow ost::mol::ResidueHandleList and ost::mol::ResidueViewList + template<class T> + void ProcessEntmmCIFify(const std::vector<T>& res_lists, + std::map<String, ost::io::MMCifWriterComp>& comp_infos, + std::vector<ost::io::MMCifWriterEntity>& entity_info, + ost::io::StarWriterLoopPtr atom_site, + ost::io::StarWriterLoopPtr pdbx_poly_seq_scheme) { + + ChainNameGenerator chain_name_gen; + + for(auto res_list: res_lists) { + + SetupChemComp(res_list, comp_infos); + + T L_chain; + T D_chain; + T P_chain; + T R_chain; + T S_chain; + T Z_chain; + T W_chain; + + // first scan only concerning peptides... + // Avoid mix of both in same chain: L-peptide linking, D-peptide linking + // But we still want to know that in advance as we assign non chiral + // peptides to either of those + bool has_l_peptide_linking = false; + bool has_d_peptide_linking = false; + for(auto res: res_list) { + if(res.GetChemClass() == ost::mol::ChemClass::D_PEPTIDE_LINKING) { + if(has_l_peptide_linking) { + throw ost::io::IOException("Cannot write mmCIF when same chain " + "contains D- and L-peptides"); + } + has_d_peptide_linking = true; + } + if(res.GetChemClass() == ost::mol::ChemClass::L_PEPTIDE_LINKING) { + if(has_d_peptide_linking) { + throw ost::io::IOException("Cannot write mmCIF when same chain " + "contains D- and L-peptides"); + } + has_l_peptide_linking = true; + } + } + + for(auto res: res_list) { + if(res.GetChemClass().IsPeptideLinking()) { + if(has_l_peptide_linking) { + L_chain.push_back(res); + } else if(has_d_peptide_linking) { + D_chain.push_back(res); + } else { + P_chain.push_back(res); + } + } else if(res.GetChemClass() == ost::mol::ChemClass::RNA_LINKING) { + R_chain.push_back(res); + } else if(res.GetChemClass() == ost::mol::ChemClass::DNA_LINKING) { + S_chain.push_back(res); + } else if(res.GetChemClass() == ost::mol::ChemClass::L_SACCHARIDE) { + Z_chain.push_back(res); + } else if(res.GetChemClass() == ost::mol::ChemClass::D_SACCHARIDE) { + Z_chain.push_back(res); + } else if(res.GetChemClass() == ost::mol::ChemClass::SACCHARIDE) { + Z_chain.push_back(res); + } else if(res.GetChemClass() == ost::mol::ChemClass::WATER) { + W_chain.push_back(res); + } else if(res.GetChemClass() == ost::mol::ChemClass::NON_POLYMER || + res.GetChemClass() == ost::mol::ChemClass::UNKNOWN) { + // already process non-poly and unknown + T tmp; + tmp.push_back(res); + String chain_name = chain_name_gen.Get(); + int entity_id = SetupEntity(chain_name, + tmp, + false, + entity_info); + Feed_atom_site(atom_site, chain_name, entity_id, entity_info[entity_id], + tmp); + } else { + // this should not happen... + std::stringstream ss; + ss << "Unsupported chem class (" << res.GetChemClass(); + ss << ") for residue "<< res; + throw ost::io::IOException(ss.str()); + } + } + + // process poly chains + T* poly_chains[5] = {&L_chain, &D_chain, &P_chain, &R_chain, &S_chain}; + for(int i = 0; i < 5; ++i) { + if(!poly_chains[i]->empty()) { + String chain_name = chain_name_gen.Get(); + int entity_id = SetupEntity(chain_name, + *poly_chains[i], + false, + entity_info); + Feed_atom_site(atom_site, chain_name, entity_id, entity_info[entity_id], + *poly_chains[i]); + // still check whether we're dealing with poly here, we could have a + // lonely amino acid that ends up as non-poly and doesn't need + // pdbx_poly_seq_scheme + if(entity_info[entity_id].is_poly) { + Feed_pdbx_poly_seq_scheme(pdbx_poly_seq_scheme, chain_name, + entity_id, entity_info[entity_id], *poly_chains[i]); + } + } + } + + // process water chain + if(!W_chain.empty()) { + String chain_name = chain_name_gen.Get(); + int entity_id = SetupEntity(chain_name, + W_chain, + false, + entity_info); + Feed_atom_site(atom_site, chain_name, entity_id, entity_info[entity_id], + W_chain); + } + // process sugar chain + if(!Z_chain.empty()) { + String chain_name = chain_name_gen.Get(); + int entity_id = SetupEntity(chain_name, + Z_chain, + false, + entity_info); + Feed_atom_site(atom_site, chain_name, entity_id, entity_info[entity_id], + Z_chain); + } + } + } + + void ProcessEntmmCIFify(const ost::mol::EntityHandle& ent, + std::map<String, ost::io::MMCifWriterComp>& comp_infos, + std::vector<ost::io::MMCifWriterEntity>& entity_info, + ost::io::StarWriterLoopPtr atom_site, + ost::io::StarWriterLoopPtr pdbx_poly_seq_scheme) { + std::vector<ost::mol::ResidueHandleList> res_lists; + ost::mol::ChainHandleList chain_list = ent.GetChainList(); + for(auto ch: chain_list) { + res_lists.push_back(ch.GetResidueList()); + } + ProcessEntmmCIFify(res_lists, comp_infos, entity_info, + atom_site, pdbx_poly_seq_scheme); + } + + void ProcessEntmmCIFify(const ost::mol::EntityView& ent, + std::map<String, ost::io::MMCifWriterComp>& comp_infos, + std::vector<ost::io::MMCifWriterEntity>& entity_info, + ost::io::StarWriterLoopPtr atom_site, + ost::io::StarWriterLoopPtr pdbx_poly_seq_scheme) { + std::vector<ost::mol::ResidueViewList> res_lists; + ost::mol::ChainViewList chain_list = ent.GetChainList(); + for(auto ch: chain_list) { + res_lists.push_back(ch.GetResidueList()); + } + ProcessEntmmCIFify(res_lists, comp_infos, entity_info, + atom_site, pdbx_poly_seq_scheme); + } + + // template to allow ost::mol::EntityHandle and ost::mol::EntityView + template<class T> + void ProcessEnt(const T& ent, + bool mmcif_conform, + std::map<String, ost::io::MMCifWriterComp>& comp_infos, + std::vector<ost::io::MMCifWriterEntity>& entity_info, + ost::io::StarWriterLoopPtr atom_site, + ost::io::StarWriterLoopPtr pdbx_poly_seq_scheme) { + + if(mmcif_conform) { + auto chain_list = ent.GetChainList(); + for(auto ch: chain_list) { + auto res_list = ch.GetResidueList(); + SetupChemComp(res_list, comp_infos); + String chain_name = ch.GetName(); + int entity_id = SetupEntity(chain_name, + ch.GetType(), + res_list, + true, + entity_info); + Feed_atom_site(atom_site, chain_name, entity_id, entity_info[entity_id], + res_list); + if(entity_info[entity_id].is_poly) { + Feed_pdbx_poly_seq_scheme(pdbx_poly_seq_scheme, chain_name, + entity_id, entity_info[entity_id], res_list); + } + } + } else { + // delegate to more complex ProcessEntmmCIFify + ProcessEntmmCIFify(ent, comp_infos, entity_info, atom_site, + pdbx_poly_seq_scheme); + } + } + + void ProcessUnknowns(std::vector<ost::io::MMCifWriterEntity>& entity_infos, + std::map<String, ost::io::MMCifWriterComp>& comp_infos) { + + for(size_t entity_idx = 0; entity_idx < entity_infos.size(); ++entity_idx) { + if(entity_infos[entity_idx].is_poly) { + // scan for "-" in mon_ids + for(size_t mon_id_idx = 0; + mon_id_idx < entity_infos[entity_idx].mon_ids.size(); ++mon_id_idx) { + + if(entity_infos[entity_idx].mon_ids[mon_id_idx] == "-") { + + if(entity_infos[entity_idx].poly_type == "polypeptide(D)" || + entity_infos[entity_idx].poly_type == "polypeptide(L)" || + entity_infos[entity_idx].poly_type == "cyclic-pseudo-peptide" || + entity_infos[entity_idx].poly_type == "peptide nucleic acid") { + entity_infos[entity_idx].mon_ids[mon_id_idx] = "UNK"; + entity_infos[entity_idx].seq_olcs[mon_id_idx] = "(UNK)"; + entity_infos[entity_idx].seq_can_olcs[mon_id_idx] = "X"; + if(comp_infos.find("UNK") == comp_infos.end()) { + ost::io::MMCifWriterComp info; + info.type = "L-PEPTIDE LINKING"; + comp_infos["UNK"] = info; + } + } + + else if(entity_infos[entity_idx].poly_type == "polydeoxyribonucleotide") { + entity_infos[entity_idx].mon_ids[mon_id_idx] = "DN"; + entity_infos[entity_idx].seq_olcs[mon_id_idx] = "(DN)"; + entity_infos[entity_idx].seq_can_olcs[mon_id_idx] = "N"; + if(comp_infos.find("DN") == comp_infos.end()) { + ost::io::MMCifWriterComp info; + info.type = "DNA LINKING"; + comp_infos["DN"] = info; + } + } + + else if(entity_infos[entity_idx].poly_type == "polyribonucleotide" || + entity_infos[entity_idx].poly_type == "polydeoxyribonucleotide/polyribonucleotide hybrid") { + entity_infos[entity_idx].mon_ids[mon_id_idx] = "N"; + entity_infos[entity_idx].seq_olcs[mon_id_idx] = "N"; + entity_infos[entity_idx].seq_can_olcs[mon_id_idx] = "N"; + if(comp_infos.find("N") == comp_infos.end()) { + ost::io::MMCifWriterComp info; + info.type = "RNA LINKING"; + comp_infos["N"] = info; + } + } + + else { + std::stringstream ss; + ss << "Gaps are not supported for polymer chains of type "; + ss << entity_infos[entity_idx].poly_type; + throw ost::io::IOException(ss.str()); + } + } + } + } + } + } + +} // ns + +namespace ost { namespace io { + +int MMCifWriterEntity::GetAsymIdx(const String& asym_id) const { + for(size_t i = 0; i < asym_ids.size(); ++i) { + if(asym_ids[i] == asym_id) { + return i; + } + } + std::stringstream ss; + ss << "Tried to find asym id " << asym_id << "in entity that has only "; + ss << "the following asym ids: "; + for(auto i: asym_ids) { + ss << i << ", "; + } + String err = ss.str(); + err = err.substr(0, err.size()-2); // remove last ", " + throw ost::io::IOException(err); +} + +void MMCifWriter::SetStructure(const ost::mol::EntityHandle& ent, + bool mmcif_conform) { + + this->Setup(); + ProcessEnt(ent, mmcif_conform, comp_info_, entity_info_, atom_site_, + pdbx_poly_seq_scheme_); + this->Finalize(); +} + +void MMCifWriter::SetStructure(const ost::mol::EntityView& ent, + bool mmcif_conform) { + + this->Setup(); + ProcessEnt(ent, mmcif_conform, comp_info_, entity_info_, atom_site_, + pdbx_poly_seq_scheme_); + this->Finalize(); +} + +void MMCifWriter::Setup() { + if(structure_set_) { + throw ost::io::IOException("SetStructure can be called only once on a " + "given MMCifWriter instance"); + } + + atom_type_ = Setup_atom_type_ptr(); + atom_site_ = Setup_atom_site_ptr(); + pdbx_poly_seq_scheme_ = Setup_pdbx_poly_seq_scheme_ptr(); + entity_ = Setup_entity_ptr(); + struct_asym_ = Setup_struct_asym_ptr(); + entity_poly_ = Setup_entity_poly_ptr(); + entity_poly_seq_ = Setup_entity_poly_seq_ptr(); + chem_comp_ = Setup_chem_comp_ptr(); + pdbx_entity_branch_ = Setup_pdbx_entity_branch_ptr(); +} + +void MMCifWriter::Finalize() { + // depending on the strategy (mmcif_conform), there might be gaps in the + // entities mon_ids/ seq_olcs/ seq_can_olcs + // The following function adds valid stuff depending on chain type + // (e.g. UNK if we're having a peptide linking chain and then adds + // that UNK directly to comp_info) + ProcessUnknowns(entity_info_, comp_info_); + + Feed_entity(entity_, entity_info_); + Feed_struct_asym(struct_asym_, entity_info_); + Feed_entity_poly(entity_poly_, entity_info_); + Feed_entity_poly_seq(entity_poly_seq_, entity_info_); + Feed_chem_comp(chem_comp_, comp_info_); + Feed_atom_type(atom_type_, atom_site_); + Feed_pdbx_entity_branch(pdbx_entity_branch_, entity_info_); + + // finalize + this->Push(chem_comp_); + this->Push(entity_); + this->Push(struct_asym_); + this->Push(entity_poly_); + this->Push(entity_poly_seq_); + this->Push(pdbx_poly_seq_scheme_); + this->Push(atom_type_); + this->Push(pdbx_entity_branch_); + this->Push(atom_site_); + + structure_set_ = true; +} + +}} // ns diff --git a/modules/io/src/mol/mmcif_writer.hh b/modules/io/src/mol/mmcif_writer.hh new file mode 100644 index 0000000000000000000000000000000000000000..e0ce06eaa0defa8412e7839e15727c7a5e876c99 --- /dev/null +++ b/modules/io/src/mol/mmcif_writer.hh @@ -0,0 +1,110 @@ +//------------------------------------------------------------------------------ +// This file is part of the OpenStructure project <www.openstructure.org> +// +// Copyright (C) 2008-2023 by the OpenStructure authors +// +// This library is free software; you can redistribute it and/or modify it under +// the terms of the GNU Lesser General Public License as published by the Free +// Software Foundation; either version 3.0 of the License, or (at your option) +// any later version. +// This library is distributed in the hope that it will be useful, but WITHOUT +// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +// FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more +// details. +// +// You should have received a copy of the GNU Lesser General Public License +// along with this library; if not, write to the Free Software Foundation, Inc., +// 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +//------------------------------------------------------------------------------ +#ifndef OST_IO_MMCIF_WRITER_HH +#define OST_IO_MMCIF_WRITER_HH + +#include <fstream> + +#include <ost/mol/entity_handle.hh> + +#include <ost/io/mol/mmcif_info.hh> +#include <ost/io/mol/io_profile.hh> +#include <ost/io/mol/star_writer.hh> + +namespace ost { namespace io { + + +struct MMCifWriterEntity { + + int GetAsymIdx(const String& asym_id) const; + + // _entity.type + String type; + + // _entity_poly.type + String poly_type; + + // __pdbx_entity_branch.type + String branch_type; + + // Names of chains in AU that are assigned to this entity + std::vector<String> asym_ids; + + // in principle type == "polymer" + bool is_poly; + + // SEQRES... kind of... internally we're not working on one letter codes + // etc. but on full compound names. Only one element if is_poly is false. + std::vector<String> mon_ids; + + // The respective strings for pdbx_seq_one_letter_code + // Irrelevant if is_poly is false. + std::vector<String> seq_olcs; + + // same for pdbx_seq_one_letter_code_can + // Irrelevant if is_poly is false. + std::vector<String> seq_can_olcs; + + // One alignment to mon_ids for each element in asym_ids, i.e. SEQRES-ATOMSEQ + // alignment. Contains "-" for residues that are missing in ATOMSEQ. + // irrelevant if is_poly is false. The assumption is that aligned residues + // exactly match with the respective position in mon_ids. + std::vector<std::vector<String> > asym_alns; +}; + + +struct MMCifWriterComp { + String type; +}; + + +class DLLEXPORT_OST_IO MMCifWriter : public StarWriter { +public: + + MMCifWriter(): structure_set_(false) { } + + virtual ~MMCifWriter() { } + + void SetStructure(const ost::mol::EntityHandle& ent, bool mmcif_conform=true); + + void SetStructure(const ost::mol::EntityView& ent, bool mmcif_conform=true); + +private: + + void Setup(); + + void Finalize(); + + std::vector<MMCifWriterEntity> entity_info_; + std::map<String, MMCifWriterComp> comp_info_; + StarWriterLoopPtr atom_type_; + StarWriterLoopPtr atom_site_; + StarWriterLoopPtr pdbx_poly_seq_scheme_; + StarWriterLoopPtr entity_; + StarWriterLoopPtr struct_asym_; + StarWriterLoopPtr entity_poly_; + StarWriterLoopPtr entity_poly_seq_; + StarWriterLoopPtr chem_comp_; + StarWriterLoopPtr pdbx_entity_branch_; + bool structure_set_; +}; + +}} // ns + +#endif diff --git a/modules/io/src/mol/star_writer.cc b/modules/io/src/mol/star_writer.cc new file mode 100644 index 0000000000000000000000000000000000000000..4b64bd370f4deb824bd72b7101a250472ca9c5b2 --- /dev/null +++ b/modules/io/src/mol/star_writer.cc @@ -0,0 +1,61 @@ +//------------------------------------------------------------------------------ +// This file is part of the OpenStructure project <www.openstructure.org> +// +// Copyright (C) 2008-2023 by the OpenStructure authors +// +// This library is free software; you can redistribute it and/or modify it under +// the terms of the GNU Lesser General Public License as published by the Free +// Software Foundation; either version 3.0 of the License, or (at your option) +// any later version. +// This library is distributed in the hope that it will be useful, but WITHOUT +// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +// FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more +// details. +// +// You should have received a copy of the GNU Lesser General Public License +// along with this library; if not, write to the Free Software Foundation, Inc., +// 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +//------------------------------------------------------------------------------ + +#include <boost/iostreams/filter/gzip.hpp> +#include <boost/algorithm/string/predicate.hpp> +#include <boost/filesystem.hpp> + +#include <ost/io/mol/star_writer.hh> + +namespace ost{ namespace io{ + +void StarWriter::Write(const String& data_name, std::ostream& stream) { + if(!stream) { + std::stringstream ss; + ss << "Cannot open stream: [Errno " << errno << "] " + << strerror(errno) << std::endl; + throw IOException(ss.str()); + } + // write data header + stream << "data_" << data_name << std::endl; + // write StarWriterObjects + for(auto star_obj : categories_to_write_) { + star_obj->ToStream(stream); + stream << String("#") << std::endl; + } +} + + +void StarWriter::Write(const String& data_name, const String& filename) { + std::ofstream fstream(filename.c_str()); + if (!fstream) { + std::stringstream ss; + ss << "Cannot open " << filename << ": [Errno " << errno << "] " + << strerror(errno) << std::endl; + throw IOException(ss.str()); + } + boost::iostreams::filtering_stream<boost::iostreams::output> stream; + if (boost::iequals(".gz", boost::filesystem::extension(filename))) { + stream.push(boost::iostreams::gzip_compressor()); + } + stream.push(fstream); + this->Write(data_name, stream); +} + +}} // ns diff --git a/modules/io/src/mol/star_writer.hh b/modules/io/src/mol/star_writer.hh new file mode 100644 index 0000000000000000000000000000000000000000..49728fcb77d93a2e183ba0667556ceca33ff1470 --- /dev/null +++ b/modules/io/src/mol/star_writer.hh @@ -0,0 +1,246 @@ +//------------------------------------------------------------------------------ +// This file is part of the OpenStructure project <www.openstructure.org> +// +// Copyright (C) 2008-2023 by the OpenStructure authors +// +// This library is free software; you can redistribute it and/or modify it under +// the terms of the GNU Lesser General Public License as published by the Free +// Software Foundation; either version 3.0 of the License, or (at your option) +// any later version. +// This library is distributed in the hope that it will be useful, but WITHOUT +// ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS +// FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more +// details. +// +// You should have received a copy of the GNU Lesser General Public License +// along with this library; if not, write to the Free Software Foundation, Inc., +// 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +//------------------------------------------------------------------------------ +#ifndef OST_IO_STAR_WRITER_HH +#define OST_IO_STAR_WRITER_HH + +#include <map> +#include <fstream> +#include <boost/iostreams/filtering_stream.hpp> +#include <boost/iostreams/filter/gzip.hpp> +#include <boost/shared_ptr.hpp> +#include <ost/string_ref.hh> +#include <ost/io/io_exception.hh> + +namespace{ + // float to string with specified number of decimals + void fts(Real f, int decimals, String& s) { + char data[20]; + size_t len; + switch(decimals){ + case 0: + len = std::snprintf(data, sizeof(data), "%.0f", f); + break; + case 1: + len = std::snprintf(data, sizeof(data), "%.1f", f); + break; + case 2: + len = std::snprintf(data, sizeof(data), "%.2f", f); + break; + case 3: + len = std::snprintf(data, sizeof(data), "%.3f", f); + break; + case 4: + len = std::snprintf(data, sizeof(data), "%.4f", f); + break; + case 5: + len = std::snprintf(data, sizeof(data), "%.5f", f); + break; + case 6: + len = std::snprintf(data, sizeof(data), "%.6f", f); + break; + default: + throw ost::io::IOException("Max decimals in float conversion: 6"); + } + + if(len < 0 || len > 20) { + throw ost::io::IOException("float conversion failed"); + } + s.assign(data, len); + } +} + +namespace ost { namespace io { + +class StarWriterObject; +class StarWriterValue; +class StarWriterDataItem; +class StarWriterLoopDesc; +class StarWriterLoop; +typedef boost::shared_ptr<StarWriterObject> StarWriterObjectPtr; +typedef boost::shared_ptr<StarWriterValue> StarWriterValuePtr; +typedef boost::shared_ptr<StarWriterDataItem> StarWriterDataItemPtr; +typedef boost::shared_ptr<StarWriterLoopDesc> StarWriterLoopDescPtr; +typedef boost::shared_ptr<StarWriterLoop> StarWriterLoopPtr; + + +class DLLEXPORT_OST_IO StarWriterObject { +public: + virtual ~StarWriterObject() { } + virtual void ToStream(std::ostream& s) = 0; +}; + + +class DLLEXPORT_OST_IO StarWriterValue{ +public: + + StarWriterValue() { } + + static StarWriterValue FromInt(int int_value) { + StarWriterValue value; + value.value_ = std::to_string(int_value); + return value; + } + static StarWriterValue FromFloat(Real float_value, int decimals) { + StarWriterValue value; + fts(float_value, decimals, value.value_); + return value; + } + static StarWriterValue FromString(const String& string_value) { + StarWriterValue value; + // cases we still need to deal with: + // - special characters in strings (put in quotation marks) + // - long strings (semicolon based syntax) + // see https://mmcif.wwpdb.org/docs/tutorials/mechanics/pdbx-mmcif-syntax.html + bool has_space = false; + for(char c: string_value) { + if(isspace(c)) { + has_space = true; + break; + } + } + if(string_value == "") { + value.value_ = "."; + } else if(has_space) { + value.value_ = "'" + string_value + "'"; + } + else { + value.value_ = string_value; + } + return value; + } + const String& GetValue() const { return value_; } +private: +String value_; +}; + + +class DLLEXPORT_OST_IO StarWriterDataItem : public StarWriterObject { +public: + StarWriterDataItem(const String& category, const String& attribute, + const StarWriterValue& value): category_(category), + attribute_(attribute), + value_(value) { } + virtual void ToStream(std::ostream& s) { + s << category_ << '.' << attribute_ << ' ' << value_.GetValue() << std::endl; + } + const String& GetCategory() const { return category_; } + const String& GetAttribute() const { return attribute_; } + const StarWriterValue& GetValue() const { return value_; } +private: + String category_; + String attribute_; + StarWriterValue value_; +}; + + +class DLLEXPORT_OST_IO StarWriterLoopDesc : public StarWriterObject { +public: + StarWriterLoopDesc(const String& category): category_(category) { } + + int GetIndex(const String& attribute) const { + std::map<String, int>::const_iterator i=index_map_.find(attribute); + return i==index_map_.end() ? -1 : i->second; + } + + void Add(const String& attribute) { + index_map_.insert(std::make_pair(attribute, index_map_.size())); + } + + size_t GetSize() const { + return index_map_.size(); + } + + virtual void ToStream(std::ostream& s) { + std::vector<std::pair<int, String> > tmp; + for(auto it = index_map_.begin(); it != index_map_.end(); ++it) { + tmp.push_back(std::make_pair(it->second, it->first)); + } + std::sort(tmp.begin(), tmp.end()); + for(auto it = tmp.begin(); it != tmp.end(); ++it) { + s << category_ << "." << it->second << std::endl; + } + } + + const String& GetCategory() const { return category_; } +private: + String category_; + std::map<String, int> index_map_; +}; + + +class DLLEXPORT_OST_IO StarWriterLoop: public StarWriterObject { +public: + + StarWriterLoop(const StarWriterLoopDesc& desc): desc_(desc) { } + + const StarWriterLoopDesc& GetDesc() { return desc_; } + + void AddData(const std::vector<StarWriterValue>& data) { + if(data.size() != desc_.GetSize()) { + throw ost::io::IOException("Invalid data size when adding to StarLoop"); + } + data_.insert(data_.end(), data.begin(), data.end()); + } + + const std::vector<StarWriterValue>& GetData() { return data_; } + + int GetN() { + return data_.size() / desc_.GetSize(); + } + + virtual void ToStream(std::ostream& s) { + if(data_.empty()) { + return; // skip loop, including header + } + s << "loop_" << std::endl; + desc_.ToStream(s); + int desc_size = desc_.GetSize(); + for(size_t i = 0; i < data_.size(); ++i) { + s << data_[i].GetValue(); + if((i+1) % desc_size == 0) { + s << std::endl; + } else { + s << ' '; + } + } + } + +private: + StarWriterLoopDesc desc_; + std::vector<StarWriterValue> data_; +}; + + +class DLLEXPORT_OST_IO StarWriter { +public: + StarWriter() { } + virtual ~StarWriter() { } + + void Push(StarWriterObjectPtr obj) { categories_to_write_.push_back(obj); } + + void Write(const String& data_name, const String& filename); + void Write(const String& data_name, std::ostream& stream); + +private: + std::vector<StarWriterObjectPtr> categories_to_write_; +}; + +}} // ns + +#endif diff --git a/modules/mol/alg/doc/molalg.rst b/modules/mol/alg/doc/molalg.rst index 366b1cb83e8a861740075e6fec0e2e24479bc6b4..6cb57ea40ab42d659b60bf0c95f7cd9b108c24a4 100644 --- a/modules/mol/alg/doc/molalg.rst +++ b/modules/mol/alg/doc/molalg.rst @@ -1474,6 +1474,16 @@ from a :class:`ost.io.MMCifInfoBioUnit` or the derived is reserved for the original AU chain with identity transform (read: no transform) applied. If a certain AU chain only occurs with an actual transform applied, numbering starts at 2. + + .. warning:: + There is the (rare) possibility that a AU chain that has only identity + transform applied is not named 1.<au_cname>. + As of january 2024, there are 3 pdb entries (8qn6, 8x1h, 2c0x) where + the same AU chain with identity transform occurs several times in the same + biounit. This is likely an error in the respective mmCIF files as the + resulting chains sit on top of each other. OST just names the FIRST + occurence as 1.<au_cname>. + :param asu: The assymetric unit :type asu: :class:`ost.mol.EntityHandle` diff --git a/modules/mol/alg/src/biounit.cc b/modules/mol/alg/src/biounit.cc index 74efb52994ba41c3246641859b00128b85c79ba3..e5fd1644f680b884c84cfb582a19ff05f52b576d 100644 --- a/modules/mol/alg/src/biounit.cc +++ b/modules/mol/alg/src/biounit.cc @@ -265,7 +265,6 @@ ost::mol::EntityHandle CreateBU(const ost::mol::EntityHandle& asu, // process all transformations for(uint t_idx = 0; t_idx < transforms[chain_intvl].size(); ++t_idx) { const geom::Mat4& m = transforms[chain_intvl][t_idx]; - // check if m is identity matrix => no transformation applied bool is_identity = true; geom::Mat4 identity_matrix = geom::Mat4::Identity(); @@ -285,16 +284,18 @@ ost::mol::EntityHandle CreateBU(const ost::mol::EntityHandle& asu, String au_cname = au_chains[chain_intvl][c_idx]; std::stringstream bu_cname_ss; - if(is_identity) { - if(au_chain_copies.find(au_cname) != au_chain_copies.end()) { - std::stringstream err; - err<<"Try to insert copy of AU chain "<<au_cname<<" with identity "; - err<<"transform, i.e. copy the raw coordinates. This has already "; - err<<"been done for this AU chain and there can only be one."; - throw ost::Error(err.str()); - } + if(is_identity && au_chain_copies.find(au_cname) == au_chain_copies.end()) { bu_cname_ss << "1." << au_cname; // 1.<au_cname> reserved for AU chain // without transformation + // at least the first of it... + // as of January 2024, there were 3 + // entries (8qn6, 8x1h, 2c0x) where + // the identity transform is applied + // more than once on the same AU + // chain, effectively leading to + // chains sitting on top of each + // other... But hey, bullshit in, + // bullshit out au_chain_copies.insert(au_cname); } else { if(chain_counter.find(au_cname) == chain_counter.end()) { diff --git a/modules/mol/base/src/chain_type.cc b/modules/mol/base/src/chain_type.cc index 6694aa017ab0c7cf89ba6e8d5cce6d2b495940f2..aa0715fe58d3003755b42d6f5b9c9081943b601d 100644 --- a/modules/mol/base/src/chain_type.cc +++ b/modules/mol/base/src/chain_type.cc @@ -59,7 +59,9 @@ ChainType ChainTypeFromString(StringRef identifier) } else if (StringRef("oligosaccharide", 15) == identifier) { return CHAINTYPE_OLIGOSACCHARIDE; } else if (StringRef("other", 5) == identifier) { - return CHAINTYPE_UNKNOWN; + // According to the mmCIF dictionary, "other" only exists in + // _entity_poly.type. Therefore, "other" can only be a generic polymer. + return CHAINTYPE_POLY; } throw Error("Unrecognised chain type descriptor found: '" + @@ -76,6 +78,8 @@ String StringFromChainType(ChainType type) { // chain types as found in the entity category of a mmcif file if (CHAINTYPE_POLY == type) { + // "polymer" in _entity.type + // "other" in _entity_poly.type return "polymer"; } else if (CHAINTYPE_NON_POLY == type) { return "non-polymer"; @@ -104,8 +108,10 @@ String StringFromChainType(ChainType type) return "cyclic-pseudo-peptide"; } else if (CHAINTYPE_POLY_PEPTIDE_DN_RN == type) { return "peptide nucleic acid"; + // chain types as found in the pdbx_entity_branch category of a mmcif file } else if (CHAINTYPE_OLIGOSACCHARIDE == type) { return "oligosaccharide"; + // other... } else if (CHAINTYPE_UNKNOWN == type) { return "other"; } @@ -115,4 +121,63 @@ String StringFromChainType(ChainType type) throw Error(ss.str()); } +String EntityTypeFromChainType(ChainType type) { + switch(type) { + case ost::mol::CHAINTYPE_POLY: return "polymer"; + case ost::mol::CHAINTYPE_NON_POLY: return "non-polymer"; + case ost::mol::CHAINTYPE_WATER: return "water"; + case ost::mol::CHAINTYPE_POLY_PEPTIDE_D: return "polymer"; + case ost::mol::CHAINTYPE_POLY_PEPTIDE_L: return "polymer"; + case ost::mol::CHAINTYPE_POLY_DN: return "polymer"; + case ost::mol::CHAINTYPE_POLY_RN: return "polymer"; + case ost::mol::CHAINTYPE_POLY_SAC_D: return "polymer"; + case ost::mol::CHAINTYPE_POLY_SAC_L: return "polymer"; + case ost::mol::CHAINTYPE_POLY_DN_RN: return "polymer"; + case ost::mol::CHAINTYPE_MACROLIDE: return "macrolide"; + case ost::mol::CHAINTYPE_CYCLIC_PSEUDO_PEPTIDE: return "polymer"; + case ost::mol::CHAINTYPE_POLY_PEPTIDE_DN_RN: return "polymer"; + case ost::mol::CHAINTYPE_BRANCHED: return "branched"; + case ost::mol::CHAINTYPE_OLIGOSACCHARIDE: return "branched"; + default: { + std::stringstream ss; + ss <<"Unknown ChainType item found: '" << type << "'!"; + throw Error(ss.str()); + } + } + +} + +String EntityPolyTypeFromChainType(ChainType type) { + switch(type) { + case ost::mol::CHAINTYPE_POLY: return "other"; + case ost::mol::CHAINTYPE_POLY_PEPTIDE_D: return "polypeptide(D)"; + case ost::mol::CHAINTYPE_POLY_PEPTIDE_L: return "polypeptide(L)"; + case ost::mol::CHAINTYPE_POLY_DN: return "polydeoxyribonucleotide"; + case ost::mol::CHAINTYPE_POLY_RN: return "polyribonucleotide"; + case ost::mol::CHAINTYPE_POLY_SAC_D: return "other"; // older dictionaries have "polysaccharide(D)" + case ost::mol::CHAINTYPE_POLY_SAC_L: return "other"; // older dictionaries have "polysaccharide(L)" + case ost::mol::CHAINTYPE_POLY_DN_RN: return "polydeoxyribonucleotide/polyribonucleotide hybrid"; + case ost::mol::CHAINTYPE_CYCLIC_PSEUDO_PEPTIDE: return "cyclic-pseudo-peptide"; + case ost::mol::CHAINTYPE_POLY_PEPTIDE_DN_RN: return "peptide nucleic acid"; + default: { + std::stringstream ss; + ss << "Cannot return entity poly type from chain of type: '" << type << "'!"; + throw Error(ss.str()); + } + } +} + +String BranchedTypeFromChainType(ChainType type) { + switch(type) { + case ost::mol::CHAINTYPE_BRANCHED: return "oligosaccharide"; // the only one + case ost::mol::CHAINTYPE_OLIGOSACCHARIDE: return "oligosaccharide"; + default: { + std::stringstream ss; + ss << "Cannot return branched type from chain of type: '" << type << "'!"; + throw Error(ss.str()); + } + } + +} + }} //ns diff --git a/modules/mol/base/src/chain_type.hh b/modules/mol/base/src/chain_type.hh index 08ce4dcc6462a726a367eaf34708b789d684dcc0..6420f01434224e724c229e51e1841a54744f6282 100644 --- a/modules/mol/base/src/chain_type.hh +++ b/modules/mol/base/src/chain_type.hh @@ -28,6 +28,11 @@ namespace ost { namespace mol { /// \enum different kinds of chains +/// +/// Warning: this class mixes vocabulary from _entity.type and +// _entity_poly.type, which is more detailed. As a result it is not a 1:1 +// mapping and cannot be used to to read/write mmCIF entity types accurately. + typedef enum { CHAINTYPE_POLY, ///< polymer CHAINTYPE_NON_POLY, ///< non-polymer @@ -45,7 +50,8 @@ typedef enum { CHAINTYPE_CYCLIC_PSEUDO_PEPTIDE, ///< cyclic-pseudo-peptide CHAINTYPE_POLY_PEPTIDE_DN_RN, ///< peptide nucleic acid CHAINTYPE_BRANCHED, ///< carbohydrate - CHAINTYPE_OLIGOSACCHARIDE, ///< oligosaccharide (branched carbohydrate) + CHAINTYPE_OLIGOSACCHARIDE, ///< oligosaccharide (branched carbohydrate, + ///< i.e. _entity.type is strictly 'branched') CHAINTYPE_N_CHAINTYPES ///< no. of chain types } ChainType; @@ -73,6 +79,55 @@ ChainType DLLEXPORT_OST_MOL ChainTypeFromString(const String& identifier); /// unknown type String DLLEXPORT_OST_MOL StringFromChainType(ChainType type); +/// \brief Return _entity.type consistent with respective mmCIF vocabulary +/// (mmcif_pdbx_v50): +/// - branched +/// - macrolide +/// - non-polymer +/// - polymer +/// - water +/// +/// For consistency with older vocabularies, CHAINTYPE_POLY_SAC_D +/// and CHAINTYPE_POLY_SAC_L return "polymer" +/// +/// \param type ChainType to be translated +/// +/// \return String corresponding to the input, throws a ost::Error on +/// unknown type +String DLLEXPORT_OST_MOL EntityTypeFromChainType(ChainType type); + +/// \brief Return _entity_poly.type consistent with mmCIF dictionary +/// (mmcif_pdbx_v50): +/// - cyclic-pseudo-peptide +/// - other +/// - peptide nucleic acid +/// - polydeoxyribonucleotide +/// - polydeoxyribonucleotide/polyribonucleotide hybrid +/// - polypeptide(D) +/// - polypeptide(L) +/// - polyribonucleotide +/// +/// For consistency with older dictionaries, CHAINTYPE_POLY_SAC_D +/// and CHAINTYPE_POLY_SAC_L are still accepted but return "other". +/// Older dictionaries still had "polysaccharide(D)" and +/// "polysaccharide(L)"" +/// +/// \param type ChainType to be translated +/// +/// \return String corresponding to the input, throws a ost::Error on +/// unknown type or if it's not of _entity.type polymer +String DLLEXPORT_OST_MOL EntityPolyTypeFromChainType(ChainType type); + +/// \brief Return pdbx_entity_branch.type consistent with mmCIF dictionary +/// (mmcif_pdbx_v50): +/// - oligosaccharide +/// +/// \param type ChainType to be translated +/// +/// \return String corresponding to the input, throws a ost::Error on +/// unknown type or if it's not of _entity.type branched +String DLLEXPORT_OST_MOL BranchedTypeFromChainType(ChainType type); + }} //ns #endif diff --git a/modules/mol/base/src/chem_class.hh b/modules/mol/base/src/chem_class.hh index efd838679a63a145164805143c3b84bc3ef1bf0f..2146b300df44586b13dbe0d479ed4922b48ba9db 100644 --- a/modules/mol/base/src/chem_class.hh +++ b/modules/mol/base/src/chem_class.hh @@ -26,61 +26,60 @@ namespace ost { namespace mol { struct DLLEXPORT ChemClass { - const static char PEPTIDE_LINKING ='P'; - const static char D_PEPTIDE_LINKING ='D'; - const static char L_PEPTIDE_LINKING ='L'; - const static char RNA_LINKING ='R'; - const static char DNA_LINKING ='S'; - const static char NON_POLYMER ='N'; - const static char L_SACCHARIDE ='X'; - const static char D_SACCHARIDE ='Y'; - const static char SACCHARIDE ='Z'; - const static char WATER ='W'; - const static char UNKNOWN ='U'; + typedef enum { + PEPTIDE_LINKING ='P', + D_PEPTIDE_LINKING ='D', + L_PEPTIDE_LINKING ='L', + RNA_LINKING ='R', + DNA_LINKING ='S', + NON_POLYMER ='N', + L_SACCHARIDE ='X', + D_SACCHARIDE ='Y', + SACCHARIDE ='Z', + WATER ='W', + UNKNOWN ='U' + } Type; - // for backward compatibility to 1.1 and earlier - const static char PeptideLinking =PEPTIDE_LINKING; - const static char DPeptideLinking =D_PEPTIDE_LINKING; - const static char LPeptideLinking =L_PEPTIDE_LINKING; - const static char RNALinking =RNA_LINKING; - const static char DNALinking =DNA_LINKING; - const static char NonPolymer =NON_POLYMER; - const static char LSaccharide =L_SACCHARIDE; - const static char DSaccharide =D_SACCHARIDE; - const static char Saccharide =SACCHARIDE; - const static char Water =WATER; - const static char Unknown =UNKNOWN; - explicit ChemClass(char chem_class) - : chem_class_(chem_class) { - } + explicit ChemClass(Type chem_class): chem_class_(chem_class) { } + + explicit ChemClass(char type): chem_class_(Type(type)) { } + + ChemClass(): chem_class_(UNKNOWN) { } - ChemClass() - : chem_class_(UNKNOWN) { - } bool operator==(const ChemClass& cc) const { - return cc.chem_class_==chem_class_; + return cc.chem_class_ == chem_class_; } bool operator!=(const ChemClass& cc) const { - return !this->operator==(cc); + return !this->operator == (cc); } bool IsPeptideLinking() const { - return (chem_class_==ChemClass::PEPTIDE_LINKING || - chem_class_==ChemClass::D_PEPTIDE_LINKING || - chem_class_==ChemClass::L_PEPTIDE_LINKING); + return (chem_class_ == PEPTIDE_LINKING || + chem_class_ == D_PEPTIDE_LINKING || + chem_class_ == L_PEPTIDE_LINKING); } bool IsNucleotideLinking() const { - return (chem_class_==ChemClass::DNA_LINKING || - chem_class_==ChemClass::RNA_LINKING); + return (chem_class_ == DNA_LINKING || + chem_class_ == RNA_LINKING); } - bool IsWater() const { return chem_class_==ChemClass::WATER; } + bool IsWater() const { + return chem_class_ == WATER; + } + operator char() const { return chem_class_; } + + bool IsSaccharide() const { + return (chem_class_ == SACCHARIDE || + chem_class_ == L_SACCHARIDE || + chem_class_ == D_SACCHARIDE); + } + private: - char chem_class_; + Type chem_class_; }; }} // ns diff --git a/modules/mol/base/tests/test_chain.cc b/modules/mol/base/tests/test_chain.cc index f965fced69eb87d610baec27f4830e8d82202369..bbc563a8b9bc8601b657b8064852ebbfd18c9871 100644 --- a/modules/mol/base/tests/test_chain.cc +++ b/modules/mol/base/tests/test_chain.cc @@ -367,7 +367,7 @@ BOOST_AUTO_TEST_CASE(chain_type) BOOST_CHECK(ChainTypeFromString( "polydeoxyribonucleotide/polyribonucleotide hybrid") == CHAINTYPE_POLY_DN_RN); - BOOST_CHECK(ChainTypeFromString("other") == CHAINTYPE_UNKNOWN); + BOOST_CHECK(ChainTypeFromString("other") == CHAINTYPE_POLY); BOOST_CHECK(ChainTypeFromString("macrolide") == CHAINTYPE_MACROLIDE); BOOST_CHECK(ChainTypeFromString("cyclic-pseudo-peptide") == CHAINTYPE_CYCLIC_PSEUDO_PEPTIDE);