The idea of using molecular similarity to
search molecular databases has a long history [
1], and has been recently reviewed [
2].
Combining the best of the results from a wide variety of these
approaches (superpositional, non-superpositional, electrostatic and
chemical feature-based) gives a diverse set of potentially similar
ligands for subsequent analysis or assay.
A suitable searching method needs to possess
certain characteristics:
- It should be able to identify
compounds which are “biologically similar”; in other words compounds that have
an increased likelihood of sharing biological activity with the query molecule.
- It should identified compounds
that are chemically distinct from the query; compounds which are close analogues
of the query are much less interesting than compounds which possess a very
different chemical scaffold. This
suggests that the comparison should be based not on the similarity to the
moelcule’s skeleton of atoms and bonds but on
its external appearance – its shape, electrostatic potential and other
relevant fields.
- It must be very fast, to allow
the calculation to complete in a reasonable time even when many millions of
compounds are being considered.
The suite of Affinity methods collectively known
as RAMS (Rapid Assessment of Molecular Similarity) enable lightening fast
identification of compounds which are similar to a query moecule in terms of
molecular shape or a combination of shape and other molecular properties. It is well-known that these properties, in
particular shape, electrostatics and lipophilicity, are key determinants of
molecular recognition and it is similarity in these terms, rather than the
connection of atoms or functional groups, that detemines bio-isosterism. Whilst this has been appreciated for a long
time, these molecular features are complex objects to represent and compare
computationally. Historically, shape
matching has been performed by molecular superpositon algorithms that are
computationally intensive, which therefor limits their application to large
databases. The development of
alignmnet-free molecular similarity measures that can describe shape and
properties effectively has removed this bottleneck.
RAMS
RAMS represents a family of alignment-free, 3D, molecular similarity methods which execute near real-time ligand searching
based on combinations of shape, charge and various phyico-chemical
properties.
The key technology to enable the exceptionally fast searching that is possible with the RAMS methodologies is the calculation of a compact descriptor that captures the complexities of molecular shape. A key enhancement over previously
described non-superpositional methods is that CSR takes into account the
chirality of the molecules being compared, while retaining the speed
and efficiency of these methods. These differences are important because
interactions between proteins and small molecules are often chiral in
nature. Using CSR, similarly shaped compounds can be quickly identified
from within even the largest molecular databases. In addition, the
problematic requirement of aligning molecules for comparison is
circumvented, as the proposed distributions are independent of molecular
orientation. CSR has been demonstrated [3] to provide superior enrichment to previously described methods.
Building on this approach to molecular shape comparison, ElectroShape[4] incorprates electrostatic and
other physico-chemical properties of the atoms, so that these properties are included in the comparison, in addition to the
molecular shape and stereochemistry. Combining these properties
maximizes the discovery of relevant lead molecules within the top few
percent of structures screened, nearly doubling the enrichment ratio at
1% over previously published shape-based methods.
RAMS methods can search databases of millions of compounds (100s of millions of conformations) in seconds using commodity hardware.
Additionally, as yet unpublished methods are capable of even better performance in benchmarking studies.
Taken together, the RAMS approach is a powerful tool for lead identification.
COver
COver is Affinity’s proprietary superpositional search method, based on
original research in Professor Graham Richards’ group in Oxford. A typical use case would be to use COver, after a first pass with a fast search method,
such as ElectroShape, in its superpositional mode to align top hits on
the query molecule. This is especially useful where the query is a co-crystalized ligand structure. COver can also be used to assess quickly the steric
overlap between molecules, which is particularly useful for filtering
results from our de novo fragment-based ligand design software, LOx.
References
[1] A.C. Good, E.E. Hodgkin, and W.G. Richards (1992). "Similarity screening of molecular data sets." Journal of Computer-Aided Molecular Design, 6(5): 513-520.
[2] P.W. Finn and G.M. Morris (2013). "Shape-based similarity searching in chemical databases." Wiley Interdisciplinary Reviews: Computational Molecular Science, 3(3): 226–241.
[3] M.S. Armstrong, G.M. Morris, P.W. Finn, R. Sharma, and W.G. Richards (2009). "Molecular similarity including chirality", Journal of Molecular Graphics and Modelling, 28: 368-370.
[4] M.S.,
Armstrong, G.M., Morris, P.W., Finn, R. Sharma, L. Moretti, R.I.
Cooper, and W.G. Richards (2010). "ElectroShape: fast molecular
similarity calculations incorporating shape, chirality and
electrostatics." Journal of Computer-Aided Molecular Design 24(9): 789–801. Epub 2010 Jul 8.
[4] N. Stiefl
and K. Baumann (2003). "Mapping property distributions of molecular
surfaces: Algorithm and evaluation of a novel 3D quantitative
structure-activity relationship technique." Journal of Medicinal Chemistry, 46(8): 1390–1407.
[5] P. Willett, J.M. Barnard, and G.M. Downs (1998). "Chemical similarity searching".
Journal of Chemical Information and Computer Sciences,
38(6): 983–996.