Custom databases to unlock the potential of your chemistry investment

Scopius

The technology used to build large proprietary databases was initially developed to create InhibOx’s Scopius database.  The CSpace component of Scopius contains molecules that are commercially available, while the VSpace component has been generated via the application of reliable combinatorial chemistry reaction schemes. Mining Scopius thus identifies new ligands that are either immediately available commercially or which can be very easily prepared from commercial molecules.

In addition to the chemical structure, molecular properties and descriptors are stored for each compound, including shape, charge and various drug-likeness descriptors.  Filters can be used, for example to include only molecules in a given range of LogP, or to exclude molecular families already covered by published patents.

Building a proprietary Scopius-VSpace

The first step in the process is to construct the proprietary libraries which correspond to the in-house, and preferred literature, chemistry protocols.  For a typical protocol this will generate a family of molecues which share a common core with a number of “positions of variation” generated by the diversity of the available building blocks.  A literature example of this type of library is the construction of triazoles from acetylenes and azides.

Scopius Library

123-triazole

Library Synthesis

Reference

F. Himo, T. Lovell, R. Hilgraf, V. V. Rostovtsev, L. Noodleman, K. B. Sharpless, V. V. Fokin, J. Am. Chem. Soc., 2005, 127, 210-216.

 

This library has two positions of variation,  R and R’.  As these can be combinatorially combined, 100 reagents for each category of starting material would lead to a library of 10,000 products. 

Because of the combinatorial multiplication of products, protocols with more positions of variation produce much larger libraries.  For example, the boronic acid Mannich reaction (also known as the  Petasis reaction) uses three components and thus would produce 100 x 100 x 100 = 1,000,000 products if we assume 100 reactants for each component.  In fact, many more than 100 suitable reagents are available commercially so the potential library size is many millions.

Scopius Library

Petasis 3-component reaction

Library Synthesis



Reference

The boronic acid mannich reaction: A new method for the synthesis of geometrically pure allylamines.  Nicos A. Petasis and Irini Akritopoulou, Tetrahedron Letters, 1993, 34, 583-586

Many synthesis protocols will produce the “core plus R-groups” library as for the triazole library described above.  In other cases there will be no commom core, for example the synthesis of products with a range of ring sizes, or the common portion will be trivial, for example the amine exemplified in the Petasis reaction.  InhibOx library building is able to handle each of these cases flexibly and in an uncomplicated manner.

Library building with Affinity -  three simple steps.

1.      Define the reaction

The first step in building a compound library is to describe the synthetic transformation in a form that can be used computationally.  This is a straightforward task, in which the key elements of the reaction are specified and the corresponding atoms in the reactants and products are described.  Thus, for the example triazole library the correspondences are as shown below.  Each atom in the reactant that is carried forward into the product is given corresponding labels.  In this example, the azide nitrogens are labeled starting at 101 and the acetylene starting at 201.  These numbers are chosen for convenience, any unique numbering system is allowed. 


The R-groups are shown for clarity and comparison with the literature reference, but they are not included in the structure transformation file.  This file is stored as readable text, which can be easily modified to define related reactions.  These transformations thus build to become a repository of ones in-house chemistry.

2.      Selecting the reagents

The second step is to define the reagents.  Typically suitable in-house lists of reagent types will already exist (aldehydes, secondary aliphatic amines and so on).   Using the in-house reagent collections maintains the closest link between the library and practical synthetic accessibility.  However, it is also possible to supplement these reagents with additional molecules that are commercially available.  This will in many cases vastly increase the size of the chemistry space because of the combinatorial nature of the library.  The ability to easily update the libraries (see below) maintains a close coupling to real-world availability.  In this way, an effective balance between maximizing chemical diversity and maintaiing synthetic accessibility is achieved.

An advantage of the Affinity approach is that it  handles cases where there is no common framework in the resultant library.  There is also a clear and explicit link to the chemistry transformation and the starting materials available, and the reagent files do not require special processing into R-groups.  This further facilitates dynamic maintenance of the library, for example as new reagents become available.

3.      Running ReactiOx

The ReactiOx method uses as input the reaction transformation and reagent files and produces as output a file of products.  Output can be filtered, for example by molecular weight, to ensure products conform to drug-like (or lead-like) criteria.  These products are then passed to property calculation.  Key properties are the conformational models for the compounds and their RAMS descriptors, which are the primary properties used for the database mining descibed in the next section.  Conformer generation is uncoupled from product generation, allowing full flexibility in the choice of conformer generation method.


     

Having created this enormous library of compounds that are readily accessible to your medicinal chemsitry teams now one needs to search it to find interesting hits and leads.  This can be performed at unprecedented speeds using the Affinity RAMS similarity methods.