AMBINTER
Supplier of Chemicals Worldwide
OtherProducts
VFF Prantner
Contact/Job
Contact Job


The distributed Ambinter compound collection contains about 3 million molecules
while we also have in house another collection of 6 million compounds (3 distributed freely + 3 extra in house).
The electronic version of these collections is presently 2D SDF files
but we would like to propose other formats that could be valuable for some research projects.

The main pre-processing that we aim at providing involve:


a) Proposing the collection (3 millions) in 3D (mono conformers)
b) A filtered collection from the initial in which potential undesirable compounds would be removed
via several ADME/Tox filtering steps.
c) A diversity set
d) A random collection


a) Collection in 3D
It will be possible to download the 3 million-compound collection in 3D (single conformation)
and in Mol2 format, free of charge at: http://bioserv.rpbs.jussieu.fr/

Indeed, for several projects involving virtual screening or docking, it is beneficial to have
the molecules in 3D. To this end, we will use our optimized version of Frog (manuscript in preparation
and Leite et al., NAR 2007, 35:W568-72), a software that takes as input a 2D SDF file and generate
the 3D structure of each compound. Counter ions and salts are removed and a standard protonation
state is assigned. Gasteiger partial charges are added in the final 3D Mol2 file.

b) ADME/Tox filtering


In several situations, including drug design projects, it is important to filter out compounds that
do not possess a “drug-like” profile.
Druglikeness is a qualitative concept used in drug design for how “drug-like” a substance is. It is
estimated in many different ways, from the molecular structure, when dealing with in silico methods.
A druglike molecule has properties like: (a) optimal solubility to both water and fat, because an
orally administered drug has to go through the intestinal mucosa…One model compound for the cellular
membrane is octanol, so the logarithm of the octanol/water partition coefficient can be used to estimate
solubility. Solubility in water can be estimated from the number of hydrogen bond donors vs alkyl side
chains in the molecule…Too many hydrogen bond donors, on the other hand, lead to low fat solubility…
Other properties include molecular weight, with about 80% of the traded drugs have MW under 450 Da,
search for some reactive substructures…

Several famous filtering rules have been reported in the past, one of the traditional rules of thumb
for estimating bioavailability is the Lipinski’s rule of Five. Yet other important criteria can be
considered, such as removing the so-called frequent hitters, reactive groups, toxic groups, etc.
We will use an updated version of FAF-Drugs (Miteva et al., NAR 2006, 34:W738-44) to perform this step.
In this new version, about 200 rules have been implemented, including molecular weight, topological
polar surface area (TPSA) and logP, absolute and relative content of heteroatoms as well as limits
on the number of a very wide variety of functional groups, the number and size of ring systems,
the flexibility of the molecule.

We will provide the ADME/Tox filtered collection in 2D SDF and in 3D Mol2 format. For each molecule,
information about the filtering will be stored in a tabulated file. In addition, some of these data
will be attached as fields in the SDF format for convenience.


c) Diversity set


For some projects, it is beneficial to have a diversity subset representing the entire collection.
We will first compute fingerprints (over 400) for each compound, then, to estimate the diversity,
a clustering approach will be applied. A dissimilarity selection method will be applied in order
to extract a diversity set. Depending on the cutoff selected for definition of similarity, the number
of the diversity subset extracted will vary. We will thus provide several iverse collections of
50,000, 100,000 and 1,000,000 compounds, in 2D SDF and in 3D Mol2.


d) Random collections
It can be useful to have random collections, like to generate statistics about a collection or for some
molecular modeling projects or to compare databases, etc. We will provide several random collections of
different sizes, in 2D and in 3D.


Alternative processing upon request is possible. For instance, each compound can be generated in 3D
multiconformer states, the ADME/Tox filtering can be very soft or very stringent…etc.