scopy.ScoRepresent package


Created on Tue Jun 25 21:59:42 2019

A module used for calculating scaffold and fingerprint.

You can freely use and distribute it. If you have any problem, you could contact with us timely.

@Author: Zhi-Jiang Yang, Dong-Sheng Cao.

@Institution: CBDD Group, Xiangya School of Pharmaceutical Science, CSU, China.

@Homepage: http://www.scbdd.com

@Mail: kotori@cbdd.me and oriental-cds@163.com


Submodules

scopy.ScoRepresent.fingerprints module

scopy.ScoRepresent.fingerprints.CalculateEFG(mols, useCount=True, n_jobs=1)[source]

classification system termed “extended functional groups” (EFG), which are an extension of a set previously used by the CheckMol software, that covers in addition heterocyclic compound classes and periodic table groups. 583 bits

Reference:
  1. Salmina, Elena, Norbert Haider and Igor Tetko (2016).

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • useCount – If set to True, the fingerprint will presented in the format of counter(not only 1 and 0) else, would be binary, defaults to True

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateGhoseCrippen(mols, useCount=True, n_jobs=1)[source]

Atom-based calculation of LogP and MR using Crippen’s approach. 110 bits

Reference:
  1. Wildman, Scott A., and Gordon M. Crippen (1999).

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • useCount – If set to True, the fingerprint will presented in the format of counter(not only 1 and 0) else, would be binary, defaults to True

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateEState(mols, val=True, n_jobs=1)[source]

79 bits

Reference:
  1. L.B. Kier and L.H. Hall _Molecular Structure Description: The Electrotopological State Academic Press (1999)

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • val (bool, optional) – If set to True, the fingerprint will presented in the format of vaule of estate, else, would be int, defauls to True

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateMACCS(mols, n_jobs=1)[source]

There is a SMARTS-based implementation of the 166 public MACCS keys. 167 bits

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateECFP(mols, radius=2, nBits=1024, n_jobs=1)[source]

This family of fingerprints, better known as circular fingerprints, is built by applying the Morgan algorithm to a set of user-supplied atom invariants. 2^n bits

Reference:
  1. Rogers (2010).

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • radius – the radius of circle, defaults to 2

  • nBits (int,optional) – number of bits in the fingerprint, defaults to 1024

  • useFeatures (bool, optional) – to control generate FCFP if True, else ECFP, defaults to False

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateFCFP(mols, radius=2, nBits=1024, n_jobs=1)[source]

This family of fingerprints, better known as circular fingerprints, is built by applying the Morgan algorithm to a set of user-supplied atom invariants. 2^n bits

Reference:
  1. Rogers (2010).

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • radius – the radius of circle, defaults to 2

  • nBits (int,optional) – number of bits in the fingerprint, defaults to 1024

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateDaylight(mols, minPath=1, maxPath=7, nBits=2048, n_jobs=1)[source]

A Daylight-like fingerprint based on hashing molecular subgraphs 2^n bits

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • minPath (int, optional) – minimum number of bonds to include in the subgraphs, defaults to 1

  • maxPath (int, optional) – maximum number of bonds to include in the subgraphs, defaults to 7

  • nBits (int, optional) – number of bits in the fingerprint, defaults to 2048

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculatePubChem(mols, n_jobs=1)[source]

Calculate PubChem Fingerprints

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.fingerprints.CalculateIFG(mols, useCount=True, n_jobs=1)[source]

An algorithm to identify functional groups in organic molecules

Reference:
  1. Peter Ertl (2017).

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • useCount – If set to True, the fingerprint will presented in the format of counter(not only 1 and 0) else, would be binary, defaults to True

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

Returns

fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.daylight module

class scopy.ScoRepresent.daylight.Daylight(minPath, maxPath, nBits)[source]

Bases: object

a Daylight-like fingerprint based on hashing molecular subgraphs 2^n bits

Parameters
  • minPath (int, optional) – minimum number of bonds to include in the subgraphs, defaults to 1

  • maxPath (int, optional) – maximum number of bonds to include in the subgraphs, defaults to 7

  • nBits (int, optional) – number of bits in the fingerprint, defaults to 2048

CalculateDaylight(mol)[source]
Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

scopy.ScoRepresent.efg module

class scopy.ScoRepresent.efg.EFG(useCount)[source]

Bases: object

classification system termed “extended functional groups” (EFG), which are an extension of a set previously used by the CheckMol software, that covers in addition heterocyclic compound classes and periodic table groups. 583 bits

Reference:
  1. Salmina, Elena, Norbert Haider and Igor Tetko (2016).

Parameters

useCount – If set to True, the fingerprint will presented in the format of counter(not only 1 and 0) else, would be binary, defaults to True

CalculateEFG(mol)[source]

Calulating EFG fingerprint

Note:

following bits are always to be set as 1 or 0: bit0, bit1, bit2, bit59, bit60, bit61, bit62, bit209, bit218, bit544, bit545. It’s diffcult to counter them like others bits, because these bits contain “not”

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

scopy.ScoRepresent.estate module

class scopy.ScoRepresent.estate.EStateFP(val=True)[source]

Bases: object

79 bits

Reference:
  1. L.B. Kier and L.H. Hall _Molecular Structure Description: The Electrotopological State Academic Press (1999)

Parameters
  • mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

  • val (bool, optional) – If set to True, the fingerprint will presented in the format of vaule of estate, else, would be binary, defauls to True

CalculateEState(mol)[source]

Calculate EState fingerprint

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

scopy.ScoRepresent.ghosecrippen module

class scopy.ScoRepresent.ghosecrippen.GCfp(useCount=True)[source]

Bases: object

Atom-based calculation of LogP and MR using Crippen’s approach. 110 bits

Reference:
  1. Wildman, Scott A., and Gordon M. Crippen (1999).

Parameters

useCount – If set to True, the fingerprint will presented in the format of counter(not only 1 and 0) else, would be binary, defaults to True

CalculateGCfp(mol)[source]

Calculate GC fingerprint

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

scopy.ScoRepresent.ifg module

scopy.ScoRepresent.ifg.GetIFG(mol)[source]

A function to compute functional groups in organic molecules —>IFG

Reference:
  1. Ertl Peter (2017).

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecular

Returns

list of namedtuple, namedtuple(‘IFG’, [‘atomIds’, ‘atoms’, ‘type’])

Return type

list

class scopy.ScoRepresent.ifg.IFG(useCount=False, n_jobs=1)[source]

Bases: object

An algorithm to identify functional groups in organic molecules Undefined bits(depend to the samples)

Parameters
  • useCount – If set to True, the fingerprint will presented in the format of counter(not only 1 and 0) else, would be binary, defaults to True

  • n_jobs (int, optional) – The number of CPUs to use to do the computation, defaults to 1

CalculateIFG(mols)[source]

Calculate IFG fingrtprint

Parameters

mols (iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

Returns

the ifg fingerprints

Return type

numpy.ndarray

scopy.ScoRepresent.maccs module

class scopy.ScoRepresent.maccs.MACCS[source]

Bases: object

There is a SMARTS-based implementation of the 166 public MACCS keys. 167 bits

Note:
  1. Most of the differences have to do with aromaticity

  2. There’s a discrepancy sometimes because the current RDKit definitions do not require multiple matches to be distinct. e.g. the SMILES C(=O)CC(=O) can match the (hypothetical) key O=CC twice in my definition. It’s not clear to me what the correct behavior is.

  3. Some keys are not fully defined in the MDL documentation.

  4. Two keys, 125 and 166, have to be done outside of SMARTS.

CalculateMACCS(mol)[source]

There is a SMARTS-based implementation of the 166 public MACCS keys. 167 bits

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

scopy.ScoRepresent.morgan module

class scopy.ScoRepresent.morgan.Morgan(radius, nBits)[source]

Bases: object

This family of fingerprints, better known as circular fingerprints, is built by applying the Morgan algorithm to a set of user-supplied atom invariants. 2^n bits

Reference:
  1. Rogers, David and Mathew Hahn (2010).

Parameters
  • radius – the radius of circle, defaults to 2

  • nBits (int,optional) – number of bits in the fingerprint, defaults to 1024

  • useFeatures (bool, optional) – to control generate FCFP if True, else ECFP, defaults to False

CalculateECFP(mol)[source]

Function to compute ECFP fingerprint under useFeatures is True

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

CalculateFCFP(mol)[source]

Function to compute ECFP fingerprint under useFeatures is False

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

scopy.ScoRepresent.pubchem module

class scopy.ScoRepresent.pubchem.PubChem[source]

Bases: object

This module is derived from our previous work.

These are SMARTS patterns corresponding to the PubChem fingerprints:
  1. https://astro.temple.edu/~tua87106/list_fingerprints.pdf

  2. ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

CalculatePubChem(mol)[source]

Calculate PubChem Fingerprints

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

list

InitKeys(keyList, keyDict)[source]

Internal Use Only

generates SMARTS patterns for the keys, run once

calcPubChemFingerPart1(mol, **kwargs)[source]

Calculate PubChem Fingerprints (1-115; 263-881)

Parameters

mol (rdkit.Chem.rdchem.Mol) – molecule

Returns

fingerprint

Return type

rdkit.DataStructs.cDataStructs.SparseBitVect

calcPubChemFingerPart2(mol)[source]

Internal Use Only

Calculate PubChem Fingerprints (116-263)

func_1(mol, bits)[source]

Internal Use Only

Calculate PubChem Fingerprints (116-263)

func_2(mol, bits)[source]

Internal Use Only

saturated or aromatic carbon-only ring

func_3(mol, bits)[source]

Internal Use Only

saturated or aromatic nitrogen-containing

func_4(mol, bits)[source]

Internal Use Only

saturated or aromatic heteroatom-containing

func_5(mol, bits)[source]

Internal Use Only

unsaturated non-aromatic carbon-only

func_6(mol, bits)[source]

Internal Use Only

unsaturated non-aromatic nitrogen-containing

func_7(mol, bits)[source]

Internal Use Only

unsaturated non-aromatic heteroatom-containing

func_8(mol, bits)[source]

Internal Use Only

aromatic rings or hetero-aromatic rings

scopy.ScoRepresent.scaffolds module

scopy.ScoRepresent.scaffolds.CountCarbonScaffold(mols)[source]

Counting the frequency of Carbon Scaffold

Parameters

mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

Returns

the SMILES of framework and its frequency

Return type

dict

scopy.ScoRepresent.scaffolds.CountMurckoFramework(mols)[source]

Counting the frequency of Murcko Framework

Parameters

mols (Iterable object, each element is rdkit.Chem.rdchem.Mol) – the molecule to be scanned.

Returns

the SMILES of framework and its frequency

Return type

dict

Module contents

Created on Tue Jun 25 21:59:42 2019

@Author: Zhi-Jiang Yang, Dong-Sheng Cao @Institution: CBDD Group, Xiangya School of Pharmaceutical Science, CSU, China, @Homepage: http://www.scbdd.com @Mail: yzjkid9@gmail.com; oriental-cds@163.com @Blog: https://blog.moyule.me