scopy.ScoPretreat package


Created on Mon Sep 9 11:23:53 2019

A module used for pretreating molecule.

You can freely use and distribute it. If you have any problem, you could contact with us timely.

@Author: Zhi-Jiang Yang, Dong-Sheng Cao.

@Institution: CBDD Group, Xiangya School of Pharmaceutical Science, CSU, China.

@Homepage: http://www.scbdd.com

@Mail: kotori@cbdd.me and oriental-cds@163.com


Submodules

scopy.ScoPretreat.pretreat module

Created on Mon Sep 9 11:23:53 2019

@Author: Zhi-Jiang Yang, Dong-Sheng Cao @Institution: CBDD Group, Xiangya School of Pharmaceutical Science, CSU, China, @Homepage: http://www.scbdd.com @Mail: yzjkid9@gmail.com; oriental-cds@163.com @Blog: https://blog.moyule.me

scopy.ScoPretreat.pretreat.StandardMol(mol)[source]

The function for performing standardization of molecules and deriving parent molecules. The function contains derive fragment, charge, tautomer, isotope and stereo parent molecules. The primary usage is:

mol1 = Chem.MolFromSmiles('C1=CC=CC=C1')
mol2 = s.standardize(mol1)
scopy.ScoPretreat.pretreat.StandardSmi(smi)[source]

The function for performing standardization of molecules and deriving parent molecules. The function contains derive fragment, charge, tautomer, isotope and stereo parent molecules. The primary usage is:

smi = StandardSmi('C[n+]1c([N-](C))cccc1')
class scopy.ScoPretreat.pretreat.StandardizeMol(normalizations=(Normalization('Nitro to N+(O-)=O', '[N,P,As,Sb;X3:1](=[O,S,Se,Te:2])=[O,S,Se,Te:3]>>[*+1:1]([*-1:2])=[*:3]'), Normalization('Sulfone to S(=O)(=O)', '[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])'), Normalization('Pyridine oxide to n+O-', '[n:1]=[O:2]>>[n+:1][O-:2]'), Normalization('Azide to N=N+=N-', '[*,H:1][N:2]=[N:3]#[N:4]>>[*,H:1][N:2]=[N+:3]=[N-:4]'), Normalization('Diazo/azo to =N+=N-', '[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]'), Normalization('Sulfoxide to -S+(O-)-', '[!O:1][S+0;X3:2](=[O:3])[!O:4]>>[*:1][S+1:2]([O-:3])[*:4]'), Normalization('Phosphate to P(O-)=O', '[O,S,Se,Te;-1:1][P+;D4:2][O,S,Se,Te;-1:3]>>[*+0:1]=[P+0;D5:2][*-1:3]'), Normalization('C/S+N to C/S=N+', '[C,S;X3+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('P+N to P=N+', '[P;X4+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('Normalize hydrazine-diazonium', '[CX4:1][NX3H:2]-[NX3H:3][CX4:4][NX2+:5]#[NX1:6]>>[CX4:1][NH0:2]=[NH+:3][C:4][N+0:5]=[NH:6]'), Normalization('Recombine 1,3-separated charges', '[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[N,P,As,Sb,O,S,Se,Te;+1:3]>>[*-0:1]=[*:2]-[*+0:3]'), Normalization('Recombine 1,3-separated charges', '[n,o,p,s;-1:1]:[a:2]=[N,O,P,S;+1:3]>>[*-0:1]:[*:2]-[*+0:3]'), Normalization('Recombine 1,3-separated charges', '[N,O,P,S;-1:1]-[a:2]:[n,o,p,s;+1:3]>>[*-0:1]=[*:2]:[*+0:3]'), Normalization('Recombine 1,5-separated charges', '[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[A:3]-[A:4]=[N,P,As,Sb,O,S,Se,Te;+1:5]>>[*-0:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Recombine 1,5-separated charges', '[n,o,p,s;-1:1]:[a:2]:[a:3]:[c:4]=[N,O,P,S;+1:5]>>[*-0:1]:[*:2]:[*:3]:[c:4]-[*+0:5]'), Normalization('Recombine 1,5-separated charges', '[N,O,P,S;-1:1]-[c:2]:[a:3]:[a:4]:[n,o,p,s;+1:5]>>[*-0:1]=[c:2]:[*:3]:[*:4]:[*+0:5]'), Normalization('Normalize 1,3 conjugated cation', '[N,O;+0!H0:1]-[A:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]=[*:2]-[*+0:3]'), Normalization('Normalize 1,3 conjugated cation', '[n;+0!H0:1]:[c:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]:[*:2]-[*+0:3]'), Normalization('Normalize 1,5 conjugated cation', '[N,O;+0!H0:1]-[A:2]=[A:3]-[A:4]=[N!$(*[O-]),O;+1H0:5]>>[*+1:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Normalize 1,5 conjugated cation', '[n;+0!H0:1]:[a:2]:[a:3]:[c:4]=[N!$(*[O-]),O;+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]-[*+0:5]'), Normalization('Charge normalization', '[F,Cl,Br,I,At;-1:1]=[O:2]>>[*-0:1][O-:2]'), Normalization('Charge recombination', '[N,P,As,Sb;-1:1]=[C+;v3:2]>>[*+0:1]#[C+0:2]')), acid_base_pairs=(AcidBasePair('-OSO3H', 'OS(=O)(=O)[OH]', 'OS(=O)(=O)[O-]'), AcidBasePair('–SO3H', '[!O]S(=O)(=O)[OH]', '[!O]S(=O)(=O)[O-]'), AcidBasePair('-OSO2H', 'O[SD3](=O)[OH]', 'O[SD3](=O)[O-]'), AcidBasePair('-SO2H', '[!O][SD3](=O)[OH]', '[!O][SD3](=O)[O-]'), AcidBasePair('-OPO3H2', 'OP(=O)([OH])[OH]', 'OP(=O)([OH])[O-]'), AcidBasePair('-PO3H2', '[!O]P(=O)([OH])[OH]', '[!O]P(=O)([OH])[O-]'), AcidBasePair('-CO2H', 'C(=O)[OH]', 'C(=O)[O-]'), AcidBasePair('thiophenol', 'c[SH]', 'c[S-]'), AcidBasePair('(-OPO3H)-', 'OP(=O)([O-])[OH]', 'OP(=O)([O-])[O-]'), AcidBasePair('(-PO3H)-', '[!O]P(=O)([O-])[OH]', '[!O]P(=O)([O-])[O-]'), AcidBasePair('phthalimide', 'O=C2c1ccccc1C(=O)[NH]2', 'O=C2c1ccccc1C(=O)[N-]2'), AcidBasePair('CO3H (peracetyl)', 'C(=O)O[OH]', 'C(=O)O[O-]'), AcidBasePair('alpha-carbon-hydrogen-nitro group', 'O=N(O)[CH]', 'O=N(O)[C-]'), AcidBasePair('-SO2NH2', 'S(=O)(=O)[NH2]', 'S(=O)(=O)[NH-]'), AcidBasePair('-OBO2H2', 'OB([OH])[OH]', 'OB([OH])[O-]'), AcidBasePair('-BO2H2', '[!O]B([OH])[OH]', '[!O]B([OH])[O-]'), AcidBasePair('phenol', 'c[OH]', 'c[O-]'), AcidBasePair('SH (aliphatic)', 'C[SH]', 'C[S-]'), AcidBasePair('(-OBO2H)-', 'OB([O-])[OH]', 'OB([O-])[O-]'), AcidBasePair('(-BO2H)-', '[!O]B([O-])[OH]', '[!O]B([O-])[O-]'), AcidBasePair('cyclopentadiene', 'C1=CC=C[CH2]1', 'c1ccc[cH-]1'), AcidBasePair('-CONH2', 'C(=O)[NH2]', 'C(=O)[NH-]'), AcidBasePair('imidazole', 'c1cnc[nH]1', 'c1cnc[n-]1'), AcidBasePair('-OH (aliphatic alcohol)', '[CX4][OH]', '[CX4][O-]'), AcidBasePair('alpha-carbon-hydrogen-keto group', 'O=C([!O])[C!H0+0]', 'O=C([!O])[C-]'), AcidBasePair('alpha-carbon-hydrogen-acetyl ester group', 'OC(=O)[C!H0+0]', 'OC(=O)[C-]'), AcidBasePair('sp carbon hydrogen', 'C#[CH]', 'C#[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfone group', 'CS(=O)(=O)[C!H0+0]', 'CS(=O)(=O)[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfoxide group', 'C[SD3](=O)[C!H0+0]', 'C[SD3](=O)[C-]'), AcidBasePair('-NH2', '[CX4][NH2]', '[CX4][NH-]'), AcidBasePair('benzyl hydrogen', 'c[CX4H2]', 'c[CX3H-]'), AcidBasePair('sp2-carbon hydrogen', '[CX3]=[CX3!H0+0]', '[CX3]=[CX2-]'), AcidBasePair('sp3-carbon hydrogen', '[CX4!H0+0]', '[CX3-]')), tautomer_transforms=(TautomerTransform('1,3 (thio)keto/enol f', '[CX4!H0]-[C]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,3 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[C]=[C]', [], []), TautomerTransform('1,5 (thio)keto/enol f', '[CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,5 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]', [], []), TautomerTransform('aliphatic imine f', '[CX4!H0]-[C]=[NX2]', [], []), TautomerTransform('aliphatic imine r', '[NX3!H0]-[C]=[CX3]', [], []), TautomerTransform('special imine f', '[N!H0]-[C]=[CX3R0]', [], []), TautomerTransform('special imine r', '[CX4!H0]-[c]=[n]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift f', '[#7!H0]-[#6R1]=[O,#7X2]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift r', '[O,#7;!H0]-[#6R1]=[#7X2]', [], []), TautomerTransform('1,3 heteroatom H shift', '[#7,S,O,Se,Te;!H0]-[#7X2,#6,#15]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift', '[#7,#16,#8;!H0]-[#6,#7]=[#6]-[#6,#7]=[#7,#16,#8;H0]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift f', '[#7,#16,#8,Se,Te;!H0]-[#6,nX2]=[#6,nX2]-[#6,#7X2]=[#7X2,S,O,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift r', '[#7,S,O,Se,Te;!H0]-[#6,#7X2]=[#6,nX2]-[#6,nX2]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift f', '[#7,#8,#16,Se,Te;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6]-[#6,#7X2]=[#7X2,S,O,Se,Te,CX3]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]', [], []), TautomerTransform('1,9 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#7,O]', [], []), TautomerTransform('1,11 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#7X2,O]', [], []), TautomerTransform('furanone f', '[O,S,N;!H0]-[#6r5]=[#6X3r5;$([#6]([#6r5])=[#6r5])]', [], []), TautomerTransform('furanone r', '[#6r5!H0;$([#6]([#6r5])[#6r5])]-[#6r5]=[O,S,N]', [], []), TautomerTransform('keten/ynol f', '[C!H0]=[C]=[O,S,Se,Te;X1]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('keten/ynol r', '[O,S,Se,Te;!H0X2]-[C]#[C]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('ionic nitro/aci-nitro f', '[C!H0]-[N+;$([N][O-])]=[O]', [], []), TautomerTransform('ionic nitro/aci-nitro r', '[O!H0]-[N+;$([N][O-])]=[C]', [], []), TautomerTransform('oxim/nitroso f', '[O!H0]-[N]=[C]', [], []), TautomerTransform('oxim/nitroso r', '[C!H0]-[N]=[O]', [], []), TautomerTransform('oxim/nitroso via phenol f', '[O!H0]-[N]=[C]-[C]=[C]-[C]=[OH0]', [], []), TautomerTransform('oxim/nitroso via phenol r', '[O!H0]-[c]=[c]-[c]=[c]-[N]=[OH0]', [], []), TautomerTransform('cyano/iso-cyanic acid f', '[O!H0]-[C]#[N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('cyano/iso-cyanic acid r', '[N!H0]=[C]=[O]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('isocyanide f', '[C-0!H0]#[N+0]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('isocyanide r', '[N+!H0]#[C-]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('phosphonic acid f', '[OH]-[PH0]', [rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('phosphonic acid r', '[PH]=[O]', [rdkit.Chem.rdchem.BondType.SINGLE], [])), tautomer_scores=(TautomerScore('benzoquinone', '[#6]1([#6]=[#6][#6]([#6]=[#6]1)=,:[N,S,O])=,:[N,S,O]', 25), TautomerScore('oxim', '[#6]=[N][OH]', 4), TautomerScore('C=O', '[#6]=,:[#8]', 2), TautomerScore('N=O', '[#7]=,:[#8]', 2), TautomerScore('P=O', '[#15]=,:[#8]', 2), TautomerScore('C=hetero', '[#6]=[!#1;!#6]', 1), TautomerScore('methyl', '[CX4H3]', 1), TautomerScore('guanidine terminal=N', '[#7][#6](=[NR0])[#7H0]', 1), TautomerScore('guanidine endocyclic=N', '[#7;R][#6;R]([N])=[#7;R]', 2), TautomerScore('aci-nitro', '[#6]=[N+]([O-])[OH]', -4)), max_restarts=200, max_tautomers=1000, prefer_organic=False)[source]

Bases: object

The main class for performing standardization of molecules and deriving parent molecules.

The primary usage is via the standardize() method:

s = Standardizer()
mol1 = Chem.MolFromSmiles('C1=CC=CC=C1')
mol2 = s.standardize(mol1)

There are separate methods to derive fragment, charge, tautomer, isotope and stereo parent molecules.

addhs(mol)[source]
property canonicalize_tautomer
Returns

A callable TautomerCanonicalizer instance.

property disconnect_metals
Returns

A callable MetalDisconnector instance.

property largest_fragment
Returns

A callable LargestFragmentChooser instance.

property normalize
Returns

A callable Normalizer instance.

property reionize
Returns

A callable Reionizer instance.

rmhs(mol)[source]
property uncharge
Returns

A callable Uncharger instance.

scopy.ScoPretreat.pretreat.ValidatorMol(mol)[source]

Return log messages for a given SMILES string using the default validations.

Note: This is a convenience function for quickly validating a single SMILES string.

Parameters

smiles (string) – The SMILES for the molecule.

Returns

A list of log messages.

Return type

list of strings.

scopy.ScoPretreat.pretreat.ValidatorSmi(smi)[source]

Return log messages for a given SMILES string using the default validations.

Note: This is a convenience function for quickly validating a single SMILES string.

Parameters

smiles (string) – The SMILES for the molecule.

Returns

A list of log messages.

Return type

list of strings.

scopy.ScoPretreat.pretreat_Lib module

scopy.ScoPretreat.pretreat_Lib.StandardMols(mols, n_jobs=1)[source]

scopy.ScoPretreat.pretreatutil module

Created on Mon Sep 9 10:41:15 2019

This module is derived from molvs(https://github.com/mcs07/MolVS.git)

@Author: Zhi-Jiang Yang, Dong-Sheng Cao @Institution: CBDD Group, Xiangya School of Pharmaceutical Science, CSU, China, @Homepage: http://www.scbdd.com @Mail: yzjkid9@gmail.com; oriental-cds@163.com @Blog: https://blog.moyule.me

scopy.ScoPretreat.pretreatutil.ACID_BASE_PAIRS = (AcidBasePair('-OSO3H', 'OS(=O)(=O)[OH]', 'OS(=O)(=O)[O-]'), AcidBasePair('–SO3H', '[!O]S(=O)(=O)[OH]', '[!O]S(=O)(=O)[O-]'), AcidBasePair('-OSO2H', 'O[SD3](=O)[OH]', 'O[SD3](=O)[O-]'), AcidBasePair('-SO2H', '[!O][SD3](=O)[OH]', '[!O][SD3](=O)[O-]'), AcidBasePair('-OPO3H2', 'OP(=O)([OH])[OH]', 'OP(=O)([OH])[O-]'), AcidBasePair('-PO3H2', '[!O]P(=O)([OH])[OH]', '[!O]P(=O)([OH])[O-]'), AcidBasePair('-CO2H', 'C(=O)[OH]', 'C(=O)[O-]'), AcidBasePair('thiophenol', 'c[SH]', 'c[S-]'), AcidBasePair('(-OPO3H)-', 'OP(=O)([O-])[OH]', 'OP(=O)([O-])[O-]'), AcidBasePair('(-PO3H)-', '[!O]P(=O)([O-])[OH]', '[!O]P(=O)([O-])[O-]'), AcidBasePair('phthalimide', 'O=C2c1ccccc1C(=O)[NH]2', 'O=C2c1ccccc1C(=O)[N-]2'), AcidBasePair('CO3H (peracetyl)', 'C(=O)O[OH]', 'C(=O)O[O-]'), AcidBasePair('alpha-carbon-hydrogen-nitro group', 'O=N(O)[CH]', 'O=N(O)[C-]'), AcidBasePair('-SO2NH2', 'S(=O)(=O)[NH2]', 'S(=O)(=O)[NH-]'), AcidBasePair('-OBO2H2', 'OB([OH])[OH]', 'OB([OH])[O-]'), AcidBasePair('-BO2H2', '[!O]B([OH])[OH]', '[!O]B([OH])[O-]'), AcidBasePair('phenol', 'c[OH]', 'c[O-]'), AcidBasePair('SH (aliphatic)', 'C[SH]', 'C[S-]'), AcidBasePair('(-OBO2H)-', 'OB([O-])[OH]', 'OB([O-])[O-]'), AcidBasePair('(-BO2H)-', '[!O]B([O-])[OH]', '[!O]B([O-])[O-]'), AcidBasePair('cyclopentadiene', 'C1=CC=C[CH2]1', 'c1ccc[cH-]1'), AcidBasePair('-CONH2', 'C(=O)[NH2]', 'C(=O)[NH-]'), AcidBasePair('imidazole', 'c1cnc[nH]1', 'c1cnc[n-]1'), AcidBasePair('-OH (aliphatic alcohol)', '[CX4][OH]', '[CX4][O-]'), AcidBasePair('alpha-carbon-hydrogen-keto group', 'O=C([!O])[C!H0+0]', 'O=C([!O])[C-]'), AcidBasePair('alpha-carbon-hydrogen-acetyl ester group', 'OC(=O)[C!H0+0]', 'OC(=O)[C-]'), AcidBasePair('sp carbon hydrogen', 'C#[CH]', 'C#[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfone group', 'CS(=O)(=O)[C!H0+0]', 'CS(=O)(=O)[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfoxide group', 'C[SD3](=O)[C!H0+0]', 'C[SD3](=O)[C-]'), AcidBasePair('-NH2', '[CX4][NH2]', '[CX4][NH-]'), AcidBasePair('benzyl hydrogen', 'c[CX4H2]', 'c[CX3H-]'), AcidBasePair('sp2-carbon hydrogen', '[CX3]=[CX3!H0+0]', '[CX3]=[CX2-]'), AcidBasePair('sp3-carbon hydrogen', '[CX4!H0+0]', '[CX3-]'))

The default list of AcidBasePairs, sorted from strongest to weakest. This list is derived from the Food and Drug Administration Substance Registration System Standard Operating Procedure guide.

class scopy.ScoPretreat.pretreatutil.AcidBasePair(name, acid, base)[source]

Bases: object

An acid and its conjugate base, defined by SMARTS.

A strength-ordered list of AcidBasePairs can be used to ensure the strongest acids in a molecule ionize first.

property acid
property base
scopy.ScoPretreat.pretreatutil.CHARGE_CORRECTIONS = (ChargeCorrection('[Li,Na,K]', '[Li,Na,K;X0+0]', 1), ChargeCorrection('[Mg,Ca]', '[Mg,Ca;X0+0]', 2), ChargeCorrection('[Cl]', '[Cl;X0+0]', -1))

The default list of ChargeCorrections.

class scopy.ScoPretreat.pretreatutil.ChargeCorrection(name, smarts, charge)[source]

Bases: object

An atom that should have a certain charge applied, defined by a SMARTS pattern.

property smarts
class scopy.ScoPretreat.pretreatutil.DichloroethaneValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.SmartsValidation

Logs if 1,2-dichloroethane is present.

This is provided as an example of how to subclass SmartsValidation to check for the presence of a substructure.

entire_fragment = True
level = 20
message = '1,2-Dichloroethane is present'
smarts = '[Cl]-[#6]-[#6]-[Cl]'
class scopy.ScoPretreat.pretreatutil.FragmentPattern(name, smarts)[source]

Bases: object

A fragment defined by a SMARTS pattern.

property smarts
class scopy.ScoPretreat.pretreatutil.FragmentRemover(fragments=(FragmentPattern('hydrogen', '[H]'), FragmentPattern('fluorine', '[F]'), FragmentPattern('chlorine', '[Cl]'), FragmentPattern('bromine', '[Br]'), FragmentPattern('iodine', '[I]'), FragmentPattern('lithium', '[Li]'), FragmentPattern('sodium', '[Na]'), FragmentPattern('potassium', '[K]'), FragmentPattern('calcium', '[Ca]'), FragmentPattern('magnesium', '[Mg]'), FragmentPattern('aluminium', '[Al]'), FragmentPattern('barium', '[Ba]'), FragmentPattern('bismuth', '[Bi]'), FragmentPattern('silver', '[Ag]'), FragmentPattern('strontium', '[Sr]'), FragmentPattern('zinc', '[Zn]'), FragmentPattern('ammonia/ammonium', '[#7]'), FragmentPattern('water/hydroxide', '[#8]'), FragmentPattern('methyl amine', '[#6]-[#7]'), FragmentPattern('sulfide', 'S'), FragmentPattern('nitrate', '[#7](=[#8])(-[#8])-[#8]'), FragmentPattern('phosphate', '[P](=[#8])(-[#8])(-[#8])-[#8]'), FragmentPattern('hexafluorophosphate', '[P](-[#9])(-[#9])(-[#9])(-[#9])(-[#9])-[#9]'), FragmentPattern('sulfate', '[S](=[#8])(=[#8])(-[#8])-[#8]'), FragmentPattern('methyl sulfonate', '[#6]-[S](=[#8])(=[#8])(-[#8])'), FragmentPattern('trifluoromethanesulfonic acid', '[#8]-[S](=[#8])(=[#8])-[#6](-[#9])(-[#9])-[#9]'), FragmentPattern('trifluoroacetic acid', '[#9]-[#6](-[#9])(-[#9])-[#6](=[#8])-[#8]'), FragmentPattern('1, 2-dichloroethane', '[Cl]-[#6]-[#6]-[Cl]'), FragmentPattern('1, 2-dimethoxyethane', '[#6]-[#8]-[#6]-[#6]-[#8]-[#6]'), FragmentPattern('1, 4-dioxane', '[#6]-1-[#6]-[#8]-[#6]-[#6]-[#8]-1'), FragmentPattern('1-methyl-2-pyrrolidinone', '[#6]-[#7]-1-[#6]-[#6]-[#6]-[#6]-1=[#8]'), FragmentPattern('2-butanone', '[#6]-[#6]-[#6](-[#6])=[#8]'), FragmentPattern('acetate/acetic acid', '[#8]-[#6](-[#6])=[#8]'), FragmentPattern('acetone', '[#6]-[#6](-[#6])=[#8]'), FragmentPattern('acetonitrile', '[#6]-[#6]#[N]'), FragmentPattern('benzene', '[#6]1[#6][#6][#6][#6][#6]1'), FragmentPattern('butanol', '[#8]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('t-butanol', '[#8]-[#6](-[#6])(-[#6])-[#6]'), FragmentPattern('chloroform', '[Cl]-[#6](-[Cl])-[Cl]'), FragmentPattern('cycloheptane', '[#6]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-1'), FragmentPattern('cyclohexane', '[#6]-1-[#6]-[#6]-[#6]-[#6]-[#6]-1'), FragmentPattern('dichloromethane', '[Cl]-[#6]-[Cl]'), FragmentPattern('diethyl ether', '[#6]-[#6]-[#8]-[#6]-[#6]'), FragmentPattern('diisopropyl ether', '[#6]-[#6](-[#6])-[#8]-[#6](-[#6])-[#6]'), FragmentPattern('dimethyl formamide', '[#6]-[#7](-[#6])-[#6]=[#8]'), FragmentPattern('dimethyl sulfoxide', '[#6]-[S](-[#6])=[#8]'), FragmentPattern('ethanol', '[#8]-[#6]-[#6]'), FragmentPattern('ethyl acetate', '[#6]-[#6]-[#8]-[#6](-[#6])=[#8]'), FragmentPattern('formic acid', '[#8]-[#6]=[#8]'), FragmentPattern('heptane', '[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('hexane', '[#6]-[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('isopropanol', '[#8]-[#6](-[#6])-[#6]'), FragmentPattern('methanol', '[#8]-[#6]'), FragmentPattern('N, N-dimethylacetamide', '[#6]-[#7](-[#6])-[#6](-[#6])=[#8]'), FragmentPattern('pentane', '[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('propanol', '[#8]-[#6]-[#6]-[#6]'), FragmentPattern('pyridine', '[#6]-1=[#6]-[#6]=[#7]-[#6]=[#6]-1'), FragmentPattern('t-butyl methyl ether', '[#6]-[#8]-[#6](-[#6])(-[#6])-[#6]'), FragmentPattern('tetrahydrofurane', '[#6]-1-[#6]-[#6]-[#8]-[#6]-1'), FragmentPattern('toluene', '[#6]-[#6]~1~[#6]~[#6]~[#6]~[#6]~[#6]~1'), FragmentPattern('xylene', '[#6]-[#6]~1~[#6](-[#6])~[#6]~[#6]~[#6]~[#6]~1')), leave_last=True)[source]

Bases: object

A class for filtering out fragments using SMARTS patterns.

remove(mol)[source]

Return the molecule with specified fragments removed.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The molecule to remove fragments from.

Returns

The molecule with fragments removed.

Return type

rdkit.Chem.rdchem.Mol

class scopy.ScoPretreat.pretreatutil.FragmentValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.Validation

Logs if certain fragments are present.

Subclass and override the fragments class attribute to customize the list of FragmentPatterns.

fragments = (FragmentPattern('hydrogen', '[H]'), FragmentPattern('fluorine', '[F]'), FragmentPattern('chlorine', '[Cl]'), FragmentPattern('bromine', '[Br]'), FragmentPattern('iodine', '[I]'), FragmentPattern('lithium', '[Li]'), FragmentPattern('sodium', '[Na]'), FragmentPattern('potassium', '[K]'), FragmentPattern('calcium', '[Ca]'), FragmentPattern('magnesium', '[Mg]'), FragmentPattern('aluminium', '[Al]'), FragmentPattern('barium', '[Ba]'), FragmentPattern('bismuth', '[Bi]'), FragmentPattern('silver', '[Ag]'), FragmentPattern('strontium', '[Sr]'), FragmentPattern('zinc', '[Zn]'), FragmentPattern('ammonia/ammonium', '[#7]'), FragmentPattern('water/hydroxide', '[#8]'), FragmentPattern('methyl amine', '[#6]-[#7]'), FragmentPattern('sulfide', 'S'), FragmentPattern('nitrate', '[#7](=[#8])(-[#8])-[#8]'), FragmentPattern('phosphate', '[P](=[#8])(-[#8])(-[#8])-[#8]'), FragmentPattern('hexafluorophosphate', '[P](-[#9])(-[#9])(-[#9])(-[#9])(-[#9])-[#9]'), FragmentPattern('sulfate', '[S](=[#8])(=[#8])(-[#8])-[#8]'), FragmentPattern('methyl sulfonate', '[#6]-[S](=[#8])(=[#8])(-[#8])'), FragmentPattern('trifluoromethanesulfonic acid', '[#8]-[S](=[#8])(=[#8])-[#6](-[#9])(-[#9])-[#9]'), FragmentPattern('trifluoroacetic acid', '[#9]-[#6](-[#9])(-[#9])-[#6](=[#8])-[#8]'), FragmentPattern('1,2-dichloroethane', '[Cl]-[#6]-[#6]-[Cl]'), FragmentPattern('1,2-dimethoxyethane', '[#6]-[#8]-[#6]-[#6]-[#8]-[#6]'), FragmentPattern('1,4-dioxane', '[#6]-1-[#6]-[#8]-[#6]-[#6]-[#8]-1'), FragmentPattern('1-methyl-2-pyrrolidinone', '[#6]-[#7]-1-[#6]-[#6]-[#6]-[#6]-1=[#8]'), FragmentPattern('2-butanone', '[#6]-[#6]-[#6](-[#6])=[#8]'), FragmentPattern('acetate/acetic acid', '[#8]-[#6](-[#6])=[#8]'), FragmentPattern('acetone', '[#6]-[#6](-[#6])=[#8]'), FragmentPattern('acetonitrile', '[#6]-[#6]#[N]'), FragmentPattern('benzene', '[#6]1[#6][#6][#6][#6][#6]1'), FragmentPattern('butanol', '[#8]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('t-butanol', '[#8]-[#6](-[#6])(-[#6])-[#6]'), FragmentPattern('chloroform', '[Cl]-[#6](-[Cl])-[Cl]'), FragmentPattern('cycloheptane', '[#6]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-1'), FragmentPattern('cyclohexane', '[#6]-1-[#6]-[#6]-[#6]-[#6]-[#6]-1'), FragmentPattern('dichloromethane', '[Cl]-[#6]-[Cl]'), FragmentPattern('diethyl ether', '[#6]-[#6]-[#8]-[#6]-[#6]'), FragmentPattern('diisopropyl ether', '[#6]-[#6](-[#6])-[#8]-[#6](-[#6])-[#6]'), FragmentPattern('dimethyl formamide', '[#6]-[#7](-[#6])-[#6]=[#8]'), FragmentPattern('dimethyl sulfoxide', '[#6]-[S](-[#6])=[#8]'), FragmentPattern('ethanol', '[#8]-[#6]-[#6]'), FragmentPattern('ethyl acetate', '[#6]-[#6]-[#8]-[#6](-[#6])=[#8]'), FragmentPattern('formic acid', '[#8]-[#6]=[#8]'), FragmentPattern('heptane', '[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('hexane', '[#6]-[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('isopropanol', '[#8]-[#6](-[#6])-[#6]'), FragmentPattern('methanol', '[#8]-[#6]'), FragmentPattern('N,N-dimethylacetamide', '[#6]-[#7](-[#6])-[#6](-[#6])=[#8]'), FragmentPattern('pentane', '[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('propanol', '[#8]-[#6]-[#6]-[#6]'), FragmentPattern('pyridine', '[#6]-1=[#6]-[#6]=[#7]-[#6]=[#6]-1'), FragmentPattern('t-butyl methyl ether', '[#6]-[#8]-[#6](-[#6])(-[#6])-[#6]'), FragmentPattern('tetrahydrofurane', '[#6]-1-[#6]-[#6]-[#8]-[#6]-1'), FragmentPattern('toluene', '[#6]-[#6]~1~[#6]~[#6]~[#6]~[#6]~[#6]~1'), FragmentPattern('xylene', '[#6]-[#6]~1~[#6](-[#6])~[#6]~[#6]~[#6]~[#6]~1'))

A list of FragmentPatterns to check for.

run(mol)[source]
class scopy.ScoPretreat.pretreatutil.IsNoneValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.Validation

Logs an error if None is passed to the Validator.

This can happen if RDKit failed to parse an input format. If the molecule is None, no subsequent validations will run.

run(mol)[source]
class scopy.ScoPretreat.pretreatutil.IsotopeValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.Validation

Logs if molecule contains isotopes.

run(mol)[source]
scopy.ScoPretreat.pretreatutil.LEAVE_LAST = True

The default value for whether to ensure at least one fragment is left after FragmentRemover is applied.

scopy.ScoPretreat.pretreatutil.LONG_FORMAT = '%(asctime)s - %(levelname)s - %(validation)s - %(message)s'

A more detailed format for log messages. Specify when initializing a Validator.

class scopy.ScoPretreat.pretreatutil.LargestFragmentChooser(prefer_organic=False)[source]

Bases: object

A class for selecting the largest covalent unit in a molecule with multiple fragments.

choose(mol)[source]

Return the largest covalent unit.

The largest fragment is determined by number of atoms (including hydrogens). Ties are broken by taking the fragment with the higher molecular weight, and then by taking the first alphabetically by SMILES if needed.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The molecule to choose the largest fragment from.

Returns

The largest fragment.

Return type

rdkit.Chem.rdchem.Mol

class scopy.ScoPretreat.pretreatutil.LogHandler[source]

Bases: logging.Handler

A simple logging Handler that just stores logs in an array until flushed.

close()[source]

Close the handler.

emit(record)[source]

Append the record.

flush()[source]

Clear the log records.

property logmessages
scopy.ScoPretreat.pretreatutil.MAX_RESTARTS = 200

The default value for the maximum number of times to attempt to apply the series of normalizations.

scopy.ScoPretreat.pretreatutil.MAX_TAUTOMERS = 1000

The default value for the maximum number of tautomers to enumerate, a limit to prevent combinatorial explosion.

class scopy.ScoPretreat.pretreatutil.MetalDisconnector[source]

Bases: object

Class for breaking covalent bonds between metals and organic atoms under certain conditions.

disconnect(mol)[source]

Break covalent bonds between metals and organic atoms under certain conditions.

The algorithm works as follows:

  • Disconnect N, O, F from any metal.

  • Disconnect other non-metals from transition metals + Al (but not Hg, Ga, Ge, In, Sn, As, Tl, Pb, Bi, Po).

  • For every bond broken, adjust the charges of the begin and end atoms accordingly.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The input molecule.

Returns

The molecule with metals disconnected.

Return type

rdkit.Chem.rdchem.Mol

exception scopy.ScoPretreat.pretreatutil.MolVSError[source]

Bases: Exception

scopy.ScoPretreat.pretreatutil.NORMALIZATIONS = (Normalization('Nitro to N+(O-)=O', '[N,P,As,Sb;X3:1](=[O,S,Se,Te:2])=[O,S,Se,Te:3]>>[*+1:1]([*-1:2])=[*:3]'), Normalization('Sulfone to S(=O)(=O)', '[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])'), Normalization('Pyridine oxide to n+O-', '[n:1]=[O:2]>>[n+:1][O-:2]'), Normalization('Azide to N=N+=N-', '[*,H:1][N:2]=[N:3]#[N:4]>>[*,H:1][N:2]=[N+:3]=[N-:4]'), Normalization('Diazo/azo to =N+=N-', '[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]'), Normalization('Sulfoxide to -S+(O-)-', '[!O:1][S+0;X3:2](=[O:3])[!O:4]>>[*:1][S+1:2]([O-:3])[*:4]'), Normalization('Phosphate to P(O-)=O', '[O,S,Se,Te;-1:1][P+;D4:2][O,S,Se,Te;-1:3]>>[*+0:1]=[P+0;D5:2][*-1:3]'), Normalization('C/S+N to C/S=N+', '[C,S;X3+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('P+N to P=N+', '[P;X4+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('Normalize hydrazine-diazonium', '[CX4:1][NX3H:2]-[NX3H:3][CX4:4][NX2+:5]#[NX1:6]>>[CX4:1][NH0:2]=[NH+:3][C:4][N+0:5]=[NH:6]'), Normalization('Recombine 1,3-separated charges', '[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[N,P,As,Sb,O,S,Se,Te;+1:3]>>[*-0:1]=[*:2]-[*+0:3]'), Normalization('Recombine 1,3-separated charges', '[n,o,p,s;-1:1]:[a:2]=[N,O,P,S;+1:3]>>[*-0:1]:[*:2]-[*+0:3]'), Normalization('Recombine 1,3-separated charges', '[N,O,P,S;-1:1]-[a:2]:[n,o,p,s;+1:3]>>[*-0:1]=[*:2]:[*+0:3]'), Normalization('Recombine 1,5-separated charges', '[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[A:3]-[A:4]=[N,P,As,Sb,O,S,Se,Te;+1:5]>>[*-0:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Recombine 1,5-separated charges', '[n,o,p,s;-1:1]:[a:2]:[a:3]:[c:4]=[N,O,P,S;+1:5]>>[*-0:1]:[*:2]:[*:3]:[c:4]-[*+0:5]'), Normalization('Recombine 1,5-separated charges', '[N,O,P,S;-1:1]-[c:2]:[a:3]:[a:4]:[n,o,p,s;+1:5]>>[*-0:1]=[c:2]:[*:3]:[*:4]:[*+0:5]'), Normalization('Normalize 1,3 conjugated cation', '[N,O;+0!H0:1]-[A:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]=[*:2]-[*+0:3]'), Normalization('Normalize 1,3 conjugated cation', '[n;+0!H0:1]:[c:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]:[*:2]-[*+0:3]'), Normalization('Normalize 1,5 conjugated cation', '[N,O;+0!H0:1]-[A:2]=[A:3]-[A:4]=[N!$(*[O-]),O;+1H0:5]>>[*+1:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Normalize 1,5 conjugated cation', '[n;+0!H0:1]:[a:2]:[a:3]:[c:4]=[N!$(*[O-]),O;+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]-[*+0:5]'), Normalization('Charge normalization', '[F,Cl,Br,I,At;-1:1]=[O:2]>>[*-0:1][O-:2]'), Normalization('Charge recombination', '[N,P,As,Sb;-1:1]=[C+;v3:2]>>[*+0:1]#[C+0:2]'))

The default list of Normalization transforms.

class scopy.ScoPretreat.pretreatutil.NeutralValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.Validation

Logs if not an overall neutral system.

run(mol)[source]
class scopy.ScoPretreat.pretreatutil.NoAtomValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.Validation

Logs an error if the molecule has zero atoms.

If the molecule has no atoms, no subsequent validations will run.

run(mol)[source]
class scopy.ScoPretreat.pretreatutil.Normalization(name, transform)[source]

Bases: object

A normalization transform defined by reaction SMARTS.

property transform
class scopy.ScoPretreat.pretreatutil.Normalizer(normalizations=(Normalization('Nitro to N+(O-)=O', '[N, P, As, Sb;X3:1](=[O, S, Se, Te:2])=[O, S, Se, Te:3]>>[*+1:1]([*-1:2])=[*:3]'), Normalization('Sulfone to S(=O)(=O)', '[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])'), Normalization('Pyridine oxide to n+O-', '[n:1]=[O:2]>>[n+:1][O-:2]'), Normalization('Azide to N=N+=N-', '[*, H:1][N:2]=[N:3]#[N:4]>>[*, H:1][N:2]=[N+:3]=[N-:4]'), Normalization('Diazo/azo to =N+=N-', '[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]'), Normalization('Sulfoxide to -S+(O-)-', '[!O:1][S+0;X3:2](=[O:3])[!O:4]>>[*:1][S+1:2]([O-:3])[*:4]'), Normalization('Phosphate to P(O-)=O', '[O, S, Se, Te;-1:1][P+;D4:2][O, S, Se, Te;-1:3]>>[*+0:1]=[P+0;D5:2][*-1:3]'), Normalization('C/S+N to C/S=N+', '[C, S;X3+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('P+N to P=N+', '[P;X4+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('Normalize hydrazine-diazonium', '[CX4:1][NX3H:2]-[NX3H:3][CX4:4][NX2+:5]#[NX1:6]>>[CX4:1][NH0:2]=[NH+:3][C:4][N+0:5]=[NH:6]'), Normalization('Recombine 1, 3-separated charges', '[N, P, As, Sb, O, S, Se, Te;-1:1]-[A+0:2]=[N, P, As, Sb, O, S, Se, Te;+1:3]>>[*-0:1]=[*:2]-[*+0:3]'), Normalization('Recombine 1, 3-separated charges', '[n, o, p, s;-1:1]:[a:2]=[N, O, P, S;+1:3]>>[*-0:1]:[*:2]-[*+0:3]'), Normalization('Recombine 1, 3-separated charges', '[N, O, P, S;-1:1]-[a:2]:[n, o, p, s;+1:3]>>[*-0:1]=[*:2]:[*+0:3]'), Normalization('Recombine 1, 5-separated charges', '[N, P, As, Sb, O, S, Se, Te;-1:1]-[A+0:2]=[A:3]-[A:4]=[N, P, As, Sb, O, S, Se, Te;+1:5]>>[*-0:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Recombine 1, 5-separated charges', '[n, o, p, s;-1:1]:[a:2]:[a:3]:[c:4]=[N, O, P, S;+1:5]>>[*-0:1]:[*:2]:[*:3]:[c:4]-[*+0:5]'), Normalization('Recombine 1, 5-separated charges', '[N, O, P, S;-1:1]-[c:2]:[a:3]:[a:4]:[n, o, p, s;+1:5]>>[*-0:1]=[c:2]:[*:3]:[*:4]:[*+0:5]'), Normalization('Normalize 1, 3 conjugated cation', '[N, O;+0!H0:1]-[A:2]=[N!$(*[O-]), O;+1H0:3]>>[*+1:1]=[*:2]-[*+0:3]'), Normalization('Normalize 1, 3 conjugated cation', '[n;+0!H0:1]:[c:2]=[N!$(*[O-]), O;+1H0:3]>>[*+1:1]:[*:2]-[*+0:3]'), Normalization('Normalize 1, 5 conjugated cation', '[N, O;+0!H0:1]-[A:2]=[A:3]-[A:4]=[N!$(*[O-]), O;+1H0:5]>>[*+1:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Normalize 1, 5 conjugated cation', '[n;+0!H0:1]:[a:2]:[a:3]:[c:4]=[N!$(*[O-]), O;+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]-[*+0:5]'), Normalization('Charge normalization', '[F, Cl, Br, I, At;-1:1]=[O:2]>>[*-0:1][O-:2]'), Normalization('Charge recombination', '[N, P, As, Sb;-1:1]=[C+;v3:2]>>[*+0:1]#[C+0:2]')), max_restarts=200)[source]

Bases: object

A class for applying Normalization transforms.

This class is typically used to apply a series of Normalization transforms to correct functional groups and recombine charges. Each transform is repeatedly applied until no further changes occur.

normalize(mol)[source]

Apply a series of Normalization transforms to correct functional groups and recombine charges.

A series of transforms are applied to the molecule. For each Normalization, the transform is applied repeatedly until no further changes occur. If any changes occurred, we go back and start from the first Normalization again, in case the changes mean an earlier transform is now applicable. The molecule is returned once the entire series of Normalizations cause no further changes or if max_restarts (default 200) is reached.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The molecule to normalize.

Returns

The normalized fragment.

Return type

rdkit.Chem.rdchem.Mol

scopy.ScoPretreat.pretreatutil.PREFER_ORGANIC = False

The default value for whether LargestFragmentChooser sees organic fragments as “larger” than inorganic fragments.

scopy.ScoPretreat.pretreatutil.REMOVE_FRAGMENTS = (FragmentPattern('hydrogen', '[H]'), FragmentPattern('fluorine', '[F]'), FragmentPattern('chlorine', '[Cl]'), FragmentPattern('bromine', '[Br]'), FragmentPattern('iodine', '[I]'), FragmentPattern('lithium', '[Li]'), FragmentPattern('sodium', '[Na]'), FragmentPattern('potassium', '[K]'), FragmentPattern('calcium', '[Ca]'), FragmentPattern('magnesium', '[Mg]'), FragmentPattern('aluminium', '[Al]'), FragmentPattern('barium', '[Ba]'), FragmentPattern('bismuth', '[Bi]'), FragmentPattern('silver', '[Ag]'), FragmentPattern('strontium', '[Sr]'), FragmentPattern('zinc', '[Zn]'), FragmentPattern('ammonia/ammonium', '[#7]'), FragmentPattern('water/hydroxide', '[#8]'), FragmentPattern('methyl amine', '[#6]-[#7]'), FragmentPattern('sulfide', 'S'), FragmentPattern('nitrate', '[#7](=[#8])(-[#8])-[#8]'), FragmentPattern('phosphate', '[P](=[#8])(-[#8])(-[#8])-[#8]'), FragmentPattern('hexafluorophosphate', '[P](-[#9])(-[#9])(-[#9])(-[#9])(-[#9])-[#9]'), FragmentPattern('sulfate', '[S](=[#8])(=[#8])(-[#8])-[#8]'), FragmentPattern('methyl sulfonate', '[#6]-[S](=[#8])(=[#8])(-[#8])'), FragmentPattern('trifluoromethanesulfonic acid', '[#8]-[S](=[#8])(=[#8])-[#6](-[#9])(-[#9])-[#9]'), FragmentPattern('trifluoroacetic acid', '[#9]-[#6](-[#9])(-[#9])-[#6](=[#8])-[#8]'), FragmentPattern('1,2-dichloroethane', '[Cl]-[#6]-[#6]-[Cl]'), FragmentPattern('1,2-dimethoxyethane', '[#6]-[#8]-[#6]-[#6]-[#8]-[#6]'), FragmentPattern('1,4-dioxane', '[#6]-1-[#6]-[#8]-[#6]-[#6]-[#8]-1'), FragmentPattern('1-methyl-2-pyrrolidinone', '[#6]-[#7]-1-[#6]-[#6]-[#6]-[#6]-1=[#8]'), FragmentPattern('2-butanone', '[#6]-[#6]-[#6](-[#6])=[#8]'), FragmentPattern('acetate/acetic acid', '[#8]-[#6](-[#6])=[#8]'), FragmentPattern('acetone', '[#6]-[#6](-[#6])=[#8]'), FragmentPattern('acetonitrile', '[#6]-[#6]#[N]'), FragmentPattern('benzene', '[#6]1[#6][#6][#6][#6][#6]1'), FragmentPattern('butanol', '[#8]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('t-butanol', '[#8]-[#6](-[#6])(-[#6])-[#6]'), FragmentPattern('chloroform', '[Cl]-[#6](-[Cl])-[Cl]'), FragmentPattern('cycloheptane', '[#6]-1-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-1'), FragmentPattern('cyclohexane', '[#6]-1-[#6]-[#6]-[#6]-[#6]-[#6]-1'), FragmentPattern('dichloromethane', '[Cl]-[#6]-[Cl]'), FragmentPattern('diethyl ether', '[#6]-[#6]-[#8]-[#6]-[#6]'), FragmentPattern('diisopropyl ether', '[#6]-[#6](-[#6])-[#8]-[#6](-[#6])-[#6]'), FragmentPattern('dimethyl formamide', '[#6]-[#7](-[#6])-[#6]=[#8]'), FragmentPattern('dimethyl sulfoxide', '[#6]-[S](-[#6])=[#8]'), FragmentPattern('ethanol', '[#8]-[#6]-[#6]'), FragmentPattern('ethyl acetate', '[#6]-[#6]-[#8]-[#6](-[#6])=[#8]'), FragmentPattern('formic acid', '[#8]-[#6]=[#8]'), FragmentPattern('heptane', '[#6]-[#6]-[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('hexane', '[#6]-[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('isopropanol', '[#8]-[#6](-[#6])-[#6]'), FragmentPattern('methanol', '[#8]-[#6]'), FragmentPattern('N,N-dimethylacetamide', '[#6]-[#7](-[#6])-[#6](-[#6])=[#8]'), FragmentPattern('pentane', '[#6]-[#6]-[#6]-[#6]-[#6]'), FragmentPattern('propanol', '[#8]-[#6]-[#6]-[#6]'), FragmentPattern('pyridine', '[#6]-1=[#6]-[#6]=[#7]-[#6]=[#6]-1'), FragmentPattern('t-butyl methyl ether', '[#6]-[#8]-[#6](-[#6])(-[#6])-[#6]'), FragmentPattern('tetrahydrofurane', '[#6]-1-[#6]-[#6]-[#8]-[#6]-1'), FragmentPattern('toluene', '[#6]-[#6]~1~[#6]~[#6]~[#6]~[#6]~[#6]~1'), FragmentPattern('xylene', '[#6]-[#6]~1~[#6](-[#6])~[#6]~[#6]~[#6]~[#6]~1'))

The default list of FragmentPatterns to be used by FragmentRemover.

class scopy.ScoPretreat.pretreatutil.Reionizer(acid_base_pairs=(AcidBasePair('-OSO3H', 'OS(=O)(=O)[OH]', 'OS(=O)(=O)[O-]'), AcidBasePair('–SO3H', '[!O]S(=O)(=O)[OH]', '[!O]S(=O)(=O)[O-]'), AcidBasePair('-OSO2H', 'O[SD3](=O)[OH]', 'O[SD3](=O)[O-]'), AcidBasePair('-SO2H', '[!O][SD3](=O)[OH]', '[!O][SD3](=O)[O-]'), AcidBasePair('-OPO3H2', 'OP(=O)([OH])[OH]', 'OP(=O)([OH])[O-]'), AcidBasePair('-PO3H2', '[!O]P(=O)([OH])[OH]', '[!O]P(=O)([OH])[O-]'), AcidBasePair('-CO2H', 'C(=O)[OH]', 'C(=O)[O-]'), AcidBasePair('thiophenol', 'c[SH]', 'c[S-]'), AcidBasePair('(-OPO3H)-', 'OP(=O)([O-])[OH]', 'OP(=O)([O-])[O-]'), AcidBasePair('(-PO3H)-', '[!O]P(=O)([O-])[OH]', '[!O]P(=O)([O-])[O-]'), AcidBasePair('phthalimide', 'O=C2c1ccccc1C(=O)[NH]2', 'O=C2c1ccccc1C(=O)[N-]2'), AcidBasePair('CO3H (peracetyl)', 'C(=O)O[OH]', 'C(=O)O[O-]'), AcidBasePair('alpha-carbon-hydrogen-nitro group', 'O=N(O)[CH]', 'O=N(O)[C-]'), AcidBasePair('-SO2NH2', 'S(=O)(=O)[NH2]', 'S(=O)(=O)[NH-]'), AcidBasePair('-OBO2H2', 'OB([OH])[OH]', 'OB([OH])[O-]'), AcidBasePair('-BO2H2', '[!O]B([OH])[OH]', '[!O]B([OH])[O-]'), AcidBasePair('phenol', 'c[OH]', 'c[O-]'), AcidBasePair('SH (aliphatic)', 'C[SH]', 'C[S-]'), AcidBasePair('(-OBO2H)-', 'OB([O-])[OH]', 'OB([O-])[O-]'), AcidBasePair('(-BO2H)-', '[!O]B([O-])[OH]', '[!O]B([O-])[O-]'), AcidBasePair('cyclopentadiene', 'C1=CC=C[CH2]1', 'c1ccc[cH-]1'), AcidBasePair('-CONH2', 'C(=O)[NH2]', 'C(=O)[NH-]'), AcidBasePair('imidazole', 'c1cnc[nH]1', 'c1cnc[n-]1'), AcidBasePair('-OH (aliphatic alcohol)', '[CX4][OH]', '[CX4][O-]'), AcidBasePair('alpha-carbon-hydrogen-keto group', 'O=C([!O])[C!H0+0]', 'O=C([!O])[C-]'), AcidBasePair('alpha-carbon-hydrogen-acetyl ester group', 'OC(=O)[C!H0+0]', 'OC(=O)[C-]'), AcidBasePair('sp carbon hydrogen', 'C#[CH]', 'C#[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfone group', 'CS(=O)(=O)[C!H0+0]', 'CS(=O)(=O)[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfoxide group', 'C[SD3](=O)[C!H0+0]', 'C[SD3](=O)[C-]'), AcidBasePair('-NH2', '[CX4][NH2]', '[CX4][NH-]'), AcidBasePair('benzyl hydrogen', 'c[CX4H2]', 'c[CX3H-]'), AcidBasePair('sp2-carbon hydrogen', '[CX3]=[CX3!H0+0]', '[CX3]=[CX2-]'), AcidBasePair('sp3-carbon hydrogen', '[CX4!H0+0]', '[CX3-]')), charge_corrections=(ChargeCorrection('[Li, Na, K]', '[Li, Na, K;X0+0]', 1), ChargeCorrection('[Mg, Ca]', '[Mg, Ca;X0+0]', 2), ChargeCorrection('[Cl]', '[Cl;X0+0]', -1)))[source]

Bases: object

A class to fix charges and reionize a molecule such that the strongest acids ionize first.

reionize(mol)[source]

Enforce charges on certain atoms, then perform competitive reionization.

First, charge corrections are applied to ensure, for example, that free metals are correctly ionized. Then, if a molecule with multiple acid groups is partially ionized, ensure the strongest acids ionize first.

The algorithm works as follows:

  • Use SMARTS to find the strongest protonated acid and the weakest ionized acid.

  • If the ionized acid is weaker than the protonated acid, swap proton and repeat.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The molecule to reionize.

Returns

The reionized molecule.

Return type

rdkit.Chem.rdchem.Mol

scopy.ScoPretreat.pretreatutil.SIMPLE_FORMAT = '%(levelname)s: [%(validation)s] %(message)s'

The default format for log messages.

class scopy.ScoPretreat.pretreatutil.SmartsValidation(log)[source]

Bases: scopy.ScoPretreat.pretreatutil.Validation

Abstract superclass for Validations that log a message if a SMARTS pattern matches the molecule.

Subclasses can override the following attributes:

entire_fragment = False

Whether the SMARTS pattern should match an entire covalent unit.

level = 20

The logging level of the message.

message = 'Molecule matched %(smarts)s'

The message to log if the SMARTS pattern matches the molecule.

run(mol)[source]
property smarts

The SMARTS pattern as a string. Subclasses must implement this.

exception scopy.ScoPretreat.pretreatutil.StandardizeError[source]

Bases: scopy.ScoPretreat.pretreatutil.MolVSError

class scopy.ScoPretreat.pretreatutil.Standardizer(normalizations=(Normalization('Nitro to N+(O-)=O', '[N,P,As,Sb;X3:1](=[O,S,Se,Te:2])=[O,S,Se,Te:3]>>[*+1:1]([*-1:2])=[*:3]'), Normalization('Sulfone to S(=O)(=O)', '[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])'), Normalization('Pyridine oxide to n+O-', '[n:1]=[O:2]>>[n+:1][O-:2]'), Normalization('Azide to N=N+=N-', '[*,H:1][N:2]=[N:3]#[N:4]>>[*,H:1][N:2]=[N+:3]=[N-:4]'), Normalization('Diazo/azo to =N+=N-', '[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]'), Normalization('Sulfoxide to -S+(O-)-', '[!O:1][S+0;X3:2](=[O:3])[!O:4]>>[*:1][S+1:2]([O-:3])[*:4]'), Normalization('Phosphate to P(O-)=O', '[O,S,Se,Te;-1:1][P+;D4:2][O,S,Se,Te;-1:3]>>[*+0:1]=[P+0;D5:2][*-1:3]'), Normalization('C/S+N to C/S=N+', '[C,S;X3+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('P+N to P=N+', '[P;X4+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]'), Normalization('Normalize hydrazine-diazonium', '[CX4:1][NX3H:2]-[NX3H:3][CX4:4][NX2+:5]#[NX1:6]>>[CX4:1][NH0:2]=[NH+:3][C:4][N+0:5]=[NH:6]'), Normalization('Recombine 1,3-separated charges', '[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[N,P,As,Sb,O,S,Se,Te;+1:3]>>[*-0:1]=[*:2]-[*+0:3]'), Normalization('Recombine 1,3-separated charges', '[n,o,p,s;-1:1]:[a:2]=[N,O,P,S;+1:3]>>[*-0:1]:[*:2]-[*+0:3]'), Normalization('Recombine 1,3-separated charges', '[N,O,P,S;-1:1]-[a:2]:[n,o,p,s;+1:3]>>[*-0:1]=[*:2]:[*+0:3]'), Normalization('Recombine 1,5-separated charges', '[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[A:3]-[A:4]=[N,P,As,Sb,O,S,Se,Te;+1:5]>>[*-0:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Recombine 1,5-separated charges', '[n,o,p,s;-1:1]:[a:2]:[a:3]:[c:4]=[N,O,P,S;+1:5]>>[*-0:1]:[*:2]:[*:3]:[c:4]-[*+0:5]'), Normalization('Recombine 1,5-separated charges', '[N,O,P,S;-1:1]-[c:2]:[a:3]:[a:4]:[n,o,p,s;+1:5]>>[*-0:1]=[c:2]:[*:3]:[*:4]:[*+0:5]'), Normalization('Normalize 1,3 conjugated cation', '[N,O;+0!H0:1]-[A:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]=[*:2]-[*+0:3]'), Normalization('Normalize 1,3 conjugated cation', '[n;+0!H0:1]:[c:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]:[*:2]-[*+0:3]'), Normalization('Normalize 1,5 conjugated cation', '[N,O;+0!H0:1]-[A:2]=[A:3]-[A:4]=[N!$(*[O-]),O;+1H0:5]>>[*+1:1]=[*:2]-[*:3]=[*:4]-[*+0:5]'), Normalization('Normalize 1,5 conjugated cation', '[n;+0!H0:1]:[a:2]:[a:3]:[c:4]=[N!$(*[O-]),O;+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]-[*+0:5]'), Normalization('Charge normalization', '[F,Cl,Br,I,At;-1:1]=[O:2]>>[*-0:1][O-:2]'), Normalization('Charge recombination', '[N,P,As,Sb;-1:1]=[C+;v3:2]>>[*+0:1]#[C+0:2]')), acid_base_pairs=(AcidBasePair('-OSO3H', 'OS(=O)(=O)[OH]', 'OS(=O)(=O)[O-]'), AcidBasePair('–SO3H', '[!O]S(=O)(=O)[OH]', '[!O]S(=O)(=O)[O-]'), AcidBasePair('-OSO2H', 'O[SD3](=O)[OH]', 'O[SD3](=O)[O-]'), AcidBasePair('-SO2H', '[!O][SD3](=O)[OH]', '[!O][SD3](=O)[O-]'), AcidBasePair('-OPO3H2', 'OP(=O)([OH])[OH]', 'OP(=O)([OH])[O-]'), AcidBasePair('-PO3H2', '[!O]P(=O)([OH])[OH]', '[!O]P(=O)([OH])[O-]'), AcidBasePair('-CO2H', 'C(=O)[OH]', 'C(=O)[O-]'), AcidBasePair('thiophenol', 'c[SH]', 'c[S-]'), AcidBasePair('(-OPO3H)-', 'OP(=O)([O-])[OH]', 'OP(=O)([O-])[O-]'), AcidBasePair('(-PO3H)-', '[!O]P(=O)([O-])[OH]', '[!O]P(=O)([O-])[O-]'), AcidBasePair('phthalimide', 'O=C2c1ccccc1C(=O)[NH]2', 'O=C2c1ccccc1C(=O)[N-]2'), AcidBasePair('CO3H (peracetyl)', 'C(=O)O[OH]', 'C(=O)O[O-]'), AcidBasePair('alpha-carbon-hydrogen-nitro group', 'O=N(O)[CH]', 'O=N(O)[C-]'), AcidBasePair('-SO2NH2', 'S(=O)(=O)[NH2]', 'S(=O)(=O)[NH-]'), AcidBasePair('-OBO2H2', 'OB([OH])[OH]', 'OB([OH])[O-]'), AcidBasePair('-BO2H2', '[!O]B([OH])[OH]', '[!O]B([OH])[O-]'), AcidBasePair('phenol', 'c[OH]', 'c[O-]'), AcidBasePair('SH (aliphatic)', 'C[SH]', 'C[S-]'), AcidBasePair('(-OBO2H)-', 'OB([O-])[OH]', 'OB([O-])[O-]'), AcidBasePair('(-BO2H)-', '[!O]B([O-])[OH]', '[!O]B([O-])[O-]'), AcidBasePair('cyclopentadiene', 'C1=CC=C[CH2]1', 'c1ccc[cH-]1'), AcidBasePair('-CONH2', 'C(=O)[NH2]', 'C(=O)[NH-]'), AcidBasePair('imidazole', 'c1cnc[nH]1', 'c1cnc[n-]1'), AcidBasePair('-OH (aliphatic alcohol)', '[CX4][OH]', '[CX4][O-]'), AcidBasePair('alpha-carbon-hydrogen-keto group', 'O=C([!O])[C!H0+0]', 'O=C([!O])[C-]'), AcidBasePair('alpha-carbon-hydrogen-acetyl ester group', 'OC(=O)[C!H0+0]', 'OC(=O)[C-]'), AcidBasePair('sp carbon hydrogen', 'C#[CH]', 'C#[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfone group', 'CS(=O)(=O)[C!H0+0]', 'CS(=O)(=O)[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfoxide group', 'C[SD3](=O)[C!H0+0]', 'C[SD3](=O)[C-]'), AcidBasePair('-NH2', '[CX4][NH2]', '[CX4][NH-]'), AcidBasePair('benzyl hydrogen', 'c[CX4H2]', 'c[CX3H-]'), AcidBasePair('sp2-carbon hydrogen', '[CX3]=[CX3!H0+0]', '[CX3]=[CX2-]'), AcidBasePair('sp3-carbon hydrogen', '[CX4!H0+0]', '[CX3-]')), charge_corrections=(ChargeCorrection('[Li,Na,K]', '[Li,Na,K;X0+0]', 1), ChargeCorrection('[Mg,Ca]', '[Mg,Ca;X0+0]', 2), ChargeCorrection('[Cl]', '[Cl;X0+0]', -1)), tautomer_transforms=(TautomerTransform('1,3 (thio)keto/enol f', '[CX4!H0]-[C]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,3 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[C]=[C]', [], []), TautomerTransform('1,5 (thio)keto/enol f', '[CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,5 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]', [], []), TautomerTransform('aliphatic imine f', '[CX4!H0]-[C]=[NX2]', [], []), TautomerTransform('aliphatic imine r', '[NX3!H0]-[C]=[CX3]', [], []), TautomerTransform('special imine f', '[N!H0]-[C]=[CX3R0]', [], []), TautomerTransform('special imine r', '[CX4!H0]-[c]=[n]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift f', '[#7!H0]-[#6R1]=[O,#7X2]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift r', '[O,#7;!H0]-[#6R1]=[#7X2]', [], []), TautomerTransform('1,3 heteroatom H shift', '[#7,S,O,Se,Te;!H0]-[#7X2,#6,#15]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift', '[#7,#16,#8;!H0]-[#6,#7]=[#6]-[#6,#7]=[#7,#16,#8;H0]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift f', '[#7,#16,#8,Se,Te;!H0]-[#6,nX2]=[#6,nX2]-[#6,#7X2]=[#7X2,S,O,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift r', '[#7,S,O,Se,Te;!H0]-[#6,#7X2]=[#6,nX2]-[#6,nX2]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift f', '[#7,#8,#16,Se,Te;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6]-[#6,#7X2]=[#7X2,S,O,Se,Te,CX3]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]', [], []), TautomerTransform('1,9 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#7,O]', [], []), TautomerTransform('1,11 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#7X2,O]', [], []), TautomerTransform('furanone f', '[O,S,N;!H0]-[#6r5]=[#6X3r5;$([#6]([#6r5])=[#6r5])]', [], []), TautomerTransform('furanone r', '[#6r5!H0;$([#6]([#6r5])[#6r5])]-[#6r5]=[O,S,N]', [], []), TautomerTransform('keten/ynol f', '[C!H0]=[C]=[O,S,Se,Te;X1]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('keten/ynol r', '[O,S,Se,Te;!H0X2]-[C]#[C]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('ionic nitro/aci-nitro f', '[C!H0]-[N+;$([N][O-])]=[O]', [], []), TautomerTransform('ionic nitro/aci-nitro r', '[O!H0]-[N+;$([N][O-])]=[C]', [], []), TautomerTransform('oxim/nitroso f', '[O!H0]-[N]=[C]', [], []), TautomerTransform('oxim/nitroso r', '[C!H0]-[N]=[O]', [], []), TautomerTransform('oxim/nitroso via phenol f', '[O!H0]-[N]=[C]-[C]=[C]-[C]=[OH0]', [], []), TautomerTransform('oxim/nitroso via phenol r', '[O!H0]-[c]=[c]-[c]=[c]-[N]=[OH0]', [], []), TautomerTransform('cyano/iso-cyanic acid f', '[O!H0]-[C]#[N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('cyano/iso-cyanic acid r', '[N!H0]=[C]=[O]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('isocyanide f', '[C-0!H0]#[N+0]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('isocyanide r', '[N+!H0]#[C-]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('phosphonic acid f', '[OH]-[PH0]', [rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('phosphonic acid r', '[PH]=[O]', [rdkit.Chem.rdchem.BondType.SINGLE], [])), tautomer_scores=(TautomerScore('benzoquinone', '[#6]1([#6]=[#6][#6]([#6]=[#6]1)=,:[N,S,O])=,:[N,S,O]', 25), TautomerScore('oxim', '[#6]=[N][OH]', 4), TautomerScore('C=O', '[#6]=,:[#8]', 2), TautomerScore('N=O', '[#7]=,:[#8]', 2), TautomerScore('P=O', '[#15]=,:[#8]', 2), TautomerScore('C=hetero', '[#6]=[!#1;!#6]', 1), TautomerScore('methyl', '[CX4H3]', 1), TautomerScore('guanidine terminal=N', '[#7][#6](=[NR0])[#7H0]', 1), TautomerScore('guanidine endocyclic=N', '[#7;R][#6;R]([N])=[#7;R]', 2), TautomerScore('aci-nitro', '[#6]=[N+]([O-])[OH]', -4)), max_restarts=200, max_tautomers=1000, prefer_organic=False)[source]

Bases: object

The main class for performing standardization of molecules and deriving parent molecules.

The primary usage is via the standardize() method:

s = Standardizer()
mol1 = Chem.MolFromSmiles('C1=CC=CC=C1')
mol2 = s.standardize(mol1)

There are separate methods to derive fragment, charge, tautomer, isotope and stereo parent molecules.

addhs(mol)[source]
property canonicalize_tautomer
Returns

A callable TautomerCanonicalizer instance.

charge_parent(mol, skip_standardize=False)[source]

Return the charge parent of a given molecule.

The charge parent is the uncharged version of the fragment parent.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • skip_standardize (bool) – Set to True if mol has already been standardized.

Returns

The charge parent molecule.

Return type

rdkit.Chem.rdchem.Mol

property disconnect_metals
Returns

A callable MetalDisconnector instance.

property enumerate_tautomers
Returns

A callable TautomerEnumerator instance.

fragment_parent(mol, skip_standardize=False)[source]

Return the fragment parent of a given molecule.

The fragment parent is the largest organic covalent unit in the molecule.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • skip_standardize (bool) – Set to True if mol has already been standardized.

Returns

The fragment parent molecule.

Return type

rdkit.Chem.rdchem.Mol

isotope_parent(mol, skip_standardize=False)[source]

Return the isotope parent of a given molecule.

The isotope parent has all atoms replaced with the most abundant isotope for that element.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • skip_standardize (bool) – Set to True if mol has already been standardized.

Returns

The isotope parent molecule.

Return type

rdkit.Chem.rdchem.Mol

property largest_fragment
Returns

A callable LargestFragmentChooser instance.

property normalize
Returns

A callable Normalizer instance.

property reionize
Returns

A callable Reionizer instance.

property remove_fragments
Returns

A callable FragmentRemover instance.

rmhs(mol)[source]
standardize(mol)[source]

Return a standardized version the given molecule.

The standardization process consists of the following stages: RDKit RemoveHs(), RDKit SanitizeMol(), MetalDisconnector, Normalizer, Reionizer, RDKit AssignStereochemistry().

Parameters

mol (rdkit.Chem.rdchem.Mol) – The molecule to standardize.

Returns

The standardized molecule.

Return type

rdkit.Chem.rdchem.Mol

standardize_with_parents(mol)[source]
stereo_parent(mol, skip_standardize=False)[source]

Return the stereo parent of a given molecule.

The stereo parent has all stereochemistry information removed from tetrahedral centers and double bonds.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • skip_standardize (bool) – Set to True if mol has already been standardized.

Returns

The stereo parent molecule.

Return type

rdkit.Chem.rdchem.Mol

super_parent(mol, skip_standardize=False)[source]

Return the super parent of a given molecule.

THe super parent is fragment, charge, isotope, stereochemistry and tautomer insensitive. From the input molecule, the largest fragment is taken. This is uncharged and then isotope and stereochemistry information is discarded. Finally, the canonical tautomer is determined and returned.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • skip_standardize (bool) – Set to True if mol has already been standardized.

Returns

The super parent molecule.

Return type

rdkit.Chem.rdchem.Mol

tautomer_parent(mol, skip_standardize=False)[source]

Return the tautomer parent of a given molecule.

Parameters
  • mol (rdkit.Chem.rdchem.Mol) – The input molecule.

  • skip_standardize (bool) – Set to True if mol has already been standardized.

Returns

The tautomer parent molecule.

Return type

rdkit.Chem.rdchem.Mol

property uncharge
Returns

A callable Uncharger instance.

exception scopy.ScoPretreat.pretreatutil.StopValidateError[source]

Bases: scopy.ScoPretreat.pretreatutil.ValidateError

Called by Validations to stop any further validations from being performed.

scopy.ScoPretreat.pretreatutil.TAUTOMER_SCORES = (TautomerScore('benzoquinone', '[#6]1([#6]=[#6][#6]([#6]=[#6]1)=,:[N,S,O])=,:[N,S,O]', 25), TautomerScore('oxim', '[#6]=[N][OH]', 4), TautomerScore('C=O', '[#6]=,:[#8]', 2), TautomerScore('N=O', '[#7]=,:[#8]', 2), TautomerScore('P=O', '[#15]=,:[#8]', 2), TautomerScore('C=hetero', '[#6]=[!#1;!#6]', 1), TautomerScore('methyl', '[CX4H3]', 1), TautomerScore('guanidine terminal=N', '[#7][#6](=[NR0])[#7H0]', 1), TautomerScore('guanidine endocyclic=N', '[#7;R][#6;R]([N])=[#7;R]', 2), TautomerScore('aci-nitro', '[#6]=[N+]([O-])[OH]', -4))

The default list of TautomerScores.

scopy.ScoPretreat.pretreatutil.TAUTOMER_TRANSFORMS = (TautomerTransform('1,3 (thio)keto/enol f', '[CX4!H0]-[C]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,3 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[C]=[C]', [], []), TautomerTransform('1,5 (thio)keto/enol f', '[CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,5 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]', [], []), TautomerTransform('aliphatic imine f', '[CX4!H0]-[C]=[NX2]', [], []), TautomerTransform('aliphatic imine r', '[NX3!H0]-[C]=[CX3]', [], []), TautomerTransform('special imine f', '[N!H0]-[C]=[CX3R0]', [], []), TautomerTransform('special imine r', '[CX4!H0]-[c]=[n]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift f', '[#7!H0]-[#6R1]=[O,#7X2]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift r', '[O,#7;!H0]-[#6R1]=[#7X2]', [], []), TautomerTransform('1,3 heteroatom H shift', '[#7,S,O,Se,Te;!H0]-[#7X2,#6,#15]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift', '[#7,#16,#8;!H0]-[#6,#7]=[#6]-[#6,#7]=[#7,#16,#8;H0]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift f', '[#7,#16,#8,Se,Te;!H0]-[#6,nX2]=[#6,nX2]-[#6,#7X2]=[#7X2,S,O,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift r', '[#7,S,O,Se,Te;!H0]-[#6,#7X2]=[#6,nX2]-[#6,nX2]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift f', '[#7,#8,#16,Se,Te;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6]-[#6,#7X2]=[#7X2,S,O,Se,Te,CX3]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]', [], []), TautomerTransform('1,9 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#7,O]', [], []), TautomerTransform('1,11 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#7X2,O]', [], []), TautomerTransform('furanone f', '[O,S,N;!H0]-[#6r5]=[#6X3r5;$([#6]([#6r5])=[#6r5])]', [], []), TautomerTransform('furanone r', '[#6r5!H0;$([#6]([#6r5])[#6r5])]-[#6r5]=[O,S,N]', [], []), TautomerTransform('keten/ynol f', '[C!H0]=[C]=[O,S,Se,Te;X1]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('keten/ynol r', '[O,S,Se,Te;!H0X2]-[C]#[C]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('ionic nitro/aci-nitro f', '[C!H0]-[N+;$([N][O-])]=[O]', [], []), TautomerTransform('ionic nitro/aci-nitro r', '[O!H0]-[N+;$([N][O-])]=[C]', [], []), TautomerTransform('oxim/nitroso f', '[O!H0]-[N]=[C]', [], []), TautomerTransform('oxim/nitroso r', '[C!H0]-[N]=[O]', [], []), TautomerTransform('oxim/nitroso via phenol f', '[O!H0]-[N]=[C]-[C]=[C]-[C]=[OH0]', [], []), TautomerTransform('oxim/nitroso via phenol r', '[O!H0]-[c]=[c]-[c]=[c]-[N]=[OH0]', [], []), TautomerTransform('cyano/iso-cyanic acid f', '[O!H0]-[C]#[N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('cyano/iso-cyanic acid r', '[N!H0]=[C]=[O]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('isocyanide f', '[C-0!H0]#[N+0]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('isocyanide r', '[N+!H0]#[C-]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('phosphonic acid f', '[OH]-[PH0]', [rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('phosphonic acid r', '[PH]=[O]', [rdkit.Chem.rdchem.BondType.SINGLE], []))

The default list of TautomerTransforms.

class scopy.ScoPretreat.pretreatutil.TautomerCanonicalizer(transforms=(TautomerTransform('1,3 (thio)keto/enol f', '[CX4!H0]-[C]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,3 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[C]=[C]', [], []), TautomerTransform('1,5 (thio)keto/enol f', '[CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,5 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]', [], []), TautomerTransform('aliphatic imine f', '[CX4!H0]-[C]=[NX2]', [], []), TautomerTransform('aliphatic imine r', '[NX3!H0]-[C]=[CX3]', [], []), TautomerTransform('special imine f', '[N!H0]-[C]=[CX3R0]', [], []), TautomerTransform('special imine r', '[CX4!H0]-[c]=[n]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift f', '[#7!H0]-[#6R1]=[O,#7X2]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift r', '[O,#7;!H0]-[#6R1]=[#7X2]', [], []), TautomerTransform('1,3 heteroatom H shift', '[#7,S,O,Se,Te;!H0]-[#7X2,#6,#15]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift', '[#7,#16,#8;!H0]-[#6,#7]=[#6]-[#6,#7]=[#7,#16,#8;H0]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift f', '[#7,#16,#8,Se,Te;!H0]-[#6,nX2]=[#6,nX2]-[#6,#7X2]=[#7X2,S,O,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift r', '[#7,S,O,Se,Te;!H0]-[#6,#7X2]=[#6,nX2]-[#6,nX2]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift f', '[#7,#8,#16,Se,Te;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6]-[#6,#7X2]=[#7X2,S,O,Se,Te,CX3]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]', [], []), TautomerTransform('1,9 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#7,O]', [], []), TautomerTransform('1,11 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#7X2,O]', [], []), TautomerTransform('furanone f', '[O,S,N;!H0]-[#6r5]=[#6X3r5;$([#6]([#6r5])=[#6r5])]', [], []), TautomerTransform('furanone r', '[#6r5!H0;$([#6]([#6r5])[#6r5])]-[#6r5]=[O,S,N]', [], []), TautomerTransform('keten/ynol f', '[C!H0]=[C]=[O,S,Se,Te;X1]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('keten/ynol r', '[O,S,Se,Te;!H0X2]-[C]#[C]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('ionic nitro/aci-nitro f', '[C!H0]-[N+;$([N][O-])]=[O]', [], []), TautomerTransform('ionic nitro/aci-nitro r', '[O!H0]-[N+;$([N][O-])]=[C]', [], []), TautomerTransform('oxim/nitroso f', '[O!H0]-[N]=[C]', [], []), TautomerTransform('oxim/nitroso r', '[C!H0]-[N]=[O]', [], []), TautomerTransform('oxim/nitroso via phenol f', '[O!H0]-[N]=[C]-[C]=[C]-[C]=[OH0]', [], []), TautomerTransform('oxim/nitroso via phenol r', '[O!H0]-[c]=[c]-[c]=[c]-[N]=[OH0]', [], []), TautomerTransform('cyano/iso-cyanic acid f', '[O!H0]-[C]#[N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('cyano/iso-cyanic acid r', '[N!H0]=[C]=[O]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('isocyanide f', '[C-0!H0]#[N+0]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('isocyanide r', '[N+!H0]#[C-]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('phosphonic acid f', '[OH]-[PH0]', [rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('phosphonic acid r', '[PH]=[O]', [rdkit.Chem.rdchem.BondType.SINGLE], [])), scores=(TautomerScore('benzoquinone', '[#6]1([#6]=[#6][#6]([#6]=[#6]1)=,:[N,S,O])=,:[N,S,O]', 25), TautomerScore('oxim', '[#6]=[N][OH]', 4), TautomerScore('C=O', '[#6]=,:[#8]', 2), TautomerScore('N=O', '[#7]=,:[#8]', 2), TautomerScore('P=O', '[#15]=,:[#8]', 2), TautomerScore('C=hetero', '[#6]=[!#1;!#6]', 1), TautomerScore('methyl', '[CX4H3]', 1), TautomerScore('guanidine terminal=N', '[#7][#6](=[NR0])[#7H0]', 1), TautomerScore('guanidine endocyclic=N', '[#7;R][#6;R]([N])=[#7;R]', 2), TautomerScore('aci-nitro', '[#6]=[N+]([O-])[OH]', -4)), max_tautomers=1000)[source]

Bases: object

canonicalize(mol)[source]

Return a canonical tautomer by enumerating and scoring all possible tautomers.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The input molecule.

Returns

The canonical tautomer.

Return type

rdkit.Chem.rdchem.Mol

class scopy.ScoPretreat.pretreatutil.TautomerEnumerator(transforms=(TautomerTransform('1,3 (thio)keto/enol f', '[CX4!H0]-[C]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,3 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[C]=[C]', [], []), TautomerTransform('1,5 (thio)keto/enol f', '[CX4,NX3;!H0]-[C]=[C][CH0]=[O,S,Se,Te;X1]', [], []), TautomerTransform('1,5 (thio)keto/enol r', '[O,S,Se,Te;X2!H0]-[CH0]=[C]-[C]=[C,N]', [], []), TautomerTransform('aliphatic imine f', '[CX4!H0]-[C]=[NX2]', [], []), TautomerTransform('aliphatic imine r', '[NX3!H0]-[C]=[CX3]', [], []), TautomerTransform('special imine f', '[N!H0]-[C]=[CX3R0]', [], []), TautomerTransform('special imine r', '[CX4!H0]-[c]=[n]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift f', '[#7!H0]-[#6R1]=[O,#7X2]', [], []), TautomerTransform('1,3 aromatic heteroatom H shift r', '[O,#7;!H0]-[#6R1]=[#7X2]', [], []), TautomerTransform('1,3 heteroatom H shift', '[#7,S,O,Se,Te;!H0]-[#7X2,#6,#15]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift', '[#7,#16,#8;!H0]-[#6,#7]=[#6]-[#6,#7]=[#7,#16,#8;H0]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift f', '[#7,#16,#8,Se,Te;!H0]-[#6,nX2]=[#6,nX2]-[#6,#7X2]=[#7X2,S,O,Se,Te]', [], []), TautomerTransform('1,5 aromatic heteroatom H shift r', '[#7,S,O,Se,Te;!H0]-[#6,#7X2]=[#6,nX2]-[#6,nX2]=[#7,#16,#8,Se,Te]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift f', '[#7,#8,#16,Se,Te;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6]-[#6,#7X2]=[#7X2,S,O,Se,Te,CX3]', [], []), TautomerTransform('1,7 aromatic heteroatom H shift r', '[#7,S,O,Se,Te,CX4;!H0]-[#6,#7X2]=[#6]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[NX2,S,O,Se,Te]', [], []), TautomerTransform('1,9 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#6,#7X2]-[#6,#7X2]=[#7,O]', [], []), TautomerTransform('1,11 aromatic heteroatom H shift f', '[#7,O;!H0]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#6,nX2]-[#6,nX2]=[#7X2,O]', [], []), TautomerTransform('furanone f', '[O,S,N;!H0]-[#6r5]=[#6X3r5;$([#6]([#6r5])=[#6r5])]', [], []), TautomerTransform('furanone r', '[#6r5!H0;$([#6]([#6r5])[#6r5])]-[#6r5]=[O,S,N]', [], []), TautomerTransform('keten/ynol f', '[C!H0]=[C]=[O,S,Se,Te;X1]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('keten/ynol r', '[O,S,Se,Te;!H0X2]-[C]#[C]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('ionic nitro/aci-nitro f', '[C!H0]-[N+;$([N][O-])]=[O]', [], []), TautomerTransform('ionic nitro/aci-nitro r', '[O!H0]-[N+;$([N][O-])]=[C]', [], []), TautomerTransform('oxim/nitroso f', '[O!H0]-[N]=[C]', [], []), TautomerTransform('oxim/nitroso r', '[C!H0]-[N]=[O]', [], []), TautomerTransform('oxim/nitroso via phenol f', '[O!H0]-[N]=[C]-[C]=[C]-[C]=[OH0]', [], []), TautomerTransform('oxim/nitroso via phenol r', '[O!H0]-[c]=[c]-[c]=[c]-[N]=[OH0]', [], []), TautomerTransform('cyano/iso-cyanic acid f', '[O!H0]-[C]#[N]', [rdkit.Chem.rdchem.BondType.DOUBLE, rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('cyano/iso-cyanic acid r', '[N!H0]=[C]=[O]', [rdkit.Chem.rdchem.BondType.TRIPLE, rdkit.Chem.rdchem.BondType.SINGLE], []), TautomerTransform('isocyanide f', '[C-0!H0]#[N+0]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('isocyanide r', '[N+!H0]#[C-]', [rdkit.Chem.rdchem.BondType.TRIPLE], [-1, 1]), TautomerTransform('phosphonic acid f', '[OH]-[PH0]', [rdkit.Chem.rdchem.BondType.DOUBLE], []), TautomerTransform('phosphonic acid r', '[PH]=[O]', [rdkit.Chem.rdchem.BondType.SINGLE], [])), max_tautomers=1000)[source]

Bases: object

enumerate(mol)[source]

Enumerate all possible tautomers and return them as a list.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The input molecule.

Returns

A list of all possible tautomers of the molecule.

Return type

list of rdkit.Chem.rdchem.Mol

class scopy.ScoPretreat.pretreatutil.TautomerScore(name, smarts, score)[source]

Bases: object

A substructure defined by SMARTS and its score contribution to determine the canonical tautomer.

property smarts
class scopy.ScoPretreat.pretreatutil.TautomerTransform(name, smarts, bonds=(), charges=(), radicals=())[source]

Bases: object

Rules to transform one tautomer to another.

Each TautomerTransform is defined by a SMARTS pattern where the transform involves moving a hydrogen from the first atom in the pattern to the last atom in the pattern. By default, alternating single and double bonds along the pattern are swapped accordingly to account for the hydrogen movement. If necessary, the transform can instead define custom resulting bond orders and also resulting atom charges.

BONDMAP = {'#': rdkit.Chem.rdchem.BondType.TRIPLE, '-': rdkit.Chem.rdchem.BondType.SINGLE, ':': rdkit.Chem.rdchem.BondType.AROMATIC, '=': rdkit.Chem.rdchem.BondType.DOUBLE}
CHARGEMAP = {'+': 1, '-': -1, '0': 0}
property tautomer
class scopy.ScoPretreat.pretreatutil.Uncharger(acid_base_pairs=(AcidBasePair('-OSO3H', 'OS(=O)(=O)[OH]', 'OS(=O)(=O)[O-]'), AcidBasePair('–SO3H', '[!O]S(=O)(=O)[OH]', '[!O]S(=O)(=O)[O-]'), AcidBasePair('-OSO2H', 'O[SD3](=O)[OH]', 'O[SD3](=O)[O-]'), AcidBasePair('-SO2H', '[!O][SD3](=O)[OH]', '[!O][SD3](=O)[O-]'), AcidBasePair('-OPO3H2', 'OP(=O)([OH])[OH]', 'OP(=O)([OH])[O-]'), AcidBasePair('-PO3H2', '[!O]P(=O)([OH])[OH]', '[!O]P(=O)([OH])[O-]'), AcidBasePair('-CO2H', 'C(=O)[OH]', 'C(=O)[O-]'), AcidBasePair('thiophenol', 'c[SH]', 'c[S-]'), AcidBasePair('(-OPO3H)-', 'OP(=O)([O-])[OH]', 'OP(=O)([O-])[O-]'), AcidBasePair('(-PO3H)-', '[!O]P(=O)([O-])[OH]', '[!O]P(=O)([O-])[O-]'), AcidBasePair('phthalimide', 'O=C2c1ccccc1C(=O)[NH]2', 'O=C2c1ccccc1C(=O)[N-]2'), AcidBasePair('CO3H (peracetyl)', 'C(=O)O[OH]', 'C(=O)O[O-]'), AcidBasePair('alpha-carbon-hydrogen-nitro group', 'O=N(O)[CH]', 'O=N(O)[C-]'), AcidBasePair('-SO2NH2', 'S(=O)(=O)[NH2]', 'S(=O)(=O)[NH-]'), AcidBasePair('-OBO2H2', 'OB([OH])[OH]', 'OB([OH])[O-]'), AcidBasePair('-BO2H2', '[!O]B([OH])[OH]', '[!O]B([OH])[O-]'), AcidBasePair('phenol', 'c[OH]', 'c[O-]'), AcidBasePair('SH (aliphatic)', 'C[SH]', 'C[S-]'), AcidBasePair('(-OBO2H)-', 'OB([O-])[OH]', 'OB([O-])[O-]'), AcidBasePair('(-BO2H)-', '[!O]B([O-])[OH]', '[!O]B([O-])[O-]'), AcidBasePair('cyclopentadiene', 'C1=CC=C[CH2]1', 'c1ccc[cH-]1'), AcidBasePair('-CONH2', 'C(=O)[NH2]', 'C(=O)[NH-]'), AcidBasePair('imidazole', 'c1cnc[nH]1', 'c1cnc[n-]1'), AcidBasePair('-OH (aliphatic alcohol)', '[CX4][OH]', '[CX4][O-]'), AcidBasePair('alpha-carbon-hydrogen-keto group', 'O=C([!O])[C!H0+0]', 'O=C([!O])[C-]'), AcidBasePair('alpha-carbon-hydrogen-acetyl ester group', 'OC(=O)[C!H0+0]', 'OC(=O)[C-]'), AcidBasePair('sp carbon hydrogen', 'C#[CH]', 'C#[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfone group', 'CS(=O)(=O)[C!H0+0]', 'CS(=O)(=O)[C-]'), AcidBasePair('alpha-carbon-hydrogen-sulfoxide group', 'C[SD3](=O)[C!H0+0]', 'C[SD3](=O)[C-]'), AcidBasePair('-NH2', '[CX4][NH2]', '[CX4][NH-]'), AcidBasePair('benzyl hydrogen', 'c[CX4H2]', 'c[CX3H-]'), AcidBasePair('sp2-carbon hydrogen', '[CX3]=[CX3!H0+0]', '[CX3]=[CX2-]'), AcidBasePair('sp3-carbon hydrogen', '[CX4!H0+0]', '[CX3-]')))[source]

Bases: object

Class for neutralizing charges in a molecule.

This class uncharges molecules by adding and/or removing hydrogens. In cases where there is a positive charge that is not neutralizable, any corresponding negative charge is also preserved.

uncharge(mol)[source]

Neutralize molecule by adding/removing hydrogens.

Parameters

mol (rdkit.Chem.rdchem.Mol) – The molecule to uncharge.

Returns

The uncharged molecule.

Return type

rdkit.Chem.rdchem.Mol

scopy.ScoPretreat.pretreatutil.VALIDATIONS = (<class 'scopy.ScoPretreat.pretreatutil.IsNoneValidation'>, <class 'scopy.ScoPretreat.pretreatutil.NoAtomValidation'>, <class 'scopy.ScoPretreat.pretreatutil.FragmentValidation'>, <class 'scopy.ScoPretreat.pretreatutil.NeutralValidation'>, <class 'scopy.ScoPretreat.pretreatutil.IsotopeValidation'>)

The default list of Validations used by Validator.

exception scopy.ScoPretreat.pretreatutil.ValidateError[source]

Bases: scopy.ScoPretreat.pretreatutil.MolVSError

class scopy.ScoPretreat.pretreatutil.Validation(log)[source]

Bases: object

The base class that all Validation subclasses must inherit from.

run(mol)[source]
class scopy.ScoPretreat.pretreatutil.Validator(validations=(<class 'scopy.ScoPretreat.pretreatutil.IsNoneValidation'>, <class 'scopy.ScoPretreat.pretreatutil.NoAtomValidation'>, <class 'scopy.ScoPretreat.pretreatutil.FragmentValidation'>, <class 'scopy.ScoPretreat.pretreatutil.NeutralValidation'>, <class 'scopy.ScoPretreat.pretreatutil.IsotopeValidation'>), log_format='%(levelname)s: [%(validation)s] %(message)s', level=20, stdout=False, raw=False)[source]

Bases: object

The main class for running Validations on molecules.

validate(mol)[source]
scopy.ScoPretreat.pretreatutil.canonicalize_tautomer_smiles(smiles)[source]

Return a standardized canonical tautomer SMILES string given a SMILES string.

Note: This is a convenience function for quickly standardizing and finding the canonical tautomer for a single SMILES string. It is more efficient to use the Standardizer class directly when working with many molecules or when custom options are needed.

Parameters

smiles (string) – The SMILES for the molecule.

Returns

The SMILES for the standardize canonical tautomer.

Return type

string.

scopy.ScoPretreat.pretreatutil.enumerate_tautomers_smiles(smiles)[source]

Return a set of tautomers as SMILES strings, given a SMILES string.

Parameters

smiles – A SMILES string.

Returns

A set containing SMILES strings for every possible tautomer.

Return type

set of strings.

scopy.ScoPretreat.pretreatutil.is_organic(fragment)[source]

Return true if fragment contains at least one carbon atom.

Parameters

fragment – The fragment as an RDKit Mol object.

scopy.ScoPretreat.pretreatutil.memoized_property(fget)[source]

Decorator to create memoized properties.

scopy.ScoPretreat.pretreatutil.pairwise(iterable)[source]

Utility function to iterate in a pairwise fashion.

scopy.ScoPretreat.pretreatutil.standardize_smiles(smiles)[source]

Return a standardized canonical SMILES string given a SMILES string.

Note: This is a convenience function for quickly standardizing a single SMILES string. It is more efficient to use the Standardizer class directly when working with many molecules or when custom options are needed.

Parameters

smiles (string) – The SMILES for the molecule.

Returns

The SMILES for the standardized molecule.

Return type

string.

scopy.ScoPretreat.pretreatutil.validate_smiles(smiles)[source]

Return log messages for a given SMILES string using the default validations.

Note: This is a convenience function for quickly validating a single SMILES string. It is more efficient to use the Validator class directly when working with many molecules or when custom options are needed.

Parameters

smiles (string) – The SMILES for the molecule.

Returns

A list of log messages.

Return type

list of strings.

Module contents

MolVS - Molecule Validation and Standardization

MolVS is a python tool built on top of RDKit that performs validation and standardization of chemical structures.