Install RDKIT
2020-09-15 20:18:59

1. install boost using python3

ref: https://github.com/pupil-labs/pupil/issues/874, huangjiancong1

tar -xzvf boost_1_65_1.tar.gz

cd boost_1_65_1

echo "using mpi ;

using gcc : : g++ ;

using python : 3.6 : /usr/bin/python3 : /usr/include/python3.6m : /usr/local/lib ;" > ~/user-config.jam

./bootstrap.sh --with-python=/usr/bin/python3 --with-python-version=3.6 --with-python-root=/usr/local/lib/python3.6 --prefix=/usr/local

sudo ./b2 install -a --with=all

2. install rdkit

modifying CMakeList boost version 1.5.1 to the installed version of boost.

change all the path below!

cmake version needs to be ~= 3.1

cmake -DPYTHON_LIBRARY=/usr/lib/python3.6/config/libpython3.6.a \

-DPYTHON_INCLUDE_DIR=/usr/include/python3.6/ \

-DPYTHON_EXECUTABLE=/usr/bin/python3 \

-DBOOST_LIBRARIES=libboost_python3.so.1.65.1 \

-DBoost_INCLUDE_DIR=include_foldr ..

3. Add rdkitpath to PYTHONPATH, libpath to LD_LIBRARY_PATH

▼ more
Turning off TF2 auto-sharding warning
2020-09-14 19:15:49

https://github.com/tensorflow/tensorflow/issues/42146#issuecomment-671484239

Message: "Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset."

If your Tensorflow scripts leave this log message, then it falls back to use DATA type sharding. Thus, to turn off the log message you can set auto_shard_policy to DATA using tf.data.Options() as follows:

options = tf.data.Options()

options.experimental_distribute.auto_shard_policy = tf.data.experimental.AutoShardPolicy.DATA

dataset = dataset.with_options(options)

▼ more
Tensorflow 2.4.0
2020-09-10 15:31:35

checking whether mkl is enabled or not.

python -c "from tensorflow.python.framework import test_util;print(test_util.IsMklEnabled())"

▼ more
Molecular embedding from sdf(3D)
2020-09-02 11:16:46

ref: https://www.rdkit.org/docs/GettingStartedInPython.html

ref: https://github.com/keiserlab/e3fp

ref: https://github.com/CanisW/TF3P/blob/master/data/utils.py

. prerequisites

rdkit : https://www.rdkit.org/docs/index.html

 > import rdkit.Chem as Chem

 > from rdkit.Chem import AllChem

e3fp : pip install e3fp

 > from e3fp.fingerprint.generate import fprints_dict_from_mol

. input: sdf (3D) to mol

 - text sdf

 > suppl = Chem.SDMolSupplier(/path_to_sdf)

 > mol = suppl[0]

 - binary sdf

 > inf = open(/path_to_sdf,'rb')

 > fsuppl = Chem.ForwardSDMolSupplier(inf)

 - zip sdf (import gzip)

 > inf = gzip.open('/path_to_sdf.gz')

 > fsuppl = Chem.ForwardSDMolSupplier(inf)

 * fsuppl is not randomly accessed

 example)

 for mol in fsuppl: 

   ... use "mol" ...

. output (previous researches)

 - ECFP: ecfp_obj = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=nBits)

 - MACCSKey: macc_obj = AllChem.GetMACCSKeysFingerprint(mol)

 - E3FP: e3fp_obj = fprints_dict_from_mol(mol, bits=nBits)[5][0].to_rdkit()

▼ more