在部分应用场景下,我们需要对两个分子片段进行拼接,例如锂电电解液数据库 LiBE
然而,当前并没有合适的拼接方法。下面是一些已有方法的调研结果:
- 在 LiBE 论文的附录里,作者使用 pymatgen 进行分子拼接。
- 其思路是:将两分子整体平移到靠近的位置,互不成键,不考虑成键原子在分子中的位置。作者提到,后续用 薛定谔 套件 (Schrödinger Suite) + 力场 采样出一个初猜。
- 简单说就是 pymatgen 只是让两分子靠近,真正拼接还得用力场。不行。
- 一番搜索后,在 这个案例 里,作者提到,使用 官能团取代 的思路可以实现 3D 层面的对接
- 需要将需要对接的原子调整到分子中第二的位置,在分子中排序第一的位置添加一个虚原子,虚原子的位置要跟需要对接的原子成键
- 使用官能团取代方法,消掉两个原子中新添加的虚原子,实现对接
- 评估:3D 整体平移思想,仅保证需要对接位置的正确成键,可能会造成两分子重叠。需要添加虚原子,需要正确排序等,代码实现量中等
- 基于 rdkit 的方法
- 基于官能团取代的方法:
- 评估:两分子均需要用 smiles 表示,可能无法应用于部分自由基体系;源码是针对已知 dummy atom 的情况,而我们没有 dummy atom,面临与 pymatgen 相当的工程量
- 基于 reactant 的方法:
- 评估:使用 SMARTS 正确表示化学反应,需要用 SMILES 表示两分子,化学反应本身可能是遵循模版法
- 基于蛋白质拼接(dock)思路的方法:
- 如果能顺利调用相关程序,工程量应该不大,但可能学习相关软件成本较高
- 蛋白拼接精度要求很低
考虑以上种种问题,本人决定独自写一个拼接脚本,如下:
import ase
import numpy as np
from ase import Atoms, Atom
from ase.data import covalent_radii, atomic_numbers
from ase.io import read, write
from ase.neighborlist import build_neighbor_list, natural_cutoffs
from scipy.spatial import distance_matrix
def get_bond_length(atom_1_symbol: str, atom_2_symbol: str, skin: float = 0):
return covalent_radii[atomic_numbers[atom_1_symbol]] + covalent_radii[atomic_numbers[atom_2_symbol]] + skin
def get_nearest_neighbor(a_molecule: ase.Atoms, atom_index: int, cutoff_mult: float = None, sort_flag: bool = True):
if cutoff_mult:
cutoffs = natural_cutoffs(atoms=a_molecule, mult=cutoff_mult)
neighborList = build_neighbor_list(a_molecule, cutoffs=cutoffs, bothways=True, self_interaction=False)
else:
neighborList = build_neighbor_list(a_molecule, bothways=True, self_interaction=False)
dok_matrix = neighborList.get_connectivity_matrix()
dok_adj = dok_matrix[atom_index]
adj_array = dok_adj.tocoo().col
if sort_flag:
dist = []
for ii in adj_array:
dist.append(np.linalg.norm(a_molecule.positions[atom_index] - a_molecule.positions[ii]))
map_dict = dict(zip(dist, adj_array))
sorted_dist = sorted(dist)
sorted_neighbor_indexes = []
for a_dist in sorted_dist:
sorted_neighbor_indexes.append(map_dict[a_dist])
return sorted_neighbor_indexes
else:
return adj_array
def swap_atoms(a_molecule: ase.Atoms, atom_index_1: int, atom_index_2: int):
a_molecule_copy = a_molecule.copy()
a_molecule[atom_index_1].position = a_molecule_copy[atom_index_2].position
a_molecule[atom_index_1].symbol = a_molecule_copy[atom_index_2].symbol
a_molecule[atom_index_2].position = a_molecule_copy[atom_index_1].position
a_molecule[atom_index_2].symbol = a_molecule_copy[atom_index_1].symbol
return a_molecule
def rm_and_sort_atoms(a_molecule: ase.Atoms, atom_index_to_rm: int):
nearest_neighbor_index = get_nearest_neighbor(a_molecule=a_molecule, atom_index=atom_index_to_rm)[0]
del a_molecule[atom_index_to_rm]
if nearest_neighbor_index > atom_index_to_rm:
nearest_neighbor_index = nearest_neighbor_index - 1
a_molecule = swap_atoms(a_molecule=a_molecule, atom_index_1=0, atom_index_2=nearest_neighbor_index)
return a_molecule
def get_addon_pos_by_sample(addon_symbol: str, tgt_atom_index: int, tgt_molecule: Atoms, sample_times: int = 1000,
skin: float = 0, cutoff_mult: float = 1.5):
tgt_atom = tgt_molecule[tgt_atom_index]
tgt_bond_length = get_bond_length(atom_1_symbol=addon_symbol, atom_2_symbol=tgt_atom.symbol, skin=skin)
neighbor_indexes = get_nearest_neighbor(a_molecule=tgt_molecule, atom_index=tgt_atom_index, cutoff_mult=cutoff_mult)
neighbor_positions = tgt_molecule.get_positions()[neighbor_indexes]
min_rev_distance = 100000000
for i in range(sample_times):
theta = np.random.uniform(0, 2 * np.pi)
phi = np.random.uniform(0, np.pi)
x = np.sin(phi) * np.cos(theta)
y = np.sin(phi) * np.sin(theta)
z = np.cos(phi)
new_atom_position = tgt_atom.position + tgt_bond_length * np.array([x, y, z])
distances = distance_matrix(new_atom_position.reshape((1, -1)), neighbor_positions).reshape(-1)
a_distance = (1 / distances).sum()
if a_distance < min_rev_distance:
min_rev_d_point = new_atom_position
min_rev_distance = a_distance
return min_rev_d_point, neighbor_indexes
def combine_2_mols(molecule_1: Atoms, molecule_2: Atoms, tgt_atom_1_index: int, tgt_atom_2_index: int,
sample_times: int = 1000, skin: float = 0, cutoff_mult: float = 1.5,
rotation_times: int = 3):
# translation
if len(molecule_1) > len(molecule_2):
main_mol, sub_mol = molecule_1, molecule_2
tgt_atom_main_index, tgt_atom_addon_index = tgt_atom_1_index, tgt_atom_2_index
else:
main_mol, sub_mol = molecule_2, molecule_1
tgt_atom_main_index, tgt_atom_addon_index = tgt_atom_2_index, tgt_atom_1_index
addon_atom = sub_mol[tgt_atom_addon_index]
addon_position, neighbor_indexes = get_addon_pos_by_sample(addon_symbol=addon_atom.symbol,
tgt_atom_index=tgt_atom_main_index,
tgt_molecule=main_mol,
sample_times=sample_times, skin=skin,
cutoff_mult=cutoff_mult)
translation_vec = addon_position - addon_atom.position
sub_mol.translate(translation_vec)
# rotate along with the nearest neighbors to minimize the fragments repulsion
main_mol_positions = main_mol.get_positions()
min_rev_distance = 100000000
sub_mol_nb_indexes = get_nearest_neighbor(a_molecule=sub_mol, atom_index=tgt_atom_addon_index, cutoff_mult=cutoff_mult)
for i in range(sample_times):
sub_mol_copy = sub_mol.copy()
for a_neighbor_index in sub_mol_nb_indexes[:rotation_times]:
rotate_angle = np.random.uniform(0, 360)
rotate_vec = addon_position - sub_mol_copy[a_neighbor_index].position
sub_mol_copy.rotate(a=rotate_angle, v=rotate_vec, center=addon_position)
sub_mol_copy_positions = sub_mol_copy.get_positions()
frag_distances = distance_matrix(sub_mol_copy_positions, main_mol_positions)
a_frag_distance = (1 / frag_distances).sum()
if a_frag_distance < min_rev_distance:
min_rev_distance = a_frag_distance
final_sub_mol = sub_mol_copy.copy()
# merge
for an_atom in final_sub_mol:
main_mol.append(an_atom)
return main_mol
def combine_2_mols_with_dummy(molecule_1: Atoms, molecule_2: Atoms, dummy_atom_index_1: int, dummy_atom_index_2: int,
sample_times: int = 1000, skin: float = 0, cutoff_mult: float = 1.5,
rotation_times: int = 3):
# remove two dummy atoms and sort the to-be-bonded atom to index 0
mol_1_without_dummy = rm_and_sort_atoms(a_molecule=molecule_1, atom_index_to_rm=dummy_atom_index_1)
mol_2_without_dummy = rm_and_sort_atoms(a_molecule=molecule_2, atom_index_to_rm=dummy_atom_index_2)
combinend_mol = combine_2_mols(molecule_1=mol_1_without_dummy, molecule_2=mol_2_without_dummy,
tgt_atom_1_index=0, tgt_atom_2_index=0, rotation_times=rotation_times,
sample_times=sample_times, skin=skin, cutoff_mult=cutoff_mult)
return combinend_mol
if __name__ == '__main__':
from ase.visualize import view
from ase.build.molecule import molecule
main_mol = molecule('C7NH5')
sub_mol = molecule('BDA')
view(main_mol)
view(sub_mol)
final_mol = combine_2_mols_with_dummy(molecule_1=main_mol, molecule_2=sub_mol,
dummy_atom_index_1=11, dummy_atom_index_2=6)
view(final_mol)
效果如下:
其核心思路 & 安装、使用方法如下:
How it works?
In short, it could be summarized into 2 steps:
- Translation:
- The new bond length is determined by the summation of the covalent radii of the two atoms.
- Set one atom as the central of a sphere, sample positions on this sphere and calculate the repulsion between the new addon to the whole molecule. (The repulsion is measured by the summation of the inverse distance.)
- Choose the minimum repulsion position and translate the whole addon molecule as a rigid body.
- Rotation:
- Set the bonded site as center and the nearest neighbor as rotation axes. Rotate the addon molecule as a rigid body
- Sample rotation angles to minimize the repulsion between the addon molecule to the main molecule. (Again, the repulsion is measured by the summation of the inverse distance.)
- Choose the minimum repulsion image and this is the final combination state.
Install
pip install combinemols3d
Usage
- Prepare your to-be-bonded molecules as
ase.Atoms
object. (Readase
documentation for format manipulation.) - Determine which sites to be bonded together. (dummy sites)
- Combine them by eliminate the dummy sites.
from ase.visualize import view
from ase.build.molecule import molecule
from combinemols3d.CombineMols3D import combine_2_mols_with_dummy, combine_2_mols
main_mol = molecule('C7NH5')
sub_mol = molecule('BDA')
view(main_mol)
view(sub_mol)
final_mol = combine_2_mols_with_dummy(molecule_1=main_mol.copy(), molecule_2=sub_mol.copy(),
dummy_atom_index_1=11, dummy_atom_index_2=6)
view(final_mol)
If you don’t want to eliminate the dummy sites, use the combine_2_mols
instead:
final_mol = combine_2_mols(molecule_1=main_mol, molecule_2=sub_mol,
tgt_atom_1_index=11, tgt_atom_2_index=6)
view(final_mol)
Other parameters are same for the two functions:
-
Conformation related:
-
skin
: to get a proper bond length for the new bond. Here we adopt the covalent radii method. A skin parameter is added to finetune the bond length, though it is 0 by default.R ( A B ) = r ( A ) + r ( B ) + s k i n R(\mathrm{AB})=r(\mathrm{A})+r(\mathrm{B})+skin R(AB)=r(A)+r(B)+skin
-
cutoff_mult
: there are several places to detect neighbors for a specific atom. By default, it isnatural cutoff
as follow equation:D e t e c t R a n g e ( A B ) = r ( A ) + r ( B ) + 0.3 DetectRange(\mathrm{AB})=r(\mathrm{A})+r(\mathrm{B})+0.3 DetectRange(AB)=r(A)+r(B)+0.3
We can enlarge our search scope to multiply the
natural cutoff
as follow equation:D e t e c t R a n g e ( A B ) = c u t o f f m u l t ∗ ( r ( A ) + r ( B ) + 0.3 ) DetectRange(\mathrm{AB})=cutoff mult*(r(\mathrm{A})+r(\mathrm{B})+0.3) DetectRange(AB)=cutoffmult∗(r(A)+r(B)+0.3)
-
-
Performance related:
sample_times
: how many times to sample addon positions in translation and angles in rotation.rotation_times
: In the rotation stage, we will rotate the whole addon molecule as a rigid body. The orientations of the addon site to its neighbors are taken as fixed rotation axes. How many neighbors to be considered could be performance critical.
如果你觉得这段代码有用,请到 github 仓库上给我一个星星:地址