Tanimoto Similarity Explained

Tanimoto similarity, also known as the Jaccard index or Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets. In cheminformatics, Tanimoto similarity is often used to measure the similarity between two molecules based on their structural features.

When comparing two molecules, their structures can be represented as bit vectors, where each bit represents the presence or absence of a particular structural feature (e.g., a certain chemical bond or functional group). These bit vectors are also known as fingerprints.

The Tanimoto similarity coefficient (T) between two molecules A and B is calculated using the following formula:

\[T(A, B) = \frac{|A \cap B|}{|A| + |B| - |A \cap B|}\]

Where: - \(( |A \cap B| )\) is the number of features common to both A and B (i.e., the number of bits that are set to 1 in both fingerprints). - \(( |A| )\) is the number of features in A (i.e., the number of bits set to 1 in the fingerprint of A). - \(( |B| )\) is the number of features in B (i.e., the number of bits set to 1 in the fingerprint of B).

The Tanimoto similarity ranges from 0 to 1, where 0 indicates no similarity and 1 indicates that the molecules are identical in terms of the features considered. In drug discovery and cheminformatics, Tanimoto similarity is widely used to compare the structural similarity of different chemical compounds and to search chemical databases for compounds that are structurally similar to a query molecule.

[1]:
from rdkit import Chem
from rdkit import DataStructs
from rdkit.Chem import AllChem

# Example SMILES representation of two molecules
mol1_smiles = 'CC(=O)OC1=CC=CC=C1C(=O)O'
mol2_smiles = 'CC(=O)OC1=CC=CC=C1'

# Convert SMILES to RDKit molecule objects
mol1 = Chem.MolFromSmiles(mol1_smiles)
mol2 = Chem.MolFromSmiles(mol2_smiles)

# Calculate fingerprints for each molecule
fp1 = AllChem.GetMorganFingerprintAsBitVect(mol1, 2, nBits=1024)
fp2 = AllChem.GetMorganFingerprintAsBitVect(mol2, 2, nBits=1024)

# Calculate Tanimoto similarity
tanimoto_similarity = DataStructs.TanimotoSimilarity(fp1, fp2)
tanimoto_similarity
[1]:
0.5357142857142857