17

A friend of mine made a tattoo on his chest of some chemical substance formula.

He challenged us to identify which substance is that. I thought that perhaps a combination of:

  • Mathematica image processing capabilities
  • Wolfram|Alpha chemical data
  • Manual intervention

could do the trick. Here is the picture:

enter image description here

I am looking for pointers on how to attack the problem (like keywords of Mathematica functions, etc).

EDIT 1: I suppose the following need to be done:

  • Extract the formula from the picture as basic (2D) structure diagram
  • Define a distance metric for images or use existing one (e.g., ImageDistance[])
  • Calculate the distance of our element's diagram against all ChemicalData[] elements
  • Pick the few first and do a visual verification

At the moment 1st step seems like the most challenging.

EDIT 2: As shown by Sjoerd it is much easier to search for a formula as string rather than as image!

Sjoerd C. de Vries
  • 65,815
  • 14
  • 188
  • 323
stathisk
  • 3,054
  • 20
  • 37
  • Anyone know a quick way to import all chemical data? I ran ChemicalData/@ChemicalData[] but it connects to wolfram servers once for each entry, so some 40k times. Had to quit the kernel to get it to stop. (v8) – ssch Sep 04 '13 at 18:38
  • 2
    @ssch Perhaps this is useful? http://mathematica.stackexchange.com/questions/3549/how-to-save-chemicaldata-queries-so-that-they-are-available-immediately-on-noteb – stathisk Sep 04 '13 at 18:41
  • 6
    I don't want to be a party pooper, so here's the spoiler alert. – István Zachar Sep 04 '13 at 19:29
  • 3
    All I can say is drat - why was I giving a lecture when this question showed up?!?!? – bobthechemist Sep 05 '13 at 01:00

1 Answers1

28

Preload all chemical data:

ChemicalData[All, "Preload"];
RebuildPacletData[]; (* the latter should not really be necessary *)

Get all names:

cd = ChemicalData[];

Get their molecular formulae:

l = ChemicalData[#, "MolecularFormulaString"] & /@ cd;

By counting the Cs, Os and Hs in the tattooed diagram we know we have to find $\rm{C_{19}H_{28}O_{2}}$. Looking for all molecules that have that molecular formula:

p = Position[l, "C19H28O2"];

Column[Labeled[ChemicalData[#, "StructureDiagram"], #] & /@ Extract[cd, p]]

Mathematica graphics

Looks like it is Testosterone.


UPDATE


In 10.4 one can call various external services dealing with chemical and/or pharmaceutical data, such as PubChem, ChemSpider and OpenPHACTS. If one knows how to code the chemical diagram into a SMILES string this gives an alternative approach to identifying this compound:

pubchem = ServiceConnect["PubChem"];

pubchem[
  "CompoundSynonyms", 
  {"SMILES" -> "CC12CCC3C(C1CCC2O)CCC4=CC(=O)CCC34C"}
][1, "Synonym"] //Normal // Column

Mathematica graphics

Among all the synonyms we see testosterone pop up.

Sjoerd C. de Vries
  • 65,815
  • 14
  • 188
  • 323
  • Nicely done Sjoerd! Searching for a formula as string is much easier than as image! – stathisk Sep 04 '13 at 20:38
  • @zet My pleasure. It took me a bit longer than necessary because I initially miscounted the number of Hs. – Sjoerd C. de Vries Sep 04 '13 at 20:48
  • 2
    Nice solution, +1! It also thins down results if you have some additional chemistry knowledge, like the molecule has sterol structure (name should include "ster"), an alcohol (with a hydroxyl group -OH), with a carbonyl group at the other end (-C=0), i.e. it is also a ketone (name should end with "-one"). The friend's gender helps to figure out the "testo-" part... – István Zachar Sep 05 '13 at 09:51
  • I will leave the question open for a couple of days and then I'll accept your answer @SjoerdC.deVries. Thanks again. – stathisk Sep 05 '13 at 16:06
  • @zet no problem. – Sjoerd C. de Vries Sep 05 '13 at 20:47