I couldn’t find a free software that would easily do exactly what I want (see the subject). So here’s my ghetto solution, which can be easily automated. Continue reading “Extracting ChemDraw schemes as .cdx files from MS Word/Excel/PowerPoint documents”
When I was writing my diploma thesis in 2009, I wanted to do it somewhat ‘special’ and I tried to write it in LaTeX (more precisely in LyX). That time there were so many geeks around Internet that praised LaTeX to be superior to any other typesetting method that I was eventually persuaded to give it a try. That time I failed miserably. Eventually I ended up writing the diploma in OpenOffice (currently LibreOffice).
Four years later I again faced the choice: either to take an easy way and to compile the PhD thesis in MS Word, or to overcome the challenge of steep learning curve of LaTeX.
During my job-searching campaign I was once asked to show all the structures that I have synthesized. Drawing 200+ molecules seemed no fun to me. Even opening all .cdx files generated in 3.5 years, to copy-paste in a single one, was too boring. So I’ve used openbabel for this job.
Once I had all the .cdx in one folder I’ve ran
babel *.cdx allStruc.svg -xe -xl -xC rsvg-convert -f pdf -o allStruc.pdf allStruc.svg
But the output was weird. All the charged molecules were assigned unrealistic charges over +2000, so all my potassium trifluoroborate and ammonium salts were crap.
Then I turned to molconvert tool from Chemaxon, which is free for academic non-commercial use. To convert all .cdx files to correct smiles I used a simple script:
#!/bin/bash for i in $(ls -1 .|grep .cdx) do ~/marvin/bin/molconvert smiles $i -o tmp.smi cat tmp.smi >> smiles.smi done
Followed by openbabel (I’ve decided to sort the molecules by molecular weight so the complexity will increase more or less steadily down the list):
babel smiles.smi allStruc.svg -xe -xl -xC --sort MW rsvg-convert -f pdf -o allStruc.pdf allStruc.svg
Still, the conversion wasn’t ideal. Particularly, BF3¯ groups were represented as BF2·F¯. Fortunately, simple replacement of SMILES code ‘B(F)F’ to ‘[B-](F)(F)F’ and removal of extra fluoride (‘[F-].’ in SMILES) solved the problem.
So, here we go, the work of 3.5 years as almost square matrix 15×14: