Retrosynthesis ex machina

A Christmas present from one of the bastions of modern organic synthesis appeared in Nature. The idea of automated retrosynthetic analysis bugged the greatest OC minds since EJ Corey came up with the very concept of retrosynthesis. Back in 1969 Corey and Wipke set the stage by publishing a paper in Science.

Screenshot of the title and the whole abstract of the 1969 Science paper. Source

Like Azimov wrote down three laws of robotics, Corey & Wipke formulated six requirements for the retrosynthesis-designing software, which can be boiled down to the following four:

  1. Interactivity
  2. Finite computing time
  3. Chemist should be able to set all the parameters and and mess with the analysis at any stage along the way [I think it’s a modern definition of interactivity, but back in 1969 apparently these were different things]
  4. Chemist should analyze the final results [(s)he’d better do that before going to the hood!]

As authors stated, “These requirements limit the task to be performed by computer to the ‘logic-centered’ part of synthesis and leave to the chemist the complex, ill-defined, and “information-centered” part, which is at present beyond the scope of computation“.

Since the hardest part of the task was still the man’s job, it’s not surprising that Corey’s software didn’t get very popular among chemists. Who would buy a small-truck-sized iPad for problems any first-year PhD student should be able to solve? The whole field of computer-assisted organic synthesis (CAOS) stagnated for a while but papers and new programs popped up from time to time. Read more details in a wonderful presentation from Baran’s group meeting.

So what about the network analysis from Sarpong lab? Let’s start with the notion that the authors didn’t set a global goal of automatically designing the full retrosynthesis from scratch. The network analysis just helped to find the starting point of the retrosynthetic analysis. The idea was to identify the maximally bridging ring in the complex natural product scaffold and use it as the key disconnection. As a validation, the group performed 30-step synthesis of Weisaconitine D and 29-step synthesis of Liljestrandinine.

In the age of mobile apps the authors could not resist but made an online tool for anyone to play. Although the source code itself was not provided, the described algorithm appeared to be simple enough for a skilled CDK developer to rebuild an application from scratch. So now anyone can even contribute to further improvement of the program! But honestly, I can’t say what conceptually improved in the field of CAOS from 1969…


Author: Slava Bernat

I did my PhD in medicinal chemistry/chemical biology of G protein-coupled receptors and then explored some chemical biology of non-coding RNA as a postdoc. Currently I'm working in a small biotech company in San-Francisco Bay area as a research chemist. I'm writing about science, which catches my attention in rss feed reader and some random thoughts or tutorials.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s