AI can predict the structure of chemical compounds thousands of times faster than quantum chemistry


AI can help chemists crack the molecular structure of crystals much faster than traditional modelling methods, according to research published in Nature Communications on Monday.

Scientists from the Ecole Polytechnique Fédérale de Lausanne (EPFL), a research institute in Switzerland, have built a machine learning programme called ShiftML to predict how the atoms in molecules shift when exposed to a magnetic field.

Nuclear magnetic resonance (NMR) is commonly used to work out the structure of compounds. Groups of atoms oscillate at a specific frequencies, providing a tell-tale sign of the number and location of electrons each contains. But the technique is not good enough to reveal the full chemical structure of molecules, especially complex ones that can contain thousands of different atoms.

Another technique known as Density functional theory (DFT) is needed. It uses complex quantum chemistry calculations to map the density of electrons in a given area, and requires heavy computation. ShiftML, however, can do the job at a much quicker rate and can perform as accurately as DFT programmes in some cases.

“Even for relatively simple molecules, this model is almost 10,000 times faster than existing methods, and the advantage grows tremendously when considering more complex compounds,” said Michele Ceriotti, co-author of the paper and an assistant professor at the EPFL.

“To predict the NMR signature of a crystal with nearly 1,600 atoms, our technique – ShiftML – requires about six minutes; the same feat would have taken 16 years with conventional techniques.”

The researchers trained the system on the Cambridge Structural Database, a dataset containing calculated DFT chemical shifts for thousands of compounds. Each one is made up less than 200 atoms including carbon and hydrogen paired with oxygen or nitrogen. 2,000 structures were used for training and validation, and 500 were held back for testing.

ShiftML managed to calculate the chemical shifts for a molecule that had 86 atoms and the same chemical elements as cocaine, but arranged in a different crystal structure. The process took less than a minute of CPU time, compared around 62 to 150 CPU hours typically needed to calculate the chemical shift of a molecule containing 86 atoms using DFT.

The team hopes that ShiftML can be used to supplement NMR experiments to design new drugs. “This is really exciting because the massive acceleration in computation times will allow us to cover much larger conformational spaces and correctly determine structures where it was just not previously possible. This puts most of the complex contemporary drug molecules within reach,” says Lyndon Emsley, co-author of the study and a chemistry professor at EPFL.

ShiftML is open source. “Anyone can upload a molecule and get its NMR signature in just a few minutes,” said Ceriotti. ®

Related articles