Skip to main content

Structural biology pushes the limits of productivity

23-05-2011

Automated processing in the ESRF’s macromolecular beamlines is fuelling an explosion in activity among structural biologists.

  • Share

Producing more than a quarter of the ESRF’s publications, the scientific output of the macromolecular crystallography (MX) beamlines is unrivalled. The technique, which uses fixed- and tunable-wavelength X-rays to study crystals with unit cells comprising tens of thousands of atoms, is the predominant tool used by a huge community of biologists vying to determine the structure of proteins. “The first structure that I worked on took more than 20 years to solve,” says structural biology group head Sean McSweeney. “Now, thanks largely to synchrotrons like the ESRF, the job can be done in a PhD project.”

More than 100,000 sample screenings take place each year in the structural biology group’s seven beamlines at ID14, ID23 and ID29, leading to 20% of the world’s (half of Europe’s) solved protein structures, while the pharmaceutical companies that use the MX beamlines for drug design constitute a significant part of the ESRF’s industrial activity. It’s therefore little surprise that the structural biology group is the second biggest consumer of the ESRF’s computing resources after the imaging group.

Automation is key

When it comes to evaluating the large unit cells of biological molecules, crystals are often not of sufficiently high quality for diffraction. MX studies therefore require a lot of computational muscle to evaluate large numbers of screenings quickly so as to allow users to plan their experiments. “Preprocessing requires both good CPU and fast input/output operations to disk, while post analysis requires more CPU and good graphics,” says McSweeney.

Once full data are collected, experiments generate large numbers of big images, which must be compressed to allow for easier storage and archiving. While MX shares similar data-reduction requirements as all high-throughput image-based techniques, such as tomography, the structural biology community has common standards on data formats to aid reproducibility and validation. Unlike in tomography, however, the programs currently used by the MX group are unlikely to profit much from the faster speeds of graphics processing units.

Preprocessing will become more crucial as biologists tackle ever more ambitious structures, such as complex membrane proteins, in which there is considerable variation in diffraction quality both within and between crystals. Data-collection facilities will also have to be optimised to manage low-resolution diffraction data from very small crystals. The group’s priorities this year are to get the system optimised to allow users to sort bad crystals from the few good ones as fast as possible.

Since an automatic sample changer was installed at MX beamlines in 2005, the number of structures elucidated has risen three-fold. To meet the growing demands of structural biologists, further automation will be a major feature of the ESRF upgrade beamline UPBL10, centred around a new sample evaluation and sorting facility called MASSIF. A third of users already operate and monitor their experiment in real time remotely from their home institute via the MXCube interface.

Techniques employed on the other beamlines may vary significantly between users, in contrast to the MX beamlines. But the analysis shares the same basic features: large quantities of data, established algorithms for data reduction, and systems that provide rapid feedback to the user, so other beamlines also stand to capitalise on the high levels of automation developed in the MX area.

The success of the MX automation software developments was partly due to strong collaboration between teams both inside the ESRF and between other European synchrotrons. “We have therefore now started to work jointly in the ISDD software group for implementing automation based on developments done for MX on all of the ESRF beamlines, starting with ID22 and ID21,” says ESRF’s Olof Svensson, who shared the 2008 Bessy Innovation Award on Synchrotron Radiation for his role in developing the customised software used on the MX beamlines.

Merging science

Interdisciplinary subjects such as nanoscience are blurring the boundaries between beamlines as different groups employ similar techniques. A biologically oriented SAXS beamline called BioSAXS has recently been installed at ID14-3, for instance, and will allow users to determine low-resolution structures of proteins that cannot yet be crystallised. The framework for automation is the same as that used by structural biologists – and developed by Dimitri Svergun at EMBL Hamburg over the past 10–15 years – while the data-reduction software was inspired by algorithms used on the materials-science beamlines. “It’s mandatory to be able to share a common framework for data analysis, otherwise it is much more difficult to make such re-use of software developments,” says Svensson.

In summer 2010 a new Dectris Pilatus 6M pixel detector was introduced on the MX beamline ID29, which is capable of running with quasi-continuous read-out. This enables new ways to collect data but puts severe strain on data transfer and processing. Given that this detector could be upgraded to even higher operational speeds, along with the addition of fast detectors on many non-MX beamlines, the ESRF faces the urgent task of providing faster data analysis in general – allowing scientists to leave the ESRF without having to transfer of huge volumes of raw data. To cope with the MX data deluge, the SB’s dedicated computing cluster will be significantly enhanced with more processors, faster file access and major database developments.

 

Matthew Chalmers

 

 

This article appeared in ESRFnews, March 2011. 

To register for a free subscription and to rapidly receive the current issue, please go to:

http://www.esrf.fr/UsersAndScience/Publications/Newsletter/esrfnewsdigital

 

Top image: “It’s mandatory to share a common framework for data analysis,” says ESRF’s Olof Svensson (Image credit: A. Molyneux).