The size, complexity and heterogeneity of the data generated in labs across the world can only increase, and the introduction of cloud computing will encourage the same mistakes. Just a stone's throw from where I work, at least three computer companies are already touting cloud-based data-management systems for the life sciences. We need to find ways to manage and integrate data to make discoveries in fields such as genomics, and we need to do this quickly.

At their most basic, data-management systems allow people to organize and share information. In the case of small amounts of uniform data from a single experiment, this can be done with a spreadsheet. But with multiple experiments that produce diverse data — on gene expression, metabolites and protein abundance, for example — we need something more sophisticated.

An ideal data-management system would store data, provide common and secure access methods, and allow for linking, annotation and a way to query and retrieve information. It would be able to cope with data in different locations — on remote servers, on desktops, in a database or spread across different machines — and formats, including spreadsheets, badly named files, blogs or even scanned-in notebooks.

Read the full article here.

Origial Source: Nature 499, 7 (04 July 2013) doi:10.1038/499007a