From NewScientist:

A MATHEMATICAL technique for studying disorder in quantum systems could improve internet keyword searches. It is able to spot significant patterns in large data sets such as web pages and text documents, and may even be adaptable to genome analysis.

Standard keyword searching compares word frequencies in one document with the frequencies in a standard corpus of text from many sources. If a word in the document occurs more frequently than average, it is considered important.

The new method gauges the importance of words in a document based on where they appear, rather than simply on how often they occur. “You should be able to detect an intrinsic property of a book without the need to compare it with different books,” says Pedro Carpena, a physicist at the University of Malaga in Spain.


Important words tend to be clustered together, Carpena says, while less important words appear more randomly distributed. This makes intuitive sense, he adds: as authors develop important ideas, they are likely to use relevant words many times in the same paragraph or page before moving on to other ideas. Less important words such as “and” and “but” tend to occur more evenly through the text.


It’s not clear whether the search method is actually superior to existing ones, says Oren Etzioni, a computer scientist at the University of Washington in Seattle. He points out that Carpena has yet to compare his results with existing methods.

It isn’t in the bits that I pulled out, but the system they are referring to in this is random matrix theory.