Researchers at Lawrence Berkeley National Laboratory have developed an artificial intelligence (AI) that, with very little training, has made discoveries in material science. To spot what scientists had missed, all the AI had to do was read millions of previously published scientific papers.
The AI approach is known as machine learning. It is an algorithm capable of being trained on a particular task until, after many iterations, it can produce something that makes sense. Machine-learning approaches are being used to solve many problems, and this team used it to look for latent knowledge in the world of materials science.
Latent knowledge is a connection that might have been missed in a particular field even though the data is right there. As reported in Nature, the scientists fed the algorithm, known as Word2vec, 3.3 million abstracts on materials science from 1,000 different journals published between 1922 and 2018. It took 500,000 distinct words from those abstracts and built mathematical connections between them. And that gave it very intriguing powers of prediction.
“In every research field there’s 100 years of past research literature, and every week dozens more studies come out,” lead author Dr Vahe Tshitoyan, a Berkeley Lab postdoctoral fellow now working at Google, said in a statement. “A researcher can access only a fraction of that. We thought, can machine learning do something to make use of all this collective knowledge in an unsupervised manner – without needing guidance from human researchers?”
By giving the program a little training, the researchers were able to produce an AI that could associate words with their meanings and extrapolate connections to other concepts. For example, it was able to group elements in the periodic table without learning what it looks like.
The team's main focus was on thermoelectric materials, an area studied for decades by materials scientists. Thermoelectric materials can convert heat into electricity so they are quite important. However, to be successful, they also need to be efficient, safe, common, and easy to produce.
Based on the literature it analyzed, the AI was able to determine which material has the best thermoelectric properties. But it did something even more extraordinary. When fed abstracts published up to the year 2008, Word2vec was able to predict materials that appear in later studies.
“I honestly didn’t expect the algorithm to be so predictive of future results,” added Anubhav Jain, the team leader who works in Berkeley Lab’s Energy Storage & Distributed Resources Division. “I had thought maybe the algorithm could be descriptive of what people had done before but not come up with these different connections. I was pretty surprised when I saw not only the predictions but also the reasoning behind the predictions. This study shows that if this algorithm were in place earlier, some materials could have conceivably been discovered years in advance.”
The team has released Word2vec's list of the top 50 thermoelectric materials and plans to release the algorithm so that other scientists can use the AI to study different materials. The team is also working on a new type of scholarly search engine that can search papers’ abstracts in a more useful way.