In February this year, XPRIZE, the global leader in incentivized prize competitions, announced the new Artificial Intelligence (AI) XPRIZE to promote collaboration and exploration into ways that AI and human-machine interaction can help to solve the world’s greatest challenges. The creation of this separate XPRIZE category clearly identifies AI’s uniquely important place in the world.
AI is everywhere and the reach and speed of its integration in all aspects of our lives is accelerating. There really is no end in sight. Some of the greatest minds of our time are focused on developing AI to its ultimate state.
Much is made about the nearly magical applications of AI, whether that magic be in autonomous vehicles or in Amazon’s seemingly uncanny ability to know what book you might next enjoy reading. But have you ever thought about the foundation of all of this AI driven functionality? Today we have the acronym AoE or Artificial Intelligence of Everything. Much of that AoE is based on the work of Professor Jürgen Schmidhuber.
In an earlier article I encouraged all business leaders to learn about AI and its increasingly critical importance to your business and its competitive position. To help you with that please allow me to introduce you to Professor Schmidhuber.
Professor Jürgen Schmidhuber is a computer scientist acclaimed for his work on machine learning, Artificial Intelligence (AI), artificial neural networks, digital physics, and low-complexity art. Since 1995 he has been co-director of the Swiss AI Lab IDSIA in Lugano and is also a professor of Artificial Intelligence at the University of Lugano since 2009. Between 2009 and 2012, the recurrent neural networks and deep feedforward neural networks developed in his research group have won eight international competitions in pattern recognition and machine learning. The Deep Learning Neural Networks developed since the early 1990s in Professor Schmidhuber’s group at TU Munich and the Swiss AI Lab IDSIA (USI and SUPSI) have revolutionized machine learning and AI, and are now available to billions of users through Google, Apple, Microsoft, IBM, Baidu, and many other companies.
His research group also established the field of mathematically rigorous universal AI and optimal universal problem solvers. His formal theory of creativity and curiosity and fun explains art, science, music, and humor. In honor of his achievements he was elected to the European Academy of Sciences and Arts in 2008. He is the recipient of numerous awards including the 2013 Helmholtz Award of the International Neural Networks Society and the 2016 IEEE Neural Networks Pioneer Award. He is president of NNAISENSE, which aims at building the first practical general purpose AI.
Recently fellow writer and most recent unanimously selected recipient of the IFIP Silver Core Award for outstanding service, Stephen Ibaraki, sat down with Professor Schmidhuber and discussed the present and future of artificial intelligence and deep learning.
|Q:||Before we talk about the 21st century: What was the most influential innovation of the previous one?
A: “In 1999, the journal Nature made a list of the most influential inventions of the 20th century. Number 1 was the invention that made the century stand out among all centuries, by “detonating the population explosion” (V. Smil), from 1.6 billion people in 1900 to soon 10 billion. This invention was the Haber process, which extracts nitrogen from thin air, to make artificial fertilizer. Without it, 1 in 2 persons would not even exist. Soon this ratio will be 2 in 3. Billions and billions would never have lived without it. Nothing else has had so much existential impact. (And nothing in the past 2 billion years had such an effect on the global nitrogen cycle.)”
|Q:||How about the 21st century?
A: “The Grand Theme of the 21st century is even grander: True Artificial Intelligence (AI). AIs will learn to do almost everything that humans can do, and more. There will be an AI explosion, and the human population explosion will pale in comparison.”
|Q:||Early on, how was AI obvious to you as this great new development for the world?
A: “This seemed obvious to me as a teenager in the 1970s, when my goal became to build a self-improving AI much smarter than myself, then retire and watch AIs start to colonize and transform the solar system and the galaxy and the rest of the universe in a way infeasible for humans. So I studied maths and computer science. For the cover of my 1987 diploma thesis, I drew a robot that bootstraps itself in seemingly impossible fashion . “
|Q:||Can you share more about this thesis that foreshadows where AI is going?
A: “The thesis was very ambitious and described first concrete research on a self-rewriting “meta-program” which not only learns to improve its performance in some limited domain but also learns to improve the learning algorithm itself, and the way it meta-learns the way it learns etc. This was the first in a decades-spanning series of papers on concrete algorithms for recursive self-improvement, with the goal of laying the foundations for super-intelligences. “
|Q:||And you thought that the basic ultimate AI algorithm will be very elegant and simple?
A: “I predicted that in hindsight the ultimate self-improver will seem so simple that high school students will be able understand and implement it. I said it’s the last significant thing a man can create, because all else follows from that. I am still saying the same thing. The only difference is that more people are listening. Why? Because methods we have developed on the way to this goal are now massively used by the world’s most valuable public companies. “
|Q:||What kind of computational device should we use to build practical AIs?
A: “Physics dictates that future efficient computational hardware will look a lot like a brain-like recurrent neural network (RNN), a general purpose computer with many processors packed in a compact 3-dimensional volume, connected by many short and few long wires, to minimize communication costs . Your cortex has over 10 billion neurons, each connected to 10,000 other neurons on average. Some are input neurons that feed the rest with data (sound, vision, tactile, pain, hunger). Others are output neurons that move muscles. Most are hidden in between, where thinking takes place. All learn by changing the connection strengths, which determine how strongly neurons influence each other, and which seem to encode all your lifelong experience. Same for our artificial RNNs, which learn better than previous methods to recognize speech or handwriting or video, minimize pain, maximize pleasure, drive simulated cars, etc.”
|Q:||How did your early work on neural networks differ from the one of others?
A: “The difference between our neural networks (NNs) and others is that we figured out ways of making NNs deeper and more powerful, especially recurrent NNs (RNNs), the most general and deepest NNs, which have feedback connections and can, in principle, run arbitrary algorithms or programs interacting with the environment. In 1991, I published first “Very Deep Learners” , systems much deeper than the 8-layer nets of the Ukrainian mathematician Ivakhnenko, the father of “Deep Learning” in the 1960s . By the early 1990s, our systems could learn to solve many previously unlearnable problems. But this was just the beginning.”
|Q:||From the 1990s’ how was this just the beginning? How does this relate to Moore’s Law and new future developments in compute power?
A: “Back then it was already clear that every 5 years computers are getting roughly 10 times faster per dollar. Unlike Moore’s Law (which recently broke), this trend has held since Konrad Zuse built the first working program-controlled computer 1935-1941. Today, 75 years later, hardware is roughly a million billion times faster per unit price. We have greatly profited from this acceleration. Soon we’ll have cheap devices with the raw computational power of a human brain, a few decades later, of all 10 billion human brains together, which collectively probably cannot execute more than 10^30 meaningful elementary operations per second. And it won’t stop there: Bremermann’s physical limit (1982)  for 1 kg of computational substrate is still over 10^20 times bigger than that. Even if the trend holds up, this limit won’t be approached before the next century, which is still “soon” though – a century is just 1 percent of 10,000 years of human civilization.”
|Q:||You talked earlier about Neural Networks (NNs). Please go further about developments in NNs such as LSTM, which your team pioneered?
A: “Most current commercial NNs need teachers. They rely on a method called backpropagation, whose present elegant and efficient form was first formulated by Seppo Linnainmaa in 1970  (extending earlier work in control theory [5a-c]), and applied to teacher-based supervised learning NNs in 1982 by Paul Werbos  – see survey . However, backpropagation did not work well for deep NNs and RNNs. In 1991, Sepp Hochreiter, my first student ever (now professor) working on my first deep learning project, identified the reason for this failure, namely, the problem of vanishing or exploding gradients . This was then overcome by the now widely used Deep Learning RNN called “Long Short-Term Memory (LSTM)” (first peer-reviewed publication in 1997 ), which was further developed with the help of my outstanding students and postdocs including Felix Gers , Alex Graves, Santi Fernandez, Faustino Gomez, Daan Wierstra, Justin Bayer, Marijn Stollenga, Wonmin Byeon, Rupesh Srivastava, Klaus Greff and others. The LSTM principle has become a basis of much of what’s now called deep learning, especially for sequential data. (BTW, today’s largest LSTMs have a billion connections or so. That is, in 25 years, 100 years after Zuse, for the same price we should have human brain-sized LSTMs with 100,000 billion connections, extrapolating the trend mentioned above.)”
|Q:||Do you have an LSTM demo or examples that we can relate to?
A: “Do you have a smartphone? Because since mid 2015, Google’s speech recognition is based on LSTM  with forget gates for recurrent units  trained by our “Connectionist Temporal Classification (CTC)” (2006) . This approach dramatically improved Google Voice not only by 5% or 10% (which already would have been great) but by almost 50% – now available to billions of smartphone users.”