Advances in Machine Intelligence

In summary, the individual claimed to be an AI researcher and stated that the recent advancements in AI, such as self-driving cars and Go AIs, are actually based on old techniques from the 1980s. They believe that these advancements have been long expected and have been disappointingly slow in the field. They also mention that big companies with significant budgets, like Google, have allowed for quick progress in the public eye, but from a theoretical perspective, this is not new. The individual also expresses doubt about the "old" techniques being used and asks for opinions on the future of machine intelligence.
  • #36
Crass_Oscillator said:
Sure, I'll just write some short points and you can ask questions about it. I'll stick to AlphaGo because less is known about what industry experts are doing with self driving cars, so for all I know they may possesses some magic I'm not aware of. The laziest thing I can do with SDC's is make an argument from authority, since a lot of academic experts have condemned the idea that we are anywhere near fully autonomous SDC's.

Regarding AlphaGo, the issues are:

DNN's are a very sloppy model, in the technical sense coined by Sethna (I can provide citations for the interested). In particular, it was found by Zhang et al (https://arxiv.org/pdf/1611.03530.pdf?from=timeline&isappinstalled=0) that DNN's, among other things, can achieve zero training error on randomly labeled or randomly generated data, pushing their generalization error arbitrarily high. To me this implies that DNN's have such enormous expressiveness that they can effectively memorize the dataset. With enough throughput and GPU toasters, you can span such an enormous portion of the Go gamespace that you can out muscle a human. Essentially it doesn't win via intelligence but via brutish Input/Output superiority that a human brain does not have access to. Consider the learning efficiency as a better measure (how many games must I win per game rank?). DeepMind is now moving on to the real time strategy computer game Starcraft which I think will illustrate this point very poignantly, since the data is much harder to handle. Moreover, they are much more carefully forcing I/O limitations on their "AI" algorithms so that I/O is properly normalized out.

All this said, clearly DNN's will have niche applications, it's just that they have been portrayed (largely by the media) in a highly misleading manner.

As far as i know Go gamespace is practically infinite, so sheer brute force and memory isn't enough.
 
Computer science news on Phys.org
  • #37
GTOM said:
As far as i know Go gamespace is practically infinite, so sheer brute force and memory isn't enough.
DNN's are kind of like "brute force learning" which is distinct from a brute force search. The SAT student I used as an analogy who has studied 100 million SAT tests has a very high dimensional sloppy model of how to answer SAT questions, not a brute force search algorithm. To illustrate this, a common SAT problem might use some simple triangle geometry. If I know the basic theorems of triangle similarity, I can answer many questions with a low dimensional model. However there's no reason why I couldn't come up with some convoluted heuristic to answer such questions without knowing those theorems.

StatGuy2000 said:
In general, in the case of machine intelligence, how important do you think it is for the AI field to have a revolutionary progress, as opposed to incremental progress of the sort you are speaking of?
I don't think AI is anywhere near achieving an acceptable solution to one of its outstanding problems, such as NLP. For that, we need to take more steps beyond, say, DNN's. But this will require something "paradigm shifting", to use a well abused phrase. For instance, if you examine the training process for any popular ML technique (or up and coming technique for that matter), the primary measure of success does not generally include parsimony of the model to my knowledge.

If I have a student who I am tutoring, "understanding" is defined in part by the fact that if I "rotate" the question to a new question that is effectively or approximately the same, the student still can answer it. In particular, there is some notion of explaining more with less. If I have two parameters in my model which answers 100 questions, and you have 100 parameters to answer 100 questions, we might suppose that you don't really understand anything about the subject; if they are 100 yes-no questions, and your parameters are all just the answers, then we would argue that you know basically nothing.

Something something symmetry/information theory something something.
 
  • #38
Here's a good overview paper (in pdf format) for the most interesting stuff (in my opinion) being worked on at present if anyone is interested: Deep Reinforcement Learning: An Overview

Here's the paper's abstract:

"We give an overview of recent exciting achievements of deep reinforcement learning (RL). We start with background of deep learning and reinforcement learning, as well as introduction of testbeds. Next we discuss Deep Q-Network (DQN) and its extensions, asynchronous methods, policy optimization, reward, and planning. After that, we talk about attention and memory, unsupervised learning, and learning to learn. Then we discuss various applications of RL, including games, in particular, AlphaGo, robotics, spoken dialogue systems (a.k.a. chatbot), machine translation, text sequence prediction, neural architecture design, personalized web services, healthcare, finance, and music generation. We mention topics/papers not reviewed yet. After listing a collection of RL resources, we close with discussions."

Also: In particular, AlphaGo is discussed in section 12 of the paper--for anyone that just wants to read about that.
 
  • Like
Likes stoomart
  • #39
Another great (hardware) advancement for deep learning that I've just learned about: a team of researchers at MIT have apparently managed to develop what they call a "programmable nanophotonic processor" as a more capable replacement for GPUs. Obviously, the dense matrix multiplications necessary for AI learning tasks are time consuming and use a lot of power. Using NPUs(?), the MIT team expect a computational speed enhancement of at least two orders of magnitude over the currently most powerful equivalent GPUs and an incredible three orders of magnitude in power efficiency. For demonstration, they implemented a basic neural network and just using their prototype system, they were able to achieve a 77% accuracy level, compared to approx. 90% for conventional systems--and it seems they are confident it won't be very technically difficult to scale for greater accuracy.

I can't wait until this is more accurate and can be made commercially available at the price of traditionally powerful GPUs. Something like this could make it a lot easier to do much more computationally heavy training and experiments with neural networks for individual researchers/machine learning enthusiasts (also, I don't want to pay so much money for an equivalent GPU setup that will only allow me to do so much).

Here is the abstract: Artificial Neural Networks are computational network models inspired by signal processing in the brain. These models have dramatically improved the performance of many learning tasks, including speech and object recognition. However, today's computing hardware is inefficient at implementing neural networks, in large part because much of it was designed for von Neumann computing schemes. Significant effort has been made to develop electronic architectures tuned to implement artificial neural networks that improve upon both computational speed and energy efficiency. Here, we propose a new architecture for a fully-optical neural network that, using unique advantages of optics, promises a computational speed enhancement of at least two orders of magnitude over the state-of-the-art and three orders of magnitude in power efficiency for conventional learning tasks. We experimentally demonstrate essential parts of our architecture using a programmable nanophotonic processor.

Link to the arXiv page where you can get access to the paper if you'd like: Deep Learning with Coherent Nanophotonic Circuits
 
  • Like
Likes QuantumQuest and jerromyjon
  • #40
From my perspective from the 60s and 70s, progress seemed rapid then due to the quantity of low hanging fruit. Then for several decades the problems got harder, and the limitation of computer speed and capacity were handicaps. Also from that early era, there were attitudes that different approaches were in the wrong direction. For example, at MIT the heuristic programing approach was the way to go, and pattern recognition was looked at as not really AI.

A common metaphor at that time was that heuristic programing imitated the "left brain" intelligence where conscious decisions were made, while pattern recognition was trying to imitate "right brain" gestalt processing that was not performed consciously. I think this attitude lasted for quite a while discouraging researchers from combining different techniques. AlphaGo is a good example of doing that - combining the "left brain" look-ahead techniques with "right brain" pattern recognition techniques.

In a conversation I had recently with an AI professor at Johns Hopkins, he explained how combining techniques has become a lot more common in the last decade or so.
 
  • Like
Likes stoomart
  • #41
Buzz Bloom said:
A common metaphor at that time was that heuristic programing imitated the "left brain" intelligence where conscious decisions were made, while pattern recognition was trying to imitate "right brain" gestalt processing that was not performed consciously. I think this attitude lasted for quite a while discouraging researchers from combining different techniques.

That sounds about right.

Buzz Bloom said:
AlphaGo is a good example of doing that - combining the "left brain" look-ahead techniques with "right brain" pattern recognition techniques.

And how does it do that?

Buzz Bloom said:
In a conversation I had recently with an AI professor at Johns Hopkins, he explained how combining techniques has become a lot more common in the last decade or so.

Can you elaborate a little on that conversation?
 
  • Like
Likes Greg Bernhardt
  • #42
DiracPool said:
And how does it do that?
Hi Dirac:

I may be mistaken about this, but my understanding is as follows.

The look-ahead component uses (1) a standard min-max tree-searching method with (2) position evaluation at nodes with no deeper moves to evaluate.

The pattern recognition component (1) accesses a very large database of positions each associated with good moves, and (2) a method for finding positions in the database that are similar to each position for which good plausible moves are to be used for deeper exploration.

DiracPool said:
Can you elaborate a little on that conversation?
I actually have some notes from the conversation for some suggestions for further reading, but I haven't yet used them, and I can't look for them now. I will see if I can find them tomorrow, and if I find them I will post them.

Regards,
Buzz
 
  • #43
DiracPool said:
Can you elaborate a little on that conversation?
Hi Dirac:

I found the notes. Unfortunately, the notes are about math rather than AI, but they did refresh my memory a bit about the conversation.

I had asked him if he know about any research related to an idea I have had for a long time which involves a specific combination of AI methods. I do not have an opportunity very often to discuss AI with someone with expertise, and when I do I ask this same question. Up until this conversation the answer has always been "no". On this occasion, however, the answer was "yes", and he told me he had recently read about such a research project, but he did not remember the details. I had planned to later email him to see if he could locate a source about this project, but it had slipped my mind completely until now.

If I can get this source information I will post it.

Regards,
Buzz
 
  • #44
This "multi-joint dynamics with contact" (www.mujoco.org) looks rather interesting! (from AaronK's post #38 link) Virtual reality for testing AI's... pretty cool stuff.
 
  • Like
Likes DiracPool and AaronK
  • #45
Hi @Dirac:

My Hopkins friend had been away at a conference, but he has recently returned and responded to my email. The following is from his email message.

The general setting that you are describing is called "feature selection" or sometimes (when the set of potential features is very large or infinite) "feature induction." Another term is "structure learning," e.g., learning the structure of a graphical model.​

Standard methods for doing this include
  • selection of independent variables from a large but fixed feature space
    • stepwise selection (forward selection or backward selection), as in statistics
    • simultaneous feature selection and weight optimization
      • via sparsity-inducing regularization (possibly structured sparsity)
      • via an explicit prior distribution over the set of features (may require reversible-jump MCMC or other randomized search algorithms to find the feature set)
  • decision trees, which gradually refine features -- they build up more and more complex conjunctive features as the depth of the tree increases
    • decision forests (a much more effective version of decision trees, which can be regarded as involving weights)
  • split-merge EM, another way of gradually refining features by refining the domains of the latent variables that are used to generate feature templates
  • inductive logic programming
    • more recent methods of learning programs, e.g., the neural Turing machine, or some work in probabilistic programming
  • active set methods, which obtain new candidate features at each step by conjoining existing features that have nonzero weight. The seminal paper may have been https://arxiv.org/pdf/1212.2504.pdf .
  • neural networks: recall that the hidden layer of a neural net can be regarded as a set of induced features, also known as "adaptive basis functions"
You may notice that the next to last bullet includes a link to a reference. My friend said he would give provide references for any of these topics in which you have an interest.

Regards,
Buzz
 
  • #46
Buzz Bloom said:
Regards,
Buzz

Hi Buzz, thanks for that. However, I'm not sure if or how informative that post is going to be for people who want to learn artificial intelligence. It looks to me like a bunch of jargon highlighted by bullet points. I think that this is a problem, no a solution. I'm not saying I have a solution right now either--I just can say that your post doesn't educate nor inspire me. Just saying.

I'm working under a department of defense contract in order to develop autonomous rovers. We don't know whether they'll be used for roving Mars or for some other task, but one of the things we demand is that we reduce jargon to a minimum. While the machine learning lingo sounds cool in the public market, we don't use this in the lab settings, at least in my lab setting. So, if you want to make a point, make it in a more colloquial manner.
 
  • Like
Likes jerromyjon
  • #47
DiracPool said:
So, if you want to make a point, make it in a more colloquial manner.
Hi Dirac:

I am sorry my post was not helpful to you. I was not really trying to make any point, but just trying to share something I thought might be of interest. I am guessing that the language use in the bullets about the various technological approaches is common in academia even if it is avoided in non-academic labs.

I scanned the linked McCallum article, and it also has a lot of technical jargon unfamiliar to me, but I think it can be deciphered with some patience. It may well be easier for you to read since you work in the AI field, but I have not since graduate school half a century ago.

Regards,
Buzz
 
  • #48
Buzz Bloom said:
I am guessing that the language use in the bullets about the various technological approaches is common in academia even if it is avoided in non-academic labs.

Well, I'm definitely in an academic lab at a major university. It's a computer science lab and we are under a DARPA grant for 4 years, although we get evaluated quarterly and I'm guessing could have the funding pulled at any point. Or at least this is what the lab director likes us to think.

By virtue of the fact that I work in the CS lab I am constantly bombarded with invitations to watch PhD defenses in the main lecture hall, as well as lectures from visiting scholars. This happens once or twice a week and there's always a big food smorgasbord in the lobby for these events. So I typically partake in these events to get a free lunch. The dissertation defenses are typically boring, but the invited lectures are often interesting.
 
  • #49
DiracPool said:
And how does it do that?
Hi Dirac:

I found a January 2016 paper describing the techniques used in a go playing bot similar to that used in AlphaGo's March 2016 match with Lee Se-dol, and I found that I was partially mistaken in my description in post #42. The tree search method is much more complicated than what I had in mind.

The combination of the two techniques as described in this paper is too complex for me to try to summarize it adequately here. However, I will make an effort anyway by paraphrasing a few quotes.

The two techniques combined as described in this paper:
1, Deep Convolutional Neural Network (DCNN)
2. Monte Carlo Tree Search (MCTS)

(1) uses a pure pattern-matching approach that predicts the next move.

(2) does a tree search, but in a different way than the more primitive ones with which I am familiar. Each round of Monte Carlo tree search consists of four steps.
( From https://en.wikipedia.org/wiki/Monte_Carlo_tree_search )
  • Selection: start from root R and select successive child nodes down to a leaf node L. The section below says more about a way of choosing child nodes that let's the game tree expand towards most promising moves, which is the essence of Monte Carlo tree search.
  • Expansion: unless L ends the game with a win/loss for either player, create one (or more) child nodes and choose node C from one of them.
  • Simulation: play a random playout from node C. This step is sometimes also called playout or rollout.
  • Backpropagation: use the result of the playout to update information in the nodes on the path from C to R.
I hope this is helpful.

Regards,
Buzz
 
Last edited:
  • #50
Buzz Bloom said:
pro.mising moves,
Should say "all possible" moves...
 
  • #51
jerromyjon said:
Should say "all possible" moves...
Hi jon:
Do you have a source for this quote saying "all possible"? I corrected the typo to "promising".

Regards,
Buzz
 
  • #52
Buzz Bloom said:
Do you have a source for this quote saying "all possible"? I corrected the typo to "promising".
No I just meant that would be the ideal, I don't think it's possible really as the number of moves possible could be astronomical...
 
  • #53
Interesting thread!

It is true that researchers had figured out 90% of the pieces of a modern neural net in the 1980's. It is true that most modern deep learning is just the same-ol' backpropagation. In some sense, it is true that modern AI is not all that modern.

It is also true that computer hardware has enabled today's modern AI renaissance, because deep learning requires immense amounts of data. No one in the 1980's had any idea of just how much data would be required. Many researchers in the 1980's gave up on their ideas because they couldn't make them work, even though they were right! They just needed 1,000x or 1,000,000x more data, which wasn't even conceivable in the 1980's.

Big data isn't enough, though. The remaining 10% of the pieces were not at all obvious, but they were utterly necessary for good performance. Some problems, like the exploding / vanishing gradient problem, vexed researchers for a decade. It turns out that it pretty much goes away if you just use ReLU instead of sigmoid activation... and ReLU is actually simpler and much faster to compute!

The landscape of AI research today feels a lot like life hunting for better enzymes by evolution. Life struggles along for millennia with a poor enzyme, until some random mutation changes one amino acid and -- BOOM! -- the enzyme is suddenly 10,000x better. Everyone in the AI community feels like just about any little tweak here or there could actually win the lottery and prove to be an incredible breakthrough. So, AI researchers are exploring every little idea like it might be the next big thing. It is very common today to see a 10x or 100x improvement in the speed or quality of an AI algorithm in a single year, solely because of some neato trick that no one expected would work. The field is fast-paced, intellectually curious, and a lot of fun.
 
  • Like
Likes AaronK, Asymptotic, stoomart and 2 others
  • #55
Fantastic work done by IBM scientists that I just heard about: Abu Sebastion et al. at IBM have managed to reliably collocate computation and memory at the nanometer scale to compute tasks in a different way from the usual von-Neumann architecture. They do this by exploiting the crystallization dynamics of the phase change memory devices and thereby perform in-memory computation (reminded me of memristor based architectures, but this is a different approach). Also, this tech will allow for massively parallel computing systems--super useful for machine learning!

Here's the paper's abstract: "Conventional computers based on the von Neumann architecture perform computation by repeatedly transferring data between their physically separated processing and memory units. As computation becomes increasingly data centric and the scalability limits in terms of performance and power are being reached, alternative computing paradigms with collocated computation and storage are actively being sought. A fascinating such approach is that of computational memory where the physics of nanoscale memory devices are used to perform certain computational tasks within the memory unit in a non-von Neumann manner. We present an experimental demonstration using one million phase change memory devices organized to perform a high-level computational primitive by exploiting the crystallization dynamics. Its result is imprinted in the conductance states of the memory devices. The results of using such a computational memory for processing real-world data sets show that this co-existence of computation and storage at the nanometer scale could enable ultra-dense, low-power, and massively-parallel computing systems."

Here's the link to the nature article where you can get the pdf if you'd like: https://www.nature.com/articles/s41467-017-01481-9
 
  • Like
Likes QuantumQuest

Similar threads

Replies
1
Views
1K
Replies
2
Views
1K
Replies
2
Views
1K
Replies
25
Views
4K
2
Replies
40
Views
983
Replies
10
Views
1K
Back
Top