Building General AI with D Scott Phoenix, founder of Vicarious


Can a computer be as smart as a human? Today we're sitting down with my friend and Initialized portfolio founder D Scott Phoenix. Vicarious has come up with a new type of machine learning based on the computational principles of the human brain. AGI. Artificial General Intelligence is coming. Let's go meet Scott.

00:59 How Garry and Scott met 
02:02 How Scott came up with the idea to work on AGI 
02:41 The time to build AGI is now 
03:10 Why work on AGI? 
04:26 What are the building blocks to building a general AI? 
04:49 What is a human-like learning system? 
06:15 Vicarious vs Deep Learning 
08:08 Traditional AI methods resemble insectoid or reptilian brain approaches 
09:43 New methods and models are more important than more money on training existing models 
11:52 Limits of narrow AI 
12:48 History and origins of the AI debate in philosophy and neuroscience 
14:45 Brute force methods require 14,000 years of training to do what children only need 2 years to learn 
15:28 Lessons from biology 
16:24 How do systems layer to generate more complex behavior? 
17:30 Is an ambitious project like AGI composable and iterable like SaaS software? 
20:01 Long term ambition is great, but what do you do along the way? 
20:38 Vicarious's first applied use case in robotics 
22:16 Vicarious vs other robotics approaches 
23:47 Building learning systems, not one-off point solutions 
24:51 Advice for builders just starting out 
25:17 How to tackle large problems and ambitious projects 
26:57 Technology is the ultimate lever for humans to create a better world 
29:14 How to be prepared for the long hard road

— — — 

Garry: Scott, thanks for hanging out, man.

Scott: It's great to see you, Garry, it's been a while.

Well, we were YC batch mates back in the day, so I guess it's been 12 years, though. Is that right?

Long time. The worst batch in YC history.

That's right. Well, we've gone on to bigger and better things, and we survived, so. Something this scary sort of happened, at least financially was 2008, 2009.

Yeah, I remember being out in San Francisco around the financial crisis after YC trying to race around and wearing my blazer and going to these VC meetings and just collapsing at the end of the day being like "Oh my God, no one is making any investments right now, the market is in freefall." And being so relieved when they finally closed the round.

Yeah, likewise. We closed our seed round for Posterous the day Lehman died.

Oh God, yeah.

The money's at the bank and we were like PHEW. 


Well, since then you started Vicarious. Walk us through what Vicarious is.

Yeah, I guess the story really starts back before YC even when I was in college. I was trying to figure out what I wanted to do with my life and made a big list and thought about how can I have the most impact for me. The things that gave me the most joy was when I was being broadly of service to other people. 

I thought about the spaces like education or health or policy and when I hit on AI as a possibility, it totally wiped out all the others because the impact of it could be so big if this was the right time in history to work on it. 

So, then it was a question of like, okay, am I 300 years too early? Or is now roughly the time when it could be possible to build AI that's like your brain and mine? And when you do the math on how fast computers are and how much we now know about the brain and the progress we've made on algorithms and so on in this window it's probably the right time in history to be working on building the first true artificial general intelligence. 


When I think of why do you want an AGI, the reason why I want one is because every problem that has been solved so far in human existence and will be solved is solved using the same hardware in our heads, using the same set of core capabilities just stacked taller and taller. It's like when I think about how we program, which is something that all simian and ancestors didn't have to do at all. The things that we use in order to write a computer program are the same object manipulation metaphors that get us through every day life. 

I have a three-year-old daughter and she has one of those iPad games where she's learning how to program which constitutes of her giving little instructions to a tiny car about turn right, go straight, turn right, go straight, so on. And so, those following a list of steps starts way, way, way at the bottom of things that we learn as very small children. And then if you can gain those skills and acquire them and then be able to build on top of them, you can get to something that can solve the kinds of problems that we need to solve today, like how do we build a vaccine or a therapy that binds to the receptors on the spike protein of COVID.

So, these are all things that you can do with the general purpose AI, but then there are all these steps that you need to sort of get to in order to get there. How do you guys approach that?

I think a lot of people, even in the industry have vastly different and contradictory definitions of what it means to have general purpose AI. My view of what AI is, or AGI is is given the same sensory experiences that a human has from birth to late childhood or young adulthood, you should write a program that can acquire the same concepts and the same capabilities and be able to do the same stuff. That's not a truly general AI, like if you put that AI inside of bar code world, it wouldn't work. Just like if you put a human baby inside of bar code world, the baby would not learn how to read bar codes and he wouldn't learn anything at all, probably. 

Our concepts are built around the environment that we live in and sort of the constraints of what it's like to be an animal inside the world. And so, that's the tact that Vicarious has taken towards building AGIs. We start with something that's embodied inside of a robot that's subject to the constraints of the real world like physics and friction and unreliable sensors and change. The test data the robot gets is very different oftentimes from what it was trained on. Just like the circumstances we run into in everyday life are not identical to the ones that we experienced when we were a child. And so, those, for us serve as really helpful, guiding constraints that encourage us to build an AGI that matches what humans can do rather than building one that's say very good at playing Go or Dota or something.

Walk me through how your approach is really different than what a lot of people think of AI, which is sort of the latest in deep learning techniques.

AI or AGI especially has started to be this conflation between what is the behavior that seems intelligent and what's going on inside the agent's mind. You and I are smart because when we close our eyes, inside our heads we have an entire version of reality. 

We have a simulator in our heads where we can imagine what it might be like to climb Mount Everest on a unicycle. And we can add details to that, like maybe the wheels are really slippery and we're listening to music as we do it. We can add an infinite number of details and those details can then change how we imagine something might play out. 

We have access in our own heads to a learned representation of the entire universe. And that's how we can solve problems on things that we can't see or touch directly like working on an antibody or a vaccine. And it's also how we can solve more abstract problems like programming. Now a lot of the focus, virtually all the focus on AI research today is not on that at all. It's on taking something where you already have the simulator, you already have the Dota game or you already have a trillion hours of YouTube videos or you already have chess. And then you spend a lot of money on computers running on AWS to expose it to 14,000 years of just the exact simulator that you've already written or already exists, whether it's a Go game or a Dota game or whatever it is video. And so, you've learned something that can respond in a way that seems intelligent without actually having anything going on inside its head. In my mind anyway, kind of the old animal brain approach to building AI.

It sounds reptilian, I guess.

Yeah, it's reptilian. An insectoid even, like a lot of our insect friends. I saw a video of a wasp who after combat in which it had been decapitated. It went through a very elaborate wound-cleaning routine where it was cleaning its arms of its body to make sure there wasn't any wounds that went unclean from the combat while holding its own head and then it flies away while still holding its own head. 

That was like all of these very complex routines that are sort of hard-coded. There's nothing going on literally in its head. Whereas you contrast that to the human and in humans and mammals, everything happens because we have this model of the world, not because we're just following some rote reflexes that we've learned over an evolutionary process. And today's AI is largely about creating an evolutionary process through millions of years of trained data that generate a system that behaved somewhat intelligently when exposed to the same stimuli. It's like the wasp with no head.

Yeah, I remember seeing a video of putting a frog in front of an iPad.

There's so many videos of animals revealing that there's nothing going on inside the animal's head, it's just following a script that gives it the illusion of intentionality of a mental model. When in reality, it's just following a list of hard-coded commands and there's no deeper intelligence going on.

And there are really well-meaning teams out there that are basically dumping a lot of money into exactly those deep learning models, but they're not necessarily pushing forward the state of the art. The methods are not new, the methods are well known.

I would say that the teams are putting a lot of money in this and it's economically useful. I think that you can build very complex, kind of like heuristic systems using large data sets to solve problems that are important, like DeepMind chose a system that manages the temperatures of the Data Center and saves hundreds of millions of dollars doing this. 


And I think the applications in society for doing those kinds of systems are huge and plentiful and varied and you can swim forever almost building those kinds of systems. I think the problem with those kinds of systems is that they're always limited by the data that you feed them with. As a species, the most interesting things that we can do exist at the edges where there's very little data. If you wanted to build an AI system to invent fusion power, it's unclear what you'd even train it on because you're trying to discover most of the things, most of the work is in doing the discovery where there is no data yet. And so, it's not well-suited to creating the kind of human progress that I want to see in my lifetime. To do that, we need something that's much more human-like than the systems other people are building.


Some of the most famous projects out there are sort of subject to this, right? Even AlphaGo, if you add another square or add another row or column, it actually breaks the model because there's not a deeper meaning--

It does and you can see this, too, if you read... OpenAI made a lot of noises from a publicity perspective about their "too dangerous to release" language model that would synthesize paragraphs of text. 

And when you read them, they were locally coherent, but globally incoherent because of this very phenomenon where they don't have a model of the world.


They're incredibly useful in complicated matrices, but there's not a deeper meaning.

What I would love to see is more effort being spent on systems that are human-like in this way, that actually learning that model of the world and learn high-level concepts and can reason so that we can build something that's closer to a human brain and less like the amazing Cambrian explosion of different very neural intelligences that are all trained to synthesize new songs or to play a new video game or to do image labeling or something. I think that that's why we started Vicarious and what I'd love for more companies to be working on, too.


The origins of AI kind of started this way, as well, didn't it? First text-to-speech models, for instance were trying to use neural networks. I guess link Vicarious to this sort of lineage of AI and how this stuff sort of started in this Cambrian era of the 60s and 70s.

So, I think the debate about how to build intelligence actually goes way further back than the current arguments happening or even the ones that happened in the early days of artificial intelligence in the 70s and 60s. 

You look back towards the philosophers and you see the thinkers like Plato or Kant who are arguing for a human mind that uses symbolic primitives or platonic solids or pure reason in order to understand the world and then there are others that are more in the lineage of people like Skinner who believe that everything is behavior and the human mind is effectively blank and there's nothing that's innate. I would argue that those two extremes are paradigms of thinking from a philosophical and then later from a psychological or psychophysics perspective moving into an artificial intelligence perspective, that pendulum swing of is everything learned or is everything hard-coded. Is it nature or is it nurture has been with us for a really long time and I think because a lot of the recent successes commercially have been with using very large power with very simple algorithms to create systems that are economically valuable or at least show flashy demos, that's kind of swung the pendulum in favor of viewing the mind or the artificial intelligent mind anyway as something that is all on the nurture column. 

If you take a really close look at the literature and the neuroscience and the cognitive science communities, there's so much that humans are able to do from incredibly young ages and there's no way to bridge the gap between our current artificial intelligence systems that take 14,000 years to learn how to manipulate just one Rubik's cube and a human child which can do it in two years. There's a very significant gap that I think needs to be bridged by bringing more innateness, more nature into the way we think about how to build artificial intelligence and that's been one of the north stars for Vicarious is thinking, okay, for the things that need to be learned, we should learn them. 

But for the things that are likely to be innate, or we can find strong biological evidence for being innate, then we should take advantage of that and use more structure in the models that we create. 

For the people who are more deep learning purists, I would observe that a lot of what's taken for granted in modern deep learning systems, things like local receptive fields or convolution, or reinforcement learning or batch normalization are all either biologically-inspired directly or after the fact, you could look back at the biology and realize hey, there's actually a really strong neural correlate for this. I think that provides some encouragement that we're looking in the right direction and seeing that what's gotten us to where we are now in our quest for intelligent machines is actually looking for more innate structures.

So, in terms of innate structures, we both have young children who are growing very quickly and going from basically Tamagotchi mode into real human mode, which is amazing to see. Are there parallels in types of learning systems that sort of layer on top of one another or is that a salient feature of the systems you're focused on and building?

Yeah, I think so. I think there's a hierarchy of skills that we form, a hierarchy of representations that we form as humans as we develop and the learning systems that we train in Vicarious inside the robots exhibit many of the same kinds of properties where by having access to one set of concepts, you can construct high-level concepts. And that was something that if it wasn't published in the Science Robotics paper we released last year it will be in a follow-up to that paper. 

I think drawing from our experience as humans and from the cognitive science and neuroscience communities is something that can be a really powerful accelerant for figuring out what are the right directions to point the next iterations of our AI architectures beyond well, let's just get a bigger computer.

How do you break it down into smaller pieces? Or is that not possible for something of this sort of magnitude? You know the classic standard SaaS view of software is ship something to a small number of users, make them very happy, and then iterate from there. And I don't even know how you apply that to something like AGI.

So, I think all of the big technologies or virtually all of the big technologies that have come to reshape society have been created using a really similar blueprint or recipe. 

Like I think about Amazon. To create an everything store where you can order anything and it gets there the next day, is a crazy idea, circa pre-internet or even circa early internet. And to build it, Jeff had to make a core technology and apply it to one very narrow thing, which he started with books. And so, inventing e-commerce and using a distribution center model for doing the shipments was enough to kind of get the flywheel spinning. And then the more you turn it, you go from books to books and CDs and to CDs and games and you expand outward and eventually you sell everything to anyone really fast. 

When I think about Elon, Elon started SpaceX to build a Mars colony. And you can't build a Mars colony in one go. And if you want a Mars colony, what you really want is a space logistics company. And so, he started with, the long-term goal for that was to get large, reusable rockets. And the short-term one was let's build small, disposable rockets as a stepping stone to get to the large, reusable ones which make the Mars colony possible. 

And you can kind of look at any of the very large, successful companies that have come to shape an industry or an aspect of society and they all follow the same footprint. So, for us as a robotics and AGI company, we wanted this any robot, any task, no programming, just language. 

We're going to get there one step at a time where we start off with a small number of different robots doing a small number different tasks and every time we add a new task, the robots get more valuable, we get more customers, we get more money which lets us make the robots do more tasks and so on. And we fly that wheel until the robots really can do anything including solve very challenging problems like the ones that face us in society today.

Robotics is sort of the way to have a very direct economic impact very quickly.

And I also say that I like businesses that while it's good to have a very long-term, like in 30 years, I want to cure all disease or something kind of ambition for society, but I especially love businesses who can create value all along the way. And so you don't have to wait 30 years to find out if your joke was funny, you can create value for everyone every time you have a small success it gets you closer to that long-term goal and it also helps people in the immediate term. And so, that's what we're trying to do at Vicarious.

There's so many problems in robotics that require basically a lot of if statements and one off work and a lot of calibration and have you been able to apply more general techniques to have a machine that teaches itself to learn?

I think we've made a lot of good progress so far on Vicarious' mission and our customers certainly appreciate it in the sense that most of the customers we talk to, they don't own any robots. They've never been able to own any robots even though robots have been around for 30 years, 50 years even, because to get a robot to do something it just requires a lot of very brutal programming and mechanical engineering and fixtures and hard, up-front costs that then make the system inflexible. 

We're able to throw all of that away and instead provide them with robotic labor as the service, the same way we can get computers to service these days. It's much more flexible and dynamic than would be possible using any other system other than Vicarious provides. We're actually serving multiple customers and our systems are running almost 24, seven doing real work in factories in America. We can do a bunch of different tasks. 

You can see at the website if you want to dig a little bit deeper, but it's everything from packaging, machine tending, kitting, palletizing, depalletizing, sorting. 

There are many different tasks we support and many more coming soon.

And then these are classically things that would be sort of single purpose. There are definitely robotics companies that have done single things really, really well per se, but none that are sort of general purpose across so many different tasks.

For us, the difference between us and many of the robotics companies that exist that are much smaller than we are is the average robotics company builds one product and has to charge a lot of money for it because they just have one product. And even building one robotics product is very difficult and so, you have to build a whole bunch of custom stuff to get it to work. And so, you're not able to advertise the cost of your product over a very large customer base or over a large set of applications. 

Whereas for us, we just have one AI layer we've spent the last 10 years building. And so, because it's so comprehensive, we can create new applications relatively quickly. And we can advertise all of the costs of doing a deployment over a customer who needs 50 or a hundred or 500 robots. And so, that makes the economics much better for us and much better for them.

You know recently I was on an episode of Netflix's explained series by Vox. And the topic was coding. And one interesting direction they took that I didn't expect but it was definitely right was they started identifying that when you talk about traditional robotics approaches, what you're talking about is rooms full of coders writing code that would do one specific purpose within a factory, for instance. Like just picking this up and putting it in this other thing, right, loading and unloading. It's like just one vertical and you can have a team of engineers working on just that one thing. There's this coming revolution where the machines can program themselves. That sort of sounds like what you've already started working on, basically. One offsets of being programmers that are sort of doing these sort of infinitely deep learning projects that only do that one thing, they're really learning systems that learn how to learn.

And you can see some of that in the Science Robotics paper in terms of in that one, you really know this program itself. Like you show it a pair of diagrams in one diagram the apples and oranges are all mixed together and in the second diagram, the apples are on the right and the oranges are on the left. And then you show it maybe two or three examples of that and it figures out, okay, the program that you're trying to communicate to me is sort apples and oranges and on the left and on the right in this way. And then it literally writes its own code. And then it executes that code on the robot. So, that's exactly the direction that we've gone with it and continue to go.


What advice do you have for people who are sort of us, but when we're 18 or 22, just starting out, maybe just learning to code or reading the right things, following the right people, but what does society want from us builders and creators, right? What advice would you share to the 18 or 22-year-old version of yourself based on what you know now?

How much time do you have? I would emphasize how hard it is. I think that a lot of emphasis in the startup world on building short-term, get it to market, flip the company and move onto the next thing kinds of businesses and products for markets that are already over-served. So, I think about things that are primarily consumer apps or in heavily-saturated markets. And then there's very few people who take an interest in more esoteric, but important building blocks in society. 

So, I think about businesses like SpaceX or like Flexport or like Coinbase, who are now very large and successful. But to build them, you had to be kind of weird and very interested in something that most people were too busy building social media apps or building something that was much more short-term or narrowly focused to want to invest time in. 

I guess the advice or encouragement I could give to young people who are thinking about this as a line of work is just to reflect very hard on what is the biggest gift you feel like you can give to your fellow humans and decide with full intention to make that your life's purpose. And know that it's going to be incredibly hard and painful and you'll be met with resistance, but ultimately by being able to give your largest gift to society, you're also giving it to yourself. 

You'll feel the most fulfillment by knowing that you've become who you were supposed to be. I guess that's the advice that I would offer to young people who are thinking about having a career in startups or core technologies. 

Technology is the ultimate lever humanity has ever found on creating a better life for everyone here. And you can be part of that, but I would encourage you to be part of it in a way that leverages your unique gifts and your unique weirdness is the wrong word, but interests and set of backgrounds and passions. There might be something that you're very, very interested in that's really unusual. 

I remember meeting Ben Silbermann, who started Pinterest, and he showed me the demo and it was him and his co-founder or something at the time and he was like, you can collect things, like make websites where you have pictures of all your stuff. And I'm like, why would anyone use this? Or even you, Garry, when you showed me Posterous at the first meeting of our YC class. I'm like, what are you building? And you're like, it's email for blogs. You're like, email can make a blog. How did you get in? [laughs]

And so, I think if you can tap into what you really feel in your bones is the gift that you have to give to all the other humans and then build, build, build to try and make it so. And be prepared for setbacks, but stay true to that gift that you want to give. That's where I would apply my energy.

I feel like even in this interview you've been able to share both of those things. We talked about AGI, it's probably the biggest thing that humanity will see in the next hundred years or ideally much sooner than that and you're trying to bring it about much sooner than that. But at the same time, you're able to basically bring it down to the level of how do I make money? And how do we actually economically make a difference in the lives of your customers, the people who, they're making widgets, they're making X and there's a top line and a bottom line and the technology can actually be applied already in a way that puts dollars and cents in peoples' pockets and it's economically viable and that's what can really grow, like you said earlier. That's really powerful. Thanks for modeling it. It's really, really cool and also rare. What you've been able to do is very, very rare.

You know, as we were talking about it, it just reminded me of how hard it is. 

I think if you go to Vicarious' website, you see a list of all of the most famous people as our investors and all these publications and the highest profile possible academic venues and our incredible team and really polished videos of the robots doing work and it's been 10 years. It's been an incredible and at times wonderful and at times heartbreaking journey to build the company. 

And I think anyone who's trying to do something that is their calling, is their destiny, it's going to have that character to it where on the outside. Like you look at what Elon's been able to do and on the outside you're like, wow, electric cars and tunnels and spaceships, that's just super cool. But you hear him talk about the moments where he had to risk it all or where he was sleeping on the factory floor and all the short sellers were squeezing the life out of the company and it was on the verge of bankruptcy so many times.

I want people who choose to walk this path to choose it with eyes open about that it's going to be hard. And there's going to be moments that you doubt that you should be doing it or that it's going to work out and that's part of the choice to do it. 

I knew when I started Vicarious I want to build AGI. And we pitched it to virtually every investor in and yet the only ones who said yes were the ones who backed me in my previous company, basically, with a couple of exceptions like Dustin Moskovitz and I knew and with eyes open committed myself. 

This may not go anywhere, this may be incredibly hard and I'm comfortable spending the next decade working kind of by myself on this thing that maybe no one else winds up caring about or doesn't go anywhere because it was that important to me. And I would find something that you feel that way about and commit to that because in those darkest moments, that's what will keep you going. 

There's no pot of gold or exit or shiny article in Forbes or something that makes it all worth it. The thing that makes it all worth it is being true to yourself and being true to know that this is what I'm supposed to be doing even if it's impossibly difficult from time to time.