Zhang Hongjiang, founder of BAAI: ‘AI systems should never be able to deceive humans’

Zhang Hongjiang is a computer scientist and senior executive who has become one of China’s most outspoken voices on the need to develop artificial intelligence safely.

After earning his PhD in Denmark, he worked in Singapore and Palo Alto, California, for several years. He then returned to China in the early 2000s to help set up Microsoft Research Asia, before going on to build Kingsoft into one of China’s leading software companies.

He stepped away from that business in 2016 only to return to the frontline of Chinese tech, two years later, by establishing the Beijing Academy of Artificial Intelligence (BAAI), a non-profit that brings together industry and academia.

In recent years, Zhang has become China’s leading advocate for the regulation of AI, to ensure it is not a threat to humanity. Here, he talks to the FT’s Ryan McMorrow and Nian Liu about the importance of international collaboration on AI safeguards, as well as the opportunities and challenges facing China.

A man in a dark suit and glasses is speaking on stage, holding a small device in one hand. Behind him is a large screen with Chinese characters in white, set against a red background — Zhang Hongjiang on AI governance: “I have spent a lot of time trying to raise awareness in the research community, industry, and government”. © China News Service via Getty Images

Nian Liu: It seems like you’re paying a lot of attention to AI governance?

Zhang Hongjiang: I have spent a lot of time trying to raise awareness in the research community, industry, and government that our attention should not only be directed at the potential risks of AI that we are already aware of, such as fake news, bias, and misinformation. These are AI misuse.

The bigger potential risk is existential risk. How do we design and control the more powerful AI systems of the future so that they do not escape human control?

We developed the definition of existential risk at a conference in Beijing in March. The most meaningful part is the red lines that we have defined.

For instance: an AI system [should] never replicate and improve itself. This red line is super important. When the system has the capability to reproduce itself, to improve itself, it gets out of control.

Second is deception. AI systems should never have the capability to deceive humans.

Another obvious one is that AI systems should not have the capability to produce weapons of mass destruction, chemical weapons. Also, AI systems should never have persuasion power . . . stronger than humans.

The global research community has to work together, and then call on global governments to work together, because this is not a risk for your country alone. It’s a huge risk for entire mankind.

I learned so much at the International Dialogue on AI Safety in the UK last October. It’s actually a system of work from the bottom up. It’s technical work. It’s not just policy work.

I realised that, in Europe and the US — especially in Europe — there are technical people who have been working in that field for many years and who have developed quite a few systems to measure and define the risk of AI systems.

The British took great initiative. Like they did in last year’s first international government summit.

Ryan McMorrow: When you’re in these discussions, are the viewpoints of leading Chinese scientists and policymakers similar to Western ones?

ZH: Very much. The argument centres on whether the current AI systems actually possess artificial general intelligence (AGI) capabilities, or if they will lead to AGI, and how far away. But, if you agree the risk is there, then there’s really not much difference on viewpoints.

A diverse group of professionals stands together for a photo in a well-lit conference room. The long table in front of them is equipped with water bottles, papers, and microphones — Zhang Hongjiang (front row, sixth from right) at the summit he helped organise in Beijing in March with other top Chinese and western scientists

[Ex-Google AI pioneer] Geoffrey Hinton’s work has shown that the digital system learns faster than biological systems, which means that AI learns faster than human beings — which means that AI will, one day, surpass human intelligence. If you believe that, then it’s a matter of time. You better start doing something. If you think about the potential risk, like how many species disappeared, you better prepare for it, and hopefully prevent it from ever happening.

Scientific collaboration should be a common practice. But, unfortunately, it’s not a common practice now. Of course, AI is the most advanced technology so it has become more sensitive. Especially between China and the US, geopolitics does affect these collaborations. I hope, at least on the science level, this collaboration can continue.

NL: Speaking of China and the US, how do you think the US export controls on processors will affect the long-term development of Chinese AI?

ZH: I think it will have a huge impact. I’ve always thought of AI as a system comprised of three things: algorithms; computing power; and data. Without the computing power, today’s technology would become more limited. The essence of GPT models is their scalability. That is, if you increase the size, the number of parameters of a model, its performance will improve. If you scale up the volume of data you’re feeding into the model, its performance will also improve. This is what we call the scaling law for models.

And, as you increase both the parameters and data, you have to also scale up computing power. So, if you’re limiting computing power, you’ll of course hit a roadblock. There’s no doubt about it.

RM: To get around these obstacles, China is pushing the development of homemade processors. But, at the same time, most of the existing models here are built on top of the Nvidia chip ecosystem. Is it possible to just port a model between different chip ecosystems?

ZH: Their software has got to be compatible, which is tough. People have been building a lot of models, and the most efficient models have been built within the Nvidia ecosystem. So, if you want to build your own ecosystem, it will take time and effort. It’s better to be compatible with the Nvidia ecosystem.

It’s very much like software compatibility issues between Windows and Mac. For example, if you build something on Android, you essentially have to adjust it to work on iOS. Today, for companies developing software apps, they have to develop on both platforms, which means they must have dedicated teams. The same principle applies if you are working on models: you might need to build for two systems, which will be hard and also costly.

RM: How hard is it to build two different systems?

ZH: It is pretty hard because you’re tuning software stacks, which are sets of basic function modules needed for the training systems to run on. If you have to build the complete platform all together, that’s just a lot of effort. It’s as hard as building another Android.

RM: It seems like Huawei is in the lead to be China’s Nvidia: it has the Ascend chips made here [in China] that the US can’t block. Is that what companies should be building on now?

Close-up view of an NVIDIA graphics processing unit (GPU) chip. The chip has a reflective surface with text that reads “NVIDIA TW 2244A1 UA9944.M02 GH100-889F-A1.” The chip is encased in a black and silver frame — Nvidia’s H100 artificial intelligence supercomputing graphics processing unit © I-Hwa Cheng/Bloomberg

ZH: My understanding is that it will level off. Just like when you build software, you don’t want to build on too many operating systems. Think about it, in the PC age, in the mobile phone age, there are two systems. That’s it. Think about the chip architecture, how many architectures are there? Not that many.

RM: Will computing power be even more critical in the multi-modal future, with humanoids and visual models. Do those require even more computing power than plain language?

ZH: Yes. Vision data, which includes image and video data, is much larger in volume compared to language data.

RM: So computing power is a hurdle, but where do China’s advantages lie? Policymakers have been very active here, is that an advantage? Or the talent?

ZH: Policy is the last thing that comes to mind when we talk about China’s advantages in AI. I think China’s advantages in AI mostly lie among young entrepreneurs who come through disappointment after disappointment but still continue to do start-ups and pursue their dreams. I think the only place that we can compare is Silicon Valley.

Someone gave me a number, though I’m sure you can find more accurate ones: 30 per cent of the worldwide top AI talent was originally born in China but a large portion of them work in the US. If 10 per cent of the top talent remains in China, that still represents a significant number of people.

On top of that, there’s the vast market. There are so many scenarios [in which] to apply AI — which, in turn, presents good research topics and provides good research data. This enables research institutions and universities to work on good problems.

So I think talent, application scenarios, and entrepreneurship are China’s advantages. But I don’t think government policies are necessarily an advantage.

NL: What about data? Your old colleague Kaifu Lee made the point in his book that China has all this data, which will be a huge advantage.

ZH: China is a huge internet market, so China has had tons of data as an advantage in the past. But, then, we realise when we look at the GPT models, and look at the data that is fed into the models, and the distributions of data, they are from the web. And, if you look at the web, China’s corpus is not that much. It’s in the low single digits. I think less than 5 per cent. A lot of languages have less than 10 per cent.

I think English accounts for about 60 per cent or 70 per cent, so it’s predominantly English. So, if you train your model with data from the web, then Chinese data is not that much. [And] whichever language has more data, it will be better in that particular language. If you look at Wikipedia, all the web data, Chinese data are not dominant. So I won’t say, in terms of language data, there’s an advantage.

But, when we come to embodied AI, when we come to robotics, when we come to manufacturing, China has tons of data. Way more than other countries.

For example, a smart city model. China definitely has more data than any other country. Just look at the number of cameras in China. Look at the number of electric cars that have basic autonomous driving capability. They have so many cameras. So it depends on which area you’re talking about.

RM: Is this next generation of vision models in humanoid robots just getting started?

ZH: It has become a hot topic in the past 18 months, especially with developments like GPT-4. It has impressive capabilities in recognising images and objects in images. And, then, if you look at Sora, and Gemini 1.5, as well as Claude from Anthropic, and the new Llama 3, they all exhibit this multi-modality — which, basically, is image capability.

If you equip a robot with a large multi-modality model, it can perform tasks way beyond what it was trained for. It will also understand commands that it was not originally trained on. Suddenly, you realise a robot can understand way more than you thought.

A good example is RT2, released by Google about a year ago. For instance, when you ask the robot to pick up a toy on top of this table, a toy that is an already extinct animal, it picks a dinosaur. That’s a very complicated reasoning process because it wasn’t directly told that a dinosaur is an extinct animal, but the language model knows that. So, among the various animal toys, it picks up dinosaurs.

Another example is: ‘Give a can of Coke to Taylor Swift.’ On the table, there were four picture frames. The robot picked up a Coke and placed it on the picture of Taylor Swift. So, think about that process. The robot can recognise who is in that picture. It has a notion of who Taylor Swift is. That wasn’t trained into the language model.

That’s why you see Figure, a new robot start-up that teamed up with OpenAI, which OpenAI invested in. You see another start-up from Berkeley: Pi. There are many of them.

RM: What about the ones in China?

ZH: Galaxy Robot, which was incubated at BAAI, is an example. It was originally founded by a professor from Beijing University who worked on embodied AI at BAAI.

In the latest demo I saw, when you say “Oh, I feel thirsty”, the robot arm will pick a bottle of water among five different things and deliver it to you. That’s what you want and expect a good nanny to do.

The instructions to the robot are no longer very explicit. It’s the robot that understands. There are quite a few companies now moving in that direction.

Zhang Hongjiang: “Llama models are the most powerful and therefore popular large-language models in the open source world.” © Alamy

RM: A lot of Chinese companies are using Llama as their models. Is that what companies all over the world are doing?

ZH: Llama models are the most powerful — and, therefore, popular — large-language models in the open source world. So I’m sure a lot of people and companies will use them. And, for someone who mostly focuses on the academic world, it’s super helpful to have an open source model that you can analyse, tune, and do research on, given how costly it is to train a large-language model from scratch.

Again, you can draw an analogy to software, where Linux and open source became super popular. And there are many open source databases that are super popular. I would say that internet companies are entirely dependent on those open source databases and systems. They helped speed up the development of the internet and the cloud. And China definitely benefits a lot from that.

NL: That leads to this open-source versus closed-source debate. Baidu’s Robin Li recently said that open-source models “make little sense” and keeping the models closed was crucial for having a viable business model. What do you think of his comments?

ZH: I must say I don’t entirely agree with him. But that argument has always been there, going back 30 years. It was the closed source Windows and Mac versus Linux, and then, later, the closed source iOS versus the open source Android. Although Android is open source, it’s heavily controlled by one company. So, this debate has always been present.

If you look at the commercial world, the leaders tend not to favour open source because they are leaders in their field. Meanwhile, followers and others who are trying to change things usually adopt the open source approach. Linux has done it, Android has done it, and both have been very successful. So I won’t say which one has an absolute advantage. It will take a long time before we can tell which one will win. But, more likely, they will coexist.

RM: China is always very good at applying stuff, commercialising. Can you talk about some interesting applications of AI that you’ve seen here?

ZH: I think there are two companies that have done a great job. One is Beijing-based Moonshot, which actually has some association with BAAI. Their product Kimi, very much like ChatGPT, is great and very popular.

Another one is Minimax, a Shanghai-based company. They started their effort [to build] large models at least a year before [the] ChatGPT launch, so they are not a copycat. They are focusing on applications like digital avatars.

I would say, if you see any good applications in other markets, China will soon have it, if not already. I’m not saying that China only follows. Actually, from time to time, in certain areas, China has taken the lead.

NL: In the west, all these AI start-ups are raising crazy amounts of money at very high valuations. Is it the same in China?

ZH: Yes, it’s the same in China. The only difference is that China has more companies in one area — which I think could be too many. The US has maybe three or four start-ups focusing on foundation models. China has how many?

NL: Hundreds

ZH: The Chinese market is very “juan” [over-competitive]. This isn’t new; it’s been like this for the last 20 years. As for the bubbles, I wouldn’t say China has more than the US. At this moment, I think these are still good bubbles. There will be winners. I think it won’t be very long before we see consolidation.

RM: What will the business model be for all of these foundational models? Will Big Tech’s models win? Or those from China’s more nimble start-ups?

ZH: In the US, it’s very clear: enterprise productivity tools. We already see the effect with tools like Co-pilot in Office. However, for consumer applications, people are still exploring. We haven’t seen a big success as yet. China definitely has a lot more people exploring this area.

The big tech giants have to engage with AI models because, otherwise, they are no longer a platform company. They either have to develop large models or they have to acquire them. So, without a doubt, they will continue investing.

The screen of a smartphone shows a conversation with an AI assistant named Kimi in a chat app. The messages are in Chinese — Moonshot’s Kimi ‘is great and very popular’ © Future Publishing via Getty Images

For the small start-ups, their challenge isn’t just to raise enough money and develop good models, but also to define their business models and find their users. The two companies I mentioned, Minimax and Moonshot, were focused on consumers from the very beginning. In China, whichever start-up succeeds in the consumer space . . . will ultimately succeed, despite the immense pressure from the big giants.

RM: So you’ve stepped back from leading BAAI, what are you spending your time on now?

ZH: I’m excited and believe that large AI models will change the way we do robotics, and finally give robotics a break, it’s just so thrilling. Also, I feel that China has some advantages here. China has the largest manufacturing base and is more sophisticated than many other countries in hardware. That’s what interests me.

I would say a large part of my motivation is driven by curiosity. It’s something I feel excited about and also something I feel I can learn from. Also, I spend more time now overseas. I spend time in Singapore and participate in some government-organised meetings and activities there. I would like to spend more time in Silicon Valley and just to be more in touch.

I’m still involved in organising BAAI’s technical conference on AI. It’s purely technical. The entire programme is organised by technical people actively working in the field, focusing on various aspects of AI. People come here to learn and exchange ideas, just like at any academic conference. It’s not commercial.

There’s value in having people come together to learn and exchange ideas. Last year we had Hinton, [Yann] LeCun, Sam Altman and we had Chinese scientists exchange and argue with them.

This transcript has been edited for brevity and clarity

Zhang Hongjiang, founder of BAAI: ‘AI systems should never be able to deceive humans’

Promoted Content

Explore the series

Follow the topics in this article

Comments