XPENG Offers More Human-Like Autonomous Driving
XPENG P7. Photo by Larry Evans
May 24, 202629 minutes
Larry Evans
0 Comments
Support CleanTechnica's work through a Substack subscription or on Stripe.
In recent articles on XPENG, I have focused on the development of human employees who make technology possible and the technology tools that they use. However, the output of the people using automation and AI tools is what matters the most to customers. It is especially noticeable in autonomous driving systems. When test driving the P7 with VLA 2.0 last month, what impressed me the most was how human-like it was. Honestly, it drove a bit smoother and could see better than I could, but the way it dealt with the road felt more like an experienced driver than a machine. The judgment calls and how it anticipated the road ahead seemed thoughtful and intuitive. In digging into the details, this is not just mimicking human driver behavior, but rather more closely reflecting human intelligence within their Artificial Intelligence.
Built on human-like first principles, the system operates on a “what you see is what you get” basis. This results in stronger generalization capabilities, allowing the software to be utilized across all scenarios on a global scale.
Beyond the technology details, according to XPENG, the core advantages of VLA 2.0 are: Reduced Loss, Faster Response, Human-like Performance, and Intelligence Emergence.

A Big Brain
With up to 3000 TOPS on the new GX, XPENG’s in-house developed Turing AI chips provide more computing power than competing systems. Beyond the nominal computing power, the effective computing power is even higher. This computing power lets cars adapt better to local conditions and drivers. For example, when I was pushing a P7 to get a sense of the acceleration, braking and cornering capabilities on a test drive before handing off control to the car, it was noticeably more aggressive initially before settling into a smoother driving style. As XPENG describes their Turing AI chip:
Tailored specifically for large AI models, it integrates dual proprietary NPUs and domain-specific architectures (DSA) to achieve integrated hardware-software R&D, boosting model execution efficiency by 12 times. Through the joint optimization of the chip, compiler, and model, on-vehicle chip utilization is approximately 4 times higher than that of “general-purpose chips + open-source models.” This architecture achieves a 51% increase in neural network computing speed, a 300% surge in information throughput per second, a 19% improvement in perception module computing speed, and a 145% increase in information processing capacity.
Having that added capacity means that more information can be processed onboard, without having to consult with an external source. That lets VLA 2.0 have a more human-like interaction with the physical world.

Doesn’t Write a Book to Take Each Step
For someone learning to complete a simple physical task, they rarely put it into words. If you were to describe every audio and visible piece of information, language processing, tactile sensation, balance adjustment, muscle contraction, joint bending, rotation, etc. involved in responding to the command “throw me the ball,” it would add up to a lot of text. If you had to do that for every action, it would consume a massive amount of time and brain power. In human beings, this kind of overthinking can lead to “Paralysis by Analysis in Athletes,” where performance suffers from overanalyzing every move. But this is how traditional long-language models tend to process the unstructured data of the physical world.
However, a child learning to throw a ball will watch, try, adapt and sometimes take coaching. They will develop what is often called “muscle memory.” Once someone learns the task, they will not have to analyze the action, but will act, tweaking performance for circumstances along the way. That lets a baseball player process information around them quickly and improve their performance. VLA 2.0 works in a similar fashion:
VLA 2.0 restructures the traditional paradigm by innovatively eliminating the “language translation” stage. It achieves direct end-to-end generation from visual signals to action commands, aiming directly for the L4 autonomy endgame. Supported by a 32x ultra-dense computing chain, the system’s prediction accuracy has been significantly enhanced, with prediction error reduced by 33%. When handling complex “long-tail” scenarios, the system can preemptively predict risks and respond calmly to changes—much like an experienced driver—moving beyond mechanical and rigid maneuvers.
More streamlined processing for “Physical AI” means that more information can be processed, which becomes important for the unstructured data in the real world. XPENG estimates that VLA 2.0 on-vehicle inference token consumption with Physical AI is roughly 80 times the daily Digital AI volume nationwide in China.

Learning New Roads
When a person goes from driving in one country to driving in another, they don’t relearn to drive from scratch. The VLA 2.0 system takes what it learned in the challenging roads of China, takes in information from the driver and drivers around it, and adapts. As such, on-road driving needs no rule re-writing for local regulations, no large-scale local data collection and no dependence on HD maps. This not only means that the system can adapt quickly to new roads, but it also avoids data collection concerns that could create a regulatory hurdle.
The second generation VLA is a humanoid product. When you learn driving in China, when you go a global, you do not have to learn it again, because your driving capacity, your sensing of the road conditions, they are common.

However, it doesn’t just learn in the physical world. Through simulation via “X World,” VLA 2.0 can accelerate the learning process for local rules and conditions in different countries.
X World can generate in the virtual world. So, this picture is not extensive. When it comes to inputting the actual picture in the front, it has mimicked the environment in Germany for the second-generation VLA 2.0 to perform simulation, to have virtual testing in the virtual environment. So in this way we can realize test driving under different conditions, in different nationalities and climates, because of our technological methodology, which does not have to collect data massively locally, and we do not have to rely on high-precision maps to accomplish the preliminary experience like this.

Learning Fast & Learning Better
When children go to school, they are not just learning new information. They are also learning how to learn new information. Learning how to prioritize. Learning how to avoid noise and distractions. While VLA 2.0 is learning to drive better, as I noticed comparing my test drive in November to what I experienced in April, it is also getting better at learning.
The most recent example is X-Cache, “a training-free control logic with cache contents refreshed in real time during generation.” XPENG claims it achieves “a 71% block skip rate and delivers 2.6–2.7× measured inference speedup, with virtually no loss in visual quality.” As such, more processing power is dedicated to perception and decision-making.
And this is not the only new skill being developed. “XPENG will continue to explore more technological breakthroughs in the field of autonomous driving, enabling XPENG smart driving to train harder in the digital world and drive more steadily in the real world.”

A More Human Technology Approach
It seems fitting that a company that focuses on developing its people and takes a more human approach to AI and automation tools will have a L4 system that is more human-like in its operation and function. A system that is built upon the uniquely human understanding of customer needs but enabled by technology. There is a clear focus on pleasing customers using the more human-like autonomous driving system that you can feel while using it. You can also see the more human-like implementation in how the IRON robot walks. I expect it will also feel more human-like in how it interacts with its users. I also expect that XPENG’s recently launched Robotaxi will do well in serving the needs of its human customers.
This isn’t top-down or rigid in execution or function, but rather more of an emergence from real world use. By taking a more human-like approach to technology, the technology becomes better fit for the humans who use it. There are an increasing number of competent intelligent driving systems. They may be safe and functional but may not have the human-like driving appeal of VLA 2.0. Likewise, there may be other functional Robotaxi designs that you can ride in, but the GX is the type of vehicle that people will want to ride in. Competition for autonomous driving will continue to intensify, and XPENG will continue to develop technology. But the humanity in the customer-centric design and implementation of technology gives them a strong advantage moving forward.
Sign up for CleanTechnica's Weekly Substack for Zach and Scott's in-depth analyses and high level summaries, sign up for our daily newsletter, and follow us on Google News! Advertisement Have a tip for CleanTechnica? Want to advertise? Want to suggest a guest for our CleanTech Talk podcast? Contact us here. Sign up for our daily newsletter for 15 new cleantech stories a day. Or sign up for our weekly one on top stories of the week if daily is too frequent. CleanTechnica uses affiliate links. See our policy here.CleanTechnica's Comment Policy
Share this story!
Схожі новини
Apple quietly prepares new Gen AI website ahead of WWDC 2026
Apple Watch may soon get smarter heart-rate tracking with watchOS 27
How Xiaomi’s push into AI, chips and EVs is future-proofing its hardware empire