How we test AI at ZDNET

Innovation Home Innovation Artificial Intelligence How we test AI at ZDNET AI is the hottest topic in tech with new models and products launching daily. Here's how we test the latest AI developments at ZDNET. david-gewirtz

Written by David Gewirtz, Senior Contributing EditorSenior Contributing Editor May 1, 2026 at 10:39 a.m. PT

How we test AI at ZDNET — Elyse Betters Picaro / ZDNET

Follow ZDNET: Add us as a preferred source on Google.

ZDNET's key takeaways

ZDNET tests AI with hands-on, real-world use.
No vendor influence, no pre-publication review access.
Standardized tests drive fair "best of" comparisons.

Here at ZDNET, we know we have an awesome responsibility. We know that you often make purchasing decisions in part based on our reviews. It's important that you get clear, unbiased, well-considered reviews so you have a reliable starting point for deciding where to spend your money and/or time.

And yes, we take that responsibility just as seriously for free products, because time these days is as scarce a resource as cash money. We don't want you to waste your time any more than we want you to waste your money.

Also: ZDNET AI policy

We somtimes work with vendors to obtain access to their products and services in order to review them. But they never get to see reviews before we publish. They never get to influence what we say in our reviews. Our reviews are always fair and focused on assessing products for their usefulness to our readers.

How we test AI in 2026

So let's talk about how we test AI here at ZDNET. Keep in mind that AI is sneaking its way into just about everything, so it's a pretty big portfolio. We look at large language models, development tools, image generators, AI-enabled applications, and even the occasional AI device like vacuum cleaners (good use of AI) and AI pins (not so much).

We test products and services based on a wide range of factors. Our prime directive is that all reviews require hands-on experience and real-world tests. Practically, that means while we might report on a benchmark result from a press release, we don't consider them in reviews.

When we look at products and services, we tend to present two different types of reviews. When we're looking for the top performers in a category, we produce our "Best of" lists. When we do a deep dive into a product or service, we often tell personal stories about our long-term experiences using that product. These different approaches allow us to explore products and services from a number of different perspectives.

How we do comparative reviews

Producing our comparative reviews (also called "best lists") is really a three-stage process. The first stage is constructing evaluation criteria to help us objectively compare products. The second stage is choosing the products to compare. And the third stage is the actually test-by-test comparison of products.

When we get started, we always ask, "How are we going to evaluate this category?" I usually construct a series of tests, which I then document in the best list article. The tests help us evaluate performance, value, helpfulness, accuracy, safety, privacy, and more. We like to standardize on a test so that when it's time to compare products, we know we're being objective.

For example, in the best chatbots review, there's a full test methodology documented at the end of the product. Check it out. The same is true of the best AI image generators comparison.

When it comes to choosing candidate products, there are often some obvious products that get added to our selection candidate list. For example, when looking at chatbots, ChatGPT, Gemini, and Claude are obvious candidates.

Then we dig in deeper. We review products or services readers have asked us to evaluate. We add candidates based on the overall buzz around a category from places like forums, user groups, and social media. And sometimes (but not always), we'll add a product as a candidate when a vendor brings a relevant product to our attention, and it's a good fit for the category.

We usually wind up with a candidate list of five to ten products. Often, a quick look at the test methodology will eliminate some products. Some are too expensive compared to the others. Some just don't fit.

For example, I'm constantly pitched by vendors with fee-based classes who think their courseware is so good it should be included in our best free classes list. Despite their fervor, their fee-based courses will never be included in a list of free offerings.

The process of choosing the test candidates, arranging access to the products and services, and making sure everything is ready for the tests to run can vary in time. When I did my first look at AI website builders last year, it took 231 emails back and forth with vendors, and over six months to get everything in place so I could test their products. This year, updating the project took only two months, and fewer than 50 total emails.

That leads me to two other items: the actual testing and the re-testing. The actual testing is straightforward, if time-consuming. Because we already have a testing methodology and a standard set of tests by the time we have the products in hand or the service accounts set up, we can just run through the tests. We record the results test by test, screen by screen.

Later, we try to normalize the results, often doing a bit of math to give the products a comparative performance value and weighting. The criteria for those metrics are also documented.

And then, the list is published. But that's not the end of the story.

In a field as rapidly changing as AI, the products and services don't stand still. Some products will crash and burn, some vendors will run out of funding, or something else will go terribly wrong. For others, they'll just get better and better. In any case, after six months to a year, the best lists are pretty much out of date. That was certainly the case with the AI website builder reviews. Last year, all of them were pretty terrible. This year, there are a few that are actually pretty great.

Some of my favorite comparative reviews for the AI category include:

Living with the products

Another way we review AI products is by living with them and doing projects with them. These go beyond traditional reviews because we put the products and services through days and weeks (sometimes months and years) of work.

The most prominent examples of this are my coding-related articles. It's very hard to objectively compare AI coding tools without actually building something. But coding a class assignment is far different from building a product or debugging an active customer issue.

Often these projects are ongoing. That ongoing work spawns a ton of great stuff to talk about. The impressions also change.

When I first looked at OpenAI's Codex coding AI, it was very early and I didn't like it at all. As Codex improved, I did another test with it, this time seeing if I could update my security product. I managed to get 24 days of coding in 12 hours, but also found some pitfalls. As the service improved further, I did another test, where I found myself producing 4 years of product development in 4 days.

The same sorts of experiential review articles have come out about Gemini, ChatGPT, Claude Code, the various image generators, and more. As the tools keep evolving, we keep finding new ways to use them and put them through more tests and deep dives.

It's an ongoing process and we get to take you along for the ride. Here are some of my favorites from the AI world:

You are a big part of the process

We get a lot of feedback from readers through email, social networking, and article comments. You help us understand what you want us to look at. We also appreciate that you hold us to a pretty high standard.

We also really appreciate it when you share your impressions of the products we review. Many of you are quite skilled and knowledgeable. So your perspectives really help keep us informed which, in turn, helps us grow in knowledge and keep you even more informed. Effectively, our work here on ZDNET is peer reviewed by millions of our fellow professionals, power users, and enthusiasts: you, the ZDNET readers.

We're diligent about our reviews because we know how important they are to you, how much you take them into account when making purchasing decisions, and that you're putting real money and time on the line, often based in part on what we share on ZDNET.

Always feel free to reach out if you want us to look at something new. What AI category, product, or service do you want us to dive into next? Let us know in the comments below.

You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Artificial Intelligence

How to get started with Goose, a free open-source alternative to Claude Code

I tried a Claude Code alternative that's local, open source, and completely free - how it works

How to remove Copilot AI from Windows 11 today Computer trash symbol on dynamic digital background. Glowing digital data delete icon abstract 3d illustration. Bright recycling sign.