BETA — Сайт у режимі бета-тестування. Можливі помилки та зміни.
UK | EN |
LIVE
Технології 🇺🇸 США

A recent experience with ChatGPT 5.5 Pro

Hacker News _alternator_ 1 переглядів 29 хв читання

We are all having to keep revising upwards our assessments of the mathematical capabilities of large language models. I have just made a fairly large revision as a result of ChatGPT 5.5 Pro, to which I am fortunate to have been given access, producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me.

The background is that, as has been widely reported, LLMs are now capable of solving research-level problems, and have managed to solve several of the Erdős problems listed on Thomas Bloom’s wonderful website. Initially it was possible to laugh this off: many of the “solutions” consisted in the LLM noticing that the problem had an answer sitting there in the literature already, or could be very easily deduced from known results. But little by little the laughter has become quieter. The message I am getting from what other mathematicians more involved in this enterprise have been saying is that LLMs have got to the point where if a problem has an easy argument that for one reason or another human mathematicians have missed (that reason sometimes, but not always, being that the problem has not received all that much attention), then there is a good chance that the LLMs will spot it. Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.

I decided to try something a little bit different. At least in combinatorics, there are quite a lot of papers that investigate some relatively new combinatorial parameter that leads naturally to several questions. Because of the sheer number of questions one can ask, the authors of such papers will not necessarily have the time to spend a week or two thinking about each one, so there is a decent probability that at least some of them will not be all that hard. This makes such papers very valuable as sources of problems for mathematicians who are doing research for the first time and who will be hugely encouraged by solving a problem that was officially open. Or rather, it used to make them valuable in that way, but it looks as though the bar has just been raised. It is no longer enough that somebody asks a problem: it needs to be hard enough for an LLM not to be able to solve it.

In any case, a little over a week ago I decided to see how ChatGPT 5.5 Pro would fare with a selection of problems asked by Mel Nathanson in a paper entitled Diversity, Equity and Inclusion for Problems in Additive Number Theory. Nathanson has a remarkable record of being interested in problems and theorems that have later become extremely fashionable, which has led him to write a series of extremely well timed and therefore highly influential textbooks. In this paper, he argues for the interest of several other problems, some of which I will now briefly describe.

If A is a set of integers, then its sumset A+A is defined to be \{a+b:a,b\in A\}. For a positive integer h, the hfold sumset, denoted hA, is defined to be \{a_1+\dots+a_h: a_1,\dots,a_h\in A\}. Nathanson is interested in the possible sizes of hA given the size of A. To that end one can define a set \mathcal R(h,k) to be the set of all t such that there exists a set A with |A|=k and |hA|=t.

An obvious first question to ask is simply “What is \mathcal R(h,k)?” When h=2, the answer is the set of all integers between 2k-1 and \binom{k+1}2. It is an easy exercise to show that if |A|=k, then 2k-1\leq|A|\leq\binom{k+1}2, so this result is saying that all sizes in between can be realized. However, it is not true in general that hA can take every size between its minimum and maximum possibilities, and we do not currently have a complete description of \mathcal R(h,k).

Another natural question one can ask, and this is where ChatGPT came in, is how large a diameter you need if you want a set A with A and hA having prescribed sizes. (Of course, the size of hA must belong to \mathcal R(h,k).) Nathanson showed that for every t\in[2k-1,\binom{k+1}2] there is a subset A of \{0,1,2,\dots,2^k-1\} with |A|=k and |A+A|=t, and asked whether the bound 2^k-1 could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is clearly best possible. It wrote up its argument in a slightly rambling LLM-ish style, so I asked if it could write the argument up as a LaTeX file in the style of a typical mathematical preprint. After two minutes and 23 seconds it gave me that, after which I spent some time convincing myself that the argument was correct.

The basic idea behind both Nathanson’s argument and ChatGPT’s was that in order to obtain a set of a given size with a sumset of a given size, it is useful to build it out of a Sidon set, which means a set with sumset of maximal size (that is not quite the usual definition but it is the simplest to use in this discussion), and an arithmetic progression. Also, for a bit of fine tuning one can take an additional point near the arithmetic progression. Then if one plays around with the various parameters, one finds that one can obtain sets of all the sizes one wants. Nathanson doesn’t express his argument this way (it is Theorem 5 of this paper), instead giving an inductive argument, but I think, without having checked too carefully, that if one unravels his argument, one finds that effectively that is what he ends up with, and the Sidon set in question consists of powers of 2. ChatGPT obtained its improvement by simply using a more efficient Sidon set — it is well known that one can find Sidon sets of quadratic diameter. (One might ask why Nathanson didn’t do that in the first place: I think it is because the obvious idea of using a more efficient Sidon set becomes obvious only after one has redescribed his inductive construction. Is that what ChatGPT did? It is very hard to say.)

Next, I asked ChatGPT to see whether it could do the same for a closely related question, where instead of looking at the size of the sumset, one looks at the size of the restricted sumset, which is defined to be \{a+b:a,b\in A, a\ne b\}. Unsurprisingly, it was able to do that with no trouble at all. I got it to write both results up in a single note, to avoid a certain amount of duplication. If you are curious, you can see the note here.

I then asked what it could do for general h. I was much less optimistic that it would manage to do anything interesting, because the proof for h=2 makes fundamental use of the fact (due to Erdős and Szemerédi) that we know exactly which sizes we need to create. If we don’t know what the set \mathcal R(h,k) is, then it seems that we are forced to start with a hypothetical set A with |A|=k and |hA|=t and build out of it a set of small diameter with the same property. As it happens, I still don’t know how to get round that difficulty (I’m mentioning that just to demonstrate that my mathematical input was zero, and I didn’t even do anything clever with the prompts), but Nathanson mentioned in his paper a remarkable paper of Isaac Rajagopal, a student at MIT, who must have got round the difficulty somehow, because he had managed to prove an exponential dependence of \mathcal R(h,k) on k for each fixed h.

I’ll leave the previous paragraph there, but Isaac has subsequently explained to me that that isn’t really the difficulty. His argument gives a complete description of \mathcal R(h,k) when k is sufficiently large, and if one wants to prove a polynomial dependence for fixed h, then assuming that k is sufficiently large is clearly permitted. The real difficulty is that constructing the sets with given sumset sizes was significantly more complicated, and necessarily so because the degree of the polynomial grows with h, and one therefore needs more and more parameters to define the sets.

In any case, the task faced by ChatGPT was not to solve the problem from scratch, but to see whether it was possible to tighten up Isaac Rajagopal’s argument. Here’s what happened.

Isaac made some very interesting remarks about the nature of what the additional ideas were that ChatGPT contributed. Since, as I have already said, my mathematical input was zero, I invited him to write a guest section to this post. Just before we get to that, I want to raise a question (that will undoubtedly have been raised by others as well), which is simple: what should we do with this kind of content? Had the result been produced by a human mathematician, it would definitely have been publishable, so I think it would be wrong to describe it as AI slop. On the other hand, it seems pointless even to think about putting it in a journal, since it can be made freely available, and nobody needs “credit” for it (except that Isaac deserves plenty of credit for creating the framework on which ChatGPT could build). I understand that arXiv has a policy against accepting AI-written content, which makes good sense to me. So maybe there should be a different repository where AI-produced results can live. But various decisions would need to be made about how it was organized. I myself think that one would probably want to have some kind of moderation process, so that results would be included only if a human mathematician was prepared to certify that they were correct — or, better still, that they had been formalized by a proof assistant — and perhaps also that they answered a question that had been asked in a human-written paper. On the other hand, I wouldn’t want a moderation process that created vast amounts of work (unless the work was itself done by AI, but there are obvious dangers in going down that route). Anyway, until these questions are answered, this result is available from the link above, and perhaps, now that LLMs are so good at literature search, that will be enough to make it findable by anyone who wants to know whether Nathanson’s problem has been solved.

With just a few prompts, ChatGPT was able to improve the upper bound on N(h,k) (which I will define very soon) from exponential in k to polynomial in k. While its first improvement of the bound, from exponential in k to exponential in k^{\frac{1}{2} + \varepsilon}, was a routine modification of my work, the improvement to polynomial in k is quite impressive. To do this, ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof. My goal is to explain that idea, in a manner that will be digestible to my friends who are computer science majors as well as my math major friends.

The problem of bounding N(h,k) is closely related to a problem I worked on at the Duluth REU (Research Experience for Undergrads) program, of determining \mathcal{R}(h,k). In particular, \mathcal{R}(h,k) is the set of possible h-fold sumset sizes |hA|, where A can be chosen to be any set of k integers. N(h,k) is the minimal N such that we can achieve all of the values of \mathcal{R}(h,k) using k-element sets A \subset \{0,1,2,\ldots,N\}. I spent last summer explicitly characterizing the set \mathcal{R}(h,k) for large k, by constructing sets A such that |hA| achieves all sizes which I could not rule out as impossible. So, N(h,k) can be upper-bounded by optimizing my constructions.

I constructed these sets A by combining smaller component sets which are simpler to analyze. Some of these components are the geometric series

for various values of 2 \leq m \leq h and 2 \leq \ell \leq k. Unfortunately, the elements of S and T are exponentially large in terms of k. So, I asked ChatGPT (through Tim) whether there exist sets of \ell elements which have similar sumset sizes to these geometric series, but contain only numbers of polynomial size in \ell: I had no idea if this was possible, or how to begin constructing such sets. ChatGPT came back with an answer, constructing sets G and H which behave like “half a geometric series squeezed into a polynomial interval,” which is counterintuitive. Before I discuss the construction of G and H, I will explain the important properties of the sumset sizes of S and T which they recreate.

For h > 0, a set A is called a B_h set if the only solutions to

with x_i,y_i in A are the “trivial” solutions, by which I mean that one side of the equation is a reordering of the other side. If A is a B_h set of size \ell, then elements of hA correspond exactly to choices of h elements of A, with repetition allowed. Using “stars and bars,” one can see that |hA| = \binom{h+\ell - 1}{h} and this is the maximum possible value of |hA| among sets of size \ell. So, another definition is that A is a B_h set if |hA| = \binom{h+|A| - 1}{h}. Sidon sets, which Tim discussed, are exactly B_2 sets.

To make things more concrete, let us assume that m = 4 in (1). Then, S is a B_3 set, but it is not a B_4 set because of the relations

for any choice of a in \{0,1,2,\ldots, \ell-3\}. In particular, \binom{\ell+3}{4} - |4S| = \ell-2, as these \ell-2 relations are the only ones preventing S from being a B_4 set. T lacks the relations in (2) because 0 is not in T. So, T is a B_4 set, but it is not a B_5 set because of the relations

for any choices of a \neq b in \{0,1,2,\ldots, \ell-2\}. This gives \binom{\ell-1}{2} relations, and one can check that \binom{\ell+4}{5} - |5T| = \binom{\ell-1}{2}. To summarize, we have seen that

(d) \binom{m+\ell}{m+1} - |(m+1)T| = \binom{\ell-1}{2} is a quadratic function of \ell.

ChatGPT was able to find sets G and H of \ell elements which satisfy (a)-(d), but whose elements all have polynomial size in \ell. The construction of G and H uses h^2-dissociated sets, which are sets A where the only solutions to

with s,s' \leq h^2 and x_i,y_i in A are the “trivial” solutions, i.e. s = s' and one side of the equation is a reordering of the other side. For r > 0, it is possible to construct an h^2-dissociated set U = \{u_1,\ldots,u_r\} \subseteq \{0,1,2,\ldots,N\}, where N is approximately r^{h^2}, and in particular polynomial in r. Constructions of such a U using finite fields date back to Singer (1938) and Bose–Chowla (1963) and are described in Appendix 1. Define

In hindsight, I have good intuition for the construction of G and H. All of the relations in (2) and (3) are formed by combining one or two relations of the form 4x = y. There are approximately \ell relations of the form mx = y in S and T, and approximately \ell/2 such relations in G and H. There are few other low-order relations in S and T, and similarly in G and H because U is h^2-dissociated. So, G and H manage to contain half as many mx = y-relations as their geometric series counterparts, while also containing few low-order relations.

We now see why (a)-(d) hold with S and T replaced by G and H, respectively. For concreteness, we assume that m = 4 and h>4, so U contains no nontrivial relations as in (4) with s,s' \leq 25 \leq h^2. Then, G is a B_3 set, but it is not a B_4 set because of the relations

for any choice of i in \{1,2,\ldots, r\}. If we let \ell = |G| = 2r+1, we can check that \binom{\ell + 3}{4} - |4G| = r = \frac{\ell-1}{2} is linear in \ell. In particular, (a) and (b) hold with S replaced by G, and the linear function \ell-2 replaced by \frac{\ell-1}{2}. We can also see that H is a B_4 set, but it is not a B_5 set because of the relations

for any i\neq j in \{1,2,\ldots, r\}. If we let \ell = |H| = 2r, we can check that \binom{\ell + 4}{5} - |5H| = \binom{r}{2} = \binom{\ell/2}{2} is quadratic in \ell. In a similar manner, (c) and (d) hold with T replaced by H, and the quadratic function \binom{\ell-1}{2} replaced by \binom{\ell/2}{2}.

Even though I can motivate it in retrospect, ChatGPT’s idea to use h^2-dissociated sets to control relations of order at most h feels quite ingenious. As far as I can tell, this idea is completely original.

ChatGPT’s proof that its construction produces the desired values of |hA| is very similar to my proof that the sets A which I construct achieve all possible values of |hA|, after replacing S and T by G and H, respectively. Properties (a)-(d) capture many of the important properties of S and T (or G and H) which are used in this proof. The final constructions involve combining the sets G and H (or S and T in my paper) for each value of m between 2 and h with another set which is the union of an arithmetic progression and a point. Intuitively, G and H (or S and T) have large sumsets, while arithmetic progressions have small sumsets, so it is plausible that one could get sets which achieve all the medium-sized sumsets by combining them. However, the proof of this is quite involved, and it occupies Section 4 of my paper and the entirety of the ChatGPT preprint. In Appendix 2, I work out the details of the ChatGPT construction to show that for k sufficiently large,

For comparison, it is easy to see that N(h,k) is at least on the order of k^{h}, and it is unknown what the real value is. In Appendix 3, I give details of the correspondence between my paper and the ChatGPT preprint, which will be helpful for those who want to read either.

Finally, I want to express my deep gratitude to Tim for allowing me to contribute to this blog. I am still stunned by the coincidence that the problem he chose to put into ChatGPT 5.5 Pro led him to my paper on the arXiv.

I would judge the level of the result that ChatGPT found in under two hours to be that of a perfectly reasonable chapter in a combinatorics PhD. It wouldn’t be considered an amazing result, since it leant very heavily on Isaac’s ideas, but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paper, looking for places where it might not be optimal, familiarizing oneself with various algebraic techniques that he used, and so on.

It seems to me that training beginning PhD students to do research, which has always been hard (unless one is lucky enough, as I have often been, to have a student who just seems to get it and therefore doesn’t need in any sense to be trained), has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

I would qualify that statement in two ways though. First, there is the obvious point that a beginning PhD student has the option of using LLMs. So the task is potentially easier than proving something that LLMs can’t prove: it is proving something in collaboration with LLMs that LLMs cannot manage on their own. I have done quite a lot of such collaboration recently and found that LLMs have made useful contributions without (yet) having game-changing ideas.

A second point is that I don’t know how much of what I have said generalizes to other areas of mathematics. Combinatorics tends to be quite focused on problems: you start with a question and you reason back from the question or if you reason forwards you do so very much with the question in mind. In other areas there can be much more of an emphasis on forwards reasoning: you start with a circle of ideas and see where it leads. To do it successfully, you need to have some way of discriminating between interesting observations and uninteresting ones, and it isn’t obvious to me what LLMs would be like at that.

Of course, everything I am saying concerns LLMs as they are right now. But they are developing so fast that it seems almost certain that my comments will go out of date in a matter of months. It is also almost certain that these developments will have a profoundly disruptive effect on how we go about mathematical research, and especially on how we introduce newcomers to it. Somebody starting a PhD next academic year will be finishing it in 2029 at the earliest, and my guess is that by then what it means to undertake research in mathematics will have changed out of all recognition.

I sometimes get emails from people who are interested in doing mathematical research but are not sure whether that makes sense any more as an aspiration. I have a view on that question, but it may very well change in response to further developments. That view is that there is still a great deal of value in struggling with a mathematics problem, but that the era where you could enjoy the thrill of having your name forever associated with a particular theorem or definition may well be close to its end. So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody. Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

So what is the point of struggling with a difficult mathematics problem? One answer is that it can be very satisfying to solve a problem even if the answer is already known, but I don’t think that is a sufficient reason to spend several years of your life on this peculiar activity. A better answer is that by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders, or people who have a solid grasp of how to do basic arithmetic are likely to be more skilled at using calculators (and especially at noticing when an answer feels off). Mathematics is a highly transferable skill, and that applies to research-level mathematics as well. By doing research in mathematics, you may not get the same rewards as your equivalents a generation ago, but there is a good chance that you will be equipping yourself very well for the world we are about to experience.

We will construct an h-dissociated set U = \{u_1,\ldots,u_r\} \subseteq \{0,1,2,\ldots,N\}, where N is approximately r^{h}. This construction is a very minor modification of Bose–Chowla (1963)’s construction of a B_h set, which I learned about from this paper. For whatever reason, the GPT preprint (Lemma 3.1) uses a different, less efficient construction using moment curves.

Let p > r be a prime, let N = p^{h+1}-2, let K be the finite field with p^{h+1} elements and fix a generator \theta of K^\times, so that K^\times is equal to \{\theta^0,\theta^1,\ldots, \theta^N\}. Define a set of p elements

Then, each element a \in U corresponds to a unique value of \tilde{a} \in \mathbb{F}_p, by taking \tilde{a} = \theta^a - \theta. Now an additive relation of the form in (4) with s,s' \leq h can be reframed by taking powers of \theta as

As K is a degree-h+1 extension of \mathbb{F}_{p} and \theta is a generator of $latex K$ as an $latex \mathbb{F}_{p}$-extension, this means that \theta does not satisfy any nonzero polynomials in $latex \mathbb{F}{p}[x]$ of degree \leq h. So, both sides of (6) are identical as polynomials in \mathbb{F}_{p}[\theta] and thus the additive relation in (4) is trivial. So, U is h-dissociated, and of course one can prune a few elements to reduce U to size r.

Fix constants \alpha,\beta,\gamma such that 0.5 < \beta\gamma < \beta < \alpha < 1 (in my paper I arbitrarily chose (\alpha,\beta,\gamma) = (0.9,0.8,0.7)). Let the two sets in (5) be called G_{m,r} and H_{m,r}. Let [a,b] denote the set of integers x satisfying a \leq x \leq b. Similarly to my paper, the constructions of A such that hA achieves the desired sizes will combine sets of the following four types:

One reason that this construction needs to be complicated is that we need to create at least \Omega(k^h) many sets. To do this, we vary 2h-4 parameters r_m and u_m in the domain [0,k^\alpha] and 2 parameters b and j in the domain [1,hk]. We can choose \alpha to be slightly bigger than 1/2, and then the above construction gives us O(k^{\alpha(2h-4)+ 2})=O(k^{h + \delta}) different sets where \delta >0 can be made arbitrarily small. So, if we were to remove any of the above parameters from the construction, and not change the others, this construction would no longer create \Omega(k^h) many sets. In comparison, Nathanson’s construction when h=2 only needs to create \Omega(k^2) sets. He does this by combining a Sidon set, an arithmetic progression, and one extra value, and varying the size of the arithmetic progression and the extra value in ranges of size O(k).

We want to combine q = 2h-2 sets A_1,\ldots,A_q, which are given by B_{j,b}, G_{m,r_m} for the h-2 values of m \in [3,h], H_{m,u_m} for the h-2 values of m \in [2,h-1], and a B_h set. By Appendix 1, for all r \leq k, there exists a h^2-dissociated set {u_{1},\ldots,u_{r}} of diameter M \leq r^{2h^2} \leq k^{2h^2}. By the constructions of G_{m,r_m} and H_{m,u_m}, we can take each A_i \subseteq [0,M], where M \leq hk^{2h^2}. Let \mathbb{Z}^{2q} have basis vectors e_1,\ldots,e_{2q}. To combine A_1,\ldots,A_q, we can define A \subseteq \mathbb{Z}^{2q} as

Similarly to my Lemma 4.9, this construction ensures that the generating function product $latex \mathcal{F}{A}(z) = \prod{i=1}^q \mathcal{F}_{A_i}(z)$ holds, which is the identity that both my paper and the GPT preprint use (see either paper for a definition of these generating functions). By (the standard) Lemma 2.3 of the GPT preprint, A is Freiman-isomorphic of order h to a subset of [0,2qM(2hM)^{2q-1}]. Therefore, for k sufficiently large (the whole construction relies on this for the same reasons as in my paper),

In Section 4.2 of my paper, I use a different, simpler construction to construct sets A achieving the values in \mathcal{R}(h,k) which have |hA| < \varepsilon k^h, for some small \varepsilon. These sets A are subsets of {0,1,2,\ldots,k^h}, meaning that all elements have polynomial size in k. This is observed in Section 5 of the GPT preprint.

Section 4.3 of my paper carries out the construction which combines many components including S and T. This corresponds to Sections 2, 3, 4, and 6 of the GPT preprint. This section has a lot of moving parts; I give an outline in Section 4.3.1.

In Section 4.3.2, I describe how the different components will be combined, using a construction which I call the disjoint union, and introduce generating functions \mathcal{F}_A(z) as a bookkeeping tool to keep track of the sumset sizes of a set A. This corresponds to Section 2 and Section 4 of the GPT preprint.

In Section 4.3.3, I compute the generating function of each of the component sets, including $latex \mathcal{F}S(z)$ (Lemma 4.15) and \mathcal{F}_T(z) (Lemma 4.17). This corresponds to Section 3 and Section 6.1 of the GPT preprint. In particular, $latex \mathcal{F}{G}(z)$ is computed in Lemma 3.3 and \mathcal{F}_{H}(z) is computed in Lemma 3.4. Once these generating functions have been computed, the remainder of the proof is almost identical in my paper and in the GPT preprint.

In Section 4.3.4, I put all the pieces together to show that as we range over the sets A which I have constructed, the values of |hA| will assume all of the elements of {\lceil\varepsilon k^h\rceil, \lceil\varepsilon k^h\rceil+1,\ldots ,\binom{h+k-1}{h} }. The key idea is to show that the set of all values of |hA| forms an interval, and contains numbers both smaller than \varepsilon k^h and equal to \binom{h+k-1}{h}.

Very interesting post, it will be fun to look back at it in 2029. To some of your points, there’s a famous Italian quote (but not so famous that chatgpt knows who said it): “Chi meglio combina meglio crea.” The literal translation is “Who better combines better creates.” Personally I don’t think there is anything “special” in human intelligence or insight, and like you suggest I feel that a very vast amount of results in math (but also literature etc.) are “banal” in the sense that they are basically a combination of known idea; they can be obtained by tediously trying one idea after the other in the “obvious way.” Papers (in math) are often written (and talks given) to give the opposite impression of phenomenal and inexplicable deus ex machina insight of the author, but in many (most) cases the ideas can be presented in a much more pednatic way (I think you expressed a similar view that ideas always come from somewhere, with the exception of Razborov’s ;-). LLMs obviously excel at this type of combination. Personally I don’t think anyone has a clear idea of the extent to which they will be able to produce without guidance research or art that we humans are interested in, and I am open to various scenarios. What seems clear is that being able to harness these tools is already a key factor. But so far, at the high level this is not very different than Google, or mathematical software. The ability to do quick searches online or use mathematical software has been a key advantage. I’ll add that I have often wondered how to define “banality.” In some sense Kolmogorov complexity seems relevant, if something has a short description given available data, it is banal. Time-bounded Kolmogorov complexity is a better idea. One issue is how to capture “available data.” Trained LLMs seem to give us just that.

The question of how best to introduce beginning PhD students to research in an LLM-era feels extremely important to think about. I want to highlight that, while it’s true in theory that such students have the option of using LLMs, the top models are currently quite expensive to get access to, and there are internal models at various companies to which only a select few have access. If one goes down the route of ‘PhD students are also allowed to use LLMs’, then it can quickly become a game of ‘which student has access to the best LLMs’, which seems to me extremely unfortunate. Is it an issue that can be gotten round on a global scale?

Sorry, I didn’t realise that was anonymous! Best wishes, Olof (Sisask)

This raises a very important issue that is relevant to all researchers, not just PhD students. Until now, unlike in most other sciences, to do research level math having access to expensive resources gave almost no advantage (except, of course, having prior access to a good education). That is gone now. I don’t know what will happen in the future, but at this moment the age of equality, in the communist sense, is sadly over in research math.

A quick comment to say that I’m having annoying compilation problems with LaTeX subscripts and superscripts, which have affected Isaac’s appendixes. I will try to sort them out soon, but if anyone has any idea what to do then that would be helpful.

My standard is to ask the AI (in particular ChatGPT): Write your finding/proof in texfile, also output as pdf. It works savely.

one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option.

I don’t see how this follows. If the student wants to learn, and if you as their advisor suggest it, they will refrain from using the LLM for such an exercise. This won’t produce an “equally original/publishable result” as it would have before, but it should in principle be just as educational as if the LLM didn’t exist. It doesn’t seem too different from how the student in the past would have refrained from asking you for detailed help with the same problem.

Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

I think this depends on whether that guidance was also a significant contribution. This will sometimes be hard to judge. But if the problem had been open and interesting, and the LLMs had been generally available, for awhile, that would be evidence in favor. It seems similar to one coauthor playing an important guiding role in a joint work with another one.

My view on this is really pessimistic. The way things progress, the value of thinking and having deep ideas seems to be lower and lower. Even before AI, institutions questioned if mathematics research was worth it. I wouldn’t recommend anyone to start a PhD now in pure maths.

You wrote “I understand that arXiv has a policy against accepting AI-written content, which makes good sense to me. So maybe there should be a different repository where AI-produced results can live.” You may find https://arxiv.org/abs/2604.16476 a step in this direction.

It’s sad, but really Mathematics is just at the leading edge of a wider phenomenon. We’re going to see similar questions raised for most intellectually fulfilling activities.

Dear Professor, Respectfully, it would be remarkable if this Chatgpt model were some evolutionary result of the (free) model I have consulted from time to time; it tells me today, when I submit a very brief source, rudimentary arithmetic, and ask for an evaluation of the conclusion, “The conclusion is unproven since, conditionally, the result of a CRT set may be smaller than one of the set of strictly positive base residues”, and sticks to its guns when challenged; the source a demonstration of an Archimedean obstruction which prevents the addition of some divisor th2 to the singleton CRT set under some divisor th1, a candidate “non Brauer Manin obstruction” (Katherine Stange); I have found a method which corrects in at least 6 cases the (free) LLMs’ mishandling of reductio arguments, but not Chatgpt’s; on the off chance you have the time / interest to put the source to this 5.5 Pro version, my email address is registered; Regards, Davide

Often, it helps to use different AIs in pingpong mode: AI 1 thinks to have proved something. Ask it to give output in texfile. This becomes input for AI 2 with the prompt: “Check this proof carefully for correctness. List all errors, gaps, and weaknesses. Ouput in tex file.” If this feedback claims to have found errors or gaps, ask AI 2 for a repair, or give ints answer file back to AI 1, asking: “Here is feed back to your proof attempt…” It works really very often in my research.

  • Unknown's avatar A recent experience with ChatGPT 5.5 Pro / 最近使用ChatGPT 5.5 Pro的体验 – OpenClawLog Says:
    May 9, 2026 at 6:00 am | Reply

    […] A recent experience with ChatGPT 5.5 Pro 🔥 12 […]

    […] https://gowers.wordpress.com/2026/05/08/a-recent-experience-with-chatgpt-5-5-pro/ […]

    I think you must have a typo after “It is an easy exercise to show that”, because it reads “if |A| = k, then 2k-1 <= |A|…” which isn’t even a true statement, let alone an easy exercise…

    Blog at WordPress.com.
    Entries (RSS) and Comments (RSS).

  • Поділитися

    Схожі новини