Editing Openai/693d13a2-0638-8007-9b25-4cd446434f52 (section)

=== Assistant: Because the failure is pattern-specific, not a general problem with doubled letters. ===
Because the failure is pattern-specific, not a general problem with doubled letters.

Here is the precise distinction.

==== The model does not treat all repeated characters the same way. Some letter sequences are learned as structural units, others are not. ====
* “ll” and “oo” occur extremely frequently across many word positions and many word types.
* “-rry” occurs far less frequently and is strongly associated with a word-ending morpheme.

As a result:
* ll / oo → learned as two separate identical characters
* rry → learned as a compressed suffix pattern

So in heuristic mode:
* balloon is internally closer to b a l l o o n
* strawberry / cranberry are closer to straw + berry, where “rry” is chunked

This is not phonetic and not auditory. It is distributional compression from training data.

==== The sequence -berry is extremely common: ====
* strawberry
* cranberry
* blueberry
* blackberry
* raspberry
* boysenberry

During training, the model repeatedly sees “berry” as a semantic and lexical unit, not as five independent characters.

When answering off-the-cuff:
* The model retrieves “berry” as a known suffix
* Inside that suffix, attention is biased toward semantic identity, not character count
* The internal representation does not insist on resolving rr → r + r

This is why the error tends to be:

: 

==== balloon does not have: ====
* a productive suffix
* a semantic morpheme boundary
* a strongly chunked ending

There is no English concept “loon” functioning as a suffix the way “berry” does.

So the model does not compress:
* ll → still seen as two
* oo → still seen as two

The internal representation stays closer to raw character structure.

==== In heuristic mode: ====
* The model prioritizes lexical familiarity and meaning
* Character-level integrity is optional

In explicit enumeration mode:
* The model must maintain symbol-level consistency
* Chunking shortcuts are disabled
* Each repeated character must independently justify itself

That is why the error disappears when forced to enumerate.

==== This is not: ====
* randomness
* hearing sounds
* general difficulty with double letters

It is:

: 

“rry” trips the model because it is part of a highly overlearned morpheme, whereas “ll” and “oo” in balloon are not.

If you want to probe this further, words ending in:
* -berry
* -letter
* -mirror
* -error
* -occurred

are disproportionately likely to expose the same heuristic failure unless enumeration is forced.