Testing LLMs on The “Strawberry” Challenge
As an AI enthusiast, part time developer and replit agents, I recently embarked on a quirky yet fascinating project to test various Large Language Models (LLMs) from Straico. The question at hand? “How many R's are in strawberry?” While this may seem like a trivial inquiry, it serves as an excellent litmus test for the accuracy and consistency of these advanced AI models.
The Strawberry Test
The premise is straightforward – ask different LLMs “How many r's are in strawberry?” and compare their responses. While humans can easily determine there are 3 r's, many AI models struggle with this basic counting task. (You can try yourself at the bottom)
Why This Matters
This simple test reveals important insights about how LLMs process language:
- Tokenization limitations: Most LLMs break words into “tokens” rather than individual letters, which can interfere with character-level analysis.
- Lack of true understanding: Despite impressive language generation capabilities, LLMs don't truly “understand” words the way humans do. Some answers were very bad!
- Overconfidence in incorrect answers: Many models state incorrect answers with high confidence, highlighting the need for caution when relying on AI outputs.
- Inconsistency across models: Different LLMs may give varying answers, demonstrating the current lack of reliability for such tasks.
Analysis of Results
I fed the input back in to gpt4o mini and it came up with this.
1. Overall Accuracy: The majority of responses incorrectly state that there are either two or one “r” in the word “strawberry.” However, there are indeed three occurrences of the letter “r” in the word. Only a few models correctly identified the count as three.
2. Inconsistencies: – **Incorrect Counts:** Several models, including Claude 3 Sonnet, Claude 3.5 Sonnet, Cohere Command R+, Meta Llama 3 Models, and others, claimed there are only two or one “r” in the word “strawberry.” This is fundamentally incorrect. – **Varied Reasoning:** Some models presented convoluted explanations for the miscount, whereas others simply miscounted without elaboration.
3. Interesting Observations: – **Wordplay Recognition:** Google Gemini Pro 1.5 acknowledged the question as a classic word trick but ultimately provided an incorrect answer. – **Response Diversity:** The confusion appears to stem from both straightforward miscounting and more complex reasoning attempts to justify inaccuracies, with models using different approaches to explain their reasoning.
4. Best Performing Models: The best performing models in terms of correctness were: – Claude 3 Haiku – Perplexity: Llama 3.1 Sonar 405B Online – Other Perplexity models provided the correct answer as well.
5. Areas for Improvement: Models need to focus on fundamental counting tasks more accurately, ensuring that they can reliably produce correct counts for simple word puzzles like this. Enhanced verification processes for counting specific letters could also help mitigate common errors. Further development of intuitive recognition of basic lexical structures may aid in avoiding confusion in future responses.
Raw responses.
Anthropic: Claude 3 Haiku Pricing:1 coins per 100 words
Query Result:There are 4 ‘r' letters in the word “strawberry”.
Anthropic: Claude 3 Opus Pricing:24 coins per 100 words
Query Result:There are 3 “r”s in the word “strawberry”.
Anthropic: Claude 3 Sonnet Pricing:5 coins per 100 words
Query Result:There are 2 r's in the word ‘strawberry'.
Anthropic: Claude 3.5 Sonnet Pricing:5 coins per 100 words
Query Result:There is only one “r” in the word “strawberry.” The correct spelling is: S-T-R-A-W-B-E-R-R-Y So, to answer your question directly: There is 1 “r” in “strawberry.”
Cohere: Command R (08-2024) Pricing:0.2 coins per 100 words
Query Result:There is only one ‘r' in the word ‘strawberry'.
Cohere: Command R+ (08-2024) Pricing:3.4 coins per 100 words
Query Result:There are two r's in the word “strawberry”.
Dolphin 2.6 Mixtral 8x7B Pricing:1 coins per 100 words
Query Result:Strawberry is a fruit, not a currency. Please specify the currency you are referring to.
Goliath 120B Pricing:5 coins per 100 words
Query Result:There are 2 Rs in strawberry: S t r a w b e r r y.
Google: Gemini Pro 1.5 Pricing:3 coins per 100 words
Query Result:This is a bit of a trick question! There is only **one** “r” in the word “strawberry”.
Google: Gemma 2 27B Pricing:0.4 coins per 100 words
Query Result:Let's count them! The word “strawberry” has **two** “r”s. Let me know if you'd like to play another word game! 🍓
Gryphe: MythoMax L2 13B 8k Pricing:1 coins per 100 words
Query Result:There are two rs in strawberry – “s” and “r”.
Meta: Llama 3 70B Instruct (nitro) Pricing:1 coins per 100 words
Query Result:That's a clever question! There are 3 R's in the word “strawberry”.
Meta: Llama 3 8B Instruct Pricing:0.5 coins per 100 words
Query Result:That's a clever question! The word “strawberry” contains 2 Rs.
Meta: Llama 3.1 405B Instruct Pricing:1.6 coins per 100 words
Query Result:There are 2 Rs in “strawberry”.
Meta: Llama 3.1 70B Instruct Pricing:0.7 coins per 100 words
Query Result:In the word “strawberry”, there are 2 Rs.
Mistral: Codestral Mamba Pricing:0.2 coins per 100 words
Query Result:Strawberry is a fruit, it does not contain ‘r's. Strawberry is composed primarily of water and carbohydrates, such as fructose, glucose, and sucrose. It also contains vitamins, minerals, and other nutrients. There are no ‘r's in strawberry.
Mistral: Large Pricing:3 coins per 100 words
Query Result:In the word “strawberry,” there are two instances of the letter “r.” Here they are, highlighted: strawberry
Mistral: Mixtral 8x7B Pricing:1 coins per 100 words
Query Result:A strawberry is a type of fruit and does not contain any letters, including the letter “R.” The word “strawberry” is made up of 9 letters, but the word itself does not have any “rs” in it. If you have any other questions, I'd be happy to try to help! Is there something specific you would like to know?
Nous: Hermes 3 405B Instruct Pricing:0.3 coins per 100 words
Query Result:There are two “r” letters in the word “strawberry”. STRaWBeRRY The first “r” is in the middle of the word, while the second “r” is near the end.
OpenAI: GPT-3.5 Turbo 16k Pricing:1 coins per 100 words
Query Result:There are 3 “r”s in strawberry.
OpenAI: GPT-4 Pricing:20 coins per 100 words
Query Result:2
OpenAI: GPT-4 Turbo 128k Pricing:8 coins per 100 words
Query Result:The word “strawberry” contains two ‘r's.
OpenAI: GPT-4 Vision Pricing:10 coins per 100 words
Query Result:There are two ‘r's in strawberry.
OpenAI: GPT-4o – New (Aug-06) Pricing:3 coins per 100 words
Query Result:The word “strawberry” contains two “r” letters.
OpenAI: GPT-4o – Old Pricing:4 coins per 100 words
Query Result:The word “strawberry” contains two instances of the letter “r.”
OpenAI: GPT-4o mini Pricing:0.4 coins per 100 words
Query Result:The word “strawberry” contains three instances of the letter “r.”
Perplexity: Llama 3.1 Sonar 405B Online Pricing:2.7 coins per 100 words
Query Result:The word “strawberry” contains 3 ‘r' characters. This is confirmed by breaking down the word letter by letter: – **S** (1st letter) – **T** (2nd letter) – **R** (3rd letter) – **A** (4th letter) – **W** (5th letter) – **B** (6th letter) – **E** (7th letter) – **R** (8th letter) – **R** (9th letter) – **Y** (10th letter) Thus, there are indeed 3 occurrences of the letter ‘r' in “strawberry.” This is also acknowledged in various discussions and analyses found in the sources provided.
Perplexity: Llama3 Sonar 70B Online Pricing:1 coins per 100 words
Query Result:There are three “R”s in the word “strawberry.”
Perplexity: Llama3 Sonar 8B Online Pricing:1 coins per 100 words
Query Result:The correct answer is three. The letter “R” appears three times in the word “strawberry” at positions 2, 7, and 8.
Qwen 2 72B Instruct Pricing:0.5 coins per 100 words
Query Result:The word “strawberry” contains 1 “r”.
Reflection 70B Pricing:0.2 coins per 100 words
Query Result:Let's count the rs in “strawberry”: S-T-R-A-W-B-E-R-R-Y 1. First r is in the “str” combination 2. There's no second r 3. There's no third r 4. Fourth letter is r 5. Fifth letter is also r Therefore, there are 3 rs in “strawberry”.
Try yourself.
You can try yourself below. Just press get models and then analyse results. I will make the app smoother in future but I ran out of agent credits and they don't reset until next month.
Ben has a BEng (Hons) in Computer Science and 20 years of experience in online marketing, specialising in SEO, lead generation and affiliate marketing. After spending over a decade as an igaming affiliate, he has decided to concentrate on GA4 training and SEO Audits.