Qwen AI: Constraints, Nuance, and Toppling Statues

I’m not sure this newsletter has an overarching theme outside of writing about what I find interesting. I try to thread the needle between writing (and everything that comes with long-form book creation), a certain baseball team in a rough spot (Cardinal fans, it was a tough season), and inspiration mixed with analytics (stories exist in data). Technology serves as an underlying thread since the industry moves at a brisk pace—quite an understatement.

This year, with articles of the year all over the place, I tried to pinpoint a single remarkable story. But the one ring to rule them all doesn’t exist. I suppose there are numerous honorable mentions: the tech gods bowing at Mar-a-Lago, drone impact in Ukraine, Syria, and New Jersey, rapid job displacement despite impressive earnings, the rise of Bitcoin, laptop gains (the new ARM chips are real), quantum, and the continued rise of open source (despite headaches).

Yet, the tagline of the JPLA newsletter is that our words do matter—no matter how they are created. So, for me, it’s the year of large language models (LLMs). Note, I didn’t say AI or GENAI. I’m describing the technology at face value—a statistical model that predicts answers based on certain inputs. That’s what it is. It’s not your friend or family or professor. One can pretend. But it’s just math that sometimes adds wrong and makes mistakes.

And I suspect this is not surprising to readers, at least those who have been with me for years. Looking at previous threads, I did build an application detecting challenges in fictional text. Also, I tried to create a Python writing alternative of myself. Both projects use various LLMs. They show promise but don’t quite reach the finish line.

Instead of simply saying LLMs are the story, I want to get a bit more granular because it’s been a ride. We’ve seen video models. And image models. And code-writing models. And the list goes on… So I found it fascinating that Sam Altman, founder of OpenAI, made the case that the path to AGI is “basically here,” especially considering we’re not done modeling yet. Or finding the next killer use case to change our lives.

Overfitting and Lessons from StoryMaster

So why is he saying this? Good old-fashioned PR? Yes and no, the current LLM paradigm might possess a plateauing effect. Two years ago, I tinkered with what I dubbed StoryMaster—my own model using my corpsus. Unfortunately, my version didn’t achieve peak performance because I lacked data. Nor did I think about grifting the internets’ (or transcribing YouTube videos) and running it through a learning process. Yet, I tried, iterating on dozens of versions with Pytorch. Sometimes, I’d massage the data. Or use headers. Or change the order.

And my approach, screwing around, required weeks of training based on hardware limitations.

If you’re curious, one doesn’t need hosts of NVIDIA GPUs lying around. I ran my training on aging machinery, sometimes, dare I say it, using CPUs Vs. GPUs. Yes, time wasn’t a friend—it took forever. But the feat can be done.

Here, there is a concept of overfitting. It’s fairly simple, the longer the machine trains there comes a point where it regresses. The same happens with people too. Sometimes, you just need a walk to clear the head. But if you overtrain, the model can perform worse. Or, with people, you give up or tear the ligaments in your foot. If you follow OpenAI, Anthropic, or any competing service on Reddit, there are hundreds of comments that the AI is getting worse. It’s true due to any number of factors—including efficiency, time of day being used, or whether the model’s performance declined.

That doesn’t mean Sam Altman and others aren’t trying new approaches. They are with agents, thinking, and synthetic data.

While this is happening, the open-source world is catching up. Meta’s models are stellar. Microsoft dropped a game changer earlier in the year. Amazon recently launched Nova, which is altering the cost paradigm. A new story each week.

Qwen’s Emergence

My article of the year is none of these achievements. A Co-Founder at Hugging Face, the company that hosts many open-source models, recently highlighted that the most downloaded model didn’t come from Meta or Microsoft or Mistral or Stability. The model is Qwen. It was developed by Alibaba. And maybe, it wasn’t supposed to be this good.

Why?

For one, it’s a Chinese company.

These aren’t businesses—they’re instruments, with certain feature sets. Western folks often think of these entities as businesses because they shimmer with a polish of innovation. But our thinking, our culture, creates a certain blind spot because upon closer inspection the actions are highly influenced by the Communist Party. US automakers are starting to realize this. In China, to build a factory, a partnership has to exist with a local entity. Or, cough, the government. It’s a technology trade.

Today, US automakers are struggling, the country has their own companies, and the market’s promise isn’t what it used to be. What happens now?

And two, the US government banned Chinese access to high-quality NVIDIA chips.

When lacking industry-leading GPUs, Alibaba found a way. They worked around the restriction, devising means around export controls and leveraging chips within country. Coupled with optimization techniques like quantization (shrinking the numbers so the computer doesn’t break a sweat), they crafted an efficient model.

Like my inferior series of computers and cheap cloud storage, training Qwen might’ve taken longer, but they made it work without throwing billions upon billions into the latest GPUs.

Censorship Dilemma

Benchmarking shows it’s an impressive model. Qwen demonstrates competitive performance relative to proprietary models when it comes to language understanding, multilingual proficiency, coding, mathematics, and reasoning. It’s a Kentucky thoroughbred. And in my own experiments for this article, it runs well.

Yet, there are strings.

The model is like a student who excels at everything the curriculum asks of them—until you ask a question that veers off the syllabus. It’s fluent, confident, and articulate, but when the conversation touches on subjects like Tiananmen Square or Hong Kong protests, it falters. Not because it lacks intelligence, but because it has learned that some doors are not worth taking.¹

China’s regulatory environment is pushing an agenda, forcing enterprises and even multinational companies to embrace Qwen. Want to use AI in your product or service? Then, use a sanctioned model. Or brace for regulatory headaches.

When Data Changes and Statues Topple

But scrubbing the historical record isn’t a new phenomenon in human history. When Iraq fell, statues of Saddam toppled. The same happened in Syria, slow but then all at once.

And let’s not think Western countries don’t have similar challenges. For now, US models respond with the following when asked:

On January 6, 2021, a violent attack occurred at the United States Capitol in Washington, D.C., as a joint session of Congress convened to certify the results of the 2020 presidential election, which declared Joe Biden the winner.

And I believe that’s the answer. Like many, I watched this cluster of tragic proportions unfold on television. Changing the record doesn’t do anyone favors—it only creates terrible cycles and patterns. We should know the facts, the record. We should know the story of tankman. We should even know that NASA landed on the moon.

And the list goes on. Note, that doesn’t means we shouldn’t argue or debate current events.

As for China, the government’s regulations will eventually make using foreign models challenging. With Qwen being open-sourced, the push is global. It’s not just a Chinese play; it’s a subtle nudge to Western companies, saying, “Hey, use me.”

For now, much of the rapid adoption is country-specific. But it may not be forever. We’ve seen this play out in sports entertainment, the tech industry, and manufacturing. I mean, who wants to build applications using multiple models? A support cost does exist.

Ultimately, remember our choices matter—and the technologies we adopt shape the stories we tell.

Runner-Up Thoughts:

Pandas! I beat the NY Times to the punch, but they got there. Those darned bears sure are cute.
I thought social media was going to turn into the next evolution of open-source services using social protocols. Meta lags. X has become less open. Blue Sky adoption came out of nowhere (didn’t see their growth coming). Yes, this space will continue to evolve, but technical fragmentation is apparently the next wave.
The Google Anti-Trust case. Wait and see.
3D Printing, Ghost guns and quick rebuilds.
Tracking tipping using Uber data.
Linux on the rise.
Open-source drama. And thoughts.
Personal productivity systems are having a moment. See Notion.

Footnotes

On censorship of Qwen, here is a solid write-up/review on limitations. I’d post by own findings but this is far better than anything I could do.↩︎