Kamil Józwik
LLM model logo

Qwen

Qwen represents Alibaba Cloud's foundational AI model initiative. Launched initially in April 2023, Qwen isn't a single model but a comprehensive suite of LLMs designed to tackle a diverse range of tasks, from natural language understanding and generation to code creation and even processing visual and auditory information.

Built upon the proven transformer architecture, the majority of Qwen models are released under the permissive Apache 2.0 license, readily available on platforms like Hugging Face. This open approach allows us to experiment, customize, and deploy these models locally or within our own infrastructure, complementing the API access provided via Alibaba Cloud.

We are all familiar with models like GPT, Claude, and Gemini, but Qwen looks like an underrated gem in the LLM landscape. So let's take a look at it.

Qwen2.5 - the foundation

At the heart of the family lies Qwen2.5, a series of powerful and efficient text-based LLMs. It’s a dense, decoder-only transformer-based LLM with improved capabilities over Qwen2. These serve as the foundation for many specialized models and are excellent general-purpose tools.

Here are some key features of Qwen2.5:

Coder and Math - specialized experts

Building on the Qwen2.5 foundation, Alibaba has released specialized models fine-tuned for coding and mathematics.

Below you can find key details about these models.

Qwen2.5-Coder

Qwen2.5-Math

VL and Omni - multimodalities

Qwen extends beyond text, offering models that can understand and interact with visual and auditory information.

Qwen2.5-VL: understanding images and videos

Qwen2.5-VL is a vision-language model series that extends Qwen2.5 with visual understanding, enabling image and video comprehension alongside text generation. This model excels at analyzing visual content and describing or reasoning about it in text form.

Qwen2.5-VL is an open-weight model, but there are also two proprietary models available via Alibaba API - qwen-vl-max (enhanced capabilities of visual reasoning and instruction following compared with qwen-vl-plus. Best for complex tasks.) and qwen-vl-plus (Enhanced detail and text recognition capabilities, supporting images with over one million pixel resolution and any aspect ratio. Exceptional performance for various visual tasks.)

Qwen2.5-Omni: real-time multimodal interaction

Qwen2.5-Omni is Qwen’s superior end-to-end multimodal model, capable of perceiving and generating across text, vision, and audio modalities in real time. It introduces a novel Thinker-Talker architecture for simultaneous understanding and response generation.

You can check out this demo to see Qwen2.5-Omni it in action.

QVQ and QwQ - advanced reasoning models

Beyond standard multimodal capabilities, Qwen offers models specifically enhanced for complex reasoning, both visually and textually.

QVQ series (QVQ-Preview and QVQ-Max): deep visual reasoning

QVQ is a vision-language model series focused on Visual Question Answering and reasoning with visual evidence. It builds upon Qwen2.5-VL but emphasizes reasoning steps (thinking) about images and videos. The initial release was QVQ-72B-Preview, demonstrating the concept of a model that can not only describe an image but also reason about it to solve complex tasks.

QVQ-Max is the successor to the QVQ-Preview and is accessible via API only.

Here is another demo prepared by Qwen team, showing the capabilities of QVQ-Max

QwQ: reinforced textual reasoning

QwQ (Qwen with Questions) is a specialized model in the Qwen family focusing on improving reasoning via reinforcement learning. Based on the Qwen2.5 32B model, QwQ underwent intensive training (including multi-stage RL) to enhance its performance on challenging reasoning tasks across domains like math and coding. The result is a model that can tackle complex questions with deeper thinking and better accuracy than the base model.

Qwen2.5-Max - the flagship

Qwen2.5-Max is the large-scale Mixture-of-Experts (MoE) version of Qwen, representing Alibaba’s most advanced LLM in the Qwen 2.5 generation. It scales the model capacity dramatically (hundreds of billions of parameters) while using experts to keep inference efficient. Qwen2.5-Max is positioned to compete with top-tier models like GPT-4-class systems in capability.

Qwen-Plus and Qwen-Turbo

On the Alibaba Cloud Model Studio we can find two more flagship models - Qwen-Plus and Qwen-Turbo. There is not much information available about them, but they are positioned as lighter and faster variants of the Qwen2.5-Max model, and both are available via API only. You can find them on OpenRouter as well: Qwen-Plus, Qwen-Turbo.

Here is some key available information about them:

Qwen-Plus

Qwen-Turbo

Qwen family at-a-glance

This table summarizes the key characteristics of the main Qwen models discussed (without the Qwen-Plus and Qwen-Turbo models):

ModelParameter sizesPrimary modalityContext windowSpecializationLicenseMultilingual
Qwen2.5 (Base)0.5B, 1.5B, 3B, 7B, 14B, 32B, 72BText128k (Input) / 8k (Output)Foundational, General PurposeApache 2.029+ Languages
Qwen2.5-Coder0.5B, 1.5B, 3B, 7B, 14B, 32BText (Code)~Tens of ThousandsCode Generation & AssistanceApache 2.0Many Prog. Langs + Eng/Chi
Qwen2.5-Math1.5B, 7B, 72BText (Math)~128kMathematical Reasoning, Tool Use (TIR)Apache 2.0English & Chinese
Qwen2.5-VL3B, 7B, 32B, 72BImage/Video -> TextLong Video + Large TextVision-Language UnderstandingApache 2.0Yes (Eng/Chi focused)
Qwen2.5-Omni7BText/Image/Audio/Video -> Text/SpeechStreaming / Real-timeEnd-to-End Multimodal InteractionApache 2.0Yes (Speech Eng/Chi)
QVQ-Max~72BImage/Video -> Text (w/ Reasoning)Extended Visual & TextDeep Visual Reasoning (CoT)Proprietary APIYes (Eng/Chi focused)
QwQ32BText (Reasoning)~32kReinforced Reasoning (Math/Logic/Code)Apache 2.0Yes (Eng/Chi)
Qwen2.5-Max~325B (MoE)Text32kFlagship Scale & Performance (MoE)Proprietary APIYes (Broad)

(Note: Context window sizes can sometimes vary based on specific implementation or fine-tuning. The table provides typical or maximum advertised values.)

Qwen chat

Qwen Chat is an UI chat interface for the Qwen family of models. Chat allows users to interact with the models in a conversational manner, making it easy to test and explore their capabilities for free.

If you want to try out the models locally, you can find them on Hugging Face and Ollama.

Conclusion

The Qwen family from Alibaba Cloud presents a compelling and versatile suite of large language models. With a strong emphasis on open-source releases for many core and specialized models, we have access to really powerful AI capabilities. From the foundational Qwen2.5 suitable for general tasks to specialized Coder and Math variants, advanced multimodal Omni and VL models, deep reasoning QVQ and QwQ models, and the enterprise-scale Qwen2.5-Max, there is likely a Qwen model well-suited for your application needs 🤞