Qwen represents Alibaba Cloud's foundational AI model initiative. Launched initially in April 2023, Qwen isn't a single model but a comprehensive suite of LLMs designed to tackle a diverse range of tasks, from natural language understanding and generation to code creation and even processing visual and auditory information.
Built upon the proven transformer architecture, the majority of Qwen models are released under the permissive Apache 2.0 license, readily available on platforms like Hugging Face. This open approach allows us to experiment, customize, and deploy these models locally or within our own infrastructure, complementing the API access provided via Alibaba Cloud.
We are all familiar with models like GPT, Claude, and Gemini, but Qwen looks like an underrated gem in the LLM landscape. So let's take a look at it.
Qwen2.5
- the foundationAt the heart of the family lies Qwen2.5, a series of powerful and efficient text-based LLMs. It’s a dense, decoder-only transformer-based LLM with improved capabilities over Qwen2
. These serve as the foundation for many specialized models and are excellent general-purpose tools.
Here are some key features of Qwen2.5
:
Qwen2.5
boasts strong capabilities in 29+ languages, including English, Chinese, French, Spanish, Arabic, and many others.Qwen2.5
can be used with agent frameworks, follow instructions and generate structured outputs, particularly JSON
.Coder
and Math
- specialized expertsBuilding on the Qwen2.5
foundation, Alibaba has released specialized models fine-tuned for coding and
mathematics.
Below you can find key details about these models.
Qwen2.5-Coder
Qwen2.5-Math
Qwen2.5
), but it’s optimized for the two above.TIR
). The model can decide to invoke external tools (like a calculator or symbolic solver) during its reasoning process, incorporating the results to improve accuracy.VL
and Omni
- multimodalitiesQwen extends beyond text, offering models that can understand and interact with visual and auditory information.
Qwen2.5-VL
: understanding images and videosQwen2.5-VL
is a vision-language model series that extends Qwen2.5
with visual understanding, enabling image and video comprehension alongside text generation. This model excels at analyzing visual content and describing or reasoning about it in text form.
JSON
) for objects.VQA
), Optical Character Recognition (OCR
), document/chart analysis, video content analysis, multimedia chatbots.Qwen2.5-VL
is an open-weight model, but there are also two proprietary models available via Alibaba API -
qwen-vl-max
(enhanced capabilities of visual reasoning and instruction following compared with qwen-vl-plus. Best for complex tasks.) and qwen-vl-plus
(Enhanced detail and text recognition capabilities, supporting images with over one million pixel resolution and any aspect ratio. Exceptional performance for various visual tasks.)
Qwen2.5-Omni
: real-time multimodal interactionQwen2.5-Omni
is Qwen’s superior end-to-end multimodal model
, capable of perceiving and generating across text, vision, and audio modalities in real time. It introduces a novel Thinker-Talker
architecture for simultaneous understanding and response generation.
Qwen2.5-7B
backbone.You can check out this demo to see Qwen2.5-Omni
it in action.
QVQ
and QwQ
- advanced reasoning modelsBeyond standard multimodal capabilities, Qwen offers models specifically enhanced for complex reasoning, both visually and textually.
QVQ-Preview
and QVQ-Max
): deep visual reasoningQVQ is a vision-language model series focused on Visual Question Answering and reasoning with visual evidence. It builds upon Qwen2.5-VL
but emphasizes reasoning steps (thinking
) about images and videos. The initial release was QVQ-72B-Preview, demonstrating the concept of a model that can not only describe an image but also reason about it to solve complex tasks.
QVQ-Max
is the successor to the QVQ-Preview
and is accessible via API only.
Qwen2.5-VL
. QVQ-Max
employs optimizations like MoE
for enhanced scalability and efficiency.Qwen2.5-VL
if needed.Here is another demo prepared by Qwen team, showing the capabilities of QVQ-Max
QwQ
: reinforced textual reasoningQwQ
(Qwen with Questions) is a specialized model in the Qwen family focusing on improving reasoning via reinforcement learning. Based on the Qwen2.5 32B
model, QwQ
underwent intensive training (including multi-stage RL
) to enhance its performance on challenging reasoning tasks across domains like math and coding. The result is a model that can tackle complex questions with deeper thinking and better accuracy than the base model.
Qwen2.5-32B
.Qwen2.5-32B
.Qwen2.5-Max
- the flagshipQwen2.5-Max
is the large-scale Mixture-of-Experts (MoE) version of Qwen, representing Alibaba’s most advanced LLM in the Qwen 2.5 generation. It scales the model capacity dramatically (hundreds of billions of parameters) while using experts to keep inference efficient. Qwen2.5-Max
is positioned to compete with top-tier models like GPT-4-class systems in capability.
Qwen-Plus
and Qwen-Turbo
On the Alibaba Cloud Model Studio we can find two more flagship models - Qwen-Plus
and Qwen-Turbo
. There is not much information available about them, but they are positioned as lighter and faster variants of the Qwen2.5-Max
model, and both are available via API only. You can find them on OpenRouter as well: Qwen-Plus, Qwen-Turbo.
Here is some key available information about them:
Qwen-Plus
Qwen2.5
base modelQwen-Turbo
Qwen2.5
base model.This table summarizes the key characteristics of the main Qwen models discussed (without the Qwen-Plus
and Qwen-Turbo
models):
Model | Parameter sizes | Primary modality | Context window | Specialization | License | Multilingual |
---|---|---|---|---|---|---|
Qwen2.5 (Base) | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Text | 128k (Input) / 8k (Output) | Foundational, General Purpose | Apache 2.0 | 29+ Languages |
Qwen2.5-Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Text (Code) | ~Tens of Thousands | Code Generation & Assistance | Apache 2.0 | Many Prog. Langs + Eng/Chi |
Qwen2.5-Math | 1.5B, 7B, 72B | Text (Math) | ~128k | Mathematical Reasoning, Tool Use (TIR) | Apache 2.0 | English & Chinese |
Qwen2.5-VL | 3B, 7B, 32B, 72B | Image/Video -> Text | Long Video + Large Text | Vision-Language Understanding | Apache 2.0 | Yes (Eng/Chi focused) |
Qwen2.5-Omni | 7B | Text/Image/Audio/Video -> Text/Speech | Streaming / Real-time | End-to-End Multimodal Interaction | Apache 2.0 | Yes (Speech Eng/Chi) |
QVQ-Max | ~72B | Image/Video -> Text (w/ Reasoning) | Extended Visual & Text | Deep Visual Reasoning (CoT) | Proprietary API | Yes (Eng/Chi focused) |
QwQ | 32B | Text (Reasoning) | ~32k | Reinforced Reasoning (Math/Logic/Code) | Apache 2.0 | Yes (Eng/Chi) |
Qwen2.5-Max | ~325B (MoE) | Text | 32k | Flagship Scale & Performance (MoE) | Proprietary API | Yes (Broad) |
(Note: Context window sizes can sometimes vary based on specific implementation or fine-tuning. The table provides typical or maximum advertised values.)
Qwen Chat is an UI chat interface for the Qwen family of models. Chat allows users to interact with the models in a conversational manner, making it easy to test and explore their capabilities for free.
If you want to try out the models locally, you can find them on Hugging Face and Ollama.
The Qwen family from Alibaba Cloud presents a compelling and versatile suite of large language models. With a strong emphasis on open-source releases for many core and specialized models, we have access to really powerful AI capabilities. From the foundational Qwen2.5
suitable for general tasks to specialized Coder
and Math
variants, advanced multimodal Omni
and VL
models, deep reasoning QVQ
and QwQ
models, and the enterprise-scale Qwen2.5-Max
, there is likely a Qwen model well-suited for your application needs 🤞