Like all big tech companies these days, Meta has its own flagship generative AI model called Llama. Llama is somewhat unique among the major models in that it is “open”. This means that developers can download and use it (with certain restrictions). This is in contrast to models like Anthropic’s Claude, Google’s Gemini, Xai’s Grok, and most of Openai’s ChatGPT models that are only accessible via API.
However, to provide developers with choice, Meta has partnered with vendors such as AWS, Google Cloud, and Microsoft Azure to make cloud-hosted versions of Llama available. Additionally, the company publishes tools, libraries, and recipes in the Llama Cookbook to help developers fine-tune, evaluate, and adapt their models. Newer generations like Llama 3 and Llama 4 expand these capabilities to include native multimodal support and broader cloud rollout.
Everything you need to know about Meta’s Llama, from its features and editions to where you can use it. We will continue to update this post as Meta releases upgrades and introduces new development tools to support the use of models.
What is a llama?
Llamas are a model family and not just one. The latest version is Llama 4. Released in April 2025, it includes three models.
Scout: 17 billion active parameters, 100 billion total parameters, and a context window of 10 million tokens. MAVERICK: 17 billion active parameters, 400 billion total parameters, and 1 million token context window. Behemoth: Not released yet, but has 288 billion active parameters and 2 trillion parameters.
(In data science, tokens are subdivided bits of raw data, such as “fan”, “TAS”, “TIC” for the word “fan”.
A model’s context, or context window, refers to the input data (e.g. text) that the model considers before producing output (e.g. additional text). Long contexts can prevent the model from “forgetting” the contents of recent documents and data, going off-topic, and extrapolating incorrectly. However, longer context windows resulted in a model that tended to “forget” certain safety guardrails and create content that was more in line with the conversation, leading some users to paranoid thinking.
For reference, Llama 4 Scout promises 10 million context windows, roughly equivalent to the text of about 80 average novels. Llama 4 Maverick’s 1 million context windows equals about 8 novels.
TechCrunch event
san francisco
|
October 27-29, 2025
According to Meta, all Llama 4 models were trained on “broad visual understanding” and “large amounts of invalid text, image, and video data” in 200 languages.
Rama 4 Scout and Maverick are Meta’s first open weight multimodal models. They are built using a “Mixed Experts” (MOE) architecture, which reduces computational load and improves training and inference efficiency. For example, Scout has 16 specialists and Maverick has 128 specialists.
Llama 4 Behemoth includes 16 experts, which Meta calls small-scale model teachers.
Llama 4 is based on the Llama 3 series. This includes the widely used 3.1 and 3.2 models for instruction-tuned applications and cloud deployments.
What can a llama do?
Like other generative AI models, Llama can perform a variety of assistance tasks, such as coding and answering basic math questions, and can summarize documents in at least 12 languages (Arabic, English, French, Indonesian, Italian, Portuguese, Hindi, Spanish, Tagalog, Thai, and Vietnamese). Most text-based workloads – think analyzing large files such as PDFs and spreadsheets – are within that range, and all Llama 4 models support text, image, and video input.
Llama 4 Scout is designed for longer workflows and large-scale data analysis. Maverick is a generalist model that does a good job of balancing reasoning power and response speed, making it suitable for coding, chatbots, and technical assistants. Behemoth is designed for advanced research, model distillation, and STEM tasks.
Llama models, including Llama 3.1, can be configured to leverage third-party applications, tools, and APIs to perform tasks. They are trained to use brave searches to answer questions about recent events. Wolfram Alpha API for math and science related queries. Python interpreter for validating your code. However, these tools require proper configuration and are not automatically enabled out of the box.
Where can I use llama?
If you simply want to chat with Llama, we are powering the Meta AI chatbot experience on Facebook Messenger, Whatsapp, Instagram, Oculus, and Meta.ai in 40 countries. A fine-tuned version of Llama is used in Meta AI experiences in over 200 countries and territories.
Llama 4 Models Scout and Maverick are available at Llama.com and Meta partners, including the AI Developer Platform. Behemoth is still in training. Developers building with Llama can download, use, or tweak models on most popular cloud platforms. Meta claims to have over 25 partners hosting Llama, including Nvidia, Databricks, Groq, Dell, and Snowflake. And while “selling access” to Meta’s publicly available models is not Meta’s business model, it does make money through revenue-sharing agreements with model hosts.
Some of these partners are building additional tools and services on top of Llama. This includes tools that allow models to reference their own data and run with lower latency.
Importantly, the Llama license constrains how developers can deploy models. App developers with more than 700 million monthly users must request a special license from Meta, which the company will grant at its discretion.
In May 2025, META launched a new program to encourage startups to adopt the Llama model. Llama for Startups gives businesses access to support and potential funding from Meta’s Llama team.
Alongside Llama, Meta provides tools aimed at making the models you use “safer”.
Lamaguard, a moderation framework. Cyberseceval, a cybersecurity risk assessment suite. Llama Firewall, a security guardrail designed to enable building secure AI systems. Code Shield, which supports inference time filtering of unstable code generated by LLMS.
Llama Guard attempts to detect potentially problematic content sourced or produced by Llama models, including content related to criminal activity, child exploitation, copyright violations, hatred, self-harm, and sexual abuse.
That said, it’s clearly not a silver bullet, as Meta’s own previous guidelines allowed chatbots to engage in sensual and romantic chats with minors. Some reports indicate that they turned into sexual conversations. Developers can customize categories of blocked content and apply blocks to all languages supported by Llama.
Like Llama Guard, Prompt Guard can block text that is specific to Llama, but only text that is intended to “attack” the model and cause it to behave in an undesired way. In addition to prompts containing “injected input,” Meta claims that Llama Guard can defend against explicitly malicious prompts (i.e., jailbreaks that attempt to circumvent Lalama’s built-in safety filters). Llama Firewall works to detect and prevent risks such as rapid jetting, secure code, and risky tool interactions. Code Shield helps mitigate unsafe code suggestions and provides safe command execution for seven programming languages.
As for Cyberseceval, it is less a tool than a collection of benchmarks for measuring model security. Cyberseceval can assess (at least according to Meta’s standards) the risks that llama models pose to app developers and end users in areas such as “automated social engineering” and “scaling offensive cyber operations.”
Llama limits

Llama, like all generative AI models, has certain risks and limitations. For example, the latest models have multimodal capabilities, but they are currently primarily limited to English.
Zooming out, Mehta used a dataset of pirated e-books and articles to train a llama model. A federal judge recently sided with Mehta in a copyright lawsuit brought against the company by 13 book authors, ruling that the use of copyrighted works in training amounted to “fair use.” However, if Llama regurgitates a copyrighted snippet and someone uses it in a product, they could infringe copyright and be held liable.
Meta also trains its AI on Instagram and Facebook posts, photos, and captions, making it difficult for users to opt out.
Programming is another area where it’s wise to tread lightly when using Llama. That’s because Llama is likely to generate buggy or unstable code, perhaps more so than its generative AI counterpart. On LiveCodeBench, a benchmark that tests AI models on competitive coding problems, Meta’s Llama 4 Maverick model achieved a score of 40%. This compares to 85% for Openai’s GPT-5 Advanced and 83% for Xai’s Grok 4 Fast.
As always, it’s best to have human experts review AI-generated code before incorporating it into your service or software.
Finally, like other AI models, Llama models are still guilty of producing plausible but false or misleading information, whether in coding, legal guidance, or emotional conversations with AI personas.
This was originally published on September 8, 2024 and will be updated regularly with new information.