Like all the major tech companies these days, Meta has its own flagship generation AI model called Llama. Llama is somewhat unique among the main models in that it is “open.” This means that developers can download it and use it (with certain limitations). This contrasts with models like Anthropic’s Claude, Google’s Gemini, Xai’s Grok, and most of Openai’s ChatGPT models that can only be accessed via APIs.
However, to provide developers with options, Meta has partnered with vendors such as AWS, Google Cloud, and Microsoft Azure to make the cloud-hosted version of Llama available. Additionally, the company publishes tools, libraries and recipes to the Llama Cookbook to help developers tweak, evaluate and adapt their models. New generations like the Llama 3 and Llama 4 have expanded these capabilities, including native multimodal support and a wider cloud rollout.
There’s everything you need to know about Meta’s llamas, from its features and editions to where it can be used. I will continue to update this post as Meta releases upgrades and introduces new development tools to support the use of the model.
What is a llama?
Llama is a model family and isn’t just one. The latest version is Llama 4. Released in April 2025 and includes three models.
Scout: Context window with 17 billion active parameters, 100 billion parameters total, and 10 million tokens. MAVERICK: Context window with 17 billion active parameters, total 400 billion parameters, and 1 million tokens. Behemoth: It’s not released yet, but it has 288 billion active parameters and 2 trillion parameters.
(In data science, a token is a subdivided bit of raw data such as “fan”, “TAS”, and “TIC” in the word “fan”.
The context of a model, or context window, refers to the input data (such as text) that the model considers before generating the output (e.g., additional text). In a long context, models can “forget” the content of recent documents and data, remove topics, and prevent accidentally extrapolating. However, the longer context windows have become a model that tends to “forget” certain safety guardrails and create content that is in line with the conversation, leading some users to paranoid thinking.
For reference, the 10 million context window promises by Llama 4 Scout is roughly equal to the text of about 80 average novels. Llama 4 Maverick’s million context window equals about eight novels.
TechCrunch Events
San Francisco
|
October 27th-29th, 2025
According to Meta, all Llama 4 models were trained on “a wide range of visual understanding” and “a large amount of invalid text, images and video data” in 200 languages.
The Lama 4 Scout and Maverick are Meta’s first open-weight multimodal model. They are built using a “mixed expert” (MOE) architecture, reducing computational load and improving training and inference efficiency. For example, Scouts have 16 experts and Maverick has 128 experts.
The Llama 4 Behemoth includes 16 experts, and Meta calls it a small model teacher.
The Llama 4 is based on the Llama 3 series. This includes the 3.1 and 3.2 models widely used for instruction-tuned applications and cloud deployments.
What can the llama do?
Like other generator AI models, Llama can perform a variety of supporting tasks, such as coding and answering basic mathematical questions, and can summarize documents in at least 12 languages (Arabic, English, French, Indonesian, Italian, Portuguese, Hindi, Spanish, Tagalog, Thai, Vietnamese). Most text-based workloads – think of them as analyzing large files such as PDFs and spreadsheets – are within that range, and all Llama 4 models support text, image and video input.
The Llama 4 Scout is designed for longer workflows and large-scale data analysis. Maverick is a generalist model that is excellent at balancing inference power and response speed, and is suitable for coding, chatbots and technical assistants. Behemoth is designed for advanced research, model distillation and STEM tasks.
Llama models, including Llama 3.1, can be configured to leverage third-party applications, tools, and APIs to perform tasks. They are trained to use brave searches to answer questions about recent events. Wolfram Alpha API for mathematics and science-related queries. A Python interpreter for validating your code. However, these tools require proper configuration and do not automatically enable them out of the box.
Where can I use llama?
If you’d like to simply chat with Llama, we’ve powered the Meta AI chatbot experience on Facebook Messenger, Whatsapp, Instagram, Oculus, and Meta.ai in 40 countries. The tweaked version of the llama is used in meta-AI experiences in over 200 countries and territories.
Llama 4 Models Scout and Maverick are available to Llama.com and Meta partners, including the AI developer platform. Behemoth is still in training. Developers who build buildings with Lamas can download, use or tweak models on most of the popular cloud platforms. Meta claims it has more than 25 partners hosting Llama, including Nvidia, Databricks, Groq, Dell and Snowflake. Additionally, “selling access” to Meta’s openly available models is not Meta’s business model, but he makes money through revenue sharing agreements with model hosts.
Some of these partners are building additional tools and services on top of the llama. These include tools that allow models to reference their own data and run with lower latency.
Importantly, llama licensing constrains how developers deploy their models. App developers with more than 700 million users per month must request from Meta a special license granted by the company at its discretion.
In May 2025, META launched a new program to encourage startups to adopt the llama model. Llama for Startups provides corporate support and access to potential funds from Meta’s Llama team.
Alongside the llama, Meta offers tools aimed at making the models used “safer.”
Lamaguard, a moderation framework. Cyberseceval, Cybersececeval, Cybersecurity Risk Assessment Suite. Llama Firewall is a security guardrail designed to enable the creation of secure AI systems. Code shield that supports inference time filtering of unstable code generated by LLMS.
Llama Guard attempts to detect potentially problematic content supplied or generated by the Llama model, including content related to criminal activity, child exploitation, copyright violations, hatred, self-harm, and sexual abuse.
That said, it is clearly not a silver bullet, as Meta’s own previous guidelines allowed chatbots to engage in sensual and romantic chats with minors. Some reports show that they have turned into sexual conversations. Developers can customize categories of blocked content and apply blocks to all languages in Llama support.
Like Llama Guard, Prompt Guard can block text exclusively for Llama, but only text is intended to “attack” the model and make it work in an undesirable way. Meta argues that in addition to prompts containing “injected input”, Rama Guard can also protect against explicitly malicious prompts (i.e. jailbreak attempting to avoid Larama’s built-in safety filters). The Llama Firewall works to detect and prevent risks such as rapid injection, safe code, and high-risk tool interactions. Code Shield helps to reduce unsafe code suggestions and provide safe command execution for seven programming languages.
When it comes to Cyberseceval, it’s not a more tool than a collection of benchmarks for measuring model security. Cyberseceval can assess (at least according to Meta’s standards) the risks poses of the llama model to app developers and end users in areas such as “automatic social engineering” and “scaling offensive cyber operations.”
Llama limits

Llama, like all generation AI models, has certain risks and limitations. For example, the latest models have multimodal features, but at the moment they are primarily limited to English.
Zoomed out, Meta trained the llama model using pirated e-books and article datasets. A federal judge recently sided with Meta in a copyright lawsuit brought against the company by the authors of 13 books, finding that the use of copyrighted works in training fell into “fair use.” However, if the llama refluxes a copyrighted snippet and someone is using it in the product, it could infringe copyright and be held liable.
Meta also trains AI with Instagram and Facebook posts, photos and captions, making it difficult for users to opt out.
Programming is another area where it is wise to step lightly when using Llama. That’s because the llama could generate buggy or unstable code, perhaps more than its generative AI counterpart. At LiveCodeBench, a benchmark that tests AI models with competitive coding issues, Meta’s Llama 4 Maverick model achieved a 40% score. This is compared to 85% at Openai’s GPT-5 altitude and 83% at Xai’s Grok 4 Fast.
As always, it’s best to have a human expert review the AI’s generated code before you incorporate it into a service or software.
Finally, like other AI models, the Llama model is still guilty of generating plausible but false or misleading information, whether it is coding, legal guidance, or emotional conversations with an AI persona.
This was originally published on September 8, 2024 and will be updated regularly with new information.
