Mastodawn

Blaed Jul 24, 2024

Llama 3.1 Megathread

Llama 3.1 Megathread - Lemmy.World

Meta has released and open-sourced Llama 3.1 in three different sizes: 8B, 70B, and 405B This new Llama iteration and update brings state-of-the-art performance to open-source ecosystems. If you’ve had a chance to use Llama 3.1 in any of its variants - let us know how you like it and what you’re using it for in the comments below! ## Llama 3.1 Megathread > For this release, we evaluated performance on over 150 benchmark datasets that span a wide range of languages. In addition, we performed extensive human evaluations that compare Llama 3.1 with competing models in real-world scenarios. Our experimental evaluation suggests that our flagship model is competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. Additionally, our smaller models are competitive with closed and open models that have a similar number of parameters. [https://lemmy.world/pictrs/image/80f49656-cfa2-460e-83e5-ff141ff43a4f.png] [https://lemmy.world/pictrs/image/975a792c-87eb-41a3-a1e4-7f944b27fc61.png] [https://lemmy.world/pictrs/image/387def4b-af7b-4d82-bdec-e9bb44e016eb.png] > As our largest model yet, training Llama 3.1 405B on over 15 trillion tokens was a major challenge. To enable training runs at this scale and achieve the results we have in a reasonable amount of time, we significantly optimized our full training stack and pushed our model training to over 16 thousand H100 GPUs, making the 405B the first Llama model trained at this scale. [https://lemmy.world/pictrs/image/633f949a-4873-4ffe-ac87-acc0d870793a.png] — ### Official Meta News & Documentation - https://llama.meta.com/ [https://llama.meta.com/] - https://ai.meta.com/blog/meta-llama-3-1/ [https://ai.meta.com/blog/meta-llama-3-1/] - https://llama.meta.com/docs/overview [https://llama.meta.com/docs/overview] - https://llama.meta.com/llama-downloads/ [https://llama.meta.com/llama-downloads/] - https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md [https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md] See also: The Llama 3 Herd of Models paper here: - https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ [https://ai.meta.com/research/publications/the-llama-3-herd-of-models/] — ### HuggingFace Download Links #### 8B Meta-Llama-3.1-8B - https://huggingface.co/meta-llama/Meta-Llama-3.1-8B [https://huggingface.co/meta-llama/Meta-Llama-3.1-8B] Meta-Llama-3.1-8B-Instruct - https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct [https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct] Llama-Guard-3-8B - https://huggingface.co/meta-llama/Llama-Guard-3-8B [https://huggingface.co/meta-llama/Llama-Guard-3-8B] Llama-Guard-3-8B-INT8 - https://huggingface.co/meta-llama/Llama-Guard-3-8B-INT8 [https://huggingface.co/meta-llama/Llama-Guard-3-8B-INT8] — #### 70B Meta-Llama-3.1-70B - https://huggingface.co/meta-llama/Meta-Llama-3.1-70B [https://huggingface.co/meta-llama/Meta-Llama-3.1-70B] Meta-Llama-3.1-70B-Instruct - https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct [https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct] — #### 405B Meta-Llama-3.1-405B-FP8 - https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-FP8 [https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-FP8] Meta-Llama-3.1-405B-Instruct-FP8 - https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 [https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8] Meta-Llama-3.1-405B - https://huggingface.co/meta-llama/Meta-Llama-3.1-405B [https://huggingface.co/meta-llama/Meta-Llama-3.1-405B] Meta-Llama-3.1-405B-Instruct - https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct [https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct] — ### Getting the models You can download the models directly from Meta or one of our download partners: Hugging Face or Kaggle. Alternatively, you can work with ecosystem partners to access the models through the services they provide. This approach can be especially useful if you want to work with the Llama 3.1 405B model. Note: Llama 3.1 405B requires significant storage and computational resources, occupying approximately 750GB of disk storage space and necessitating two nodes on MP16 for inferencing. Learn more at: - https://llama.meta.com/docs/getting_the_models [https://llama.meta.com/docs/getting_the_models] [https://lemmy.world/pictrs/image/3990c7b9-f4ad-4baf-8a33-46e18d70478b.png] — ### Running the models #### Linux - https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-linux/ [https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-linux/] #### Windows - https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-windows/ [https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-windows/] #### Mac - https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-mac/ [https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-mac/] #### Cloud - https://llama.meta.com/docs/llama-everywhere/running-meta-llama-in-the-cloud/ [https://llama.meta.com/docs/llama-everywhere/running-meta-llama-in-the-cloud/] — ### More guides and resources How-to Fine-tune Llama 3.1 models - https://llama.meta.com/docs/how-to-guides/fine-tuning [https://llama.meta.com/docs/how-to-guides/fine-tuning] Quantizing Llama 3.1 models - https://llama.meta.com/docs/how-to-guides/quantization [https://llama.meta.com/docs/how-to-guides/quantization] Prompting Llama 3.1 models - https://llama.meta.com/docs/how-to-guides/prompting [https://llama.meta.com/docs/how-to-guides/prompting] Llama 3.1 recipes - https://github.com/meta-llama/llama-recipes [https://github.com/meta-llama/llama-recipes] — ### YouTube media Rowan Cheung - Mark Zuckerberg on Llama 3.1, Open Source, AI Agents, Safety, and more - https://www.youtube.com/watch?v=Vy3OkbtUa5k [https://www.youtube.com/watch?v=Vy3OkbtUa5k] Matthew Berman - BREAKING: LLaMA 405b is here! Open-source is now FRONTIER! - https://www.youtube.com/watch?v=JLEDwO7JEK4 [https://www.youtube.com/watch?v=JLEDwO7JEK4] Wes Roth - Zuckerberg goes SCORCHED EARTH… Llama 3.1 BREAKS the “AGI Industry”* - https://www.youtube.com/watch?v=QyRWqJehK7I [https://www.youtube.com/watch?v=QyRWqJehK7I] 1littlecoder - How to DOWNLOAD Llama 3.1 LLMs - https://www.youtube.com/watch?v=R_vrjOkGvZ8 [https://www.youtube.com/watch?v=R_vrjOkGvZ8] Bloomberg - Inside Mark Zuckerberg’s AI Era | The Circuit - https://www.youtube.com/watch?v=YuIc4mq7zMU [https://www.youtube.com/watch?v=YuIc4mq7zMU]

Blaed Jan 29, 2024

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

https://lemmy.world/post/11335173

Direct Preference Optimization: Your Language Model is Secretly a Reward Model - Lemmy.World

Hello everyone. Today I’d like to catch up on another paper, a popular one that has pushed a new fine-tuning trend called DPO (Direct Preference Optimization). Included with the paper are a few open-source projects and code repos that support DPO training. If you are fine-tuning models, this is worth looking into! DPO Arxiv Paper - https://arxiv.org/abs/2305.18290 [https://arxiv.org/abs/2305.18290] Try Fine-tuning w/ DPO using Axolotl - https://github.com/OpenAccess-AI-Collective/axolotl [https://github.com/OpenAccess-AI-Collective/axolotl] Try Fine-tuning w/ DPO using Llama Factory - https://github.com/hiyouga/LLaMA-Factory [https://github.com/hiyouga/LLaMA-Factory] Try Fine-tuning w/DPO using Unsloth - https://github.com/unslothai/unsloth [https://github.com/unslothai/unsloth] Now… onto the paper! ### Direct Preference Optimization: Your Language Model is Secretly a Reward Model > While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). > However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. > In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss. > The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. > Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train. [https://lemmy.world/pictrs/image/c7c2e6b6-f5b6-42f9-a7d6-d5ca74196c46.png] > Figure 1: DPO optimizes for human preferences while avoiding reinforcement learning. Existing methods for fine-tuning language models with human feedback first fit a reward model to a dataset of prompts and human preferences over pairs of responses, and then use RL to find a policy that maximizes the learned reward. In contrast, DPO directly optimizes for the policy best satisfying the preferences with a simple classification objective, fitting an implicit reward model whose corresponding optimal policy can be extracted in closed form [https://lemmy.world/pictrs/image/558ddb7a-939a-49b7-bea3-1f277c793b0a.png] > Figure 2: Left. The frontier of expected reward vs KL to the reference policy. DPO provides the highest expected reward for all KL values, demonstrating the quality of the optimization. > Right. TL;DR summarization win rates vs. human-written summaries, using GPT-4 as evaluator. DPO exceeds PPO’s best-case performance on summarization, while being more robust to changes in the sampling temperature. > Learning from preferences is a powerful, scalable framework for training capable, aligned language models. We have introduced DPO, a simple training paradigm for training language models from preferences without reinforcement learning. > Rather than coercing the preference learning problem into a standard RL setting in order to use off-the-shelf RL algorithms, DPO identifies a mapping between language model policies and reward functions that enables training a language model to satisfy human preferences directly, with a simple cross-entropy loss, without reinforcement learning or loss of generality. > With virtually no tuning of hyperparameters, DPO performs similarly or better than existing RLHF algorithms, including those based on PPO; DPO thus meaningfully reduces the barrier to training more language models from human preferences. > Our results raise several important questions for future work. How does the DPO policy generalize out of distribution, compared with learning from an explicit reward function? > Our initial results suggest that DPO policies can generalize similarly to PPO-based models, but more comprehensive study is needed. For example, can training with self-labeling from the DPO policy similarly make effective use of unlabeled prompts? On another front, how does reward over-optimization manifest in the direct preference optimization setting, and is the slight decrease in performance in Figure 3-right an instance of it? > Additionally, while we evaluate models up to 6B parameters, exploration of scaling DPO to state-of-the-art models orders of magnitude larger is an exciting direction for future work. Regarding evaluations, we find that the win rates computed by GPT-4 are impacted by the prompt; future work may study the best way to elicit high-quality judgments from automated systems. Finally, many possible applications of DPO exist beyond training language models from human preferences, including training generative models in other modalities. Read More [https://arxiv.org/pdf/2305.18290.pdf]

Blaed Jan 18, 2024

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

https://lemmy.world/post/10849911

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts - Lemmy.World

Hello everyone, I have another exciting Mamba paper to share. This being an MoE implementation of the state space model. For those unacquainted with Mamba, let me hit you with a double feature (take a detour checking out these papers/code if you don’t know what Mamba is): - Mamba: Linear-Time Sequence Modeling with Selective State Spaces [https://arxiv.org/abs/2312.00752] - Official Mamba GitHub [https://github.com/state-spaces/mamba] - Example Implementation - Mamba-Chat [https://github.com/havenhq/mamba-chat] Now… onto the MoE paper! ### MoE-Mamba [https://arxiv.org/abs/2401.04081] Efficient Selective State Space Models with Mixture of Experts > Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski, Sebastian Jaszczur > State Space Models (SSMs) have become serious contenders in the field of sequential modeling, challenging the dominance of Transformers. At the same time, Mixture of Experts (MoE) has significantly improved Transformer-based LLMs, including recent state-of-the-art open-source models. > We propose that to unlock the potential of SSMs for scaling, they should be combined with MoE. We showcase this on Mamba, a recent SSM-based model that achieves remarkable, Transformer-like performance. [https://lemmy.world/pictrs/image/54e9b703-8c68-4df5-9fae-2627d4a27d47.png] > Our model, MoE-Mamba, outperforms both Mamba and Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer. [https://lemmy.world/pictrs/image/1ad18393-3ed3-4ce4-8113-c74538d6940e.png] [https://lemmy.world/pictrs/image/0dbb811a-94e1-4191-8488-778908ebeeb1.png] | Category | Hyperparameter | Value | |---------------------|----------------------------------|---------------------------------| | Model | Total Blocks | 8 (16 in Mamba) | | | dmodel | 512 | | Feed-Forward | df f | 2048 (with Attention) or 1536 (with Mamba) | | Mixture of Experts | dexpert | 2048 (with Attention) or 1536 (with Mamba) | | | Experts | 32 | | Attention | nheads | 8 | | Training | Training Steps | 100k | | | Context Length | 256 | | | Batch Size | 256 | | | LR | 1e-3 | | | LR Warmup | 1% steps | | | Gradient Clipping | 0.5 | MoE seems like the logical way to move forward with Mamba, at this point, I’m wondering could there anything else holding it back? Curious to see more tools and implementations compare against some of the other trending transformer-based LLM stacks.

Blaed Jan 18, 2024

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

https://lemmy.world/post/10849904

Mamba: Linear-Time Sequence Modeling with Selective State Spaces - Lemmy.World

Hello everyone, I have a very exciting paper to share with you today. This came out a little while ago, (like many other papers since my hiatus) so allow me to catch you up if you haven’t read it already. - Mamba: Linear-Time Sequence Modeling with Selective State Spaces [https://arxiv.org/abs/2312.00752] - Official Mamba GitHub [https://github.com/state-spaces/mamba] - Example Implementation - Mamba-Chat [https://github.com/havenhq/mamba-chat] ### Mamba [https://arxiv.org/abs/2312.00752] Linear-Time Sequence Modeling with Selective State Spaces > Albert Gu, Tri Dao [https://lemmy.world/pictrs/image/9e459e17-aebd-474a-84ca-26a3646fec8a.png] > Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. > Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers’ computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. > We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. > First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. > Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). > Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. > As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. > On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation. > (… [https://arxiv.org/ftp/arxiv/papers/2312/2312.00752.pdf]) Mamba achieves state-of-the-art results on a diverse set of domains, where it matches or exceeds the performance of strong Transformer models. We are excited about the broad applications of selective state space models to build foundation models for different domains, especially in emerging modalities requiring long context such as genomics, audio, and video. Our results suggest that Mamba is a strong candidate to be a general sequence model backbone. What are your thoughts on Mamba?

Blaed Jan 17, 2024

Develop Alongside Local LLMs w/ Open Interpreter

https://lemmy.world/post/10810649

Develop Alongside Local LLMs w/ Open Interpreter - Lemmy.World

I don’t think this has been shared here before. Figured now is as good time as ever. I’d like to share with everyone Open Interpreter. ## Open Interpreter Check it out here: https://github.com/KillianLucas/open-interpreter [https://github.com/KillianLucas/open-interpreter] > Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. > This provides a natural-language interface to your computer’s general-purpose capabilities: > - Create and edit photos, videos, PDFs, etc. > - Control a Chrome browser to perform research > - Plot, clean, and analyze large datasets > - …etc. > ⚠️ Note: You’ll be asked to approve code before it’s run. ### Comparison to ChatGPT’s Code Interpreter > OpenAI’s release of Code Interpreter with GPT-4 presents a fantastic opportunity to accomplish real-world tasks with ChatGPT. > However, OpenAI’s service is hosted, closed-source, and heavily restricted: > - No internet access. > - Limited set of pre-installed packages. > - 100 MB maximum upload, 120.0 second runtime limit. > - State is cleared (along with any generated files or links) when the environment dies. > Open Interpreter overcomes these limitations by running in your local environment. It has full access to the internet, isn’t restricted by time or file size, and can utilize any package or library. > This combines the power of GPT-4’s Code Interpreter with the flexibility of your local development environment. > Open Interpreter Roadmap [https://github.com/KillianLucas/open-interpreter/blob/main/docs/ROADMAP.md]

Blaed Jan 17, 2024

What open-source LLMs are you using in 2024?

https://lemmy.world/post/10810338

What open-source LLMs are you using in 2024? - Lemmy.World

There has been an overwhelming amount of new models hitting HuggingFace. I wanted to kick off a thread and see what open-source LLM has been your new daily driver? Personally, I am using many Mistral/Mixtral models and a few random OpenHermes fine-tunes for flavor. I was also pleasantly surprised by some of the DeepSeek models. Those were fun to test. I believe 2024 is the year open-source LLMs will catchup with GPT-3.5 and GPT-4. We’re already most of the way there. Curious to hear what new contenders are on the block and how others feel about their performance/precision compared to other state-of-the-art (closed) source models.

Blaed Jan 17, 2024

FOSAI 2024

https://lemmy.world/post/10805927

FOSAI 2024 - Lemmy.World

Hello everyone. I’m back! To anyone still reading - I hope you have been enjoying the rapid amount of progress we’ve seen in the space since my hiatus. You’ll be happy to hear I’m going to be periodically cleaning up some of the outdated resources in favor of new, updated documentation both on our frontpage and on our sidebar. I know I also promised you all official FOSAI models on HuggingFace. I did not forget. Those are still in the pipeline. More info on that and other updates coming soon. In the meantime, is there anything in terms of guides, resources, or notes that you’d like to see in particular? Let me know in the comments and I’ll see where it might fit on the list. Cheers! Blaed

Blaed Oct 27, 2023

Blaed's Hiatus (Part I)

https://lemmy.world/post/7425237

Blaed's Hiatus (Part I) - Lemmy.World

Hello everyone, After some time away I have come to the realization that I have been neglecting a few personal projects and responsibilities by prioritizing staying in the know (over building / working towards other goals I set out to accomplish before 2024). That being said, I decided it would be in my best interest to take a brief hiatus throughout the remainder of the year to tackle these tasks before they get out of hand (and no longer become a reality). I will be sharing notes here and there, but at much less frequency due to the work I’ll be doing. Some of these projects are resources for this community, others are totally different obligations I need to attend to. You will be informed of the important updates, but I will be working mostly in the shadows - waiting and watching for the right moments to emerge. On my long list of tasks is still getting our own fosai model on HuggingFace, which was going well until I ran out of funds. As much as I’d love to, it is no longer sustainable for me to keep paying out-of-pocket for fosai fine-tuning expenses… lol. I had a Mistral-7B fine-tune that almost completed its training - but failed at the final 4%. I had the adapter and weights semi-published, but they were unusable from whatever caused that hiccup. That’s okay though, I will be applying for grants to help get this training workflow back off the ground (this time, with those pesky GPU costs covered). If all else fails, I will turn to other methods. I want you to know that throughout this hiatus, I am leaving the community to you guys. I want to let [email protected] [/c/[email protected]] organically grow (or slow) without my intervention. At the end of the day, I probably shouldn’t be the only one sharing content here. I’m curious to see who sticks around and who does (or doesn’t) post in my absence. Shoutout everyone who has been sharing content, it does not go unnoticed. At least by me. Whether content creator or casual lurker - you should know the activity of this community is not something I put a ton of expectations on so don’t pressure yourself to try and keep this community ‘alive’ with content or comments if it doesn’t feel natural or genuine. This community is not going anywhere, I’m just taking a break. We have already succeeded at the original fosai goal I set out to achieve. Now we must spend time building and developing our futures - collectively, and individually. If you’ve been here since the beginning - thank you for reading, perhaps this is a good time for you to take a break from the AI news cycle too. There was much innovation throughout the year and much more yet to come. If your FOMO is getting the best of you, consider subscribing to the YouTube content creators I’ve listed in this README [https://github.com/adynblaed/hypertech-workshop]. We’ll be here for all of the future’s wildest creations in this space, but taking a moment to develop yourself, be with family, (or spend time on one of your projects) is something you should consider doing if you have the ability to do so - no matter the pace of innovation. This is something I have forgotten, and something I will be reminding myself these coming weeks. The future is now. The future is bright. The future is H. Blaed

Blaed Oct 16, 2023

What kind of content do you want to see more of?

https://lemmy.world/post/6881852

What kind of content do you want to see more of? - Lemmy.World

I have temporarily paused my weekly news reports to take a moment to take stock and better gauge the content you all care about and want to see more of in this community. What sort of topics or areas of content would you like for me to cover every week or so? I won’t guarantee I’ll be the best journalist in this regard, but I’d be more than happy writing or R&D’ing about any concept that was useful or interesting for one of your ideas or workflows. I am still somewhat busy brainstorming standardized workflows to fine-tune and publish a fosai model to HuggingFace, but I’m all ears between now and then. Let me know if there is something you’d like to see more of here at [email protected] [/c/[email protected]]!

Blaed Oct 16, 2023

Llama 2 & WizardLM Megathread

https://lemmy.world/post/6881667

Llama 2 / WizardLM Megathread - Lemmy.World

## Llama 2 & WizardLM Megathread Starting another model megathread to aggregate resources for any newcomers. It’s been awhile since I’ve had a chance to chat with some of these models so let me know some your favorites in the comments below. There are many to choose from - sharing your experience could help someone else decide which to download for their use-case. Thread Models: - Llama 2 - MetaAI [https://ai.meta.com/llama/] - WizardLM - WizardLM [https://huggingface.co/WizardLM] — ### Quantized Base Llama-2 Chat Models - Unquantized Models [https://huggingface.co/meta-llama] #### Llama-2-7b-Chat GPTQ - Llama-2-7b-Chat-GPTQ [https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ] GGUF - Llama-2-7b-Chat-GGUF [https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF] AWQ - Llama-2-7b-Chat-AWQ [https://huggingface.co/TheBloke/Llama-2-7b-Chat-AWQ] — #### Llama-2-13B-chat GPTQ - Llama-2-13B-chat-GPTQ [https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ] GGUF - Llama-2-13B-chat-GGUF [https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF] AWQ - Llama-2-13B-chat-AWQ [https://huggingface.co/TheBloke/Llama-2-13B-chat-AWQ] — #### Llama-2-70B-chat GPTQ - Llama-2-70B-chat-GPTQ [https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ] GGUF - Llama-2-70B-chat-GGUF [https://huggingface.co/TheBloke/Llama-2-70B-chat-GGUF] AWQ - Llama-2-70B-chat-AWQ [https://huggingface.co/TheBloke/Llama-2-70B-chat-AWQ] — ### Quantized WizardLM Models - Unquantized Models [https://huggingface.co/WizardLM] #### WizardLM-7B-V1.0+ GPTQ - wizardLM-7B-GPTQ [https://huggingface.co/TheBloke/wizardLM-7B-GPTQ] - WizardLM-7B-V1.0-Uncensored-GPTQ [https://huggingface.co/TheBloke/WizardLM-7B-V1.0-Uncensored-GPTQ] GGUF - wizardLM-7B-GGUF [https://huggingface.co/TheBloke/wizardLM-7B-GGUF] - WizardLM-7B-V1.0-Uncensored-GGUF [https://huggingface.co/TheBloke/WizardLM-7B-V1.0-Uncensored-GGUF] AWQ - WizardLM-7B-V1.0-Uncensored-AWQ [https://huggingface.co/TheBloke/WizardLM-7B-V1.0-Uncensored-AWQ] — #### WizardLM-13B-V1.0+ GPTQ - WizardLM-13B-V1.1-GPTQ [https://huggingface.co/TheBloke/WizardLM-13B-V1.1-GPTQ] - WizardLM-13B-V1.2-GPTQ [https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GPTQ] GGUF - WizardLM-13B-V1.0-Uncensored-GGUF [https://huggingface.co/TheBloke/WizardLM-13B-V1.0-Uncensored-GGUF] - WizardLM-13B-V1.1-GGUF [https://huggingface.co/TheBloke/WizardLM-13B-V1.1-GGUF] - WizardLM-13B-V1.2-GGUF [https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGUF] AWQ - WizardLM-13B-V1.0-Uncensored-AWQ [https://huggingface.co/TheBloke/WizardLM-13B-V1.0-Uncensored-AWQ] - WizardLM-13B-V1.1-AWQ [https://huggingface.co/TheBloke/WizardLM-13B-V1.1-AWQ] - WizardLM-13B-V1.2-AWQ [https://huggingface.co/TheBloke/WizardLM-13B-V1.2-AWQ] — #### WizardLM-30B-V1.0+ GPTQ - WizardLM-30B-uncensored-GPTQ [https://huggingface.co/TheBloke/WizardLM-30B-uncensored-GPTQ] - WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GPTQ [https://huggingface.co/TheBloke/WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GPTQ] - WizardLM-33B-V1.0-Uncensored-GPTQ [https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-GPTQ] GGUF - WizardLM-30B-GGUF [https://huggingface.co/TheBloke/WizardLM-30B-GGUF] - WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GGUF [https://huggingface.co/TheBloke/WizardLM-Uncensored-SuperCOT-StoryTelling-30B-GGUF] - WizardLM-33B-V1.0-Uncensored-GGUF [https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-GGUF] AWQ - WizardLM-Uncensored-SuperCOT-StoryTelling-30B-AWQ [https://huggingface.co/TheBloke/WizardLM-Uncensored-SuperCOT-StoryTelling-30B-AWQ] - WizardLM-33B-V1.0-Uncensored-AWQ [https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-AWQ] — ### Llama 2 Resources [https://www.philschmid.de/llama-2] > LLaMA 2 is a large language model developed by Meta and is the successor to LLaMA 1. LLaMA 2 is available for free for research and commercial use through providers like AWS, Hugging Face, and others. LLaMA 2 pretrained models are trained on 2 trillion tokens, and have double the context length than LLaMA 1. Its fine-tuned models have been trained on over 1 million human annotations. ### Llama 2 Benchmarks [https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard] > Llama 2 shows strong improvements over prior LLMs across diverse NLP benchmarks, especially as model size increases: On well-rounded language tests like MMLU and AGIEval, Llama-2-70B scores 68.9% and 54.2% - far above MTP-7B, Falcon-7B, and even the 65B Llama 1 model. ### Llama 2 Tutorials [https://duarteocarmo.com/blog/fine-tune-llama-2-telegram] Tutorials by James Briggs [https://www.youtube.com/@jamesbriggs/videos] (also link above) are quick, hands-on ways for you to experiment with Llama 2 workflows. See also a poor man’s guide to fine-tuning Llama 2 [https://duarteocarmo.com/blog/fine-tune-llama-2-telegram]. Check out Replicate if you want to host Llama 2 with an easy-to-use API [https://replicate.com/blog/run-llama-2-with-an-api]. — Did I miss any models? What are some of your favorites? Which family/foundation/fine-tuning should we cover next?