Decoding Strategies in Large Language Models

In the fascinating world of large language models (LLMs), much attention is given to model architectures, data processing, and optimization. However, decoding strategies like beam search, which play a crucial role in text generation, are often overlooked. In this article, we will explore how LLMs generate text by delving into the mechanics of greedy search and beam search, as well as sampling techniques with top-k and nucleus sampling. By the conclusion of this article, you’ll not only understand these decoding […]

Read more

Improve ChatGPT with Knowledge Graphs

ChatGPT has shown impressive capabilities in processing and generating human-like text. However, it is not without its imperfections. A primary concern is the model’s propensity to produce either inaccurate or obsolete answers, often called “hallucinations.” The New York Times recently highlighted this issue in their article, “Here’s What Happens When Your Lawyer Uses ChatGPT.” It presents a lawsuit where a lawyer leaned heavily on ChatGPT to assist in preparing a court filing for a client suing an airline. The model […]

Read more

4-bit LLM Quantization with GPTQ

Recent advancements in weight quantization allow us to run massive large language models on consumer hardware, like a LLaMA-30B model on an RTX 3090 GPU. This is possible thanks to novel 4-bit quantization techniques with minimal performance degradation, like GPTQ, GGML, and NF4. In the previous article, we introduced naïve 8-bit quantization techniques and the excellent LLM.int8(). In this article, we will explore the popular GPTQ algorithm to understand how it works and implement it using the AutoGPTQ library. You […]

Read more

A Beginner’s Guide to LLM Fine-Tuning

The growing interest in Large Language Models (LLMs) has led to a surge in tools and wrappers designed to streamline their training process. Popular options include FastChat from LMSYS (used to train Vicuna) and Hugging Face’s transformers/trl libraries (used in my previous article). In addition, each big LLM project, like WizardLM, tends to have its own training script, inspired by the original Alpaca implementation. In this article, we will use Axolotl, a tool created by the OpenAccess AI Collective. We […]

Read more

ExLlamaV2: The Fastest Library to Run LLMs

Quantizing Large Language Models (LLMs) is the most popular approach to reduce the size of these models and speed up inference. Among these techniques, GPTQ delivers amazing performance on GPUs. Compared to unquantized models, this method uses almost 3 times less VRAM while providing a similar level of accuracy and faster generation. It became so popular that it has recently been directly integrated into the transformers library. ExLlamaV2 is a library designed to squeeze even more performance out of GPTQ. […]

Read more

Using Python for Data Analysis

Data analysis is a broad term that covers a wide range of techniques that enable you to reveal any insights and relationships that may exist within raw data. As you might expect, Python lends itself readily to data analysis. Once Python has analyzed your data, you can then use your findings to make good business decisions, improve procedures, and even make informed predictions based on what you’ve discovered. Before you start, you should familiarize yourself with Jupyter Notebook, a popular […]

Read more

Create a Tic-Tac-Toe Python Game Engine With an AI Player

A classic childhood game is tic-tac-toe, also known as naughts and crosses. It’s simple and enjoyable, and coding a version of it with Python is an exciting project for a budding programmer. Now, adding some artificial intelligence (AI) using Python can make an old favorite even more thrilling. In this comprehensive tutorial, you’ll construct a flexible game engine. This engine will include an unbeatable computer player that employs the minimax algorithm to play tic-tac-toe flawlessly. Throughout the tutorial, you’ll explore […]

Read more
1 76 77 78 79 80 908