Detailed Guide to LangChain Text Splitters with Examples
What are LangChain Text Splitters
In recent times LangChain has evolved into a go-to framework for creating complex pipelines for working with LLMs. One of its important utility is the langchain_text_splitters package which contains various modules to split large textual data into more manageable chunks. Usually, LangChain Text Splitters are used in RAG architecture to chunk a large document and convert these chunks into embeddings to be stored in Vector DB. For LLMs with limited context-size windows, it is quite useful to retrieve relevant chunks of the document from Vector DB and pass it as context while inferencing.
LangChain Text Splitters offers the following types of splitters that are useful for different types of textual data or as per your splitting requirement.
- CharacterTextSplitter
- TokenTextSplitter
- RecursiveCharacterTextSplitter
- RecursiveJsonSplitter