Orca-AgentInstruct: Agentic flows can be effective synthetic-data generators
Our work on Orca and Orca 2 demonstrated the power of using synthetic data for the post-training of small language models and getting them to levels of performance previously found only in much larger language models. Orca-AgentInstruct is another step in this direction, where we explore using agentic flows to generate diverse and high-quality data at scale. Orca-AgentInstruct is an agentic solution for synthetic-data generation. By leveraging an agentic framework, AgentInstruct can generate tailored datasets, comprising both prompts and responses, from