Evol-Instruct

From Robowaifu Institute of Technology
Revision as of 14:38, 28 April 2023 by RobowaifuDev (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Evol-Instruct: Mass-Producing Open-Domain Instruction Data with Varying Levels of Complexity using Large Language Models is a research paper published on the arXiv pre-print server in 2023. The paper proposes a method for creating large amounts of instruction data with varying levels of complexity using Large Language Models (LLMs) instead of humans.[1] The proposed method, called Evol-Instruct, uses an initial set of instructions and rewrites them step by step into more complex instructions. The resulting data is used to fine-tune the LLM, resulting in a model named WizardLM.

Background

Large-scale language models (LLMs) have become the go-to approach for numerous natural language processing (NLP) tasks. These models are trained on large volumes of text data to predict subsequent tokens, generating coherent and fluent text in response to various inputs. However, they often struggle to follow instructions or goals specified by users, limiting their usefulness and applicability in real-world scenarios.

Initial attempts at training instruction-following language models are based on a collection of various NLP tasks, with a small amount of hand-written instructions accompanying each task. These closed-domain instructions suffer from limited diversity and usually only ask for one task. Open-domain instruction data generated by real human users has been successful in unleashing the unlimited potential of LLMs and enabling them to perform more complex and diverse tasks. However, creating such data manually is very time-consuming, labor-intensive, and skewed towards easy or moderate difficulty levels.

Method

Evol-Instruct proposes a method for automatically mass-producing open-domain instructions with various difficulty levels using Large Language Models (LLMs). Starting from a simple initial instruction, the method randomly selects either In-depth Evolving or In-breadth Evolving to upgrade the instruction to a more complex one or create a new one. In-depth Evolving includes five types of operations: add constraints, deepening, concretizing, increase reasoning steps, and complicate input. In-breadth Evolving is mutation, generating a completely new instruction based on the given instruction. These six evolutionary operations are implemented by prompting an LLM with specific prompts. The evolved instructions are generated from LLMs, which may occasionally fail. The method adopts an instruction filter to screen out the failed instructions, called Elimination Evolving. The evolutionary process is repeated for several rounds to obtain enough instruction data containing various complexities.

Prompts of In-Depth Evolving

I want you act as a Prompt Rewriter.
Your objective is to rewrite a given prompt into a more complex version to make those famous AI systems (e.g., ChatGPT and GPT4) a bit harder to handle.
But the rewritten prompt must be reasonable and must be understood and responded by humans.
Your rewriting cannot omit the non-text parts such as the table and code in #Given Prompt#:. Also, please do not omit the input in #Given Prompt#.
You SHOULD complicate the given prompt using the following method:

Please add one more constraints/requirements into #Given Prompt# or
If #Given Prompt# contains inquiries about certain issues, the depth and breadth of the inquiry can be increased. or
Please replace general concepts with more specific concepts. or
If #Given Prompt# can be solved with just a few simple thinking processes, you can rewrite it to explicitly request multiple-step reasoning.

You should try your best not to make the #Rewritten Prompt# become verbose, #Rewritten Prompt# can only add 10 to 20 words into #Given Prompt#.
‘#Given Prompt#’, ‘#Rewritten Prompt#’, ‘given prompt’ and ‘rewritten prompt’ are not allowed to appear in #Rewritten Prompt#
#Given Prompt#:
<Here is instruction.>
#Rewritten Prompt#:

Prompts of In-Breadth Evolving

I want you act as a Prompt Creator.
Your goal is to draw inspiration from the #Given Prompt# to create a brand new prompt.
This new prompt should belong to the same domain as the #Given Prompt# but be even more rare.
The LENGTH and difficulty level of the #Created Prompt# should be similar to that of the #Given Prompt#.
The #Created Prompt# must be reasonable and must be understood and responded by humans.
‘#Given Prompt#’, ‘#Created Prompt#’, ‘given prompt’ and ‘created prompt’ are not allowed to appear in #Created Prompt#.

#Given Prompt#:
<Here is instruction.>
#Created Prompt#:

Elimination Evolving

Evolved instructions that don't add any new information compared to the original instruction are eliminated. They use this prompt to determine if they are equivalent:

Here are two Instructions to ChatGPT AI, do you think they are equal to each other, which meet the following requirements:
1. They have same constraints and requirements.
2. They have same depth and breadth of the inquiry.
The First Prompt: <Here is first instruction.>
The Second Prompt: <Here is second instruction.>
Your Judgement (Just answer: Equal or Not Equal. No need to explain the reason.):
  1. If the generated response is under 80 words and includes "sorry," it suggests the model is struggling.
  2. The response generated only contains punctuation and stop words.
  3. The evolved instruction copies words from the evolving prompt, like "given prompt," "rewritten prompt," and "#Rewritten Prompt#."

Results

The proposed method was evaluated by fine-tuning open-source LLaMA with the generated instruction data and comparing the performance with existing SOTA works. The instruction datasets used were generated using self-instruct and real human users. The results showed that the instruction data from Evol-Instruct is superior to human-created instruction data. When using the same amount of instruction data to fine-tune LLaMA, the resulting model WizardLM significantly outperforms existing models. On the test set, WizardLM performs worse than OpenAI's ChatGPT, but in the high-difficulty section of the test set, WizardLM outperforms ChatGPT, indicating that the method can significantly improve the ability of LLMs to handle complex instructions.

Open questions

  • How can Evol-Instruct be combined with generative curriculum learning? The instruction generating model could be trained on an outer loop to improve the student language model that is finetuning on the instructions on an inner loop.

References

  1. Xu et al. "DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings." 2023. arXiv:2304.12244