Prompt Engineering — Data Augumentation

Stan Chen
6 min readMar 9, 2023

Data augmentation is a way to add more information to machine learning models by making new data points from existing data in different ways. With this method, you change existing data by making small changes or distortions to the original dataset. In natural language processing, data augumentation is a technique that is often used to improve the accuracy and variety of chatbot responses. With GPT chatbot prompt engineering, data augmentation can be used to improve the model’s performance by adding more training data. By giving the model new variations of phrases and sentences, it can learn more about the context and subtleties of language. This makes it better able to give responses that are both relevant and make sense. Also, data augmentation lets the model deal with differences in language and dialects, which can be very helpful when making chatbots for a wide range of users.

Providing external knowledge in prompt engineering is a crucial aspect of improving the performance of language models like ChatGPT. This can be achieved through methods such as adding relevant keywords and phrases, incorporating structured data, providing context, and fine-tuning with domain-specific data. By utilizing these methods, language models can generate more accurate and relevant responses to prompts.

here are some examples of how you can structure prompts to provide external knowledge:

  1. Adding keywords or phrases: If you want to provide external knowledge by adding relevant keywords or phrases to the prompt, you can structure your prompts like:
  • “What is the weather like in [city] today? Please include temperature and humidity.”
  • “Can you recommend a good [cuisine type] restaurant in [city]? Please provide ratings and reviews.”
<< 
[cuisine type]="Chinese"
[city] = "Amsterdam
Can you recommend a good [cuisine type] restaurant in [city]? Please provide ratings and reviews.>>
Sure! Here's a good Chinese restaurant in Amsterdam:
Restaurant: Oriental City
Cuisine type: Chinese
Location: Oudezijds Voorburgwal 177, 1012 EW Amsterdam, Netherlands
Ratings: 4.3 out of 5 stars on Google, 4.5 out of 5 stars on TripAdvisor
Review: "Oriental City is an amazing Chinese restaurant in the heart of Amsterdam. The food is delicious and authentic, and the service is excellent. The restaurant has a great atmosphere, and the prices are reasonable for the quality of the food. Highly recommend trying the Peking duck and the dim sum."

2. Incorporating structured data: If you have structured data that you want to incorporate in your prompts, you can structure your prompts like:

  • “Based on the [database name], what is the sales forecast for [product name] in the next quarter?”
  • “Using the [table name], can you provide the average salary for [job title] in [city]?”
<<
[salary database]="""Average Salary,Job Title,City
165000,Software Engineer,San Francisco
55000,Marketing Manager,New York City
45000,Graphic Designer,Los Angeles
70000,Data Scientist,Chicago
52000,Human Resources Manager,Houston
60000,Web Developer,Miami
48000,Accountant,Dallas
80000,Product Manager,Seattle
57000,Public Relations Specialist,Atlanta
42000,Administrative Assistant,Phoenix
"""
Using the [salary database], can you provide the average salary for IT job in San Francisco?
>>
Sure, based on the provided salary database, the average salary for a Software Engineer job in San Francisco is $165,000.

3. Providing context: If you want to provide external knowledge by providing context, you can structure your prompts like:

  • “I’m planning a trip to [destination]. Can you provide some information on the top tourist attractions and how to get there?”
  • “I’m interested in learning more about [topic]. Can you provide some background information and key facts?”
<<
[Top Attractions in Miami]=""" 1. Vizcaya Museum and Gardens · 2. Wynwood Walls · 3. Zoo Miami · 4. Bayside Marketplace · 5. Dolphin Mall · 6. Zoological Wildlife"""
I'm planning a trip to Miami. Can you provide some information on the top tourist attractions and how to get there?>>
Sure, here are some of the top tourist attractions in Miami along with information on how to get there:
Vizcaya Museum and Gardens - This beautiful Italian-style villa is located in Coconut Grove, Miami. To get there, you can take the Metrorail to Vizcaya Station, or take a taxi or ride-sharing service.Wynwood Walls - This popular street art attraction is located in the Wynwood Arts District in Miami. You can take the Metrorail to the Wynwood Walls Station, or take a taxi or ride-sharing service.Zoo Miami - This zoo is located in Southwest Miami-Dade and is home to over 3,000 animals. You can take the Metrorail to the Dadeland South Station and transfer to a bus, or take a taxi or ride-sharing service.Bayside Marketplace - This outdoor mall is located on the waterfront in downtown Miami. You can take the Metromover to the Bayfront Park Station or take a taxi or ride-sharing service.Dolphin Mall - This large shopping center is located in Sweetwater, Miami. You can take a taxi or ride-sharing service to get there.Zoological Wildlife Foundation - This is a wildlife park located in Miami, where you can see various animals including tigers, leopards, and monkeys. You can take a taxi or ride-sharing service to get there.I hope this information helps you plan your trip to Miami!

4. Fine-tuning with domain-specific data: If you want to fine-tune your language model with domain-specific data, you can structure your prompts like:

  • “Using the [dataset name], can you provide insights on [specific task]?”
  • “Based on the [domain-specific data], can you provide recommendations on [specific task]?”
<<
[dogs dataset]="""ID,Breed,Color,Weight,Age
1,Golden Retriever,Yellow,75,5
2,Poodle,White,15,2
3,German Shepherd,Black and Tan,80,4
4,Bulldog,Brown and White,50,7
5,Siberian Husky,Gray and White,60,3
6,Golden Retriever,Red,80,6
7,Chihuahua,Tan,5,1
8,Golden Retriever,Cream,70,4
9,Labrador Retriever,Black,65,2
10,Golden Retriever,Golden,80,8
"""
Using the [dogs dataset], can you provide insights on the average age of Golden Retriever? and what is the most common dog color?
>>
From the [dogs dataset], we can see that there are a total of 5 Golden Retrievers with ages ranging from 5 to 8. To determine the average age of Golden Retrievers, we can add up the ages of all Golden Retrievers and divide by the total number of Golden Retrievers.
(5+6+4+8)/4 = 23/4 = 5.75Therefore, the average age of Golden Retrievers in this dataset is 5.75 years old.In terms of the most common dog color, we can tally up the frequency of each color in the dataset. - Yellow: 1
- White: 1
- Black and Tan: 1
- Brown and White: 1
- Gray and White: 1
- Red: 1
- Tan: 1
- Cream: 1
- Black: 1
- Golden: 1
We can see that each color appears only once, therefore there is no clear most common dog color in this dataset.

5. Provide real-time knowledge memory prior to prompt completions, you can create your prompts like:

<< 
read
"""
city: Calgary
today: "2023-03-09"
weather: |
Cloudy with 60 percent chance of flurries. High minus 7. Night: Cloudy periods. Low minus 16.
A mix of sun and cloud. Sun , 12 Mar -7° C 19° F.
"""
<<
What's the weather today in Calgary?
>>
The weather today in Calgary on March 9th, 2023 is cloudy with a 60 percent chance of flurries. The high temperature is expected to be around minus 7 degrees Celsius (19 degrees Fahrenheit), and the low temperature tonight is expected to be minus 16 degrees Celsius.

Note: the above prompt example is a multi-shot prompting

Incorporating external knowledge in your prompts can help your language model generate more accurate and relevant responses. By structuring your prompts in a way that incorporates external knowledge, you can improve the performance of your language model and create more valuable outputs.

Want to read more with some hands on example? Checkout this repo: https://github.com/stancsz/prompt-engineering

Looking to learn data while on the go riding a bus or bike? Check out my no-fluff data podcast series, new episode every Friday!

datapodcast.github.io

--

--