How To Unleash The Power of Large Language Models For Few-Shot Relation Extraction
How To Unleash The Power of Large Language Models For Few-Shot Relation Extraction
∗
Xin Xu, Yuqi Zhu, Xiaohan Wang, Ningyu Zhang
Zhejiang University & AZFT Joint Lab for Knowledge Engine
{xxucs, wangxh07, zhangningyu}@zju.edu.cn
utilize relatively small language models (RoBERTa prompt to elicit comprehension of the relation ex-
(Liu et al., 2019), GPT2 (Radford et al., 2019)), traction task from LLMs. To this end, specific and
demonstrating empirical success regarding few- compelling prompts for RE with demonstrations
shot relation extraction performance. To date, large are manually constructed and designed to instruct
language models have demonstrated powerful abil- LLMs to understand the relation extraction task
ities by prompting a few instances without tuning and how to execute relation extraction. Consid-
(Ding et al., 2022); however, the power of LLMs ering aspects and characteristics of the relation
for few-shot relation extraction is little known. extraction task, including task definition, candi-
date relation (label) words, entity types (schemas)
2.2 Large Language Models and so on, we design prompts of different articu-
Large language models, trained with exceedingly lation and complexity to investigate how prompts
large corpora and often with a great number of pa- help LLMs release the power of few-shot RE. First,
rameters (≥10B), have achieved excellent perfor- TEXT PROMPT only contains essential elements
mance in numerous downstream NLP tasks (Taylor for RE, including relation categories, contexts, and
et al., 2022; Zhang et al., 2022; Zeng et al., 2022; corresponding head and tail entities. Inspired by
Chowdhery et al., 2022; Ouyang et al., 2022). Com- the fantastic performance of InstructGPT (Ouyang
pared to relatively small language models (SLMs), et al., 2022) and ChatGPT (OpenAI, 2022), we de-
LLMs are usually not open-source and can not be sign the task-related instruction describing the
fine-tuned, which is challenging for downstream relation extraction task and add it to the prompt,
task adaptation. Therefore, in-context learning which is named INSTRUCT PROMPT. Meanwhile,
(Brown et al., 2020) is proposed to utilize prompts according to previous few-shot RE works (Zhou
with a few demonstrations for few-shot learning. and Chen, 2022), entity types (schemas) are help-
Previous studies (Yoo et al., 2021; Wang et al., ful; therefore, we also explore the effectiveness of
2021) have investigated using LLMs for text classi- schemas in prompts.
fication and generation. In this work, we take the
first step to study few-shot RE with large language 3.2 Data Generation with LLMs
models, which brings new challenges and insights. To complement the scarcity of labeled data, we in-
troduce another strategy: data generation via LLMs.
3 LLMs for Few-shot Relation Extraction Specifically, we utilize specific prompts with de-
In this section, we introduce two strategies to utilize scriptions of data forms to guide LLMs to generate
LLMs for relation extraction: 1) in-context learning more in-domain labeled data autonomously, which
(§3.1); 2) data generation (§3.2) with LLMs, as is subsequently employed to fine-tune a relatively
shown in Figure 1. small language model with existing few-shot la-
beled training data. We design the prompt to tell the
3.1 In-Context Learning with LLMs essential components (x, h, t, th , tt and y) of one
The first strategy applies in-context learning (ICL) RE training instance and show few-shot instances
by providing LLMs with demonstrations in the as demonstrations to teach LLMs to comprehend
191
TACRED TACREV RE-TACRED SciERC
Method K=8 K=16 K=8 K=16 K=8 K=16 K=8 K=16
SpanBERT (Joshi et al., 2020) 8.4 17.5 5.2 5.7 14.2 29.3 29.0 38.7
LUKE (Yamada et al., 2020) 9.5 21.5 9.8 22.0 14.1 37.5 33.2 48.9
Baselines
GDPNet (Xue et al., 2021) 11.8 22.5 8.3 20.8 18.8 48.0 33.5 42.3
TANL (Paolini et al., 2021) 18.1 27.6 18.6 28.8 26.7 50.4 32.4 38.7
TYP Marker (Zhou and Chen, 2022) 26.5 29.9 26.7 29.5 44.8 54.1 50.4 59.0
KnowPrompt (Chen et al., 2022) 29.4 32.1 29.8 34.1 56.1 61.4 50.2 57.1
In-context Learning† 31.9 32.4 49.9 46.6
In-context Learning†(w/ Instruction) 31.0 31.9 51.8 48.8
GPT3
Data Generation (TYP Marker) 35.8 36.6 36.7 36.5 58.4 60.6 63.2 64.3
Data Generation (KnowPrompt) 37.9 37.4 42.6 41.0 62.7 66.2 58.6 67.8
Table 1: Micro F1 (%) of few-shot performance. † refers to the performance with one-shot demonstrations.
Prompts TACRED TACREV RE-TACRED SciERC for training and validation. As for in-context learn-
TEXT 31.9 32.4 49.9 46.6 ing, because GPT-3.5 has the limitation of maxi-
TEXT + Schema 36.9 37.7 54.3 45.9
INSTRUCT 31.0 31.9 51.8 48.8 mum request tokens (4097 tokens) and the series
INSTRUCT + Schema 38.3 36.7 58.5 50.2
of TACRED datasets have more than 40 relations,
Table 2: Micro F1 (%) of performance on different
one-shot demonstrations can only be used, and
prompt: TEXT PROMPT and INSTRUCT PROMPT. the one-shot performance is reported in Table 1.
For the same reason, to generate more labeled data
for each relation independently, only three demon-
features of labeled RE data. Note that schemas, strations for the relation are added to the prompts.
such as types of relations and entities, are signifi- In-context learning is implemented on the four
cant structural information in RE data. Therefore, whole test sets. Different demonstrations are ran-
we propose schema-constrained data generation domly sampled from the shuffled training set every
by adding entity types as schema guidance to the time to avoid effects from permutations of demon-
prompt (in Figure 1) to boost performance. Then, strations (Lu et al., 2021). As for data generation,
the prompt is utilized to guide LLMs to create aug- generated data from GPT-3.5 and original few-shot
mented relation extraction data that are converted training data are combined to fine-tune two base-
into the expected format for future usage. lines, TYP Marker (Zhou and Chen, 2022) and
KnowPrompt (Chen et al., 2022). Using different
4 Experimental Setups shots of generated data will lead to different results.
4.1 Methods and Datasets Therefore, we increasingly add generated k-shot
(k ∈ {8, 16, 32, 48}) data to the original 8-shot
GPT-3.5 is utilized via OpenAI API2 as the large and 16-shot training data respectively and report
language model in our experiments. We implement the best performance over k in Tabel 1. More de-
experiments on four relation extraction datasets, tails are shown in Appendix A.3.
including TACRED (Zhang et al., 2017), TACREV
(Alt et al., 2020), RE-TACRED (Stoica et al., 2021) 5 Results and Discussion
and SciERC (Luan et al., 2018). Compared with
the LLM, six baselines methods are conducted via 5.1 Main Findings for Relation Extraction
relatively small models (details in Appendix A).
In-context learning on LLMs can achieve com-
4.2 Few-shot Settings parable performance for RE with tuning rel-
atively small PLMs. From Table 1, we notice
K instances per relation (K-shot) are sampled that ICL with only one-shot demonstrations can
for training and validation. For all baselines, we obtain competitive performance with full parame-
use randomly sampled 8-shot and 16-shot datasets ter tuning-based prompt learning baselines. Using
2
https://platform.openai.com/docs/models/ LLMs via ICL does not necessitate any parameter
gpt-3-5 updates, which contains the potential value of mak-
192
Figure 2: Micro F1 (%) of k in-context demonstrations
in SciERC.
194
Conference 2022, Virtual Event, Lyon, France, April with rules for text classification. arXiv preprint,
25 - 29, 2022, pages 2778–2788. ACM. abs/2105.11259.
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Xu Han, Hao Zhu, Pengfei Yu, Ziyun Wang, Yuan Yao,
Maarten Bosma, Gaurav Mishra, Adam Roberts, Zhiyuan Liu, and Maosong Sun. 2018. FewRel: A
Paul Barham, Hyung Won Chung, Charles Sutton, large-scale supervised few-shot relation classification
Sebastian Gehrmann, Parker Schuh, Kensen Shi, dataset with state-of-the-art evaluation. In Proceed-
Sasha Tsvyashchenko, Joshua Maynez, Abhishek ings of the 2018 Conference on Empirical Methods
Rao, Parker Barnes, Yi Tay, Noam Shazeer, Vin- in Natural Language Processing, pages 4803–4809,
odkumar Prabhakaran, Emily Reif, Nan Du, Ben Brussels, Belgium. Association for Computational
Hutchinson, Reiner Pope, James Bradbury, Jacob Linguistics.
Austin, Michael Isard, Guy Gur-Ari, Pengcheng Yin,
Toju Duke, Anselm Levskaya, Sanjay Ghemawat, Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld,
Sunipa Dev, Henryk Michalewski, Xavier Garcia, Luke Zettlemoyer, and Omer Levy. 2020. Spanbert:
Vedant Misra, Kevin Robinson, Liam Fedus, Denny Improving pre-training by representing and predict-
Zhou, Daphne Ippolito, David Luan, Hyeontaek Lim, ing spans. Trans. Assoc. Comput. Linguistics, 8:64–
Barret Zoph, Alexander Spiridonov, Ryan Sepassi, 77.
David Dohan, Shivani Agrawal, Mark Omernick, Hunter Lang, Monica N. Agrawal, Yoon Kim, and
Andrew M. Dai, Thanumalayan Sankaranarayana David A. Sontag. 2022. Co-training improves
Pillai, Marie Pellat, Aitor Lewkowycz, Erica Mor- prompt-based learning for large language models.
eira, Rewon Child, Oleksandr Polozov, Katherine In International Conference on Machine Learning,
Lee, Zongwei Zhou, Xuezhi Wang, Brennan Saeta, ICML 2022, 17-23 July 2022, Baltimore, Maryland,
Mark Diaz, Orhan Firat, Michele Catasta, Jason USA, volume 162 of Proceedings of Machine Learn-
Wei, Kathy Meier-Hellstern, Douglas Eck, Jeff Dean, ing Research, pages 11985–12003. PMLR.
Slav Petrov, and Noah Fiedel. 2022. Palm: Scaling
language modeling with pathways. arXiv preprint, Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
abs/2204.02311. dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Luke Zettlemoyer, and Veselin Stoyanov. 2019.
Bosheng Ding, Chengwei Qin, Linlin Liu, Lidong Bing, Roberta: A robustly optimized BERT pretraining
Shafiq R. Joty, and Boyang Li. 2022. Is GPT-3 a good approach. arXiv:1907.11692, abs/1907.11692.
data annotator? arXiv preprint, abs/2212.10450.
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel,
Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong and Pontus Stenetorp. 2021. Fantastically ordered
Wu, Baobao Chang, Xu Sun, Jingjing Xu, Lei Li, and prompts and where to find them: Overcoming
Zhifang Sui. 2023. A survey for in-context learning. few-shot prompt order sensitivity. arXiv preprint,
arXiv preprint, abs/2301.00234. abs/2104.08786.
Jiale Han, Bo Cheng, and Wei Lu. 2021a. Exploring Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh
task difficulty for few-shot relation extraction. In Pro- Hajishirzi. 2018. Multi-task identification of entities,
ceedings of the 2021 Conference on Empirical Meth- relations, and coreferencefor scientific knowledge
ods in Natural Language Processing, EMNLP 2021, graph construction. In Proc. Conf. Empirical Meth-
Virtual Event / Punta Cana, Dominican Republic, 7- ods Natural Language Process. (EMNLP).
11 November, 2021, pages 2605–2616. Association
for Computational Linguistics. Shengfei Lyu and Huanhuan Chen. 2021. Relation
classification with entity type restriction. In Find-
Jiale Han, Shuai Zhao, Bo Cheng, Shengkun Ma, and ings of the Association for Computational Linguis-
Wei Lu. 2022. Generative prompt tuning for rela- tics: ACL/IJCNLP 2021, Online Event, August 1-6,
tion classification. In Findings of the Association 2021, volume ACL/IJCNLP 2021 of Findings of ACL,
for Computational Linguistics: EMNLP 2022, pages pages 390–395. Association for Computational Lin-
3170–3185, Abu Dhabi, United Arab Emirates. As- guistics.
sociation for Computational Linguistics.
Yubo Ma, Yixin Cao, YongChing Hong, and Aixin Sun.
Xu Han, Tianyu Gao, Yuan Yao, Deming Ye, Zhiyuan 2023. Large language model is not a good few-shot
Liu, and Maosong Sun. 2019. Opennre: An open information extractor, but a good reranker for hard
and extensible toolkit for neural relation extraction. samples! CoRR, abs/2303.08559.
In Proceedings of the 2019 Conference on Empiri- OpenAI. 2022. Chatgpt: Optimizing language mod-
cal Methods in Natural Language Processing and els for dialogue. https://openai.com/blog/
the 9th International Joint Conference on Natural chatgpt/.
Language Processing, EMNLP-IJCNLP 2019, Hong
Kong, China, November 3-7, 2019 - System Demon- OpenAI. 2023a. Gpt-4 technical report. arXiv preprint,
strations, pages 169–174. Association for Computa- abs/2303.08774.
tional Linguistics.
OpenAI. 2023b. Text-davinci-003. https:
Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, //platform.openai.com/docs/models/
and Maosong Sun. 2021b. PTR: prompt tuning text-davinci-003.
195
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Car- 2021, Virtual Event / Punta Cana, Dominican Re-
roll L. Wainwright, Pamela Mishkin, Chong Zhang, public, 16-20 November, 2021, pages 4195–4205.
Sandhini Agarwal, Katarina Slama, Alex Ray, John Association for Computational Linguistics.
Schulman, Jacob Hilton, Fraser Kelton, Luke Miller,
Maddie Simens, Amanda Askell, Peter Welinder, Fuzhao Xue, Aixin Sun, Hao Zhang, and Eng Siong
Paul F. Christiano, Jan Leike, and Ryan Lowe. 2022. Chng. 2021. Gdpnet: Refining latent multi-view
Training language models to follow instructions with graph for relation extraction. In Thirty-Fifth AAAI
human feedback. arXiv preprint, abs/2203.02155. Conference on Artificial Intelligence, AAAI 2021,
Thirty-Third Conference on Innovative Applications
Giovanni Paolini, Ben Athiwaratkun, Jason Krone, of Artificial Intelligence, IAAI 2021, The Eleventh
Jie Ma, Alessandro Achille, Rishita Anubhai, Symposium on Educational Advances in Artificial In-
Cícero Nogueira dos Santos, Bing Xiang, and Ste- telligence, EAAI 2021, Virtual Event, February 2-9,
fano Soatto. 2021. Structured prediction as transla- 2021, pages 14194–14202. AAAI Press.
tion between augmented natural languages. In 9th
International Conference on Learning Representa- Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki
tions, ICLR 2021, Virtual Event, Austria, May 3-7, Takeda, and Yuji Matsumoto. 2020. LUKE: deep con-
2021. OpenReview.net. textualized entity representations with entity-aware
self-attention. In Proceedings of the 2020 Confer-
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, ence on Empirical Methods in Natural Language
Dario Amodei, Ilya Sutskever, et al. 2019. Language Processing, EMNLP 2020, Online, November 16-20,
models are unsupervised multitask learners. OpenAI 2020, pages 6442–6454. Association for Computa-
blog, 1(8):9. tional Linguistics.
Livio Baldini Soares, Nicholas FitzGerald, Jeffrey Ling, Shan Yang, Yongfei Zhang, Guanglin Niu, Qinghua
and Tom Kwiatkowski. 2019. Matching the blanks: Zhao, and Shiliang Pu. 2021. Entity concept-
Distributional similarity for relation learning. In Pro- enhanced few-shot relation extraction. In Proceed-
ceedings of the 57th Conference of the Association ings of the 59th Annual Meeting of the Association for
for Computational Linguistics, ACL 2019, Florence, Computational Linguistics and the 11th International
Italy, July 28- August 2, 2019, Volume 1: Long Pa- Joint Conference on Natural Language Processing,
pers, pages 2895–2905. Association for Computa- ACL/IJCNLP 2021, (Volume 2: Short Papers), Virtual
tional Linguistics. Event, August 1-6, 2021, pages 987–991. Association
for Computational Linguistics.
Hwanjun Song, Minseok Kim, Dongmin Park, and Jae-
Gil Lee. 2020. Learning from noisy labels with Hongbin Ye, Ningyu Zhang, Hui Chen, and Huajun
deep neural networks: A survey. arXiv preprint, Chen. 2022. Generative knowledge graph construc-
abs/2007.08199. tion: A review. In Proceedings of the 2022 Con-
ference on Empirical Methods in Natural Language
George Stoica, Emmanouil Antonios Platanios, and Processing, pages 1–17, Abu Dhabi, United Arab
Barnabás Póczos. 2021. Re-tacred: Addressing short- Emirates. Association for Computational Linguistics.
comings of the TACRED dataset. In Thirty-Fifth
AAAI Conference on Artificial Intelligence, AAAI Kang Min Yoo, Dongju Park, Jaewook Kang, Sang-
2021, Thirty-Third Conference on Innovative Ap- Woo Lee, and Woo-Myoung Park. 2021. Gpt3mix:
plications of Artificial Intelligence, IAAI 2021, The Leveraging large-scale language models for text aug-
Eleventh Symposium on Educational Advances in Ar- mentation. In Findings of the Association for Com-
tificial Intelligence, EAAI 2021, Virtual Event, Febru- putational Linguistics: EMNLP 2021, Virtual Event /
ary 2-9, 2021, pages 13843–13850. AAAI Press. Punta Cana, Dominican Republic, 16-20 November,
2021, pages 2225–2239. Association for Computa-
Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing tional Linguistics.
Huang, and Xipeng Qiu. 2022. Black-box tuning
for language-model-as-a-service. In International Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang,
Conference on Machine Learning, ICML 2022, 17-23 Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu,
July 2022, Baltimore, Maryland, USA, volume 162 of Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan
Proceedings of Machine Learning Research, pages Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng
20841–20855. PMLR. Zhang, Yuxiao Dong, and Jie Tang. 2022. GLM-
130B: an open bilingual pre-trained model. CoRR,
Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas abs/2210.02414.
Scialom, Anthony Hartshorn, Elvis Saravia, Andrew
Poulton, Viktor Kerkez, and Robert Stojnic. 2022. Susan Zhang, Stephen Roller, Naman Goyal, Mikel
Galactica: A large language model for science. arXiv Artetxe, Moya Chen, Shuohui Chen, Christopher
preprint, abs/2211.09085. Dewan, Mona T. Diab, Xian Li, Xi Victoria Lin,
Todor Mihaylov, Myle Ott, Sam Shleifer, Kurt Shus-
Shuohang Wang, Yang Liu, Yichong Xu, Chenguang ter, Daniel Simig, Punit Singh Koura, Anjali Srid-
Zhu, and Michael Zeng. 2021. Want to reduce la- har, Tianlu Wang, and Luke Zettlemoyer. 2022.
beling cost? GPT-3 can help. In Findings of the OPT: open pre-trained transformer language mod-
Association for Computational Linguistics: EMNLP els. CoRR, abs/2205.01068.
196
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, methods with Word-Net’s synonyms and contex-
and Christopher D. Manning. 2017. Position-aware tual word embedding are achieved by nlpaug7 . The
attention and supervised data improve slot filling. In
parameter temperature in OpenAI API is set to 0
Proceedings of the 2017 Conference on Empirical
Methods in Natural Language Processing (EMNLP for precision in ICL and 1 for generating diverse
2017), pages 35–45. RE data. One NVIDIA GeForce RTX 3090 GPU
with 24GB memory is employed to run all experi-
Wenxuan Zhou and Muhao Chen. 2022. An improved
baseline for sentence-level relation extraction. In Pro- ments. We rerun the official code of baselines with
ceedings of the 2nd Conference of the Asia-Pacific their original settings except on the SciERC dataset.
Chapter of the Association for Computational Lin- Due to the vertical domain of SciERC, SciBERT
guistics and the 12th International Joint Conference (Beltagy et al., 2019) is used in TYP Marker and
on Natural Language Processing, AACL/IJCNLP
2022 - Volume 2: Short Papers, Online only, Novem- KnowPrompt for fairness. And for another three
ber 20-23, 2022, pages 161–168. Association for datasets, RoBERTa-large is utilized in TYP Marker
Computational Linguistics. and KnowPrompt.
A Experimental Details B Case Analysis
A.1 Datasets B.1 Wrong Cases from ICL
TACRED3 is a widely used RE dataset. It has 42 From Table 4, we notice that some RE instances
relation labels, including no_relation, meaning no are challenging for LLMs, and there are several
relation is found. TACREV4 includes the same limitations with LLMs: 1) LLMs are not good at
training set and relabeled development and test sets clearly distinguishing the order between head and
from TACRED. RE-TACRED5 is a re-annotated tail entities. 2) The same mention of head and tail
version of TACRED with 40 relations. SciERC6 entities will confuse LLMs. 4) If the distance be-
has seven relation categories and is constructed in tween head and tail entities in the context is long, it
the scientific domain. All datasets are derived from is difficult for LLMs to decide the relation correctly.
their official webs without modification, including 5) Semantically-similar relation label words and
contents and train/test/dev splits. entity mentions will puzzle LLMs because their em-
A.2 Baselines beddings are similar. 6) LLMs cannot afford very
long instances since there is a large label space for
We compare LLMs with recent baseline methods relation extraction. 7) LLMs may mostly fail to ex-
using relatively small models. 1) Normal fine- tract those ambitious or wrongly labeled relations;
tuning methods: SpanBERT (Joshi et al., 2020), those are also challenging for humans. More high-
a span-based PLM; LUKE (Yamada et al., 2020), quality demonstrations may help mitigate these
pre-trained contextualized representations of words issues. And we think it is necessary to develop
and entities based on the bidirectional transformer; step-by-step (Chat-style) approaches with LLMs to
GDPNet, a gaussian dynamic time warping pool- extract limited relations in one stage.
ing net able to select important words for rela-
tion prediction; TYP Marker (Zhou and Chen, B.2 Generated Data from LLMs
2022), fine-tuning with entity typed markers. 2) There are some cases for generated data from GPT-
Generative method: TANL (Paolini et al., 2021), 3.5 in Table 5. Through human checks on 100
framing a structured prediction language task as generated samples per dataset, about 78% gener-
a translation task between augmented natural lan- ated data are corrected labeled and of a high quality
guages. 3) Prompt-tuning methods: KnowPrompt, (85% for TACRED, 82.5% for TACREV, 72% for
knowledge-aware continuous prompt-based tuning RE-TACRED, 75% for SciERC). Meanwhile, we
with synergistic optimization. add generated data and original gold training data
A.3 Implementation Details respectively to 8-shot datasets and fine-tune Know-
Prompt, we evaluate the quality of generated data
Generated data with existing training data is then as shown in Table 3. We observe that labeled data
evaluated on KnowPrompt. Data augmentation generated by GPT-3.5 are mostly correct. As for
3
https://nlp.stanford.edu/projects/tacred/ TACRED and TACREV, generated data achieve
4
https://github.com/DFKI-NLP/tacrev more improvements than gold labeled data. Since
5
https://github.com/gstoica27/Re-TACRED
6 7
http://nlp.cs.washington.edu/sciIE/ https://github.com/makcedward/nlpaug
197
TACRED TACREV RE-TACRED SciERC
8-shot Dataset generated gold generated gold generated gold generated gold
add 0-shot 29.35 29.35 29.77 29.77 56.05 56.05 45.80 45.80
add 8-shot 31.63 30.73 34.30 33.16 59.85 60.92 48.30 57.08
add 16-shot 34.78 31.88 36.33 33.49 59.59 61.30 58.62 65.15
add 32-shot 36.45 33.35 38.19 33.98 60.06 64.65 57.70 72.11
add 48-shot 37.89 33.97 38.80 35.06 62.67 65.56 51.64 74.29
add 64-shot 36.67 34.36 42.61 35.57 61.07 67.28 54.52 75.36
add 72-shot 35.69 34.58 41.72 35.96 59.09 67.43 49.59 75.87
Table 3: Micro F1 (%) of KnowPrompt after adding labeled data generated by GPT-3.5 or gold labeled data to
8-shot datasets.
198
Dataset Case Gold Relation In-context Learning
Context: And strangely enough , Cain’s short , three-year org:top_members/employees per:employee_of
TACRED tenure at the NRA is evidently the only period in his
decades-long career during which he ’s alleged to have
been a sexual predator.
Head Type: ORGANIZATION. Head Entity: NRA.
Tail Type: PERSON. Tail Type: Cain
Context: "I learn from students and I challenge them," per:parents per:alternate_names
says Heloise, 58, who took over the family hints business
when her mother, also named Heloise, died in 1977.
Head Type: PERSON. Head Entity: Heloise.
Tail Type: PERSON. Tail Entity: Heloise.
Context: Anna Mae Pictou Aquash, a Mi ‘ kmaq Indian per:country_of_birth per:countries_of_residence
TACREV from Canada, was brutally murdered in 1975.
Head Type: PERSON. Head Entity: Anna Mae Pictou
Aquash.
Tail Type: COUNTRY. Tail Entity: Canada.
Context: Messina Denaro has been trying to impose his no_relation per:cities_of_residence
power in Palermo, the Sicilian capital, and become the
new head of the Sicilian Mafia, weakened by the arrest
of Provenzano in April 2006.
Head Type: PERSON. Head Entity: his.
Tail Type: CITY. Tail Entity: Palermo.
Context: They say Vladimir Ladyzhenskiy died late Sat- per:identity per:date_of_death
RE-TACRED urday during the contest in southern Finland, while his
Finnish rival Timo Kaukonen was rushed to a hospital.
Head Type: PERSON. Head Entity: Vladimir Ladyzhen-
skiy.
Tail Type: PERSON. Tail Entity: his.
President of the Central American Parliament (Parlacen) org:top_members/employees per:title
Jacinto Suarez said on Monday that the presidents of the
Central American countries did not support Panama ’s
request of withdrawal from the Parlacen.
Head Type: ORGANIZATION. Head Entity: Central
American Parliament.
Tail Type: PERSON. Tail Entity: Jacinto Suarez.
Context: We evaluate across two corpora (conversational CONJUNCTION COMPARE
SciERC telephone speech and broadcast news speech) on both
human transcriptions and speech recognition output.
Head Type: OtherScientificTerm. Head Entity: transcrip-
tions.
Tail Type: OtherScientific Term. Tail Entity: output.
Context: We validate this new method on nine standard HYPONYM-OF PART-OF
person re-identification datasets including two large scale
Market-1501 and CUHK03 datasets and show that we
improve upon the current state-of-the-art methods on all
of them.
Head Type: Material. Head Entity: CUHK03 datasets.
Tail Type: Material. Tail Entity: datasets.
Table 4: Wrong cases predicted by GPT-3.5. The gold relation categories are listed in the third column and the
results predicted by in-context learning are in the fourth column.
199
Dataset Case Corrective Data
Context: The American Cancer Society is headquartered in Atlanta and was Tail Type: PERSON
TACRED founded in 1913 by 15 trained laywomen.
Head Type: ORGANIZATION. Head Entity: American Cancer Society.
Tail Type: ORGANIZATION. Tail Entity: 15 trained laywomen.
Relation: org:founded_by.
Context: Mary Brown, CEO of Brown Corp and renowned businesswoman, Tail Type: TITLE
is a regular speaker at industry conferences and events.
Head Type: PERSON. Head Entity: Mary Brown.
Tail Type: PERSON. Tail Entity: CEO.
Relation: per:title.
Context: Gustav Mahler was born in Kalischt, Bohemia on July 7th, 1860. Tail Type: DATE
TACREV Head Type: PERSON. Head Entity: Gustav Mahler. Relation: per:date_of_birth
Tail Type: PERSON. Tail Entity: 1860.
Relation: per:country_of_birth.
Context: MTN Nigeria, a subsidiary of South African-based MTN Group, has -
begun to list its shares on the Nigerian Stock Exchange.
Head Type: ORGANIZATION. Head Entity: MTN Group.
Tail Type: ORGANIZATION. Tail Entity: MTN Nigeria.
Relation: org:subsidiaries.
Context: Pope John Paul II was a hugely popular Catholic leader who was Tail Type: CITY
RE-TACRED based in the Vatican City for most of his papacy. Reltaion:
Head Type: PERSON. Head Entity: Pope John Paul II. per:cities_of_residence
Tail Type: PERSON. Tail Entity: Vatican City.
Relation: per:countries_of_residence.
Context: French drug manufacturer Sanofi-Aventis dissolved its Chinese -
subsidiary Guangzhou Pharma following a bribery scandal.
Head Type: ORGANIZATION. Head Entity: Sanofi-Aventis.
Tail Type: ORGANIZATION. Tail Entity: Guangzhou Pharma.
Relation: org:dissolved.
Context: The comparison between the two approaches indicates that the neural -
SciERC method produces far better results than the rule-based system.
Head Type: Method. Head Entity: neural method.
Tail Type: Method. Tail Entity: rule-based system.
Relation: COMPARE.
Context: The combination of chromatography and mass spectrometry has Relation: CONJUNCTION
enabled scientists to achieve unparalleled levels of proteome analysis.
Head Type: Method. Head Entity: mass spectrometry.
Tail Type: Method. Tail Entity: chromatography.
Relation: FEATURE-OF.
Table 5: Generated data from LLMs. Errors are bold in the second column and corrected in the third column.
200