Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

IPBench/IPBench

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IPBench

🌐 Homepage | 🤗 Dataset | 🤗 Paper | 📖 arXiv | GitHub

This repo contains the evaluation code for the paper "IPBench: Benchmarking the knowledge of Large Language Models in Intellectual Property"

🔔News

Introduction

Intellectual property, especially patents, shares similarities with academic papers in that it encapsulates the essence of domain knowledge across various technical fields. However, it is also governed by the intellectual property legal frameworks of different countries and regions. As such, it carries technical, legal, and economic significance, and is closely connected to real-world intellectual property services. In particular, intellectual property data is a rich, multi-modal data type with immense potential for content mining and analysis. Focusing on the field of intellectual property, we propose a comprehensive four-level IP task taxonomy based on the DOK model. Building on this taxonomy, we developed IPBench, a large language model benchmark consisting of 10,374 data instances across 20 tasks and covering 8 types of IP mechanisms. Compared to existing related benchmarks, IPBench features the largest data scale and the most comprehensive task coverage, spanning technical and legal tasks as well as understanding, reasoning, classification, and generation tasks.

introduction

Dataset Creation

To bridge the gap between real-world demands and the application of LLMs in the IP field, we introduce the first comprehensive IP task taxonomy. Our taxonomy is based on Webb's Depth of Knowledge (DOK) Theory and is extended to include four hierarchical levels: Information Processing, Logical Reasoning, Discriminant Evaluation, and Creative Generation. It includes an evaluation of models' intrinsic knowledge of IP, along with a detailed analysis of IP text from both point-wise and pairwise perspectives, covering technical and legal aspects.

Building on this taxonomy, we develop IPBench, the first comprehensive Intellectual Property Benchmark for LLMs, consisting of 10,374 data points across 20 tasks aimed at evaluating the knowledge and capabilities of LLMs in real-world IP applications.

This holistic evaluation enables us to gain a hierarchical deep insight into LLMs, assessing their capabilities in in-domain memory, understanding, reasoning, discrimination, and creation across different IP mechanisms. Due to the legal nature of the IP field, there are regional differences between countries. Our IPBench is constrained within the legal frameworks of the United States and mainland China, making it a bilingual benchmark.

For more detailed information, please refer to our paper and Hugging Face datasets:

Installation

pip install -r requirements.txt

Inference

We provide the inference code using either vLLM or the OpenAI API in the src folder, along with corresponding run scripts in the scripts folder.

sh inference.sh

sh inference-api.sh

sh inference-fewshot.sh

sh inference-cot.sh

Evaluation

We provide separate evaluation code for MCQA, Classification, and Generation tasks in the eval folder, along with corresponding run scripts in the scripts folder.

sh eval-mcqa.sh

sh eval-3-5.sh

sh eval-classification.sh

sh eval-generation.sh

Disclaimers

In developing IPBench, all data are collected exclusively from open and publicly available sources. We strictly adhered to all relevant copyright and licensing regulations. Any data originating from websites or platforms that prohibit copying, redistribution, or automated crawling are explicitly excluded from use. Furthermore, we confirm that all data are used solely for academic and research purposes, and not for any commercial applications. We are committed to upholding responsible data usage and transparency in our research practices. Future updates of IPBench will continue to follow the same principles and remain fully open to academic scrutiny and community feedback.

Contact

Qiyao Wang: wangqiyao@mail.dlut.edu.cn

Shiwen Ni: sw.ni@siat.ac.cn

If you have any questions, please feel free to contact us.

Citation

BibTeX:

@article{wang2025ipbench,
  title={IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property},
  author={Wang, Qiyao and Chen, Guhong and Wang, Hongbo and Liu, Huaren and Zhu, Minghui and Qin, Zhifei and Li, Linwei and Yue, Yilin and Wang, Shiqiang and Li, Jiayan and others},
  journal={arXiv preprint arXiv:2504.15524},
  year={2025}
}

Star History

Star History Chart

Morty Proxy This is a proxified and sanitized view of the page, visit original site.