Dataset Generation for API Function Calls: Leveraging Google Gemini for Accuracy

cover
4 Apr 2025

Abstract and 1. Introduction

2 Related works

3 Methodology and 3.1 Causal language model as a classification model

3.2 Functional token

3.3 Dataset collection

3.4 Model development and training

4 Experiments and 4.1 Android function calls

4.2 Extension to Vehicle, Yelp, and DoorDash function sets

4.3 Full and partial training datasets and 4.4 Full training and LoRA training

4.5 Parallel and nested function call and 4.6 Weighted loss function for special tokens

5 Discussion and future works and References

Appendix

A.1 Android function examples

A.2 Vehicle function examples

3.3 Dataset collection

This segment outlines our methodology for assembling datasets of superior quality for the phases of training, validation, and testing. It also describes the organized process we utilized to arrange the dataset for efficient training purposes.

API Collection As an example, we start with Android APIs. Our selection criteria encompass usability, usage frequency, and the complexity of technical implementation. We ultimately gather 20 Android APIs and organize them into three separate categories, ensuring that each function can be realistically executed on devices through Android app development, provided the developer possesses the necessary system permissions. Additionally, we also compile APIs available in vehicles. More examples can be found in the Appendix.

  1. Android system API This category includes APIs for system-level functions essential for basic mobile operations, such as making calls, texting, setting alarms, modifying screen brightness, creating calendar entries, managing Bluetooth, enabling do-not-disturb mode, and taking photos. We exclude highly sensitive tasks like accessing system state information or changing accessibility settings.

  2. Android App API Our research examines APIs from pre-installed Google apps on Android devices, such as YouTube, Google Chrome, Gmail, and Google Maps. We explore functionalities like accessing trending news, retrieving weather updates, searching for YouTube content, and map navigation.

  3. Android smart device management API Our focus extends to the Google Home ecosystem, which comprises a wide range of smart home devices with significant market presence. Our aim is to improve smart device management via APIs, covering functions like adjusting a Nest Thermostat, managing media playback on a Google Nest device, and controlling door locks using the Google Home App.

Dataset generation Our approach is depicted in Figure (3), showcasing the steps involved in assembling the dataset. The creation of the dataset involves three key phases: (1) generating relevant queries and their associated function call arguments; (2) developing irrelevant queries accompanied by suitable function bodies; and (3) implementing binary verification support through Google Gemini.

  1. Google Gemini Generated Query and Function Call Creating a high-quality dataset hinges on formulating well-defined queries and accurate function call arguments. Our strategy emphasizes generating positive queries that a single API can resolve. With a query and predetermined API descriptions in hand, we utilize a subsequent Google Gemini API call to produce the required function call arguments.

Dataset Verification Despite the advanced capabilities of large language models such as OpenAI’s GPT-4 and Google’s Gemini, there remains a noticeable rate of errors, particularly in the generation of function call arguments. These errors may manifest as missing arguments, incorrect argument types, or misinterpretations of the intended query. To mitigate these shortcomings, we have introduced a verification mechanism. This system allows Google Gemini to evaluate the completeness and accuracy of its generated function calls, and should the output be found lacking, it initiates a regeneration process.

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Authors:

(1) Wei Chen, Stanford University, with equal contribution and a corresponding author {weichen6}@stanford.edu;

(2) Zhiyuan Li, Stanford University and a corresponding author {zhiyuan8}@stanford.edu.