Page 43 - 2024S
P. 43
36 UEC Int’l Mini-Conference No.52
service, given that they currently support such
modifications. However, this dataset can also
be adapted for training local language models,
which can then be integrated into the system.
The fine-tuning process ensures that the model
is capable of interpreting nuanced textual cues
and accurately mapping them to specific odor
categories, enhancing the olfactory dimension of
multimedia consumption.
4 Model Comparison
To determine the most suitable model for our Figure 4: Accuracy plot by models.
system, we conducted an extensive evaluation by
benchmarking several state-of-the-art language
models. We aimed to compare their perfor-
mance in a contextual understanding classifica-
tion task, focusing on their accuracy, cost, and
response time.
4.1 Experiment Description
We selected five models from leading AI com-
panies such as OpenAI, Anthropic, and Google.
Specifically, we tested gpt-3.5-turbo and fine-
tuned gpt-3.5-turbo from OpenAI; claude-3
haiku from Anthropic; and fine-tuned gemini-
1.0-pro and gemini-1.5-flash from Google. Our
evaluation dataset consisted of 700 text-odor
pairs, with 50 samples for each odor class to en-
sure a balanced and fair assessment. For each Figure 5: Cost plot by models
model, we recorded predictions, contextual anal-
ysis, token usage, and elapsed time, enabling us
to compare their accuracy, cost-efficiency, and standing.
speed.
Figure 5 presents the average cost per API
request, calculated based on token usage and
4.2 Results
pricing from each service provider. While
Figure 4 illustrates the accuracy results for the most advanced models offered superior
each model. Fine-tuned models consistently accuracy, they were also more expensive. The
outperformed their non-fine-tuned counter- fine-tuned gpt-3.5-turbo model emerged as
parts, demonstrating the significant benefits of a cost-effective choice, being 24% cheaper
domain-specific training. The fine-tuned gpt- than the fine-tuned gemini-1.0-pro model at a
3.5-turbo and gemini-1.0-pro models achieved comparable accuracy level. This cost advantage
the highest accuracy rates, both surpassing becomes more pronounced with higher usage,
90%. Notably, the fine-tuned gpt-3.5-turbo making gpt-3.5-turbo a more practical option
model exhibited a 34% improvement over its for large-scale applications.
original version, underscoring the effectiveness
of fine-tuning in enhancing contextual under- As shown in Figure 6, the fine-tuned gpt-3.5-