Efficient Prompt Design for Resource-Constrained Deployment of Local LLMs

Adeseye, Aisvarya; Isoaho, Jouni; Virtanen, Seppo; Tahir, Mohammad

Efficient Prompt Design for Resource-Constrained Deployment of Local LLMs

dc.contributor.author	Adeseye, Aisvarya
dc.contributor.author	Isoaho, Jouni
dc.contributor.author	Virtanen, Seppo
dc.contributor.author	Tahir, Mohammad
dc.contributor.organization	fi=kyberturvallisuusteknologia\|en=Cyber Security Engineering\|
dc.contributor.organization	fi=tietotekniikan laitos\|en=Department of Computing\|
dc.contributor.organization-code	1.2.246.10.2458963.20.85312822902
dc.converis.publication-id	505411162
dc.converis.url	https://research.utu.fi/converis/portal/Publication/505411162
dc.date.accessioned	2026-01-21T13:37:52Z
dc.date.available	2026-01-21T13:37:52Z
dc.description.abstract	<p>The local deployment of Large Language Models (LLMs) is essential for privacy and latency in several domains. However, it faces significant challenges in terms of memory, power, and inference speed, particularly in resource-constrained systems such as Internet of Things (IoT) and edge computing devices. Most existing studies emphasize compression and hardware tuning, while holistic system-level optimization remains incomplete, and the role of prompt design is still underexplored. This study introduces a structured evaluation of prompt engineering strategies designed to enhance resource efficiency and accuracy in local LLMs, applied across three textual analysis tasks: theme extraction, frequency analysis, and impact analysis. Four experimental conditions were compared: Baseline, System Prompt Only (SP), User Prompt Only (UP), and System+User Prompt (SP+UP). Using multiple LLMs ranging from 1 B to 70B parameters, we audited tokens generated, latency, VRAM usage, hallucination rates, and other structural errors. The results show that System Prompts alone substantially reduced computational overhead, whereas User Prompts improved accuracy and task alignment. Their combination yields comprehensive improvements, maximizing both efficiency and reliability. The proposed prompt design enabled smaller LLMs to rival larger ones in efficiency and accuracy, with LLaMA-3.2, 3B with SP+UP reducing VRAM by 96%, latency by 85%, and hallucinations by 83% when compared to the 70B with Baseline. Even LLaMA-3.2, 1B proved to be a viable option, especially when VRAM size is a critical factor.<br></p>
dc.embargo.lift	2027-11-17
dc.identifier.eisbn	979-8-3315-1501-0
dc.identifier.isbn	979-8-3315-1502-7
dc.identifier.olddbid	213192
dc.identifier.oldhandle	10024/196210
dc.identifier.uri	https://www.utupub.fi/handle/11111/54917
dc.identifier.url	https://ieeexplore.ieee.org/document/11231309
dc.identifier.urn	URN:NBN:fi-fe202601217335
dc.language.iso	en
dc.okm.affiliatedauthor	Adeseye, Aisvarya
dc.okm.affiliatedauthor	Isoaho, Jouni
dc.okm.affiliatedauthor	Virtanen, Seppo
dc.okm.affiliatedauthor	Mohammad, Tahir
dc.okm.discipline	113 Computer and information sciences	en_GB
dc.okm.internationalcopublication	not an international co-publication
dc.okm.internationality	International publication
dc.okm.type	A4 Conference Article
dc.publisher.country	United States	en_GB
dc.publisher.country	Yhdysvallat (USA)	fi_FI
dc.publisher.country-code	US
dc.relation.conference	IEEE Nordic Circuits and Systems Conference
dc.relation.doi	10.1109/NorCAS66540.2025.11231309
dc.source.identifier	https://www.utupub.fi/handle/10024/196210
dc.title	Efficient Prompt Design for Resource-Constrained Deployment of Local LLMs
dc.title.book	2025 IEEE Nordic Circuits and Systems Conference (NorCAS)
dc.year.issued	2025

Kokoelmat

Rinnakkaistallenteet

Efficient Prompt Design for Resource-Constrained Deployment of Local LLMs

Tiedostot

Kokoelmat