Skip to content

UpstageAILab/upstage-ai-final-ir2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

59 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Scientific Knowledge Question Answering

IR 2์กฐ

๊น€ํƒœํ•œ ๊น€์†Œํ˜„ ๊น€์ค€ํ˜ธ ์ตœ์žฅ์›
๊น€ํƒœํ•œ ๊น€์†Œํ˜„ ๊น€์ค€ํ˜ธ ์ตœ์žฅ์›
ํŒ€์žฅ, ๋ฆฌ์„œ์น˜, ๋ฐ์ดํ„ฐ์ƒ์„ฑ, ๋ชจ๋ธ๋ง, ํ›„์ฒ˜๋ฆฌ ๋ฆฌ์„œ์น˜, ๋ฐ์ดํ„ฐ์ƒ์„ฑ, ๋ชจ๋ธ๋ง, ํ›„์ฒ˜๋ฆฌ ๋ฆฌ์„œ์น˜, ๋ฐ์ดํ„ฐ์ƒ์„ฑ, ๋ชจ๋ธ๋ง, ํ›„์ฒ˜๋ฆฌ ๋ฆฌ์„œ์น˜, ๋ฐ์ดํ„ฐ์ƒ์„ฑ, ๋ชจ๋ธ๋ง, ํ›„์ฒ˜๋ฆฌ

0. Overview

Environment

  • AMD Ryzen Threadripper 3960X 24-Core Processor
  • NVIDIA GeForce RTX 3090
  • CUDA Version 12.2

Requirements

  • openai==1.7.2
  • elasticsearch==8.8.0
  • sentence_transformers==2.2.2
  • wandb
  • gemini

1. Competiton Info

Overview

  • ๊ณผํ•™ ์ง€์‹ ์งˆ์˜ ์‘๋‹ต ์‹œ์Šคํ…œ ๊ตฌ์ถ•

  • LLM์€ ์ตœ๊ทผ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ Hallucination๊ณผ Knolwedge Cut-off ํ˜„์ƒ์ด ์žˆ๋‹ค.

  • ์œ„ ๋‹จ์ ์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ RAG (Retrieval Augmented Generation)๋ฅผ ๋„์ž…ํ•œ๋‹ค.

  • LLM์˜ ๋“ฑ์žฅ ์ดํ›„ ์—ฌ๋Ÿฌ ์‚ฐ์—… ๋ถ„์•ผ์—์„œ ์ง€์‹์„ ๋‹ค๋ฃจ๋Š” ์—…๋ฌด๋“ค์ด ์ ์  ๊ณ ๋„ํ™”๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ •๋ณด๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด ๊ฒ€์ƒ‰์—”์ง„์˜ ์ž…๋ ฅ์ฐฝ์— ํ‚ค์›Œ๋“œ๋ฅผ ์ž…๋ ฅํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜๊ณ  ์›ํ•˜๋Š” ์ •๋ณด๊ฐ€ ์—†์œผ๋ฉด ๋‹ค๋ฅธ ํ‚ค์›Œ๋“œ๋กœ ๋‹ค์‹œ ๊ฒ€์ƒ‰ํ•˜๊ธฐ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๋ฒˆ๊ฑฐ๋กœ์šด ๊ณผ์ •์„ ์ด์ œ ๋”์ด์ƒ ์ž์ฃผ ํ•  ํ•„์š”๊ฐ€ ์—†์–ด์กŒ์Šต๋‹ˆ๋‹ค. ์ด์ œ LLMํ•œํ…Œ ๋ฌผ์–ด๋ณด๋ฉด ์งˆ๋ฌธ์˜ ์˜๋„๊นŒ์ง€ ํŒŒ์•…ํ•ด์„œ ํ•„์š”ํ•œ ๋‚ด์šฉ๋งŒ ์ž˜ ์ •๋ฆฌํ•ด์„œ ์•Œ๋ ค ์ค๋‹ˆ๋‹ค.
    image

  • ๊ทธ๋ ‡์ง€๋งŒ LLM์ด ๊ฐ€์ง„ ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ๋จผ์ €, ์ •๋ณด๋ผ๋Š” ๊ฒƒ์€ ์˜๋ฏธ๋‚˜ ๊ฐ€์น˜๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ๊ณ„์† ๋ณ€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์ด ์ด๋ฅผ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํ•™์Šตํ•˜๊ธฐ ํž˜๋“ค๊ณ  ์ด ๋•Œ๋ฌธ์— ์•„๋ž˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ knowledge cutoff ๊ฐ€ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
    image

  • ๊ทธ๋ฆฌ๊ณ  LLM์ด ์•Œ๋ ค์ฃผ๋Š” ์ง€์‹์ด ํ•ญ์ƒ ์‚ฌ์‹ค์— ๊ธฐ๋ฐ˜ํ•œ ๊ฒƒ์ด ์•„๋‹Œ ๊ฒฝ์šฐ๊ฐ€ ์ข…์ข… ์žˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ํŠน์ • ๋„๋ฉ”์ธ์ด๋‚˜ ๋ฌธ์ œ ์˜์—ญ์€ ๋งค์šฐ ์‹ฌ๊ฐํ•œ ๊ฑฐ์ง“ ์ •๋ณด๋“ค์„ ์ƒ์„ฑํ•ด ๋‚ด๊ณค ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ์˜ˆ์‹œ์—์„œ ์ถ”์ฒœํ•˜๋Š” ๋ง›์ง‘๋“ค์€ ๋ชจ๋‘ ์‹ค์žฌํ•˜์ง€ ์•Š๋Š” ์žฅ์†Œ๋“ค์ž…๋‹ˆ๋‹ค.
    image

  • ์ด๋Ÿฌํ•œ ํ™˜๊ฐ ํ˜„์ƒ์€ ๋ฉ”ํƒ€์ธ์ง€๋ฅผ ํ•™์Šตํ•˜์ง€ ์•Š์€ LLM์˜ ๊ทผ๋ณธ์ ์ธ ํ•œ๊ณ„๋ผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ์€ ํ•™์Šต ๊ณผ์ •์—์„œ ์ •๋ณด๋ฅผ ์••์ถ•ํ•ด์„œ ์ €์žฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •๋ณด์˜ ์†์‹ค์ด ๋ฐœ์ƒํ•  ์ˆ˜๋ฐ–์— ์—†๊ณ , ์ด ๋•Œ๋ฌธ์— ํŠน์ • ์ž…๋ ฅ ์กฐ๊ฑด์— ๋Œ€ํ•ด์„œ๋Š” ์‚ฌ์‹ค ์—ฌ๋ถ€๋ณด๋‹ค๋Š” ์ง€์‹๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๊ตญ์†Œ์ ์ธ ํŒจํ„ด์ด ๋” ํฐ ์˜ํ–ฅ์„ ์ฃผ๋ฉด์„œ ๋‹ต๋ณ€์ด ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” RAG(Retrieval Augmented Generation) ๊ธฐ์ˆ ์ด ํ•„์ˆ˜์ž…๋‹ˆ๋‹ค. RAG๋Š” ์งˆ๋ฌธ์— ์ ํ•ฉํ•œ ๋ ˆํผ๋Ÿฐ์Šค ์ถ”์ถœ์„ ์œ„ํ•ด ๊ฒ€์ƒ‰์—”์ง„์„ ํ™œ์šฉํ•˜๊ณ  ๋‹ต๋ณ€ ์ƒ์„ฑ์„ ์œ„ํ•ด LLM(Large Language Model)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ LLM์€ ์Šค์Šค๋กœ ์•Œ๊ณ  ์žˆ๋Š” ์ง€์‹์„ ์ถœ๋ ฅํ•˜๊ธฐ๋ณด๋‹ค๋Š” ์–ธ์–ด ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์— ๋ฐฉ์ ์„ ๋‘ก๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์‚ฌ์‹ค์— ๊ธฐ๋ฐ˜ํ•œ ์ง€์‹ ์ •๋ณด๋ฅผ ํ† ๋Œ€๋กœ ์งˆ๋ฌธ์— ๋‹ต์„ ํ•˜๊ณ  ์ถœ์ฒ˜ ์ •๋ณด๋„ ๊ฐ™์ด ์ค„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž๋Š” ํ›จ์”ฌ ๋” ์•ˆ์‹ฌํ•˜๊ณ  ์ •๋ณด๋ฅผ ์†Œ๋น„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ด ๋ฐ”๋กœ RAG ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

    image

  • RAG๋Š” ์งˆ๋ฌธ์— ์ ํ•ฉํ•œ ๋ ˆํผ๋Ÿฐ์Šค ์ถ”์ถœ์„ ์œ„ํ•ด ๊ฒ€์ƒ‰์—”์ง„์„ ํ™œ์šฉํ•˜๊ณ  ๋‹ต๋ณ€ ์ƒ์„ฑ์„ ์œ„ํ•ด LLM(Large Language Model)์„ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ LLM์€ ์Šค์Šค๋กœ ์•Œ๊ณ  ์žˆ๋Š” ์ง€์‹์„ ์ถœ๋ ฅํ•˜๊ธฐ๋ณด๋‹ค๋Š” ์–ธ์–ด ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์— ๋ฐฉ์ ์„ ๋‘ก๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์‚ฌ์‹ค์— ๊ธฐ๋ฐ˜ํ•œ ์ง€์‹ ์ •๋ณด๋ฅผ ํ† ๋Œ€๋กœ ์งˆ๋ฌธ์— ๋‹ต์„ ํ•˜๊ณ  ์ถœ์ฒ˜ ์ •๋ณด๋„ ๊ฐ™์ด ์ค„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž๋Š” ํ›จ์”ฌ ๋” ์•ˆ์‹ฌํ•˜๊ณ  ์ •๋ณด๋ฅผ ์†Œ๋น„ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • ์ด๋ฒˆ ๋Œ€ํšŒ์—์„œ๋Š” ๊ณผํ•™ ์ƒ์‹์„ ์งˆ๋ฌธํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ฐ€์ •ํ•˜๊ณ  ๊ณผํ•™ ์ƒ์‹ ๋ฌธ์„œ 4200์—ฌ๊ฐœ๋ฅผ ๋ฏธ๋ฆฌ ๊ฒ€์ƒ‰์—”์ง„์— ์ƒ‰์ธํ•ด ๋‘ก๋‹ˆ๋‹ค. ๋Œ€ํ™” ๋ฉ”์‹œ์ง€ ๋˜๋Š” ์งˆ๋ฌธ์ด ๋“ค์–ด์˜ค๋ฉด ๊ณผํ•™ ์ƒ์‹์— ๋Œ€ํ•œ ์งˆ๋ฌธ ์˜๋„์ธ์ง€ ๊ทธ๋ ‡์ง€ ์•Š์€ ์ง€ ํŒ๋‹จ ํ›„์— ๊ณผํ•™ ์ƒ์‹ ์งˆ๋ฌธ์ด๋ผ๋ฉด ๊ฒ€์ƒ‰์—”์ง„์œผ๋กœ๋ถ€ํ„ฐ ์ ํ•ฉํ•œ ๋ฌธ์„œ๋“ค์„ ์ถ”์ถœํ•˜๊ณ  ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.ย  ๋งŒ์ผ ๊ณผํ•™ ์ƒ์‹ ์ด์™ธ์˜ ์งˆ๋ฌธ์ด๋ผ๋ฉด ๊ฒ€์ƒ‰์—”์ง„์„ ํ™œ์šฉํ•  ํ•„์š” ์—†์ด ์ ์ ˆํ•œ ๋‹ต์„ ๋ฐ”๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  • ๋งˆ์ง€๋ง‰์œผ๋กœ, ๋ณธ ํ”„๋กœ์ ํŠธ๋Š” ๋ชจ๋ธ๋ง์— ์ค‘์ ์„ ๋‘” ๋Œ€ํšŒ๊ฐ€ ์•„๋‹ˆ๋ผ RAG(Retrieval Augmented Generation) ์‹œ์Šคํ…œ์˜ ๊ฐœ๋ฐœ์— ์ง‘์ค‘ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋Œ€ํšŒ๋Š” ์—ฌ๋Ÿฌ ๋ชจ๋ธ๊ณผ ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์•™์ƒ๋ธ”์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ผ๋ฐ˜์ ์ธ ๋ชจ๋ธ๋ง ๋Œ€ํšŒ์™€๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ๋Œ€์‹ ์— ๊ฒ€์ƒ‰ ์—”์ง„์ด ์˜ฌ๋ฐ”๋ฅธ ๋ฌธ์„œ๋ฅผ ์ƒ‰์ธํ–ˆ๋Š”์ง€, ๊ทธ๋ฆฌ๊ณ  ์ƒ์„ฑ๋œ ๋‹ต๋ณ€์ด ์ ์ ˆํ•œ์ง€ ์ง์ ‘ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•œ ๋Œ€ํšŒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ฐธ๊ฐ€์ž๋“ค์€ ์ž‘์€ ๊ทœ๋ชจ์˜ ํ† ์ด ๋ฐ์ดํ„ฐ์…‹(10๊ฐœ ๋ฏธ๋งŒ)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ดˆ๊ธฐ ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ ํ›„์— ์ „์ฒด ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ํ‰๊ฐ€๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ๋กœ RAG ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•  ๋•Œ์—๋„ ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์ ์šฉ๋˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ์‹คํ—˜์„ ๋”์šฑ ํšจ์œจ์ ์œผ๋กœ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฒˆ ๋Œ€ํšŒ๋Š” 2์ฃผ๊ฐ„ ์ง„ํ–‰๋˜๋ฉฐ, ํ•˜๋ฃจ์— ์ œ์ถœํ•  ์ˆ˜ ์žˆ๋Š” ํšŸ์ˆ˜๊ฐ€ 5ํšŒ๋กœ ์ œํ•œ๋ฉ๋‹ˆ๋‹ค.

Timeline

  • April 8, 2024 - Start Date: Studying Lectures
  • April 11, 2024 - First Mentoring
  • April 15, 2024 - Starting Project Date
  • April 18, 2024 - Second Mentoring
  • April 23, 2024 - Third Mentoring
  • May 2, 2024 - Final submission deadline

2. Components

Directory

โ”œโ”€โ”€ code
โ”œโ”€โ”€ configs
โ”œโ”€โ”€ datasets
โ”œโ”€โ”€ logger
โ”œโ”€โ”€ models
โ”œโ”€โ”€ docs
โ”‚   โ”œโ”€โ”€ pdf
โ”‚   โ”‚   โ””โ”€โ”€ (Template) [ํŒจ์ŠคํŠธ์บ ํผ์Šค] Upstage AI Lab 1๊ธฐ_๊ทธ๋ฃน ์Šคํ„ฐ๋”” .pptx
โ”‚   โ””โ”€โ”€ paper
โ””โ”€โ”€ data
โ”‚   โ”œโ”€โ”€ documents.jsonl
โ”‚   โ”œโ”€โ”€ eval.jsonl
โ”‚   โ”œโ”€โ”€ generated_questions_single_train.csv
โ”‚   โ”œโ”€โ”€ generated_questions_single_val.csv
โ”‚   โ”œโ”€โ”€ generated_questions_single_test.csv
โ”‚   โ”œโ”€โ”€ generated_questions_multiple_train.csv
โ”‚   โ”œโ”€โ”€ generated_questions_multiple_val.csv
โ”‚   โ””โ”€โ”€ generated_questions_multiple_test.csv
โ”‚   โ””โ”€โ”€ gemini_q_generation_sample.ipynb
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ train.py

3. Data descrption

Dataset overview

  • ๊ณผํ•™ ์ƒ์‹ ๋ฌธ์„œ 4272๊ฐœ
  • ko_ai2_arc__ARC_Challenge์™€ ko_mmlu ๋ฐ์ดํ„ฐ
  • ์ด 63๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์†Œ์Šค (ko_mmlu__human_sexuality__train, ko_mmlu__human_sexuality__test ๋“ฑ์„ ๋ณ„๊ฐœ๋กœ ์นด์šดํŠธ, ๋˜ํ•œ ko_mmlu__human_sexuality__train๊ณผ ko_mmlu__conceptual_physics__train ๋„ ๋ณ„๊ฐœ๋กœ ์นด์šดํŠธ)
  • ํŒŒ์ผ ํฌ๋งท์€ ๊ฐ line์ด json ๋ฐ์ดํ„ฐ์ธ jsonl ํŒŒ์ผ์ž…๋‹ˆ๋‹ค.

ํ•™์Šต ๋ฐ์ดํ„ฐ

{"docid": "42508ee0-c543-4338-878e-d98c6babee66", "src": "ko_mmlu__nutrition__test", "content": "๊ฑด๊ฐ•ํ•œ ์‚ฌ๋žŒ์ด ์—๋„ˆ์ง€ ๊ท ํ˜•์„ ํ‰ํ˜• ์ƒํƒœ๋กœ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์€ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
์—๋„ˆ์ง€ ๊ท ํ˜•์€ ์—๋„ˆ์ง€ ์„ญ์ทจ์™€ ์—๋„ˆ์ง€ ์†Œ๋น„์˜ ์ˆ˜ํ•™์  ๋™๋“ฑ์„ฑ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๊ฑด๊ฐ•ํ•œ ์‚ฌ๋žŒ์€ 1-2์ฃผ์˜ ๊ธฐ๊ฐ„ ๋™์•ˆ ์—๋„ˆ์ง€ ๊ท ํ˜•์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ธฐ๊ฐ„ ๋™์•ˆ์—๋Š” ์˜ฌ๋ฐ”๋ฅธ ์‹๋‹จ๊ณผ ์ ์ ˆํ•œ ์šด๋™์„ ํ†ตํ•ด ์—๋„ˆ์ง€ ์„ญ์ทจ์™€ ์—๋„ˆ์ง€ ์†Œ๋น„๋ฅผ ์กฐ์ ˆํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
์‹๋‹จ์€ ์˜์–‘๊ฐ€ ์žˆ๋Š” ์‹ํ’ˆ์„ ํฌํ•จํ•˜๊ณ , ์ ์ ˆํ•œ ์นผ๋กœ๋ฆฌ๋ฅผ ์„ญ์ทจํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์šด๋™์€ ์—๋„ˆ์ง€ ์†Œ๋น„๋ฅผ ์ด‰์ง„์‹œํ‚ค๊ณ  ๊ทผ์œก์„ ๊ฐ•ํ™”์‹œํ‚ต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์—๋„ˆ์ง€ ๊ท ํ˜•์„ ์œ ์ง€ํ•˜๋ฉด ๊ฑด๊ฐ•์„ ์œ ์ง€ํ•˜๊ณ  ๋น„๋งŒ์ด๋‚˜ ์˜์–‘ ์‹ค์กฐ์™€ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ์˜ˆ๋ฐฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ๊ฑด๊ฐ•ํ•œ ์‚ฌ๋žŒ์€ ์—๋„ˆ์ง€ ๊ท ํ˜•์„ ํ‰ํ˜• ์ƒํƒœ๋กœ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•˜๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด 1-2์ฃผ์˜ ๊ธฐ๊ฐ„ ๋™์•ˆ ์‹๋‹จ๊ณผ ์šด๋™์„ ์กฐ์ ˆํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค."}
{"docid": "7a3e9dc2-2572-4954-82b4-1786e9e48f1f", "src": "ko_ai2_arc__ARC_Challenge__test", "content": "์‚ฐ๊ผญ๋Œ€๊ธฐ์—์„œ๋Š” ์ค‘๋ ฅ์ด ์•„์ฃผ ์•ฝ๊ฐ„ย ๋ณ€ํ•ฉ๋‹ˆ๋‹ค.
์ด๋Š” ๋ฌด๊ฒŒ์— ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ์‚ฐ๊ผญ๋Œ€๊ธฐ์—์„œ๋Š” ๋ฌด๊ฒŒ๊ฐ€ ๊ฐ์†Œํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๊ฐ€์žฅ ๋†’์Šต๋‹ˆ๋‹ค. ์ค‘๋ ฅ์€ ์ง€๊ตฌ์˜ ์งˆ๋Ÿ‰์— ์˜ํ•ด ๊ฒฐ์ •๋˜๋ฉฐ, ์‚ฐ๊ผญ๋Œ€๊ธฐ์—์„œ๋Š” ์ง€๊ตฌ์˜ ์งˆ๋Ÿ‰๊ณผ์˜ ๊ฑฐ๋ฆฌ๊ฐ€ ๋” ๋ฉ€์–ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ค‘๋ ฅ์ด ์•ฝ๊ฐ„ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ, ์‚ฐ๊ผญ๋Œ€๊ธฐ์—์„œ๋Š” ๋ฌด๊ฒŒ๊ฐ€ ๋” ๊ฐ€๋ณ๊ฒŒ ๋А๊ปด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค."}  

ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ
20๊ฐœ์˜ ๋ฉ€ํ‹ฐํ„ด ๋Œ€ํ™”์™€ 20๊ฐœ์˜ ๊ณผํ•™ ์ƒ์‹ ์ด์™ธ์˜ ์ผ์ƒ ๋Œ€ํ™”

{"eval_id": 0, "msg": [{"role": "user", "content": "๋‚˜๋ฌด์˜ ๋ถ„๋ฅ˜์— ๋Œ€ํ•ด ์กฐ์‚ฌํ•ด ๋ณด๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์€?"}]}
{"eval_id": 1, "msg": [{"role": "user", "content": "๊ฐ ๋‚˜๋ผ์—์„œ์˜ ๊ณต๊ต์œก ์ง€์ถœ ํ˜„ํ™ฉ์— ๋Œ€ํ•ด ์•Œ๋ ค์ค˜."}]}
{"eval_id": 2, "msg": [{"role": "user", "content": "๊ธฐ์–ต ์ƒ์‹ค์ฆ ๊ฑธ๋ฆฌ๋ฉด ๋„ˆ๋ฌด ๋ฌด์„ญ๊ฒ ๋‹ค."}, {"role": "assistant", "content": "๋„ค ๋งž์Šต๋‹ˆ๋‹ค."}, {"role": "user", "content": "์–ด๋–ค ์›์ธ ๋•Œ๋ฌธ์— ๋ฐœ์ƒํ•˜๋Š”์ง€ ๊ถ๊ธˆํ•ด."}]}
{"eval_id": 3, "msg": [{"role": "user", "content": "ํ†ตํ•™ ๋ฒ„์Šค์˜ ๊ฐ€์น˜์— ๋Œ€ํ•ด ๋งํ•ด์ค˜."}]}
{"eval_id": 4, "msg": [{"role": "user", "content": "Dmitri Ivanovsky๊ฐ€ ๋ˆ„๊ตฌ์•ผ?"}]}
{"eval_id": 36, "msg": [{"role": "user", "content": "๋‹ˆ๊ฐ€ ๋Œ€๋‹ต์„ ์ž˜ํ•ด์ค˜์„œ ๋„ˆ๋ฌด ์‹ ๋‚˜!"}]}

ํ‰๊ฐ€ ๋ฐฉ๋ฒ•
mAP (mean Average Precision)
mAP๋Š” ์งˆ์˜ N๊ฐœ์— ๋Œ€ํ•œ Average Precision์˜ ํ‰๊ท  ๊ฐ’์„ ๊ตฌํ•˜๊ณ , Average Precision์€ Precision-recall curve์—์„œ ์•„๋ž˜์ชฝ ๋ฉด์ ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ณ„์‚ฐ ๊ณผ์ •์€ ๋„์‹ํ™”ํ•˜๋ฉด ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

image

๊ทธ๋Ÿฐ๋ฐ ์ด๋ฒˆ ๋Œ€ํšŒ์—์„œ๋Š” MAP๋ฅผ ์•ฝ๊ฐ„ ๋ณ€ํ˜•ํ•˜์—ฌ RAG ํ‰๊ฐ€์— ์ ํ•ฉํ•˜๋„๋ก ์‚ด์ง ์ˆ˜์ •ํ•œ ํ˜•ํƒœ์˜ ๋กœ์ง์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋Œ€ํ™” ๋ฉ”์‹œ์ง€๊ฐ€ ๊ณผํ•™ ์ƒ์‹์— ๋Œ€ํ•œ ์งˆ๋ฌธ์ผ ์ˆ˜๋„ ์žˆ๊ณ  ์•„๋‹์ˆ˜๋„ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ณผํ•™ ์ƒ์‹ ์งˆ๋ฌธ์ด ์•„๋‹Œ ๊ฒฝ์šฐ๋Š” ๋ฌธ์„œ๋ฅผ ์ถ”์ถœํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฒ€์ƒ‰์ด ํ•„์š”์—†๋Š” ground truth ํ•ญ๋ชฉ์— ๋Œ€ํ•ด์„œ๋Š” ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ์—†๋Š” ๊ฒฝ์šฐ๋ฅผ 1์ ์œผ๋กœ ์ฃผ๊ณ  ๊ทธ๋ ‡์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๋Š” 0์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๊ฒŒ ๋กœ์ง์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ์ฝ”๋“œ์˜ else ๋ถ€๋ถ„์ด ์ด์— ํ•ด๋‹นํ•˜๊ณ  ๋‚˜๋จธ์ง€ ๋กœ์ง์€ ์›๋ž˜ MAP ๊ณ„์‚ฐ ๋กœ์ง์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค.

def calc_map(gt, pred):ย  ย  
    sum_average_precision = 0ย  ย  
    for j in pred:ย  ย  ย  ย  
        if gt[j["eval_id"]]:ย  ย  ย  ย  ย  ย  
            hit_count = 0ย  ย  ย  ย  ย  ย  
            sum_precision = 0ย  ย  ย  ย  ย  ย  
            for i,docid in enumerate(j["topk"][:3]):ย  ย  ย  ย  ย  ย  ย  ย  
                if docid in gt[j["eval_id"]]:ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  
                    hit_count += 1ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  
                    sum_precision += hit_count/(i+1)ย  ย  ย  ย  ย  ย  
            average_precision = sum_precision / hit_count if hit_count > 0 else 0ย  ย  ย  ย  
        else:ย  ย  ย  ย  ย  ย  
            average_precision = 0 if j["topk"] else 1ย  ย  ย  ย  
        sum_average_precision += average_precisionย  ย  
    return sum_average_precision/len(pred)

EDA

  • ๊ณผํ•™ ์งˆ๋ฌธ์˜ ์ข…๋ฅ˜ src์—์„œ ์ถœ์ฒ˜์ธ ko_mmlu์™€ ko_ai2๋ฅผ ์ œ์™ธํ•˜๊ณ , ๋’ท๋ถ€๋ถ„์˜ train, validation, test๋ฅผ ์ œ์™ธํ•œ ๊ฐ€์šด๋ฐ ๋ถ€๋ถ„์„ ๊ณผํ•™ ์ง€์‹ domain์œผ๋กœ ์„ค์ •ํ•œ๋‹ค. ko_mmlu__conceptual_physics__test ko_ai2_arc__ARC_Challenge__test ์œ„ ๋‘ ์˜ˆ์‹œ์ธ ๊ฒฝ์šฐ conceptual_physics์™€ ARC_Challenge๋กœ ๋งŒ๋“ค์–ด์„œ domain์— ํ• ๋‹นํ•œ๋‹ค. ๋ชจ๋“  domain์„ ๋ชจ์•„๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
'nutrition', 'conceptual_physics', 'ARC_Challenge',
'human_sexuality', 'virology', 'human_aging',
'high_school_biology', 'high_school_physics', 'college_biology',
'computer_security', 'anatomy', 'college_physics',
'medical_genetics', 'electrical_engineering', 'college_medicine',
'college_chemistry', 'astronomy', 'college_computer_science',
'global_facts', 'high_school_chemistry',
'high_school_computer_science'

Data Processing

  • ํ˜„์žฌ documents.jsonl์—๋Š” ์ •๋‹ต์— ํ•ด๋‹นํ•˜๋Š” ๋ฌธ์„œ๋งŒ ์กด์žฌํ•˜๊ณ  ์งˆ๋ฌธ(ํ˜น์€ ์งˆ์˜)๊ฐ€ ์—†๋‹ค.
  • ์ฃผ์–ด์ง„ ๋ฌธ์„œ์— ๋Œ€ํ•œ ์งˆ์˜๊ฐ€ ์—†์œผ๋ฏ€๋กœ ๋‹ค๋ฅธ LLM์„ ํ™œ์šฉํ•˜์—ฌ ์ƒ์„ฑ.
  • ๋ฌด๋ฃŒ API์ธ Google์˜ Gemini๋ฅผ ์ด์šฉํ•ด ์ƒ์„ฑ.

์งˆ๋ฌธ ์ƒ์„ฑ์€ ๋‹ค์Œ์˜ ๊ฐ„๋‹จํ•œ ๋ช…๋ น์–ด๋ฅผ ํ†ตํ•ด์„œ ์ƒ์„ฑํ–ˆ๋‹ค.

f"""{document}

์œ„ ๋ฌธ์„œ์—์„œ ์งˆ๋ฌธ 5๊ฐœ ๋งŒ๋“ค์–ด์ค˜
"""
  • ๋ฉ˜ํ† ๋‹˜์˜ ์กฐ์–ธ์— ๋”ฐ๋ผ์„œ ์งˆ์˜์™€ ์‘๋‹ต์˜ pair๋ฅผ Cosine Embedding Loss๋กœ ์ตœ์ ํ™” ์‹œ๋„.
  • ์ด๋ฅผ ์œ„ํ•ด์„œ๋Š” ์งˆ์˜์™€ ์‘๋‹ต ํŽ˜์–ด๋“ค์„, positive pair์™€ negative pair๋“ค๋กœ ๋งŒ๋“ค์–ด์•ผํ•œ๋‹ค.
  • Positive pair๋Š” ๊ด€๋ จ์ด ์žˆ๋Š” ์งˆ์˜์™€ ์‘๋‹ต ์Œ์ด๋‹ค.
  • Negative pair๋Š” ๊ด€๋ จ์ด ์—†๋Š” ์งˆ์˜์™€ ์‘๋‹ต ์Œ์ด๋‹ค.

e.g. ์งˆ์˜ Q๊ฐ€ "ํƒœ์–‘์˜ ์ง€๋ฆ„์ด ์–ผ๋งˆ์•ผ?" ์ผ ๋•Œ,

๊ด€๋ จ์ด ์žˆ๋Š” ๋ฌธ์„œ๋Š” "ํƒœ์–‘์€ ์ง€๊ตฌ ์ง€๋ฆ„์˜ 109๋ฐฐ์ธ 139๋งŒใŽž, ๋ฌด๊ฒŒ๋Š” ์ง€๊ตฌ๋ณด๋‹ค ๋ฌด๋ ค 33๋งŒ 2,900๋ฐฐ๋‚˜ ๋ฌด๊ฒ์Šต๋‹ˆ๋‹ค. ํƒœ์–‘๊ณ„ ์ „์ฒด ์งˆ๋Ÿ‰์˜ 99.8% ์ด์ƒ์„ ์ฐจ์ง€ํ•˜๊ณ , ํƒœ์–‘๊ณ„์˜ ์ค‘์‹ฌ์— ์œ„์น˜ํ•˜์—ฌ ์ง€๊ตฌ๋ฅผ ํฌํ•จํ•œ 8๊ฐœ ํ–‰์„ฑ๊ณผ ์œ„์„ฑ, ํ˜œ์„ฑ ๋“ฑ์˜ ์šด๋™์„ ์ง€๋ฐฐํ•˜๊ณ  ์žˆ๋Š” ๋ณ„์ž…๋‹ˆ๋‹ค." ์ด๊ณ ,

๊ด€๋ จ์ด ์—†๋Š” ๋ฌธ์„œ๋Š” "์ธ๊ณต์ง€๋Šฅ(AI)์€ ์ปดํ“จํ„ฐ์—์„œ ์Œ์„ฑ ๋ฐ ์ž‘์„ฑ๋œ ์–ธ์–ด๋ฅผ ๋ณด๊ณ  ์ดํ•ดํ•˜๊ณ  ๋ฒˆ์—ญํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๊ณ  ์ถ”์ฒœํ•˜๋Š” ๊ธฐ๋Šฅ์„ ํฌํ•จํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ณ ๊ธ‰ ๊ธฐ๋Šฅ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ผ๋ จ์˜ ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค."๋‹ค.

4. Modeling

Model descrition

Baseline์— ์žˆ๋Š” Elasticsearch๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๊ณ , OpenAI API๋ฅผ ํ™œ์šฉํ•˜์—ฌ"gpt-3.5-turbo-1106" ๋ชจ๋ธ์„ ํ†ตํ•ด ๋‹ต๋ณ€์„ ์ƒ์„ฑ.
Elasticsearch์˜ sparse retrieval๊ณผ dense retrieval์„ ํ†ตํ•ด LLM์ด ์ฐธ์กฐํ•  topk reference ์ƒ์„ฑ.
ํ•ด๋‹น ๋ ˆํผ๋Ÿฐ์Šค์™€ ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง์„ ํ†ตํ•ด์„œ ์ •๋‹ต ์ƒ์„ฑ.

Sparse Retrieval Elasticsearch์˜ ์—ญ์ƒ‰์ธ์„ ์‚ฌ์šฉํ•œ๋‹ค. ์—ญ์ƒ‰์ธ์˜ ๊ธฐ๋ณธ ๋žญํ‚น ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ BM25๋‹ค.
์—ญ์ƒ‰์ธ์— ์‚ฌ์šฉํ•œ ๋‹จ์–ด, token์€ sentence_transformers์˜ "snunlp/KR-SBERT-V40K-klueNLI-augSTS"์˜ encode๋กœ ์ƒ์„ฑํ•œ๋‹ค.
soynlp๋ฅผ ํ™œ์šฉํ•˜์—ฌ NER์„ sentence_transformer์— vocab์„ ๋“ฑ๋กํ•˜๋ คํ–ˆ์œผ๋‚˜ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š์•˜๋‹ค.

Dense Retrieval

  • KLUE-RoBERTa Fine-Tuning with Cosine Embedding Loss ํ•œ๊ตญ์–ด๋กœ ํ•™์Šต๋œ KLUE-RoBERTa๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•œ ๋‹ค์Œ ํ•ด๋‹นํ•˜๋Š” token embedding์œผ๋กœ dense retrieval ์ˆ˜ํ–‰์„ ์‹œ๋„ํ–ˆ์œผ๋‚˜ Elasticsearch์™€ ๊ฒฐํ•ฉํ•˜์ง€ ๋ชปํ•ด์„œ ์‚ฌ์šฉํ•˜์ง€ ๋ชปํ–ˆ๋‹ค.
  • sentence_transformers์˜ "snunlp/KR-SBERT-V40K-klueNLI-augSTS" ์‚ฌ์šฉ
    ๋ฒ ์ด์Šค๋ผ์ธ ์ฝ”๋“œ์— ์žˆ๋Š” sentence_transformers์˜ "snunlp/KR-SBERT-V40K-klueNLI-augSTS"๋ฅผ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์—ฌ dense retvieval ์ˆ˜ํ–‰.

Prompt Engineering

  • persona_qa์™€ persona_function_calling์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ณ€๊ฒฝํ•ด์„œ ๊ณผํ•™ ๋ฌธ์„œ์ธ์ง€ ์•„๋‹Œ์ง€๋ฅผ LLM์ด ํŒ๋‹จํ•ด์„œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
๊ฒฐ๊ณผ๋Š” json ํ˜•ํƒœ๋กœ ์ƒ์„ฑํ•˜๊ณ , 'is_science' ํ•„๋“œ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์งˆ๋ฌธ์ด ๊ณผํ•™ ์ง€์‹์— ๊ด€๋ จ๋œ ๋‚ด์šฉ์ด๋ฉด true๋ฅผ, ๊ณผํ•™ ์ง€์‹์— ๊ด€๋ จ๋œ ๋‚ด์šฉ์ด ์•„๋‹ˆ๋ผ๋ฉด false๋ฅผ value๋กœ ๋งŒ๋“ ๋‹ค. ๋‹ต๋ณ€ ํ•„๋“œ์˜ ์ด๋ฆ„์€ 'answer'๋กœ ํ†ต์ผํ•œ๋‹ค.

LoRA (Low-Rank Adaptation of Large Language Models) ์ ์šฉ

  • LoRA๋ฅผ "snunlp/KR-SBERT-V40K-klueNLI-augSTS"์— ์ ์šฉํ•ด์„œ ํŒŒ์ธํŠœ๋‹ํ•ด์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋„๋ชจํ–ˆ๋‹ค.
  • Loss๋Š” ์•„๋ž˜ ์‹์˜ Cosine Embedding Loss๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ด€๋ จ์žˆ๋Š” Q์™€ D (document)๋Š” ๊ฐ€๊น๊ฒŒ, ๊ด€๋ จ์—†๋Š” Q์™€ D๋Š” ์„œ๋กœ ๋ฉ€๋„๋ก ํ•™์Šตํ•œ๋‹ค.

image

Modeling Process

image

5. Result

Leader Board

  • Public Leaderboard

image

  • Final Leaderboard

image

Presentation

PPT ํŒŒ์ผ

Retrospective

  • ๋ฉ˜ํ† ๋‹˜์ด ์งˆ์˜์™€ ์‘๋‹ต์— ๋Œ€ํ•œ hard negative pairs๋ฅผ ์ƒ์„ฑํ•˜๋ผ๊ณ  ์กฐ์–ธํ•ด์ฃผ์…จ๋‹ค. ์ด๋Š” ๋˜‘๊ฐ™์€ ๋ฌผ๋ฆฌํ•™์ด๋ผ๋„ ์ „์ž๊ธฐํ•™๊ณผ ์–‘์ž์—ญํ•™์ด ์„œ๋กœ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ์ปค๋‹ค๋ž—๊ฒŒ๋Š” ๊ฐ™์€ ๋„๋ฉ”์ธ์ผ์ง€๋ผ๋„ ์„ธ๋ถ€ ๋‚ด์šฉ์„ ๋‹ฌ๋ฆฌํ•˜์—ฌ ๋ณด๋‹ค ์„ฌ์„ธํ•˜๊ฒŒ ํ•™์Šต์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋Š” ์‚ฌ๋žŒ์ด ์ˆ˜์ž‘์—…์œผ๋กœ ๋งค์นญ์„ ํ•ด์•ผํ•˜๊ณ  ๋„๋ฉ”์ธ ์ง€์‹์ด ๋งŽ์ด ์š”๊ตฌ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์—†์—ˆ๋‹ค.

  • ์•„๋ž˜์™€ ๊ฐ™์ด ์• ๋งคํ•œ ์ผ€์ด์Šค์— ๋Œ€ํ•œ ์ฒ˜๋ฆฌ๊ฐ€ ๋ถ€์กฑํ–ˆ๋‹ค.

Test 3
Question: [{'role': 'user', 'content': 'ํ†ตํ•™ ๋ฒ„์Šค์˜ ๊ฐ€์น˜์— ๋Œ€ํ•ด ๋งํ•ด์ค˜.'}]
Answer: ํ†ตํ•™ ๋ฒ„์Šค๋Š” ํ•™์ƒ๋“ค์—๊ฒŒ ๋งค์šฐ ์ค‘์š”ํ•œ ๊ฐ€์น˜๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ•™๊ต ๋ฒ„์Šค๋Š” ๋งŽ์€ ํ•™์ƒ๋“ค์„ ์šด์†กํ•˜๊ณ , ํ•™๊ต๋กœ ํ–ฅํ•˜๋Š” ๊ธธ์— ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์„ ์•ˆ์ „ํ•˜๊ฒŒ ํƒœ์›Œ์ค๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ•™๊ต ๋ฒ„์Šค๋Š” ํ•™๊ต์˜ ๊ต์œก ํ™œ๋™ ๋ฐ ํ•™์ƒ๋“ค์˜ ํŽธ์˜๋ฅผ ์ฆ์ง„์‹œํ‚ค๋ฉฐ, ํ•™๊ต๋กœ ํ–ฅํ•˜๋Š” ๊ธธ์— ๋งŽ์€ ์‚ฌ๋žŒ๋“ค์˜ ๊ตํ†ต๋Ÿ‰์„ ์ค„์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ•™๊ต ๋ฒ„์Šค๋Š” ํ•™๊ต์˜ ๊ตํ†ต ์ฒด์ฆ์„ ํ•ด๊ฒฐํ•˜๊ณ , ํ•™์ƒ๋“ค์—๊ฒŒ ํŽธ๋ฆฌํ•จ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ํ†ตํ•™ ๋ฒ„์Šค๋Š” ๋งค์šฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ด ๋ฒ„์Šค์˜ ์šดํ–‰ ์ •๋ณด์— ๋”ฐ๋ฅด๋ฉด, ์ด ๋ฒ„์Šค๋Š” ํ•˜๋ฃจ์— 280๋ช… ์ด์ƒ์˜ ์Šน๊ฐ์„ ์šด์†กํ•˜๋ฉฐ, ํ‰๊ท  ์šดํ–‰ ์‹œ๊ฐ„์€ 4์‹œ๊ฐ„์ž…๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ด ๋ฒ„์Šค์˜ ํ‰๊ท  ์†๋„๋Š” 70 km/h์ž…๋‹ˆ๋‹ค.

๋˜ํ•œ, ์ด ๋ฒ„์Šค๋Š” 8085A ๋งˆ์ดํฌ๋กœ์ปจํŠธ๋กค๋Ÿฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ALE(Auto Latch Enable) ๋™์ž‘์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ๋ฒ„์Šค๋ฅผ ์ œ์–ดํ•ฉ๋‹ˆ๋‹ค. ALE ๋™์ž‘์€ ๋งˆ์ดํฌ๋กœ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์™ธ๋ถ€ ์‹ ํ˜ธ์™€ ์ƒํ˜ธ ์ž‘์šฉํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋ฐ์ดํ„ฐ ๋ฒ„์Šค๋ฅผ ์ œ์–ดํ•˜์—ฌ ํšจ์œจ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋„๋ก ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

ํ†ตํ•™ ๋ฒ„์Šค๋ผ๋ฉด ๊ต์œกํ•™์ด๋‚˜ ๊ด€๋ จ ๋ถ„์•ผ๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ฒ„์Šค ์ž์ฒด๋Š” ๊ธฐ๊ณ„๊ณตํ•™์ด๋‚˜ ์ „์ž๊ณตํ•™, ๊ตํ†ต๊ณตํ•™๊ณผ ๊ด€๋ จ๋œ ๋‚ด์šฉ์ด๋ผ LLM์ด ๊ณผํ•™๋ฌธ์„œ์ธ์ง€ ์•„๋‹Œ์ง€ ์ œ๋Œ€๋กœ ํŒ๋‹จํ•  ์ˆ˜ ์—†์—ˆ๋‹ค.

  • Sentence Transformer์ด ์•„๋‹Œ Huggingface์˜ ๋ชจ๋ธ์„ ํŒŒ์ธ ํŠœ๋‹ ํ›„ ์ ์šฉํ•˜์ง€ ๋ชปํ•ด์„œ ์•„์‰ฌ์› ๋‹ค.

  • ์œ ์ € ์‚ฌ์ „์„ ๋งŒ๋“ค์–ด์„œ konlpy์—๋Š” ์‚ฌ์ „ ๋“ฑ๋ก์„ ํ•ด๋ดค์œผ๋‚˜ sentence_transformer์—๋Š” ์ ์šฉํ•˜์ง€ ๋ชปํ•ด์„œ ์•„์‰ฝ๋‹ค.

  • ๊ตฌ์ฒด์ ์ธ ์งˆ์˜ ๋‚ด์šฉ์ด ์•„๋‹Œ ๋‚ด์šฉ๋“ค์„ ๋ ˆํผ๋Ÿฐ์Šค๋กœ ์‚ผ์•„์„œ LLM์ด ์ •๋‹ต์„ ์ƒ์„ฑํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ ์ˆ˜๊ฐ€ ๋‚ฎ์•˜๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด "๋‚˜๋ฌด ๋ถ„๋ฅ˜"์— ๋Œ€ํ•ด์„œ ์•Œ๋ ค๋‹ฌ๋ผํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•ด์„œ "๋‚˜๋ฌด ๋ถ„๋ฅ˜"๋ž‘ ์ƒ๊ด€์—†๋Š” ๋‚˜๋ฌด ๋‚ด์šฉ๋“ค์ด ๋†’์€ ์ ์ˆ˜๋ฅผ ์–ป์€ ๋ ˆํŽ˜๋Ÿฐ์Šค๊ฐ€ ๋˜์—ˆ๋‹ค. LLM์˜ ์ƒ์„ฑ ๊ฒฐ๊ณผ์— ๋Œ€ํ•œ ๋ถ„์„์„ ์‹œ๊ฐ„ ๋ถ€์กฑ์œผ๋กœ ์ธํ•ด ํ•˜์ง€ ๋ชปํ•ด์„œ ์•„์‰ฝ๋‹ค.

Meeting Log

Reference

About

upstage-ai-final-ir2 created by GitHub Classroom

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors