I connected localGPT to my corpus building system - To Train LLM's #481

clearsitedesigns · 2023-09-17T15:05:09Z

clearsitedesigns
Sep 17, 2023

I designed a system to build a data store on any topic. I ingest this with my custom overlap embedding function. Then, I use the local GPT to run a series of chain commands, using the model as a dummy in between. The model is tuned to respond. I adjust several hyperparameters to do this - to a series of questions that I pose to the content in a change and then outputs the data as a "new training" source for an LLM as question and answer pairs. I use a secondary config that adds a Madlib style-like question and answer path to vary how the questions are asked. Then, from there, I save the output as a question and answer instruct pairs; I have a series of validation checks to help determine where hallucinations occur.

I thought I would share just a little information about this. The below is just a sample. It will run in the background, for 2,000 cycles (the typical amount to adjust a Lora adapter)

Question:
What is cerebrospinal fluid? generally

Answer:
2 hemispheres, left and right. Both hemispheres are very much involved in handling cognitive and behavioral functions. The cerebrum receives sensory input from all parts of the body, interprets it, and then sends signals to the muscles to respond.
Length of query in tokens: 5
Llama.generate: prefix-match hit

llama_print_timings: load time = 2014.40 ms
llama_print_timings: sample time = 388.07 ms / 551 runs ( 0.70 ms per token, 1419.83 tokens per second)
llama_print_timings: prompt eval time = 42211.96 ms / 989 tokens ( 42.68 ms per token, 23.43 tokens per second)
llama_print_timings: eval time = 32695.94 ms / 550 runs ( 59.45 ms per token, 16.82 tokens per second)
llama_print_timings: total time = 76212.72 ms
Length of raw answer in tokens: 348

Question:
What is parietal lobe? specifically

Answer:

Length of query in tokens: 5
Llama.generate: prefix-match hit

llama_print_timings: load time = 2014.40 ms
llama_print_timings: sample time = 12.78 ms / 18 runs ( 0.71 ms per token, 1408.56 tokens per second)
llama_print_timings: prompt eval time = 43205.15 ms / 1000 tokens ( 43.21 ms per token, 23.15 tokens per second)
llama_print_timings: eval time = 1055.25 ms / 18 runs ( 58.63 ms per token, 17.06 tokens per second)
llama_print_timings: total time = 44436.19 ms
Length of raw answer in tokens: 5

Question:
What is limbic system? specifically

Answer:
healthcare provider.
Length of query in tokens: 8
Llama.generate: prefix-match hit

llama_print_timings: load time = 2014.40 ms
llama_print_timings: sample time = 356.10 ms / 501 runs ( 0.71 ms per token, 1406.93 tokens per second)
llama_print_timings: prompt eval time = 33024.80 ms / 778 tokens ( 42.45 ms per token, 23.56 tokens per second)
llama_print_timings: eval time = 29268.26 ms / 500 runs ( 58.54 ms per token, 17.08 tokens per second)
llama_print_timings: total time = 63490.88 ms
Length of raw answer in tokens: 318

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I connected localGPT to my corpus building system - To Train LLM's #481

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

I connected localGPT to my corpus building system - To Train LLM's #481

Uh oh!

Uh oh!

clearsitedesigns Sep 17, 2023

Replies: 0 comments

clearsitedesigns
Sep 17, 2023