Skip to content

Commit 8374537

Browse files
Initial commit
1 parent cb64977 commit 8374537

File tree

5 files changed

+250
-0
lines changed

5 files changed

+250
-0
lines changed
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#Copyright (c) 2024 Oracle and/or its affiliates.
2+
#
3+
#The Universal Permissive License (UPL), Version 1.0
4+
#
5+
#Subject to the condition set forth below, permission is hereby granted to any
6+
#person obtaining a copy of this software, associated documentation and/or data
7+
#(collectively the "Software"), free of charge and under any and all copyright
8+
#rights in the Software, and any and all patent rights owned or freely
9+
#licensable by each licensor hereunder covering either (i) the unmodified
10+
#Software as contributed to or provided by such licensor, or (ii) the Larger
11+
#Works (as defined below), to deal in both
12+
#
13+
#(a) the Software, and
14+
#(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
#one is included with the Software (each a "Larger Work" to which the Software
16+
#is contributed by such licensors),
17+
#
18+
#without restriction, including without limitation the rights to copy, create
19+
#derivative works of, display, perform, and distribute the Software and make,
20+
#use, sell, offer for sale, import, export, have made, and have sold the
21+
#Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
#either these or other terms.
23+
#
24+
#This license is subject to the following condition:
25+
#The above copyright notice and either this complete permission notice or at
26+
#a minimum a reference to the UPL must be included in all copies or
27+
#substantial portions of the Software.
28+
#
29+
#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
#SOFTWARE.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
### How to Configure AI model as HTTP Endpoint Using TorchServe
2+
3+
1. Download and test the model using download script ../basic-utility-scripts/download\_and\_test\_huggingface\_model.py
4+
5+
2. Go to the folder where you downloaded model is located, for the sake of example, it'll be /share/app/llama3-8b-instruct folder, and do the following as a regular user.
6+
7+
3. Put handler.py to this folder. The handler is a python file that implements semantics of inference; in particular, it implements methods 'preprocess', 'inference', 'postprocess', and an umbrella method 'handle' that invokes previous methods sequentially.
8+
9+
4. Put config.properties to this folder. This file name is fixed and recognized by TorchServe. This file defines performance & scalability characteristics of a given model, such as batch size and number of 'workers', that is, processes, associated with the inference of a given model.
10+
11+
5. Archive the model along with all required file into .mar file recognizable by TorchServe using this command:
12+
13+
torch-model-archiver --model-name llama3-8b-instruct --version 1.0 --serialized-file /share/app/llama3-8b-instruct/model-00001-of-00004.safetensors --handler /share/app/llama3-8b-instruct/handler.py --extra-files "/share/app/llama3-8b-instruct/config.json,/share/app/llama3-8b-instruct/generation\_config.json,/share/app/llama3-8b-instruct/model-00002-of-00004.safetensors,/share/app/llama3-8b-instruct/model-00003-of-00004.safetensors,/share/app/llama3-8b-instruct/model-00004-of-00004.safetensors,/share/app/llama3-8b-instruct/model.safetensors.index.json,/share/app/llama3-8b-instruct/tokenizer.json,/share/app/llama3-8b-instruct/tokenizer\_config.json,/share/app/llama3-8b-instruct/special\_tokens\_map.json" --export-path model\_store --force
14+
15+
6. Start TorchServe:
16+
17+
torchserve --start --ts-config /share/app/llama3-8b-instruct/config.properties --model-store model\_store --models llama3-8b-instruct.mar
18+
19+
7. You can use the following basic commands with the running TorchServe:
20+
* How to stop TorchServe: torchserve --stop
21+
* How to send inference request: curl -X POST http://127.0.0.1:8080/predictions/llama3-8b-instruct -T input\_example.json
22+
* How to directly pass payload to inference request: curl -X POST http://localhost:8080/predictions/llama -H "Content-Type: application/json" -d '{ "input": "Hello, who are you?" }'
23+
* How to monitor GPU utilization during inference processing: watch -n 1 nvidia-smi
24+
* How to check if TorchServe is healthy: curl http://localhost:8080/ping
25+
* How to check the list of models exposed via TorchServe: curl http://localhost:8081/models
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
#Copyright (c) 2024 Oracle and/or its affiliates.
2+
#
3+
#The Universal Permissive License (UPL), Version 1.0
4+
#
5+
#Subject to the condition set forth below, permission is hereby granted to any
6+
#person obtaining a copy of this software, associated documentation and/or data
7+
#(collectively the "Software"), free of charge and under any and all copyright
8+
#rights in the Software, and any and all patent rights owned or freely
9+
#licensable by each licensor hereunder covering either (i) the unmodified
10+
#Software as contributed to or provided by such licensor, or (ii) the Larger
11+
#Works (as defined below), to deal in both
12+
#
13+
#(a) the Software, and
14+
#(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
#one is included with the Software (each a "Larger Work" to which the Software
16+
#is contributed by such licensors),
17+
#
18+
#without restriction, including without limitation the rights to copy, create
19+
#derivative works of, display, perform, and distribute the Software and make,
20+
#use, sell, offer for sale, import, export, have made, and have sold the
21+
#Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
#either these or other terms.
23+
#
24+
#This license is subject to the following condition:
25+
#The above copyright notice and either this complete permission notice or at
26+
#a minimum a reference to the UPL must be included in all copies or
27+
#substantial portions of the Software.
28+
#
29+
#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
#SOFTWARE.
36+
37+
# Path where models are stored
38+
model_store=/share/app/llama3-8b-instruct/
39+
40+
# Number of workers per model
41+
default_workers_per_model=1
42+
43+
# Inference API settings
44+
inference_address=http://0.0.0.0:8080
45+
management_address=http://0.0.0.0:8081
46+
metrics_address=http://0.0.0.0:8082
47+
48+
# Model snapshot settings
49+
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"llama3-8b-instruct":{"1.0":{"defaultVersion":true,"marName":"llama3-8b-instruct.mar","minWorkers":1,"maxWorkers":1,"batchSize":1,"maxBatchDelay":100,"responseTimeout":120}}}}
50+
51+
#model_name=llama3-8b-instruct
52+
#initial_workers=1
53+
#max_workers=1
54+
#llama3-8b-instruct.mar=1
Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
#Copyright (c) 2024 Oracle and/or its affiliates.
2+
#
3+
#The Universal Permissive License (UPL), Version 1.0
4+
#
5+
#Subject to the condition set forth below, permission is hereby granted to any
6+
#person obtaining a copy of this software, associated documentation and/or data
7+
#(collectively the "Software"), free of charge and under any and all copyright
8+
#rights in the Software, and any and all patent rights owned or freely
9+
#licensable by each licensor hereunder covering either (i) the unmodified
10+
#Software as contributed to or provided by such licensor, or (ii) the Larger
11+
#Works (as defined below), to deal in both
12+
#
13+
#(a) the Software, and
14+
#(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
#one is included with the Software (each a "Larger Work" to which the Software
16+
#is contributed by such licensors),
17+
#
18+
#without restriction, including without limitation the rights to copy, create
19+
#derivative works of, display, perform, and distribute the Software and make,
20+
#use, sell, offer for sale, import, export, have made, and have sold the
21+
#Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
#either these or other terms.
23+
#
24+
#This license is subject to the following condition:
25+
#The above copyright notice and either this complete permission notice or at
26+
#a minimum a reference to the UPL must be included in all copies or
27+
#substantial portions of the Software.
28+
#
29+
#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
#SOFTWARE.
36+
37+
import logging
38+
import json
39+
import torch
40+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
41+
42+
logger = logging.getLogger(__name__)
43+
44+
class SimpleTextGenerationHandler:
45+
def __init__(self):
46+
self.initialized = False
47+
48+
def initialize(self, context):
49+
logger.info("YR: model initialization")
50+
self.model_dir = context.system_properties.get("model_dir")
51+
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
52+
53+
logger.info("YR: loading model")
54+
# Load the tokenizer and model from the local directory
55+
self.model = AutoModelForCausalLM.from_pretrained(
56+
self.model_dir,
57+
torch_dtype=torch.bfloat16
58+
).to(self.device)
59+
60+
self.tokenizer = AutoTokenizer.from_pretrained(self.model_dir)
61+
62+
logger.info("YR: setting pipeline")
63+
# Use the Hugging Face pipeline for text generation
64+
self.pipe = pipeline(
65+
"text-generation",
66+
model=self.model,
67+
tokenizer=self.tokenizer,
68+
device=0 # Use GPU
69+
)
70+
71+
self.initialized = True
72+
73+
def preprocess(self, data):
74+
input_text = data[0].get("body")
75+
return input_text.decode('utf-8') if isinstance(input_text, (bytes, bytearray)) else input_text
76+
77+
def inference(self, input_text):
78+
try:
79+
result = self.pipe(input_text)[0]['generated_text']
80+
return [result]
81+
except Exception as e:
82+
raise RuntimeError(f"Error during inference: {str(e)}")
83+
84+
def postprocess(self, inference_output):
85+
return [json.dumps({"generated_text": inference_output[0]})]
86+
87+
_service = SimpleTextGenerationHandler()
88+
89+
def handle(data, context):
90+
if not _service.initialized:
91+
_service.initialize(context)
92+
if data is None:
93+
return None
94+
95+
data = _service.preprocess(data)
96+
predictions = _service.inference(data)
97+
return _service.postprocess(predictions)
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#Copyright (c) 2024 Oracle and/or its affiliates.
2+
#
3+
#The Universal Permissive License (UPL), Version 1.0
4+
#
5+
#Subject to the condition set forth below, permission is hereby granted to any
6+
#person obtaining a copy of this software, associated documentation and/or data
7+
#(collectively the "Software"), free of charge and under any and all copyright
8+
#rights in the Software, and any and all patent rights owned or freely
9+
#licensable by each licensor hereunder covering either (i) the unmodified
10+
#Software as contributed to or provided by such licensor, or (ii) the Larger
11+
#Works (as defined below), to deal in both
12+
#
13+
#(a) the Software, and
14+
#(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
15+
#one is included with the Software (each a "Larger Work" to which the Software
16+
#is contributed by such licensors),
17+
#
18+
#without restriction, including without limitation the rights to copy, create
19+
#derivative works of, display, perform, and distribute the Software and make,
20+
#use, sell, offer for sale, import, export, have made, and have sold the
21+
#Software and the Larger Work(s), and to sublicense the foregoing rights on
22+
#either these or other terms.
23+
#
24+
#This license is subject to the following condition:
25+
#The above copyright notice and either this complete permission notice or at
26+
#a minimum a reference to the UPL must be included in all copies or
27+
#substantial portions of the Software.
28+
#
29+
#THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
30+
#IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
31+
#FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
32+
#AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
33+
#LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
34+
#OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
35+
#SOFTWARE.
36+
37+
{
38+
"input_text": "Once upon a long ago..."
39+
}

0 commit comments

Comments
 (0)