|
| 1 | +## Quick Start: Example Walkthrough |
| 2 | + |
| 3 | +In this section, we will walk through a simple example to demonstrate how to use CLDK. We will: |
| 4 | + |
| 5 | +* Set up a local ollama server to interact with CodeLLMs |
| 6 | +* Build a simple code summarization pipeline for a Java and a Python application. |
| 7 | + |
| 8 | +### Prerequisites |
| 9 | + |
| 10 | +Before we begin, make sure you have the following prerequisites installed: |
| 11 | + |
| 12 | + * Python 3.11 or later |
| 13 | + * Ollama v0.3.4 or later |
| 14 | + |
| 15 | +### Step 1: Set up an Ollama server |
| 16 | + |
| 17 | +If don't already have ollama, please download and install it from here: [Ollama](https://ollama.com/download). |
| 18 | + |
| 19 | +Once you have ollama, start the server and make sure it is running. |
| 20 | + |
| 21 | +If you're on MacOS, Linux, or WSL, you can check to make sure the server is running by running the following command: |
| 22 | + |
| 23 | +```bash |
| 24 | +sudo systemctl status ollama |
| 25 | +``` |
| 26 | + |
| 27 | +You should see an output similar to the following: |
| 28 | + |
| 29 | +```bash |
| 30 | +➜ sudo systemctl status ollama |
| 31 | +● ollama.service - Ollama Service |
| 32 | + Loaded: loaded (/etc/systemd/system/ollama.service; enabled; preset: enabled) |
| 33 | + Active: active (running) since Sat 2024-08-10 20:39:56 EDT; 17s ago |
| 34 | + Main PID: 23069 (ollama) |
| 35 | + Tasks: 19 (limit: 76802) |
| 36 | + Memory: 1.2G (peak: 1.2G) |
| 37 | + CPU: 6.745s |
| 38 | + CGroup: /system.slice/ollama.service |
| 39 | + └─23069 /usr/local/bin/ollama serve |
| 40 | +``` |
| 41 | + |
| 42 | +If not, you may have to start the server manually. You can do this by running the following command: |
| 43 | + |
| 44 | +```bash |
| 45 | +sudo systemctl start ollama |
| 46 | +``` |
| 47 | + |
| 48 | +#### Pull the latest version of Granite 8b instruct model from ollama |
| 49 | + |
| 50 | +To pull the latest version of the Granite 8b instruct model from ollama, run the following command: |
| 51 | + |
| 52 | +```bash |
| 53 | +ollama pull granite-code:8b-instruct |
| 54 | +``` |
| 55 | + |
| 56 | +Check to make sure the model was successfully pulled by running the following command: |
| 57 | + |
| 58 | +```bash |
| 59 | +ollama run granite-code:8b-instruct 'Write a function to print hello world in python' |
| 60 | +``` |
| 61 | + |
| 62 | +The output should be similar to the following: |
| 63 | + |
| 64 | +``` |
| 65 | +➜ ollama run granite-code:8b-instruct 'Write a function to print hello world in python' |
| 66 | +
|
| 67 | +def say_hello(): |
| 68 | + print("Hello World!") |
| 69 | +``` |
| 70 | + |
| 71 | +### Step 2: Install CLDK |
| 72 | + |
| 73 | +You may install the latest version of CLDK from [PyPi](https://pypi.org/project/cldk/): |
| 74 | + |
| 75 | +```python |
| 76 | +pip install cldk |
| 77 | +``` |
| 78 | + |
| 79 | +Once CLDK is installed, you can import it into your Python code: |
| 80 | + |
| 81 | +```python |
| 82 | +from cldk import CLDK |
| 83 | +``` |
| 84 | + |
| 85 | +### Step 3: Build a code summarization pipeline |
| 86 | + |
| 87 | +Now that we have set up the ollama server and installed CLDK, we can build a simple code summarization pipeline for a Java application. |
| 88 | + |
| 89 | +1. Let's download a sample Java (apache-commons-cli): |
| 90 | + |
| 91 | + * Download and unzip the sample Java application: |
| 92 | + ```bash |
| 93 | + wget https://github.com/apache/commons-cli/archive/refs/tags/rel/commons-cli-1.7.0.zip -O commons-cli-1.7.0.zip && unzip commons-cli-1.7.0.zip |
| 94 | + ``` |
| 95 | + * Record the path to the sample Java application: |
| 96 | + ```bash |
| 97 | + export JAVA_APP_PATH=/path/to/commons-cli-1.7.0 |
| 98 | + ``` |
| 99 | + |
| 100 | +Below is a simple code summarization pipeline for a Java application using CLDK. It does the following things: |
| 101 | + |
| 102 | +* Creates a new instance of the CLDK class (see comment `# (1)`) |
| 103 | +* Creates an analysis object over the Java application (see comment `# (2)`) |
| 104 | +* Iterates over all the files in the project (see comment `# (3)`) |
| 105 | +* Iterates over all the classes in the file (see comment `# (4)`) |
| 106 | +* Iterates over all the methods in the class (see comment `# (5)`) |
| 107 | +* Gets the code body of the method (see comment `# (6)`) |
| 108 | +* Initializes the treesitter utils for the class file content (see comment `# (7)`) |
| 109 | +* Sanitizes the class for analysis (see comment `# (8)`) |
| 110 | +* Formats the instruction for the given focal method and class (see comment `# (9)`) |
| 111 | +* Prompts the local model on Ollama (see comment `# (10)`) |
| 112 | +* Prints the instruction and LLM output (see comment `# (11)`) |
| 113 | + |
| 114 | +```python |
| 115 | +# code_summarization_for_java.py |
| 116 | +
|
| 117 | +from cldk import CLDK |
| 118 | +
|
| 119 | +
|
| 120 | +def format_inst(code, focal_method, focal_class): |
| 121 | + """ |
| 122 | + Format the instruction for the given focal method and class. |
| 123 | + """ |
| 124 | + inst = f"Question: Can you write a brief summary for the method `{focal_method}` in the class `{focal_class}` below?\n" |
| 125 | +
|
| 126 | + inst += "\n" |
| 127 | + inst += f"```{language}\n" |
| 128 | + inst += code |
| 129 | + inst += "```" if code.endswith("\n") else "\n```" |
| 130 | + inst += "\n" |
| 131 | + return inst |
| 132 | +
|
| 133 | +def prompt_ollama(message: str, model_id: str = "granite-code:8b-instruct") -> str: |
| 134 | + """Prompt local model on Ollama""" |
| 135 | + response_object = ollama.generate(model=model_id, prompt=message) |
| 136 | + return response_object["response"] |
| 137 | +
|
| 138 | +
|
| 139 | +if __name__ == "__main__": |
| 140 | + # (1) Create a new instance of the CLDK class |
| 141 | + cldk = CLDK(language="java") |
| 142 | +
|
| 143 | + # (2) Create an analysis object over the java application |
| 144 | + analysis = cldk.analysis(project_path=os.getenv("JAVA_APP_PATH")) |
| 145 | +
|
| 146 | + # (3) Iterate over all the files in the project |
| 147 | + for file_path, class_file in analysis.get_symbol_table().items(): |
| 148 | + class_file_path = Path(file_path).absolute().resolve() |
| 149 | + # (4) Iterate over all the classes in the file |
| 150 | + for type_name, type_declaration in class_file.type_declarations.items(): |
| 151 | + # (5) Iterate over all the methods in the class |
| 152 | + for method in type_declaration.callable_declarations.values(): |
| 153 | + |
| 154 | + # (6) Get code body of the method |
| 155 | + code_body = class_file_path.read_text() |
| 156 | + |
| 157 | + # (7) Initialize the treesitter utils for the class file content |
| 158 | + tree_sitter_utils = cldk.tree_sitter_utils(source_code=code_body) |
| 159 | + |
| 160 | + # (8) Sanitize the class for analysis |
| 161 | + sanitized_class = tree_sitter_utils.sanitize_focal_class(method.declaration) |
| 162 | +
|
| 163 | + # (9) Format the instruction for the given focal method and class |
| 164 | + instruction = format_inst( |
| 165 | + code=sanitized_class, |
| 166 | + focal_method=method.declaration, |
| 167 | + focal_class=type_name, |
| 168 | + ) |
| 169 | +
|
| 170 | + # (10) Prompt the local model on Ollama |
| 171 | + llm_output = prompt_ollama( |
| 172 | + message=instruction, |
| 173 | + model_id="granite-code:20b-instruct", |
| 174 | + ) |
| 175 | +
|
| 176 | + # (11) Print the instruction and LLM output |
| 177 | + print(f"Instruction:\n{instruction}") |
| 178 | + print(f"LLM Output:\n{llm_output}") |
| 179 | +``` |
0 commit comments