Skip to content

Commit afc896b

Browse files
authored
Merge branch 'main' into support-slim-jsons-from-codeanalyzer
2 parents fcd7558 + 0020e55 commit afc896b

File tree

2 files changed

+10
-8
lines changed

2 files changed

+10
-8
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,4 +44,7 @@ scratch*
4444
__pycache__/
4545
*.py[cod]
4646
.python-version
47-
.venv/
47+
.venv/
48+
49+
# Build files
50+
dist/

README.md

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
# CodeLLM-Devkit: A Python library for seamless interaction with CodeLLMs
22

3-
![image](./docs/assets/cldk.png)
3+
![codellm-devkit logo](https://github.com/IBM/codellm-devkit/blob/main/docs/assets/cldk.png?raw=true)
4+
45
[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/)
6+
[![License](https://img.shields.io/badge/License-Apache%202.0-green.svg)](https://opensource.org/licenses/Apache-2.0)
57

68
Codellm-devkit (CLDK) is a multilingual program analysis framework that bridges the gap between traditional static analysis tools and Large Language Models (LLMs) specialized for code (CodeLLMs). Codellm-devkit allows developers to streamline the process of transforming raw code into actionable insights by providing a unified interface for integrating outputs from various analysis tools and preparing them for effective use by CodeLLMs.
79

@@ -43,6 +45,7 @@ For any questions, feedback, or suggestions, please contact the authors:
4345
- [Pull the latest version of Granite 8b instruct model from ollama](#pull-the-latest-version-of-granite-8b-instruct-model-from-ollama)
4446
- [Step 2: Install CLDK](#step-2--install-cldk)
4547
- [Step 3: Build a code summarization pipeline](#step-3--build-a-code-summarization-pipeline)
48+
- [Publication (papers and blogs related to CLDK)](#publication-papers-and-blogs-related-to-cldk)
4649

4750
## Architectural and Design Overview
4851

@@ -80,14 +83,12 @@ Each language comprises of two key components: data models and backends.
8083

8184
1. **Data Models:** These are high level abstractions that represent the various language constructs and componentes in a structured format using pydantic. This confers a high degree of flexibility and extensibility to the models as well as allowing for easy accees of various data components via a simple dot notation. In addition, the data models are designed to be easily serializable and deserializable, making it easy to store and retrieve data from various sources.
8285

83-
8486
2. **Analysis Backends:** These are the components that are responsible for interfacing with the various program analysis tools. The core backends are Treesitter, Javaparse, WALA, LLVM, and CodeQL. The backends are responsible for handling the user requests and delegating them to the appropriate analysis tools. The analysis tools perfrom the requisite analysis and return the results to the user. The user merely calls one of several high-level API functions such as `get_method_body`, `get_method_signature`, `get_call_graph`, etc. and the backend takes care of the rest.
8587

8688
Some langugages may have multiple backends. For example, Java has WALA, Javaparser, Treesitter, and CodeQL backends. The user has freedom to choose the backend that best suits their needs.
8789

8890
We are currently working on implementing the retrieval and prompting components. The retrieval component will be responsible for retrieving the relevant code snippets from the codebase for RAG usecases. The prompting component will be responsible for generating the prompts for the CodeLLMs using popular prompting frameworks such as `PDL`, `Guidance`, or `LMQL`.
8991

90-
9192
## Quick Start: Example Walkthrough
9293

9394
In this section, we will walk through a simple example to demonstrate how to use CLDK. We will:
@@ -102,7 +103,6 @@ Before we begin, make sure you have the following prerequisites installed:
102103
* Python 3.11 or later
103104
* Ollama v0.3.4 or later
104105

105-
106106
### Step 1: Set up an Ollama server
107107

108108
If don't already have ollama, please download and install it from here: [Ollama](https://ollama.com/download).
@@ -161,10 +161,10 @@ def say_hello():
161161

162162
### Step 2: Install CLDK
163163

164-
You may install the latest version of CLDK from our GitHub repository:
164+
You may install the latest version of CLDK from [PyPi](https://pypi.org/project/cldk/):
165165

166166
```python
167-
pip install git+https://github.com/IBM/codellm-devkit.git
167+
pip install cldk
168168
```
169169

170170
Once CLDK is installed, you can import it into your Python code:
@@ -188,7 +188,6 @@ Now that we have set up the ollama server and installed CLDK, we can build a sim
188188
export JAVA_APP_PATH=/path/to/commons-cli-1.7.0
189189
```
190190

191-
192191
Below is a simple code summarization pipeline for a Java application using CLDK. It does the following things:
193192

194193
* Creates a new instance of the CLDK class (see comment `# (1)`)

0 commit comments

Comments
 (0)