Merge pull request #1723 from ysocarras-oracle/sentiment+category-analysis

al3xne · web-flow · commit feacfa2d62ba · 2025-04-28T13:16:01.000+03:00
Sentiment+category analysis
diff --git a/ai/generative-ai-service/sentiment+categorization/LICENSE b/ai/generative-ai-service/sentiment+categorization/LICENSE
@@ -0,0 +1,35 @@
+Copyright (c) 2025 Oracle and/or its affiliates.
+
+The Universal Permissive License (UPL), Version 1.0
+
+Subject to the condition set forth below, permission is hereby granted to any
+person obtaining a copy of this software, associated documentation and/or data
+(collectively the "Software"), free of charge and under any and all copyright
+rights in the Software, and any and all patent rights owned or freely
+licensable by each licensor hereunder covering either (i) the unmodified
+Software as contributed to or provided by such licensor, or (ii) the Larger
+Works (as defined below), to deal in both
+
+(a) the Software, and
+(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
+one is included with the Software (each a "Larger Work" to which the Software
+is contributed by such licensors),
+
+without restriction, including without limitation the rights to copy, create
+derivative works of, display, perform, and distribute the Software and make,
+use, sell, offer for sale, import, export, have made, and have sold the
+Software and the Larger Work(s), and to sublicense the foregoing rights on
+either these or other terms.
+
+This license is subject to the following condition:
+The above copyright notice and either this complete permission notice or at
+a minimum a reference to the UPL must be included in all copies or
+substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/ai/generative-ai-service/sentiment+categorization/README.md b/ai/generative-ai-service/sentiment+categorization/README.md
@@ -0,0 +1,32 @@
+# Customer Message Analyzer
+ 
+The Customer Message Analyzer is a tool designed to analyze customer messages through unsupervised categorization, sentiment analysis, and summary reporting. It helps businesses understand customer feedback without requiring extensive manual labeling or analysis.
+ 
+ 
+Reviewed: 01.04.2025
+ 
+# When to use this asset?
+ 
+Customer service teams, product managers, and marketing professionals would use this asset when they need to quickly understand large volumes of customer feedback, identify trends, and make data-driven decisions to improve products or services.
+ 
+# How to use this asset?
+ 
+To use the Customer Message Analyzer, follow these steps:
+
+1. Input the customer messages into the system.
+2. The system will automatically cluster the messages into categories based on their content.
+3. Each message will receive a sentiment score indicating its emotional tone.
+4. Review the generated summary report highlighting dominant themes, sentiment trends, and actionable insights.
+ 
+# Useful Links (Optional)
+
+- [Confluence](https://confluence.oraclecorp.com/confluence/x/DaCEoAE)
+    - Internal Reusable Assets
+ 
+# License
+ 
+Copyright (c) 2025 Oracle and/or its affiliates.
+ 
+Licensed under the Universal Permissive License (UPL), Version 1.0.
+ 
+See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
diff --git a/ai/generative-ai-service/sentiment+categorization/files/README.md b/ai/generative-ai-service/sentiment+categorization/files/README.md
@@ -0,0 +1,54 @@
+# Batch Message Analysis and Categorization Demo
+This demo showcases an AI-powered solution for analyzing batches of customer messages, categorizing them into hierarchical levels, extracting sentiment scores, and generating structured reports.
+
+## Key Features
+* **Hierarchical Categorization**: Automatically categorizes messages into three levels of hierarchy:
+	+ Primary Category: High-level categorization
+	+ Secondary Category: Mid-level categorization, building upon primary categories
+	+ Tertiary Category: Low-level categorization, providing increased specificity and detail
+* **Sentiment Analysis**: Extracts sentiment scores for each message, ranging from very negative (1) to very positive (10)
+* **Structured Reporting**: Generates a comprehensive report analyzing the batch of messages, including:
+	+ Category distribution across all three levels
+	+ Sentiment score distribution
+	+ Summaries of key findings and insights
+
+## Data Requirements
+* Customer messages should be stored in a CSV file(s) within a folder named `data`.
+* Each CSV file should contain a column with the message text.
+
+## Getting Started
+To run the demo, follow these steps:
+1. Clone the repository using `git clone`.
+2. Place your CSV files containing customer messages in the `data` folder.
+3. Install dependencies using `pip install -r requirements.txt`.
+4. Run the application using `streamlit run app.py`.
+
+## Example Use Cases
+* Analyze customer feedback from surveys, reviews, or social media platforms to identify trends and patterns.
+* Inform product development and customer support strategies by understanding customer sentiment and preferences.
+* Optimize marketing campaigns by targeting specific customer segments based on their interests and concerns.
+
+## Technical Details
+* The solution leverages Oracle Cloud Infrastructure (OCI) GenAI, a suite of AI services designed to simplify AI adoption.
+* Specifically, this demo utilizes the Cohere R+ model, a state-of-the-art language model optimized for natural language processing tasks.
+* All aspects of the demo, including:
+	+ Hierarchical categorization
+	+ Sentiment analysis
+	+ Structured report generation
+are powered by GenAI, ensuring accurate and efficient analysis of customer messages.
+
+## Output
+The demo will display an interactive dashboard with the generated report, providing valuable insights into customer messages, including:
+* Category distribution across all three levels
+* Sentiment score distribution
+* Summaries of key findings and insights
+
+## Contributing
+We welcome contributions to improve and expand the capabilities of this demo. Please fork the repository and submit a pull request with your changes.
+
+## License
+Copyright (c) 2025 Oracle and/or its affiliates.
+ 
+Licensed under the Universal Permissive License (UPL), Version 1.0.
+ 
+See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details.
diff --git a/ai/generative-ai-service/sentiment+categorization/files/app.py b/ai/generative-ai-service/sentiment+categorization/files/app.py
@@ -0,0 +1,16 @@
+import streamlit as st
+
+st.set_page_config(
+    page_title="Hello",
+    page_icon="👋",
+)
+
+st.write("# Welcome to Streamlit! 👋")
+
+st.sidebar.success("Select a demo above.")
+
+st.markdown(
+    """
+This is a demo!
+"""
+)
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/data/complaints_messages.csv b/ai/generative-ai-service/sentiment+categorization/files/backend/data/complaints_messages.csv
@@ -0,0 +1,31 @@
+ID,Message
+1,I had to cancel my order because of poor service.
+2,"The delivery was late, and the packaging was damaged."
+3,I was sent the wrong color of the product.
+4,My order was incomplete when it arrived.
+5,The product I received was damaged.
+6,The quality of the product is much worse than expected.
+7,The product stopped working after a short period of time.
+8,The product doesn’t match the description on the website.
+9,I’ve had to contact customer service multiple times for the same issue.
+10,Customer support was not helpful at all.
+11,The quality of the product was poor.
+12,The product was much smaller than I expected.
+13,I had trouble finding the product on your website.
+14,The instructions were unclear and hard to follow.
+15,The website was difficult to navigate during my purchase.
+16,I received the wrong size and need a replacement.
+17,I was given false information about the product.
+18,The product stopped working after a short period of time.
+19,The product arrived damaged and unusable.
+20,The product arrived in terrible condition.
+21,The product arrived damaged and unusable.
+22,The customer service was slow to respond.
+23,The product was missing some essential accessories.
+24,I didn’t receive any confirmation email for my order.
+25,The product wasn’t compatible with my other appliances.
+26,The product is faulty and doesn’t work properly.
+27,The product didn’t fit as expected.
+28,The product was extremely hard to set up.
+29,I am unhappy with the design of the product.
+30,The website was difficult to navigate during my purchase.
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/feedback_agent.py b/ai/generative-ai-service/sentiment+categorization/files/backend/feedback_agent.py
@@ -0,0 +1,129 @@
+import json
+import logging
+from typing import List
+
+from langchain_community.chat_models.oci_generative_ai import ChatOCIGenAI
+from langchain_core.messages import HumanMessage, SystemMessage
+from langchain_core.pydantic_v1 import BaseModel
+from langgraph.checkpoint.memory import MemorySaver
+from langgraph.graph import END, StateGraph
+
+import backend.message_handler as handler
+import backend.utils.llm_config as llm_config
+
+# Set up logging
+logging.getLogger("oci").setLevel(logging.DEBUG)
+messages_path = "ai/generative-ai-service/sentiment+categorization/demo_code/backend/data/complaints_messages.csv"
+
+
+class AgentState(BaseModel):
+    messages_info: List = []
+    categories: List = []
+    reports: List = []
+
+
+class FeedbackAgent:
+    def __init__(self, model_name: str = "cohere_oci"):
+        self.model_name = model_name
+        self.model = self.initialize_model()
+        self.memory = MemorySaver()
+        self.builder = self.setup_graph()
+        self.messages = self.read_messages()
+
+    def initialize_model(self):
+        if self.model_name not in llm_config.MODEL_REGISTRY:
+            raise ValueError(f"Unknown model: {self.model_name}")
+
+        model_config = llm_config.MODEL_REGISTRY[self.model_name]
+
+        return ChatOCIGenAI(
+            model_id=model_config["model_id"],
+            service_endpoint=model_config["service_endpoint"],
+            compartment_id=model_config["compartment_id"],
+            provider=model_config["provider"],
+            auth_type=model_config["auth_type"],
+            auth_profile=model_config["auth_profile"],
+            model_kwargs=model_config["model_kwargs"],
+        )
+
+    def read_messages(self):
+        messages = handler.read_messages(filepath=messages_path)
+        return handler.batchify(messages, 30)
+
+    def summarization_node(self, state: AgentState):
+        batch = self.messages
+        response = self.model.invoke(
+            [
+                SystemMessage(
+                    content=llm_config.get_prompt(self.model_name, "SUMMARIZATION")
+                ),
+                HumanMessage(content=f"Message batch: {batch}"),
+            ]
+        )
+        state.messages_info = state.messages_info + [json.loads(response.content)]
+        return {"messages_info": state.messages_info}
+
+    def categorization_node(self, state: AgentState):
+        batch = state.messages_info
+        response = self.model.invoke(
+            [
+                SystemMessage(
+                    content=llm_config.get_prompt(
+                        self.model_name, "CATEGORIZATION_SYSTEM"
+                    )
+                ),
+                HumanMessage(
+                    content=llm_config.get_prompt(
+                        self.model_name, "CATEGORIZATION_USER"
+                    ).format(MESSAGE_BATCH=batch)
+                ),
+            ]
+        )
+        content = [json.loads(response.content)]
+        state.categories = state.categories + handler.match_categories(batch, content)
+        return {"categories": state.categories}
+
+    def generate_report_node(self, state: AgentState):
+        response = self.model.invoke(
+            [
+                SystemMessage(
+                    content=llm_config.get_prompt(self.model_name, "REPORT_GEN")
+                ),
+                HumanMessage(content=f"Message info: {state.categories}"),
+            ]
+        )
+        state.reports = response.content
+        return {"reports": [response.content]}
+
+    def setup_graph(self):
+        builder = StateGraph(AgentState)
+        builder.add_node("summarize", self.summarization_node)
+        builder.add_node("categorize", self.categorization_node)
+        builder.add_node("generate_report", self.generate_report_node)
+
+        builder.set_entry_point("summarize")
+        builder.add_edge("summarize", "categorize")
+        builder.add_edge("categorize", "generate_report")
+
+        builder.add_edge("generate_report", END)
+        return builder.compile(checkpointer=self.memory)
+
+    def get_graph(self):
+        return self.builder.get_graph()
+
+    def run(self):
+        thread = {"configurable": {"thread_id": "1"}}
+        for s in self.builder.stream(
+            config=thread,
+        ):
+            print(f"\n \n{s}")
+
+    def run_step_by_step(self):
+        thread = {"configurable": {"thread_id": "1"}}
+        initial_state = {
+            "messages_info": [],
+            "categories": [],
+            "reports": [],
+        }
+        for state in self.builder.stream(initial_state, thread):
+            yield state  # Yield each intermediate step to allow step-by-step execution
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/feedback_wrapper.py b/ai/generative-ai-service/sentiment+categorization/files/backend/feedback_wrapper.py
@@ -0,0 +1,25 @@
+from backend.feedback_agent import FeedbackAgent
+
+
+class FeedbackAgentWrapper:
+    def __init__(self):
+        self.agent = FeedbackAgent()
+        self.run_graph = self.agent.run_step_by_step()
+
+    def get_nodes_edges(self):
+        graph_data = self.agent.get_graph()
+        nodes = list(graph_data.nodes.keys())
+        edges = [(edge.source, edge.target) for edge in graph_data.edges]
+        return nodes, edges
+
+    def run_step_by_step(self):
+        try:
+            action_output = next(self.run_graph)
+            current_node = list(action_output.keys())[0]
+        except StopIteration:
+            action_output = {}
+            current_node = "FINALIZED"
+        return current_node, action_output
+
+    def get_graph(self):
+        return self.agent.get_graph()
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/message_handler.py b/ai/generative-ai-service/sentiment+categorization/files/backend/message_handler.py
@@ -0,0 +1,53 @@
+import csv
+from typing import List
+
+
+def read_messages(
+    filepath: str, columns: List[str] = ["ID", "Message"]
+) -> List[List[str]]:
+    with open(filepath, newline="", encoding="utf-8") as file:
+        reader = csv.DictReader(file)
+        extracted_data = []
+
+        for row in reader:
+            extracted_row = [row[col] for col in columns if col in row]
+            extracted_data.append(extracted_row)
+
+    return extracted_data
+
+
+def batchify(lst, batch_size):
+    return [lst[i : i + batch_size] for i in range(0, len(lst), batch_size)]
+
+
+def match_categories(summaries, categories):
+    result = []
+    for i, elem in enumerate(summaries[0]):
+        if elem["id"] == categories[0][i]["id"]:
+            elem["primary_category"] = categories[0][i]["primary_category"]
+            elem["secondary_category"] = categories[0][i]["secondary_category"]
+            elem["tertiary_category"] = categories[0][i]["tertiary_category"]
+            result.append(elem)
+    return result
+
+
+def group_by_category_level(categories_list):
+    result = {}
+
+    for category in categories_list:
+        primary = category["primary_category"]
+        secondary = category["secondary_category"]
+        tertiary = category["tertiary_category"]
+
+        if primary not in result:
+            result[primary] = {}
+
+        if secondary not in result[primary]:
+            result[primary][secondary] = {}
+
+        if tertiary not in result[primary][secondary]:
+            result[primary][secondary][tertiary] = []
+
+        result[primary][secondary][tertiary].append(category["id"])
+
+    return result
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/utils/config.py b/ai/generative-ai-service/sentiment+categorization/files/backend/utils/config.py
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/utils/llm_config.py b/ai/generative-ai-service/sentiment+categorization/files/backend/utils/llm_config.py
diff --git a/ai/generative-ai-service/sentiment+categorization/files/backend/utils/prompts.py b/ai/generative-ai-service/sentiment+categorization/files/backend/utils/prompts.py
diff --git a/ai/generative-ai-service/sentiment+categorization/files/pages/SentimentByCat.py b/ai/generative-ai-service/sentiment+categorization/files/pages/SentimentByCat.py