diff --git a/ai/ai-document-understanding/ai-email-invoice/files/README.md b/ai/ai-document-understanding/ai-email-invoice/files/README.md index 55bd972ab..18e40a477 100644 --- a/ai/ai-document-understanding/ai-email-invoice/files/README.md +++ b/ai/ai-document-understanding/ai-email-invoice/files/README.md @@ -1,6 +1,6 @@ # Invoice Document Processing from Gmail into ERP Systems using OCI Document Understanding & Oracle Integration Cloud -Reviewed: 30.10.2024 +Reviewed: 04.11.2025 # Introduction diff --git a/ai/gen-ai-agents/assistant-secretary-agent/files/local_requirements.txt b/ai/gen-ai-agents/assistant-secretary-agent/files/local_requirements.txt index cc18cf9d2..153fcef1c 100644 --- a/ai/gen-ai-agents/assistant-secretary-agent/files/local_requirements.txt +++ b/ai/gen-ai-agents/assistant-secretary-agent/files/local_requirements.txt @@ -38,7 +38,7 @@ langchain-core==0.3.29 langchain-experimental==0.3.4 langchain-text-splitters==0.3.9 langgraph==0.2.62 -langgraph-checkpoint==2.0.8 +langgraph-checkpoint==3.0.0 langgraph-sdk==0.1.51 langsmith==0.2.11 markdown-it-py==3.0.0 diff --git a/ai/gen-ai-agents/crewai-oci-integration/LICENSE b/ai/gen-ai-agents/crewai-oci-integration/LICENSE new file mode 100644 index 000000000..fb2e1fcb6 --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/LICENSE @@ -0,0 +1,21 @@ +MIT License + +Copyright (c) 2025 Luigi Saetta + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/ai/gen-ai-agents/crewai-oci-integration/README.md b/ai/gen-ai-agents/crewai-oci-integration/README.md new file mode 100644 index 000000000..7978b2299 --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/README.md @@ -0,0 +1,107 @@ +# CrewAI ↔ OCI Generative AI Integration + +This repository provides examples and configuration guidelines for integrating **[CrewAI](https://github.com/joaomdmoura/crewAI)** with **Oracle Cloud Infrastructure (OCI) Generative AI** services. +The goal is to demonstrate how CrewAI agents can seamlessly leverage OCI-hosted models through the **LiteLLM gateway**. + +Reviewed: 31.10.2025 + +--- + +## 🔐 Security Configuration + +Before running the demos, you must configure access credentials for OCI. + +In these examples, we use a **locally stored key pair** for authentication. +Ensure your local OCI configuration (`~/.oci/config` and private key) is correctly set up and accessible to the Python SDK. + +To correctly start the **LiteLLM gateway** you need to create and configure correctly a **config.yml** file. To create this file use the [template](./config_template.yml). + +In addition, you should be **enabled** to use OCI Generative AI Service in your tenant. If you haven't yet used OCI GenAI ask to your tenant's admin to setup the **needed policies**. + +--- + +## 🧩 Demos Included + +- [Simple CrewAI Agent](./simple_test_crewai_agent.py) — basic CrewAI agent interacting with an LLM through OCI +- [OCi Consumption Report](./crew_agent_mcp02.py) +- *(More demos to be added soon)* + +--- + +## 📦 Dependencies + +The project relies on the following main packages: + +| Dependency | Purpose | +|-------------|----------| +| **CrewAI** | Framework for creating multi-agent workflows | +| **OCI Python SDK** | Access OCI services programmatically | +| **LiteLLM (Gateway)** | OpenAI-compatible proxy for accessing OCI Generative AI models | + +To connect CrewAI to OCI models, we use a **LiteLLM gateway**, which exposes OCI GenAI via an **OpenAI-compatible** REST API. + +--- + +## ⚙️ Environment Setup + +1. **Create a Conda environment** +```bash +conda create -n crewai python=3.11 +``` + +2. **Activate** the environment +``` +conda activate crewai +``` + +3. **Install** the required **packages** +``` +pip install -U oci litellm "litellm[proxy]" crewai +``` + +4. Run the LiteLLM Gateway + +Start the LiteLLM gateway using your configuration file (config.yml): +``` +./start_gateway.sh +``` + +Make sure the gateway starts successfully and is listening on the configured port (e.g., http://localhost:4000/v1). + +🧠 Test the Integration + +Run the sample CrewAI agent to verify that CrewAI can connect to OCI through LiteLLM: + +``` +python simple_test_crewai_agent.py +``` + +If the setup is correct, you should see the agent’s output using an OCI model. + +## Integrate Agents with **MCP** servers. +Install this additional package: + +``` +pip install 'crewai-tools[mcp]' +``` + +You can test the integration with **MCP** using [OCI Consumption report](./crew_agent_mcp02.py) that generates a report +of the consumption in your tenant (top 5 compartments, for 4 weeks). + +To have this demo up&running: +* download the code for the MCP server from [here](https://github.com/oracle-devrel/technology-engineering/blob/main/ai/gen-ai-agents/mcp-oci-integration/mcp_consumption.py) +* start the MCP server, on a free port (for example 9500) +* register the URL, in [source](./crew_agent_mcp02.py), in the section: +``` +server_params = { + "url": "http://localhost:9500/mcp", + "transport": "streamable-http" +} +``` + +If you don't want to secure (with JWT) the communication with the MCP server, put +``` +ENABLE_JWT_TOKEN = False +``` +in the config.py file. + diff --git a/ai/gen-ai-agents/crewai-oci-integration/config_template.yml b/ai/gen-ai-agents/crewai-oci-integration/config_template.yml new file mode 100644 index 000000000..c44fc9eb3 --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/config_template.yml @@ -0,0 +1,36 @@ +# config.yaml for litellm with OCI Grok models +litellm_settings: + drop_params: true + # drop unsupported params instead of 500 errors + additional_drop_params: ["max_retries"] + +# Common OCI connection parameters +common_oci: &common_oci + provider: oci + oci_region: us-chicago-1 + oci_serving_mode: ON_DEMAND + supports_tool_calls: true + oci_user: your-oci-user-ocid + oci_fingerprint: your-oci-api-key-fingerprint + oci_tenancy: your-oci-tenancy-ocid + oci_compartment_id: your-oci-compartment-ocid + oci_key_file: /path/to/your/oci_api_key.pem + api_key: key4321 + + +# List of models +model_list: + - model_name: grok4-oci + litellm_params: + <<: *common_oci # merge common OCI params + model: oci/xai.grok-4 + + - model_name: grok4-fast-oci + litellm_params: + <<: *common_oci + model: oci/xai.grok-4-fast-reasoning + +general_settings: + telemetry: false + proxy_logging: false + allow_model_alias: true diff --git a/ai/gen-ai-agents/crewai-oci-integration/crew_agent_mcp01.py b/ai/gen-ai-agents/crewai-oci-integration/crew_agent_mcp01.py new file mode 100644 index 000000000..e717315ef --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/crew_agent_mcp01.py @@ -0,0 +1,58 @@ +""" +CrewAI agent with MCP + +This one is doing Deep research using internet search tools via MCP server. + +see: + https://docs.crewai.com/en/mcp/overview + https://docs.crewai.com/en/mcp/multiple-servers +""" +import os +from crewai import Agent, Task, Crew, LLM +from crewai_tools import MCPServerAdapter + +# Disable telemetry, tracing, and logging +os.environ["CREWAI_LOGGING_ENABLED"] = "false" +os.environ["CREWAI_TELEMETRY_ENABLED"] = "false" +os.environ["CREWAI_TRACING_ENABLED"] = "false" + +llm = LLM( + model="grok4-fast-oci", + # LiteLLM proxy endpoint + base_url="http://localhost:4000/v1", + api_key="sk-local-any", + temperature=0.2, + max_tokens=4000, +) + +server_params = { + "url": "http://localhost:8500/mcp", + "transport": "streamable-http" +} + +# Create agent with MCP tools +with MCPServerAdapter(server_params, connect_timeout=60) as mcp_tools: + print(f"Available tools: {[tool.name for tool in mcp_tools]}") + + research_agent = Agent( + role="Research Analyst", + goal="Find and analyze information using advanced search tools", + backstory="Expert researcher with access to multiple data sources", + llm=llm, + tools=mcp_tools, + verbose=True + ) + + # Create task + research_task = Task( + description="Research the latest developments in AI agent frameworks", + expected_output="Comprehensive research report with citations", + agent=research_agent + ) + + # Create and run crew + crew = Crew(agents=[research_agent], tasks=[research_task]) + + result = crew.kickoff() + + print(result) \ No newline at end of file diff --git a/ai/gen-ai-agents/crewai-oci-integration/crew_agent_mcp02.py b/ai/gen-ai-agents/crewai-oci-integration/crew_agent_mcp02.py new file mode 100644 index 000000000..675405604 --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/crew_agent_mcp02.py @@ -0,0 +1,79 @@ +""" +CrewAI agent with MCP + +This one is analyzing tenant consumption via MCP server. + +see: + https://docs.crewai.com/en/mcp/overview + https://docs.crewai.com/en/mcp/multiple-servers +""" +import os +from datetime import datetime +from crewai import Agent, Task, Crew, LLM +from crewai_tools import MCPServerAdapter + +# Disable telemetry, tracing, and logging +os.environ["CREWAI_LOGGING_ENABLED"] = "false" +os.environ["CREWAI_TELEMETRY_ENABLED"] = "false" +os.environ["CREWAI_TRACING_ENABLED"] = "false" + +llm = LLM( + model="grok4-oci", + # LiteLLM proxy endpoint + base_url="http://localhost:4000/v1", + api_key="sk-local-any", + temperature=0., + max_tokens=4000, +) + +# OCI consumption +server_params = { + "url": "http://localhost:9500/mcp", + "transport": "streamable-http" +} + +# Create agent with MCP tools +with MCPServerAdapter(server_params, connect_timeout=60) as mcp_tools: + print(f"Available tools: {[tool.name for tool in mcp_tools]}") + + research_agent = Agent( + role="OCI Consumption Analyst", + goal="Find and analyze information about OCI tenant consumption.", + backstory="Expert analyst with access to multiple data sources", + llm=llm, + tools=mcp_tools, + max_iter=30, + max_retry_limit=5, + verbose=True + ) + + # Create task + research_task = Task( + description="Identify the top 5 compartments by consumption (amount) for the OCI tenant " + "in the weeks of the month of september 2025, analyze the trends and provide insights on usage patterns." + "Analyze fully the top 5 compartments. Use only the amount, not the quantity.", + expected_output="Comprehensive report with data-backed insights.", + agent=research_agent + ) + + # Create and run crew + crew = Crew(agents=[research_agent], tasks=[research_task]) + + result = crew.kickoff() + + print(result) + + # --- Save the result to a Markdown file --- + # Create an output directory if it doesn’t exist + output_dir = "reports" + os.makedirs(output_dir, exist_ok=True) + + # Use timestamped filename for clarity + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + output_path = os.path.join(output_dir, f"oci_consumption_report_{timestamp}.md") + + # Write the result + with open(output_path, "w", encoding="utf-8") as f: + f.write(str(result)) + + print(f"\n✅ Report saved successfully to: {output_path}") \ No newline at end of file diff --git a/ai/gen-ai-agents/crewai-oci-integration/multi_agent_report.py b/ai/gen-ai-agents/crewai-oci-integration/multi_agent_report.py new file mode 100644 index 000000000..ca25d72e1 --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/multi_agent_report.py @@ -0,0 +1,270 @@ +""" +Multi-agent report builder with CrewAI + LiteLLM (OCI) +- Planner -> generates outline +- Multiple Section Writers -> draft each section +- Synthesizer -> compiles final report + +Run: + python multi_agent_report.py "Subject to analyze" +""" + +import os +import sys +from typing import List +from pydantic import BaseModel, Field +from crewai import Agent, Task, Crew, LLM + +# --- Disable CrewAI phone-home/logs in locked-down environments --- +os.environ["CREWAI_LOGGING_ENABLED"] = "false" +os.environ["CREWAI_TELEMETRY_ENABLED"] = "false" +os.environ["CREWAI_TRACING_ENABLED"] = "false" + +# Make Instructor/OpenAI client use your LiteLLM proxy +os.environ.setdefault("OPENAI_API_KEY", "sk-local-any") +os.environ.setdefault("OPENAI_BASE_URL", "http://localhost:4000/v1") + + +# ========================= +# LLM CONFIG (LiteLLM proxy) +# ========================= +def make_llm(): + print("\n=== CONFIGURING LLM ===") + return LLM( + model="grok4-fast-oci", # your LiteLLM model alias + base_url="http://localhost:4000/v1", # LiteLLM proxy endpoint + api_key="sk-local-any", + temperature=0.2, + max_tokens=4000, + ) + + +# ========================= +# STRUCTURED OUTPUT MODELS +# ========================= +class Outline(BaseModel): + subject: str + title: str + sections: List[str] = Field(..., description="Ordered list of section titles") + + +class SectionDraft(BaseModel): + section_title: str + key_points: List[str] + content: str + + +class FinalReport(BaseModel): + subject: str + outline: Outline + executive_summary: str + sections: List[SectionDraft] + + +# ========================= +# AGENTS +# ========================= +def make_planner(llm: LLM) -> Agent: + print("=== DEFINING PLANNER AGENT ===") + return Agent( + role="Planner", + goal="Create a clear, logically ordered outline for a technical report.", + backstory=( + "A senior analyst with strong information architecture skills. " + "Produces pragmatic outlines tailored to enterprise readers." + ), + llm=llm, + allow_delegation=False, + ) + + +def make_section_writer(llm: LLM) -> Agent: + print("=== DEFINING SECTION WRITER AGENT ===") + return Agent( + role="Section Writer", + goal="Write concise, well-structured sections from an outline.", + backstory=( + "A staff technical writer focused on clarity, correctness, and actionable insights. " + "Avoids fluff and repetition; uses bullet points where helpful." + ), + llm=llm, + allow_delegation=False, + ) + + +def make_synthesizer(llm: LLM) -> Agent: + print("=== DEFINING SYNTHESIZER AGENT ===") + return Agent( + role="Report Synthesizer", + goal="Assemble a coherent, polished report from multiple section drafts.", + backstory=( + "An editor who specializes in executive summaries and narrative cohesion. " + "Ensures consistency of tone, terminology, and depth across sections." + ), + llm=llm, + allow_delegation=False, + ) + + +# ========================= +# SINGLE-TASK CREWS HELPERS +# (We run 3 stages: plan -> write -> synthesize) +# ========================= +def run_planner(subject: str, llm: LLM) -> Outline: + planner = make_planner(llm) + + print("=== DEFINING PLANNER TASK ===") + plan_task = Task( + description=( + "Create a structured outline for a technical report on the subject:\n" + f"SUBJECT: {subject}\n\n" + "Constraints:\n" + "- Audience: enterprise architects / AI platform owners.\n" + "- Depth: practical and decision-oriented.\n" + "- Include 5–8 sections, ordered logically.\n" + "- Title should be short and informative.\n" + "Return ONLY a valid JSON object matching the schema.\n" + ), + expected_output=( + "A JSON object with: 'subject', 'title', and 'sections' (array of section titles)." + ), + agent=planner, + output_pydantic=Outline, + ) + + print("=== RUNNING PLANNER CREW ===") + crew = Crew(agents=[planner], tasks=[plan_task]) + _ = crew.kickoff() + + outline = plan_task.output.pydantic # type: ignore + if not outline or not outline.sections: + raise RuntimeError("Planner produced no sections. Check LLM config or prompts.") + return outline + + +def run_section_writers(outline: Outline, llm: LLM) -> List[SectionDraft]: + writer = make_section_writer(llm) + + section_tasks: List[Task] = [] + print("=== DEFINING SECTION TASKS ===") + for idx, section in enumerate(outline.sections, start=1): + t = Task( + description=( + f"Write the section #{idx} titled '{section}' for a report titled '{outline.title}' " + f"on the subject '{outline.subject}'.\n\n" + "Deliverables:\n" + "- 4–7 key bullet points (actionable and non-redundant).\n" + "- A concise section narrative (120–250 words), no marketing fluff.\n" + "- Avoid repeating content from other sections.\n" + "Return ONLY a valid JSON object matching the schema.\n" + ), + expected_output=( + "A JSON object with 'section_title', 'key_points' (array of strings), and 'content' (string)." + ), + agent=writer, + output_pydantic=SectionDraft, + ) + section_tasks.append(t) + + print("=== RUNNING SECTION WRITERS CREW ===") + crew = Crew(agents=[writer], tasks=section_tasks) + _ = crew.kickoff() + + drafts: List[SectionDraft] = [] + for t in section_tasks: + p = getattr(t.output, "pydantic", None) + if not p: + raise RuntimeError( + f"Section task for '{t.description[:60]}...' produced no structured output." + ) + drafts.append(p) + return drafts + + +def run_synthesizer( + outline: Outline, drafts: List[SectionDraft], llm: LLM +) -> FinalReport: + synthesizer = make_synthesizer(llm) + + print("=== DEFINING SYNTHESIS TASK ===") + # Prepare a compact representation of drafts for the synthesizer's context + drafts_context = "\n\n".join( + [ + f"[{i+1}] {d.section_title}\n- " + + "\n- ".join(d.key_points) + + f"\n\n{d.content}" + for i, d in enumerate(drafts) + ] + ) + + synth_task = Task( + description=( + f"Assemble the final report for SUBJECT: {outline.subject}\n" + f"TITLE: {outline.title}\n\n" + "You are given the drafted sections below. Your job:\n" + "1) Produce a crisp executive summary (120–180 words)\n" + "2) Preserve the order of sections.\n" + "3) Normalize terminology and tone across sections.\n" + "4) Do not introduce new claims; keep it faithful to the drafts.\n" + "Return ONLY a valid JSON object matching the schema.\n\n" + f"DRAFTED SECTIONS:\n{drafts_context}\n" + ), + expected_output=( + "A JSON object with: 'subject', 'outline' (with subject/title/sections), " + "'executive_summary' (string), and 'sections' (array of {section_title,key_points,content})." + ), + agent=synthesizer, + output_pydantic=FinalReport, + ) + + print("=== RUNNING SYNTHESIS CREW ===") + crew = Crew(agents=[synthesizer], tasks=[synth_task]) + _ = crew.kickoff() + + final_report = synth_task.output.pydantic # type: ignore + if not final_report: + raise RuntimeError("Synthesizer produced no structured report.") + return final_report + + +# ========================= +# MAIN +# ========================= +def main(): + if len(sys.argv) < 2: + print('Usage: python multi_agent_report.py "Your subject here"') + sys.exit(1) + + subject = sys.argv[1].strip() + print(f"\n=== SUBJECT ===\n{subject}\n") + + llm = make_llm() + + # Stage 1: Plan + outline = run_planner(subject, llm) + print("\n=== OUTLINE (structured) ===") + print(outline.model_dump_json(indent=2)) + + # Stage 2: Write sections + drafts = run_section_writers(outline, llm) + print("\n=== FIRST SECTION DRAFT (preview) ===") + print(drafts[0].model_dump_json(indent=2)) + + # Stage 3: Synthesize final report + final_report = run_synthesizer(outline, drafts, llm) + print("\n=== FINAL REPORT (structured) ===") + print(final_report.model_dump_json(indent=2)) + + # Optional: also print a readable text version + print("\n=== FINAL REPORT (readable) ===\n") + print(f"# {final_report.outline.title}\n") + print("## Executive Summary\n") + print(final_report.executive_summary.strip(), "\n") + for i, sec in enumerate(final_report.sections, start=1): + print(f"## {i}. {sec.section_title}") + if sec.key_points: + print("\n- " + "\n- ".join(sec.key_points)) + print("\n" + sec.content.strip() + "\n") + + +if __name__ == "__main__": + main() diff --git a/ai/gen-ai-agents/crewai-oci-integration/simple_test_crewai_agent.py b/ai/gen-ai-agents/crewai-oci-integration/simple_test_crewai_agent.py new file mode 100644 index 000000000..463a50b0b --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/simple_test_crewai_agent.py @@ -0,0 +1,53 @@ +""" +Test CrewAI with LiteLLM and OCI Generative AI +""" + +import os +from crewai import Agent, Task, Crew, LLM + +# Disable telemetry, tracing, and logging +os.environ["CREWAI_LOGGING_ENABLED"] = "false" +os.environ["CREWAI_TELEMETRY_ENABLED"] = "false" +os.environ["CREWAI_TRACING_ENABLED"] = "false" + +# Configure the LLM (Grok model served via LiteLLM proxy on OCI) +print("\n=== CONFIGURING LLM ===") + +llm = LLM( + model="grok4-fast-oci", + # LiteLLM proxy endpoint + base_url="http://localhost:4000/v1", + api_key="sk-local-any", + temperature=0.2, + max_tokens=4000, +) + +# Define the agent +print("=== DEFINING AGENT ===") +researcher = Agent( + role="Researcher", + goal="Analyze documents and synthesize insights.", + backstory="Expert in enterprise Generative AI.", + llm=llm, +) + +# Define the task assigned to the agent +print("=== DEFINING TASK ===") +task = Task( + description="Summarize in 10 bullet points the pros and cons of using LiteLLM with OCI Generative AI.", + expected_output="A 10-bullet summary, clear and non-redundant.", + agent=researcher, +) + +# Create the crew (collection of agents and tasks) +print("=== CREATING CREW ===") +crew = Crew(agents=[researcher], tasks=[task]) + +# Execute the crew and print the result +print("") +print("\n=== EXECUTING CREW ===\n") + +result = crew.kickoff() + +print("\n=== CREW RESULT ===\n") +print(result) diff --git a/ai/gen-ai-agents/crewai-oci-integration/start_gateway.sh b/ai/gen-ai-agents/crewai-oci-integration/start_gateway.sh new file mode 100755 index 000000000..6c4be304e --- /dev/null +++ b/ai/gen-ai-agents/crewai-oci-integration/start_gateway.sh @@ -0,0 +1,2 @@ +# export LITELLM_LOG=DEBUG +litellm --config ./config.yml --port 4000 \ No newline at end of file diff --git a/ai/gen-ai-agents/custom-rag-agent/requirements.txt b/ai/gen-ai-agents/custom-rag-agent/requirements.txt index 70910f34f..e2786e97f 100644 --- a/ai/gen-ai-agents/custom-rag-agent/requirements.txt +++ b/ai/gen-ai-agents/custom-rag-agent/requirements.txt @@ -43,7 +43,7 @@ executing==2.2.0 faiss-cpu==1.11.0.post1 fastapi==0.115.14 fastjsonschema==2.21.1 -fastmcp==2.9.2 +fastmcp==2.13.0 filetype==1.2.0 flatbuffers==25.2.10 fonttools==4.56.0 @@ -94,7 +94,7 @@ langchain-text-splitters==0.3.8 langchain-unstructured==0.1.6 langdetect==1.0.9 langgraph==0.5.0 -langgraph-checkpoint==2.1.0 +langgraph-checkpoint==3.0.0 langgraph-prebuilt==0.5.1 langgraph-sdk==0.1.55 langsmith==0.4.4 @@ -192,7 +192,7 @@ soupsieve==2.6 SQLAlchemy==2.0.38 sse-starlette==2.3.6 stack-data==0.6.3 -starlette==0.47.2 +starlette==0.49.1 streamlit==1.43.0 sympy==1.14.0 tabulate==0.9.0 diff --git a/ai/gen-ai-agents/mcp-oci-integration/requirements.txt b/ai/gen-ai-agents/mcp-oci-integration/requirements.txt index 6f93652d8..0df9d6bdc 100644 --- a/ai/gen-ai-agents/mcp-oci-integration/requirements.txt +++ b/ai/gen-ai-agents/mcp-oci-integration/requirements.txt @@ -24,7 +24,7 @@ docstring_parser==0.17.0 docutils==0.22 email-validator==2.3.0 exceptiongroup==1.3.0 -fastmcp==2.12.2 +fastmcp==2.13.0 frozenlist==1.7.0 gitdb==4.0.12 GitPython==3.1.45 @@ -109,7 +109,7 @@ smmap==5.0.2 sniffio==1.3.1 SQLAlchemy==2.0.43 sse-starlette==3.0.2 -starlette==0.47.3 +starlette==0.49.1 streamlit==1.49.1 tenacity==9.1.2 toml==0.10.2 diff --git a/ai/gen-ai-agents/travel-agent/requirements.txt b/ai/gen-ai-agents/travel-agent/requirements.txt index c8a1954f0..1596be57a 100644 --- a/ai/gen-ai-agents/travel-agent/requirements.txt +++ b/ai/gen-ai-agents/travel-agent/requirements.txt @@ -77,7 +77,7 @@ langchain-community==0.3.24 langchain-core==0.3.60 langchain-text-splitters==0.3.8 langgraph==0.4.5 -langgraph-checkpoint==2.0.26 +langgraph-checkpoint==3.0.0 langgraph-prebuilt==0.1.8 langgraph-sdk==0.1.69 langsmith==0.3.42 @@ -146,7 +146,7 @@ sniffio==1.3.1 soupsieve==2.7 SQLAlchemy==2.0.41 stack-data==0.6.3 -starlette==0.47.2 +starlette==0.49.1 streamlit==1.45.1 tenacity==9.1.2 terminado==0.18.1 diff --git a/ai/generative-ai-service/car-insurance-cahtbot/LICENSE b/ai/generative-ai-service/car-insurance-chatbot/LICENSE similarity index 100% rename from ai/generative-ai-service/car-insurance-cahtbot/LICENSE rename to ai/generative-ai-service/car-insurance-chatbot/LICENSE diff --git a/ai/generative-ai-service/car-insurance-cahtbot/README.md b/ai/generative-ai-service/car-insurance-chatbot/README.md similarity index 100% rename from ai/generative-ai-service/car-insurance-cahtbot/README.md rename to ai/generative-ai-service/car-insurance-chatbot/README.md diff --git a/ai/generative-ai-service/car-insurance-cahtbot/files/config.py b/ai/generative-ai-service/car-insurance-chatbot/files/config.py similarity index 100% rename from ai/generative-ai-service/car-insurance-cahtbot/files/config.py rename to ai/generative-ai-service/car-insurance-chatbot/files/config.py diff --git a/ai/generative-ai-service/car-insurance-cahtbot/files/motor-insurance-chatbot.py b/ai/generative-ai-service/car-insurance-chatbot/files/motor-insurance-chatbot.py similarity index 100% rename from ai/generative-ai-service/car-insurance-cahtbot/files/motor-insurance-chatbot.py rename to ai/generative-ai-service/car-insurance-chatbot/files/motor-insurance-chatbot.py diff --git a/ai/generative-ai-service/car-insurance-cahtbot/files/style.css b/ai/generative-ai-service/car-insurance-chatbot/files/style.css similarity index 100% rename from ai/generative-ai-service/car-insurance-cahtbot/files/style.css rename to ai/generative-ai-service/car-insurance-chatbot/files/style.css diff --git a/ai/generative-ai-service/complex-document-rag/.gitignore b/ai/generative-ai-service/complex-document-rag/.gitignore new file mode 100644 index 000000000..01b77f130 --- /dev/null +++ b/ai/generative-ai-service/complex-document-rag/.gitignore @@ -0,0 +1,26 @@ +# macOS system files +.DS_Store + +.env +# Python cache +**/__pycache__/ + +# Virtual environments +venv/ + +# Local config +config.py + +# Data folders +data/ +embeddings/ +charts/ +reports/ + +# Logs +*.log +logs/ + +# Text files (except requirements.txt) +*.txt +!requirements.txt \ No newline at end of file diff --git a/ai/generative-ai-service/complex-document-rag/README.md b/ai/generative-ai-service/complex-document-rag/README.md index dc0a5c3d7..4120deea5 100644 --- a/ai/generative-ai-service/complex-document-rag/README.md +++ b/ai/generative-ai-service/complex-document-rag/README.md @@ -2,7 +2,7 @@ An enterprise-grade Retrieval-Augmented Generation (RAG) system for generating comprehensive business reports from multiple document sources using Oracle Cloud Infrastructure (OCI) Generative AI services. -Reviewed date: 22.09.2025 +Reviewed date: 03.11.2025 ## Features @@ -14,7 +14,7 @@ Reviewed date: 22.09.2025 - **Citation Tracking**: Source attribution with references - **Multi-Language Support**: Generate reports in English, Arabic, Spanish, and French - **Visual Analytics**: Automatic chart and table generation from data - +![Application screenshot](files/images/screenshot1.png) ## Prerequisites - Python 3.11+ diff --git a/ai/generative-ai-service/complex-document-rag/files/agents/agent_factory.py b/ai/generative-ai-service/complex-document-rag/files/agents/agent_factory.py index 9987f0113..e4332cab9 100644 --- a/ai/generative-ai-service/complex-document-rag/files/agents/agent_factory.py +++ b/ai/generative-ai-service/complex-document-rag/files/agents/agent_factory.py @@ -445,38 +445,45 @@ def _process_batch(self, batch: List[Dict[str, Any]]) -> List[str]: prompt = "\n".join(prompt_parts) self.log_prompt(prompt, f"ChunkRewriter (Batch of {len(batch)})") - response = self.llm.invoke([DummyMessage(prompt)]) - - # Handle different LLM response styles - if hasattr(response, "content"): - text = response.content.strip() - elif isinstance(response, list) and isinstance(response[0], dict): - text = response[0].get("generated_text") or response[0].get("text") - if not text: - raise ValueError("⚠️ No valid 'generated_text' found in response.") - text = text.strip() - else: - raise TypeError(f"⚠️ Unexpected response type: {type(response)} — {response}") - - self.log_response(text, f"ChunkRewriter (Batch of {len(batch)})") - rewritten_chunks = self._parse_batch_response(text, len(batch)) - rewritten_chunks = [self._clean_chunk_text(chunk) for chunk in rewritten_chunks] - - # Enhanced logging with side-by-side comparison - paired = list(zip(batch, rewritten_chunks)) - for i, (original_chunk, rewritten_text) in enumerate(paired, 1): - # Get the actual raw chunk text, not the metadata - original_text = original_chunk.get("text", "") - metadata = original_chunk.get("metadata", {}) - - # Use demo logger for visual comparison if available - if DEMO_MODE and hasattr(logger, 'chunk_comparison'): - # Pass the actual chunk text, not metadata - logger.chunk_comparison(original_text, rewritten_text, metadata) + try: + response = self.llm.invoke([DummyMessage(prompt)]) + + # Handle different LLM response styles + if hasattr(response, "content"): + text = response.content.strip() + elif isinstance(response, list) and isinstance(response[0], dict): + text = response[0].get("generated_text") or response[0].get("text") + if not text: + raise ValueError("⚠️ No valid 'generated_text' found in response.") + text = text.strip() else: - logger.info(f"⚙ Rewritten Chunk {i}:\n{rewritten_text}\nMetadata: {json.dumps(metadata, indent=2)}\n") - - return rewritten_chunks + raise TypeError(f"⚠️ Unexpected response type: {type(response)} — {response}") + + self.log_response(text, f"ChunkRewriter (Batch of {len(batch)})") + rewritten_chunks = self._parse_batch_response(text, len(batch)) + rewritten_chunks = [self._clean_chunk_text(chunk) for chunk in rewritten_chunks] + + # Enhanced logging with side-by-side comparison + paired = list(zip(batch, rewritten_chunks)) + for i, (original_chunk, rewritten_text) in enumerate(paired, 1): + # Get the actual raw chunk text, not the metadata + original_text = original_chunk.get("text", "") + metadata = original_chunk.get("metadata", {}) + + # Use demo logger for visual comparison if available + if DEMO_MODE and hasattr(logger, 'chunk_comparison'): + # Pass the actual chunk text, not metadata + logger.chunk_comparison(original_text, rewritten_text, metadata) + else: + logger.info(f"⚙ Rewritten Chunk {i}:\n{rewritten_text}\nMetadata: {json.dumps(metadata, indent=2)}\n") + + return rewritten_chunks + + except Exception as e: + # Handle timeout and other errors gracefully + logger.error(f"❌ Batch processing failed: {e}") + # Return None for each chunk to indicate failure (not empty strings!) + return [None] * len(batch) def _parse_batch_response(self, response_text: str, expected_chunks: int) -> List[str]: @@ -581,8 +588,7 @@ def _detect_comparison_query(self, query: str) -> bool: """Use LLM to detect whether the query involves a comparison.""" prompt = f""" Does the query below involve a **side-by-side comparison between two or more named entities such as companies, organizations, or products**? - -Exclude comparisons to frameworks (e.g., CSRD, ESRS), legal standards, or regulations — those do not count. +Include comparisons to frameworks (e.g., CSRD, ESRS), legal standards, or regulations. Query: "{query}" @@ -641,228 +647,206 @@ def extract_first_json_list(text): return re.findall(r'"([^"]+)"', text) def _extract_entities(self, query: str) -> List[str]: - """Use LLM to extract entity names, then normalize + dedupe.""" - prompt = f""" -Extract company/organization names mentioned in the query and return a CLEANED JSON list. + """Prefer exact vector-store tags typed by the user; LLM only as fallback.""" + import re + logger = getattr(self, "logger", None) or __import__("logging").getLogger(__name__) -CLEANING RULES (apply to each name before returning): -- Lowercase everything. -- Remove legal suffixes at the end: plc, ltd, inc, llc, lp, l.p., corp, corporation, co., co, s.a., s.a.s., ag, gmbh, bv, nv, oy, ab, sa, spa, pte, pvt, pty, srl, sro, k.k., kk, kabushiki kaisha. -- Remove punctuation except internal ampersands (&). Collapse multiple spaces. -- No duplicates. + # --- 0) known tag set from your vector store (lowercased) --- + # Populate this once at init: self.known_tags = {id.lower() for id in vector_store_ids()} + known = getattr(self, "known_tags", None) -CONSTRAINTS: -- Return ONLY a JSON list of strings, e.g. ["aelwyn","elinexa"] -- No prose, no keys, no explanations. -- Do not include standards, clause numbers, sectors, or generic words like "entity". -- If none are present, return []. + tagged = [] -Examples: -Query: "Compare Aelwyn vs Elinexa PLC policies" -Return: ["aelwyn","elinexa"] + # A) Existing FY/Q pattern (kept) + tagged += [m.group(0) for m in re.finditer( + r"\b[A-Za-z][A-Za-z0-9\-]*_(?:FY|Q[1-4])\d{2,4}\b", query, flags=re.I + )] -Query: "Barclays (UK) and JPMorgan Chase & Co." -Return: ["barclays","jpmorgan chase & co"] + # B) NEW: generic "_" e.g., "mof_2022", "mof_2024" + tagged += [m.group(0) for m in re.finditer( + r"\b[A-Za-z][A-Za-z0-9\-]*_\d{2,4}\b", query + )] -Query: "What are Microsoft’s 2030 targets?" -Return: ["microsoft"] + # C) (Optional but useful) quoted tokens like "mof_2022" + tagged += [m.group(1) for m in re.finditer( + r'"([A-Za-z0-9][A-Za-z0-9_\-]{1,80})"', query + )] -Query: "No company here" -Return: [] + # De-dup preserve order (case-insensitive) + seen = set() + tagged_unique: List[str] = [] + for t in tagged: + k = t.lower() + if k not in seen: + # If we know the store IDs, only keep those that exist + if not known or k in known: + seen.add(k) + tagged_unique.append(t) + + # --- Early return: if user typed valid tags, trust them verbatim --- + if tagged_unique: + if logger: + logger.info(f"[Entity Extractor] Exact tags: {tagged_unique}") + return tagged_unique + + # --- Fallback: your original LLM extraction (unchanged) --- + prompt = f""" + Extract company/organization names mentioned in the query and return a CLEANED JSON list. -Now process this query: + CLEANING RULES (apply to each name before returning): + - Lowercase everything. + - Remove legal suffixes at the end: plc, ltd, inc, llc, lp, l.p., corp, corporation, co., co, s.a., s.a.s., ag, gmbh, bv, nv, oy, ab, sa, spa, pte, pvt, pty, srl, sro, k.k., kk, kabushiki kaisha. + - Remove punctuation except internal ampersands (&). Collapse multiple spaces. + - No duplicates. -{query} -""" + CONSTRAINTS: + - Return ONLY a JSON list of strings, e.g. ["aelwyn","elinexa"] + - No prose, no keys, no explanations. + - Do not include standards, clause numbers, sectors, or generic words like "entity". + - If none are present, return []. + + Now process this query: + + {query} + """ try: raw = self.llm(prompt).strip() - print(raw) entities = self.extract_first_json_list(raw) - # Keep strings only and strip whitespace entities = [e.strip() for e in entities if isinstance(e, str) and e.strip()] - # Deduplicate while preserving order - seen = set() - cleaned: List[str] = [] + final: List[str] = [] + seen2 = set() + for e in entities: - if e.lower() not in seen: - seen.add(e.lower()) - cleaned.append(e) + k = e.lower() + if (not known or k in known) and k not in seen2: + seen2.add(k) + final.append(e) - if not cleaned: - logger.warning(f"[Entity Extractor] No plausible entities extracted from LLM output: {entities}") + if not final and logger: + logger.warning(f"[Entity Extractor] No plausible entities extracted. LLM: {entities} | tags: []") - logger.info(f"[Entity Extractor] Raw: {raw} | Cleaned: {cleaned}") - return cleaned + if logger: + logger.info(f"[Entity Extractor] Raw: {raw} | Tags: [] | Final: {final}") + return final except Exception as e: - logger.warning(f"⚠️ Failed to robustly extract entities via LLM: {e}") + if logger: + logger.warning(f"⚠️ Failed to robustly extract entities via LLM: {e}") return [] + def plan( - self, - query: str, - context: List[Dict[str, Any]] | None = None, - is_comparison_report: bool = False - ) -> tuple[list[Dict[str, Any]], list[str], bool]: - """ - Strategic planner that returns structured topics with steps. - Supports both comparison and single-entity analysis with consistent output format. + self, + query: str, + context: List[Dict[str, Any]] | None = None, + is_comparison_report: bool = False, + comparison_mode: str | None = None, # kept for compatibility, not used to hardcode content + provided_entities: Optional[List[str]] = None + ) -> tuple[list[Dict[str, Any]], list[str], bool]: """ - raw = None - is_comparison = self._detect_comparison_query(query) or is_comparison_report - entities = self._extract_entities(query) - logger.info(f"[Planner] Detected entities: {entities} | Comparison task: {is_comparison}") - - if is_comparison and len(entities) < 2: - logger.warning(f"⚠️ Comparison task detected but only {len(entities)} entity found: {entities}") - is_comparison = False # fallback to single-entity mode + PROMPT-DRIVEN PLANNER + - Derive section topics from the user's TASK PROMPT (not hardcoded). + - For each topic, emit one mirrored retrieval step per entity. + - Output shape: List[{"topic": str, "steps": List[str]}], plus (entities, is_comparison). - ctx = "\n".join(f"{i+1}. {c['content']}" for i, c in enumerate(context or [])) - - if is_comparison: - template = """ - You are a strategic planning agent generating grouped research steps for a comparative analysis report. + Returns: + (plan, entities, is_comparison) + """ - TASK: {query} + # 1) Determine comparison intent and entities (keep your existing logic) + is_comparison = self._detect_comparison_query(query) or is_comparison_report - OBJECTIVE: - Break the task into high-level comparison **topics**. For each topic, generate **two steps** — one per entity. + if provided_entities: + entities = [e for e in provided_entities if isinstance(e, str) and e.strip()] + logger.info(f"[Planner] Using provided entities: {entities}") + else: + entities = self._extract_entities(query) + logger.info(f"[Planner] Detected entities: {entities} | Comparison task: {is_comparison}") - RULES: - - Keep topic titles focused and distinct (e.g., "Scope 1 Emissions") - - Use a consistent step format: "Find (something) for (Entity)" - - Use only these entities: {entities} + # If comparison requested but <2 entities, degrade gracefully to single-entity mode + if is_comparison and len(entities) < 2: + logger.warning(f"⚠️ Comparison requested but only {len(entities)} entity found: {entities}. Falling back to single-entity.") + is_comparison = False + # 2) Ask the LLM ONLY for topics (strings), not full objects — we’ll build steps ourselves + # This avoids fragile JSON with missing "topic" keys. + topic_prompt = f""" +Extract the main section topics from the TASK PROMPT. +Use the user's own headings/bullets/order when present. +If none are explicit, infer 5–10 concise, non-overlapping topics that reflect the user's request. - EXAMPLE: - [ - {{ - "topic": "Net-Zero Targets", - "steps": [ - "Find net-zero targets for Company-A", - "Find net-zero targets for Company-B" - ] - }} - ] +TASK PROMPT: +{query} - TASK: {query} +Return ONLY a JSON array of strings, e.g. ["Executive Summary","Revenue Analysis","Profitability"]. +No prose, no keys, no markdown. +""" + self.log_prompt(topic_prompt, "Planner: Topic Extraction") + raw_topics = None + topics: list[str] = [] + try: + raw_topics = self.llm(topic_prompt).strip() + json_str = UniversalJSONCleaner.clean_and_extract_json(raw_topics, expected_type="array") + parsed = UniversalJSONCleaner.parse_with_validation(json_str, expected_structure=None) + if isinstance(parsed, list): + # Keep only non-empty strings + topics = [str(t).strip() for t in parsed if isinstance(t, (str, int, float)) and str(t).strip()] + except Exception as e: + logger.error(f"❌ Topic extraction failed: {e}") + logger.debug(f"Raw topic response:\n{raw_topics}") + + # 2b) Hard fallback: if still empty, derive topics from obvious headings in the query + if not topics: + # Grab capitalized/bulleted lines as headings + lines = [ln.strip() for ln in (query or "").splitlines()] + bullets = [ln.lstrip("-*• ").strip() for ln in lines if ln.strip().startswith(("-", "*", "•"))] + caps = [ln for ln in lines if ln and ln == ln.title() and len(ln.split()) <= 8] + candidates = bullets or caps + if candidates: + topics = [t for t in candidates if len(t) >= 3][:10] + + # 2c) Ultimate fallback: generic buckets (kept minimal, not domain-specific) + if not topics: + topics = [ + "Executive Summary", + "Key Metrics", + "Section 1", + "Section 2", + "Section 3", + "Risks & Considerations", + "Conclusion" + ] - ENTITIES: {entities} - Respond ONLY with valid JSON. - Use standard double quotes (") for all JSON keys and string values. - You MAY and SHOULD use single quotes (') *inside* string values for possessives (e.g., "CEO's"). - Do NOT use curly or smart quotes. - Do NOT write `"CEO"s"`, only `"CEO's"`. - """ - else: - if not entities: - logger.warning("⚠️ No entity found in query — using fallback") - entities = ["The Company"] - template = """ - You are a planning agent decomposing a task for a single entity into structured research topics. - -TASK: {query} - -OBJECTIVE: -Break this into 3–10 key topics. Under each topic, include 1–2 retrieval-friendly steps. - -RULES: -- Keep topics distinct and concrete (e.g., Carbon Disclosure) -- Use only these entities: {entities} -- Use a consistent step format: "Find (something) for (Entity)" - -EXAMPLE: -[ -{{ - "topic": "Carbon Disclosure for Company-A", - "steps": [ - "Find 2023 Scope 1 and 2 emissions for Company-A" - ] -}}, -{{ - "topic": "Company-A Diversity Strategy", - "steps": [ - "Analyze gender and ethnicity diversity at Company-A" - ] -}} -] -Respond ONLY with valid JSON. -Do NOT use possessive forms (e.g., do NOT write "Aelwyn's Impact"). Instead, write "Impact for Aelwyn" or "Impact of Aelwyn". -Use the format: "Find (something) for (Entity)" -Do NOT use curly or smart quotes. + # 3) Build plan objects and MIRROR steps across entities (no hardcoded content) + plan: list[dict] = [] + for t in topics: + t_clean = str(t).strip() + if not t_clean: + continue - """ + if is_comparison and len(entities) >= 2: + # One retrieval step per entity — mirrored wording + steps = [f"Find all items requested under '{t_clean}' for {entities[0]}", + f"Find all items requested under '{t_clean}' for {entities[1]}"] + else: + # Single entity (or unknown) + e0 = entities[0] if entities else "The Entity" + steps = [f"Find all items requested under '{t_clean}' for {e0}"] - messages = ChatPromptTemplate.from_template(template).format_messages( - query=query, - context=ctx, - entities=entities - ) - full_prompt = "\n".join(str(m.content) for m in messages) - self.log_prompt(full_prompt, "Planner") + plan.append({"topic": t_clean, "steps": steps}) + # 4) Log and return try: - raw = self.llm.invoke(messages).content.strip() - self.log_response(raw, "Planner") - cleaned = UniversalJSONCleaner.clean_and_extract_json(raw, expected_type="array") + self.log_response(json.dumps(plan, ensure_ascii=False, indent=2), "Planner: Plan (topics→steps)") + except Exception: + pass - plan = UniversalJSONCleaner.parse_with_validation( - cleaned, expected_structure="Array of objects with 'topic' and 'steps' keys" - ) + return plan, entities, is_comparison - if not isinstance(plan, list): - raise ValueError("Parsed plan is not a list") - - for section in plan: - if not isinstance(section, dict): - raise ValueError("Section is not a dict") - if "topic" not in section or "steps" not in section: - raise ValueError("Missing 'topic' or 'steps'") - if not isinstance(section["topic"], str): - raise ValueError("Topic must be a string") - if not isinstance(section["steps"], list): - raise ValueError("Steps must be a list") - if not all(isinstance(s, str) for s in section["steps"]): - raise ValueError("Each step must be a string") - - # Optional: Validate entity inclusion if this was a comparison task - if is_comparison and entities: - for section in plan: - step_text = " ".join(section["steps"]).lower() - for entity in entities: - if entity.lower() not in step_text: - logger.warning( - f"⚠️ Entity '{entity}' not found in steps for topic: '{section['topic']}'" - ) - - return plan, entities, is_comparison - except Exception as e: - logger.error(f"❌ Failed to parse planner output: {e}") - logger.error(f"Raw response:\n{raw}") - # Attempt a minimal prompt instead of hardcoded fallback - try: - fallback_prompt = f""" - Return a JSON list of 5 objects like this: - [{{ - "topic": "X and Y", - "steps": ["Find X for The Company", "Analyze Y for The Company"] - }}] - TASK: {query} - Respond with valid JSON - """ - raw_fallback = self.llm(fallback_prompt).strip() - cleaned_fallback = UniversalJSONCleaner.clean_and_extract_json(raw_fallback) - fallback_plan = UniversalJSONCleaner.parse_with_validation( - cleaned_fallback, expected_structure="Array of objects with 'topic' and 'steps' keys" - ) - return fallback_plan, entities, is_comparison - except Exception as inner_e: - logger.error(f"🛑 Fallback planner also failed: {inner_e}") - raise RuntimeError("Both planner and fallback planner failed") from inner_e class ResearchAgent(Agent): diff --git a/ai/generative-ai-service/complex-document-rag/files/agents/report_writer_agent.py b/ai/generative-ai-service/complex-document-rag/files/agents/report_writer_agent.py index f11ddde9e..302f1577e 100644 --- a/ai/generative-ai-service/complex-document-rag/files/agents/report_writer_agent.py +++ b/ai/generative-ai-service/complex-document-rag/files/agents/report_writer_agent.py @@ -5,63 +5,226 @@ import uuid import logging import datetime -import matplotlib.pyplot as plt - import math +import re +from docx.oxml.shared import OxmlElement +from docx.text.run import Run logger = logging.getLogger(__name__) logging.basicConfig(level=logging.INFO) os.makedirs("charts", exist_ok=True) + + +_MD_TOKEN_RE = re.compile(r'(\*\*.*?\*\*|__.*?__|\*.*?\*|_.*?_)') + +def add_inline_markdown_paragraph(doc, text: str): + """ + Creates a paragraph and renders lightweight inline Markdown: + **bold** or __bold__ → bold run + *italic* or _italic_ → italic run + Everything else is plain text. No links/lists/code handling. + """ + p = doc.add_paragraph() + i = 0 + for m in _MD_TOKEN_RE.finditer(text): + # leading text + if m.start() > i: + p.add_run(text[i:m.start()]) + token = m.group(0) + # strip the markers + if token.startswith('**') or token.startswith('__'): + content = token[2:-2] + run = p.add_run(content) + run.bold = True + else: + content = token[1:-1] + run = p.add_run(content) + run.italic = True + i = m.end() + # trailing text + if i < len(text): + p.add_run(text[i:]) + return p + def add_table(doc, table_data): - """Create a professionally styled Word table from list of dicts.""" + """Create a Word table from list of dicts or list of lists, robustly.""" if not table_data: return - + headers = [] - seen = set() - for row in table_data: - for k in row.keys(): - if k not in seen: - headers.append(k) - seen.add(k) - - # Create table with proper styling + rows_normalized = [] + + # Case 1: list of dicts + if isinstance(table_data[0], dict): + seen = set() + for row in table_data: + for k in row.keys(): + if k not in seen: + headers.append(k) + seen.add(k) + rows_normalized = table_data + + # Case 2: list of lists + elif isinstance(table_data[0], (list, tuple)): + max_len = max(len(row) for row in table_data) + headers = [f"Col {i+1}" for i in range(max_len)] + for row in table_data: + rows_normalized.append({headers[i]: row[i] if i < len(row) else "" + for i in range(max_len)}) + + else: + headers = ["Value"] + rows_normalized = [{"Value": str(row)} for row in table_data] + table = doc.add_table(rows=1, cols=len(headers)) table.style = 'Table Grid' - - # Style header row + header_row = table.rows[0] for i, h in enumerate(headers): cell = header_row.cells[i] cell.text = str(h) - # Make header bold for paragraph in cell.paragraphs: for run in paragraph.runs: run.bold = True - # Add data rows - for row in table_data: + for row in rows_normalized: row_cells = table.add_row().cells for i, h in enumerate(headers): row_cells[i].text = str(row.get(h, "")) +def _color_for_label(label: str, entities: list[str] | tuple[str, ...] | None, + base="#a9bbbc", e1="#437c94", e2="#c74634") -> str: + """Pick a bar color based on whether a label mentions one of the entities.""" + if not entities: + return base + lbl = label.lower() + ents = [e for e in entities if isinstance(e, str)] + if len(ents) >= 1 and ents[0].lower() in lbl: + return e1 + if len(ents) >= 2 and ents[1].lower() in lbl: + return e2 + return base + + +def detect_units(chart_data: dict, title: str = "") -> str: + """Detect units of measure from chart data and title.""" + # Common patterns for currency + currency_patterns = [ + (r'\$|USD|usd|dollar', 'USD'), + (r'€|EUR|eur|euro', 'EUR'), + (r'£|GBP|gbp|pound', 'GBP'), + (r'¥|JPY|jpy|yen', 'JPY'), + (r'₹|INR|inr|rupee', 'INR'), + ] + + # Common patterns for other units - order matters! + unit_patterns = [ + (r'million|millions|mn|mln|\$m|\$M', 'Million'), + (r'billion|billions|bn|bln|\$b|\$B', 'Billion'), + (r'thousand|thousands|k|\$k', 'Thousand'), + (r'percentage|percent|%', '%'), + (r'tonnes|tons|tonne|ton', 'Tonnes'), + (r'co2e|CO2e|co2|CO2', 'CO2e'), + (r'kwh|kWh|KWH', 'kWh'), + (r'mwh|MWh|MWH', 'MWh'), + (r'kg|kilogram|kilograms', 'kg'), + (r'employees|headcount|people', 'Employees'), + (r'days|day', 'Days'), + (r'hours|hour|hrs', 'Hours'), + (r'years|year|yrs', 'Years'), + ] + + # Check title and keys for units - also check values if they're strings + combined_text = title.lower() + " " + " ".join(str(k).lower() for k in chart_data.keys()) + # Also check string values which might contain unit info + for v in chart_data.values(): + if isinstance(v, str): + combined_text += " " + v.lower() + + detected_currency = None + detected_scale = None + detected_unit = None + + # Check for currency + for pattern, unit in currency_patterns: + if re.search(pattern, combined_text, re.IGNORECASE): + detected_currency = unit + break + + # Check for scale (million, billion, etc.) + for pattern, unit in unit_patterns[:4]: # First 4 are scales + if re.search(pattern, combined_text, re.IGNORECASE): + detected_scale = unit + break + + # Check for other units + for pattern, unit in unit_patterns[4:]: # Rest are units + if re.search(pattern, combined_text, re.IGNORECASE): + detected_unit = unit + break + + # Combine detected elements + if detected_currency and detected_scale: + return f"{detected_scale} {detected_currency}" + elif detected_currency: + # If we detect currency but no scale, look for financial context clues + if 'revenue' in combined_text or 'sales' in combined_text or 'income' in combined_text: + # Financial data without explicit scale often means millions + if 'fy' in combined_text or 'fiscal' in combined_text or 'quarterly' in combined_text: + return "Million USD" # Corporate financials are typically in millions + return detected_currency + return detected_currency + elif detected_unit: + if detected_scale and detected_unit not in ['%', 'Employees', 'Days', 'Hours', 'Years']: + return f"{detected_scale} {detected_unit}" + return detected_unit + elif detected_scale: + # If we only have scale (like "Million") without currency, check for financial context + if any(term in combined_text for term in ['revenue', 'cost', 'profit', 'income', 'sales', 'expense', 'financial']): + return f"{detected_scale} USD" + return detected_scale + + # For financial metrics without explicit units, default to "Million USD" + if any(term in combined_text for term in ['revenue', 'sales', 'profit', 'income', 'cost', 'expense', 'financial', 'fiscal', 'fy20']): + return "Million USD" + + return "Value" # Default fallback + + +def format_value_with_units(value: float, units: str) -> str: + """Format a value with appropriate precision based on units.""" + if '%' in units: + return f"{value:.1f}%" + elif 'Million' in units or 'Billion' in units: + return f"{value:,.1f}" + elif value >= 1000: + return f"{value:,.0f}" + else: + return f"{value:.1f}" + + +def make_chart(chart_data: dict, title: str = "", + entities: list[str] | tuple[str, ...] | None = None, + units: str | None = None) -> str | None: + """Generate a chart with conditional formatting and fallback for list values. + If `entities` contains up to two names, bars whose labels include those names + are highlighted in two distinct colors. Otherwise a default color is used. + Units are detected automatically or can be passed explicitly. + """ -def make_chart(chart_data: dict, title: str = "") -> str | None: - """Generate a chart with conditional formatting and fallback for list values.""" - import numpy as np import textwrap os.makedirs("charts", exist_ok=True) clean = {} for k, v in chart_data.items(): - # NEW: Reduce lists to latest entry if all elements are numeric + # Reduce lists to latest numeric entry if isinstance(v, list): if all(isinstance(i, (int, float)) for i in v): - v = v[-1] # use the latest value + v = v[-1] else: continue @@ -78,47 +241,56 @@ def make_chart(chart_data: dict, title: str = "") -> str | None: labels = list(clean.keys()) values = list(clean.values()) + + # Detect units if not provided + if not units: + units = detect_units(chart_data, title) + + # Update title to include units if not already present + if units and units != "Value" and units.lower() not in title.lower(): + title = f"{title} ({units})" - # Decide chart orientation based on label length and count - create more variety + # Decide orientation max_label_length = max(len(label) for label in labels) if labels else 0 - - # More nuanced decision for chart orientation - if len(clean) > 12: # Many items -> horizontal + if len(clean) > 12: horizontal = True - elif max_label_length > 40: # Very long labels -> horizontal + elif max_label_length > 40: horizontal = True - elif len(clean) <= 4 and max_label_length <= 20: # Few items, short labels -> vertical + elif len(clean) <= 4 and max_label_length <= 20: horizontal = False - elif len(clean) <= 6 and max_label_length <= 30: # Medium items, medium labels -> vertical + elif len(clean) <= 6 and max_label_length <= 30: horizontal = False - else: # Default to horizontal for edge cases + else: horizontal = True - fig, ax = plt.subplots(figsize=(12, 8)) # Increased figure size for better readability + fig, ax = plt.subplots(figsize=(12, 8)) if horizontal: - # Wrap long labels for horizontal charts wrapped_labels = ['\n'.join(textwrap.wrap(label, width=40)) for label in labels] - bars = ax.barh(wrapped_labels, values, color=["#2e7d32" if "aelwyn" in l.lower() else "#f9a825" if "elinexa" in l.lower() else "#4472C4" for l in labels]) - ax.set_xlabel("Value") + colors = [_color_for_label(l, entities) for l in labels] + bars = ax.barh(wrapped_labels, values, color=colors) + ax.set_xlabel(units) # Use detected units instead of "Value" ax.set_ylabel("Category") for bar in bars: width = bar.get_width() - ax.annotate(f"{width:.1f}", xy=(width, bar.get_y() + bar.get_height() / 2), xytext=(5, 0), - textcoords="offset points", ha='left', va='center', fontsize=8) + formatted_value = format_value_with_units(width, units) + ax.annotate(formatted_value, xy=(width, bar.get_y() + bar.get_height() / 2), + xytext=(5, 0), textcoords="offset points", + ha='left', va='center', fontsize=8) else: - # Wrap long labels for vertical charts wrapped_labels = ['\n'.join(textwrap.wrap(label, width=15)) for label in labels] - bars = ax.bar(range(len(labels)), values, color=["#2e7d32" if "aelwyn" in l.lower() else "#f9a825" if "elinexa" in l.lower() else "#4472C4" for l in labels]) - ax.set_ylabel("Value") + colors = [_color_for_label(l, entities) for l in labels] + bars = ax.bar(range(len(labels)), values, color=colors) + ax.set_ylabel(units) # Use detected units instead of "Value" ax.set_xlabel("Category") ax.set_xticks(range(len(labels))) ax.set_xticklabels(wrapped_labels, ha='center', va='top') - for bar in bars: height = bar.get_height() - ax.annotate(f"{height:.1f}", xy=(bar.get_x() + bar.get_width() / 2, height), xytext=(0, 5), - textcoords="offset points", ha='center', va='bottom', fontsize=8) + formatted_value = format_value_with_units(height, units) + ax.annotate(formatted_value, xy=(bar.get_x() + bar.get_width() / 2, height), + xytext=(0, 5), textcoords="offset points", + ha='center', va='bottom', fontsize=8) ax.set_title(title[:100]) ax.grid(axis="y" if not horizontal else "x", linestyle="--", alpha=0.6) @@ -126,21 +298,18 @@ def make_chart(chart_data: dict, title: str = "") -> str | None: filename = f"chart_{uuid.uuid4().hex}.png" path = os.path.join("charts", filename) - fig.savefig(path, dpi=300, bbox_inches='tight') # Higher DPI and tight bbox for better quality + fig.savefig(path, dpi=300, bbox_inches='tight') plt.close(fig) return path - - def append_to_doc(doc, section_data: dict, level: int = 2, citation_map: dict | None = None): """Append section to document with heading, paragraph, table, chart, and citations.""" heading = section_data.get("heading", "Untitled Section") - # Use the level parameter to control heading hierarchy doc.add_heading(heading, level=level) text = section_data.get("text", "").strip() - + # Add citations to the text if sources are available if text and citation_map and section_data.get("sources"): citation_numbers = [] @@ -148,14 +317,13 @@ def append_to_doc(doc, section_data: dict, level: int = 2, citation_map: dict | source_key = f"{source.get('file', 'Unknown')}_{source.get('sheet', '')}_{source.get('entity', '')}" if source_key in citation_map: citation_numbers.append(citation_map[source_key]) - if citation_numbers: - # Add unique citation numbers at the end of the text unique_citations = sorted(set(citation_numbers)) citations_str = " " + "".join([f"[{num}]" for num in unique_citations]) text = text + citations_str - + if text: + add_inline_markdown_paragraph(doc, text) doc.add_paragraph(text) table_data = section_data.get("table", []) @@ -176,17 +344,23 @@ def append_to_doc(doc, section_data: dict, level: int = 2, citation_map: dict | else: flattened_chart_data[k] = v - chart_path = make_chart(flattened_chart_data, title=heading) + # Pass dynamic entities (if present) so colors match those names + entities = section_data.get("entities") + # Pass units if available in section data + units = section_data.get("units") + chart_path = make_chart(flattened_chart_data, title=heading, entities=entities, units=units) if chart_path: doc.add_picture(chart_path, width=Inches(6)) last_paragraph = doc.paragraphs[-1] last_paragraph.alignment = 1 # center + def save_doc(doc, filename: str = "_report.docx"): """Save the Word document.""" doc.save(filename) logger.info(f"✅ Report saved: {filename}") + class SectionWriterAgent: def __init__(self, llm, tokenizer=None): self.llm = llm @@ -197,34 +371,26 @@ def __init__(self, llm, tokenizer=None): print("⚠️ No tokenizer provided for SectionWriterAgent") def estimate_tokens(self, text: str) -> int: - # naive estimate: 1 token ≈ 4 characters for English-like text return max(1, len(text) // 4) def log_token_count(self, text: str, tokenizer=None, label: str = "Prompt"): if not text: print(f"⚠️ Cannot log tokens: empty text for {label}") return - if tokenizer: token_count = len(tokenizer.encode(text)) else: token_count = self.estimate_tokens(text) - print(f"{label} token count: {token_count}") - - - def write_section(self, section_title: str, context_chunks: list[dict]) -> dict: from collections import defaultdict - # Group chunks by entity and preserve metadata grouped = defaultdict(list) grouped_metadata = defaultdict(list) for chunk in context_chunks: entity = chunk.get("_search_entity", "Unknown") grouped[entity].append(chunk.get("content", "")) - # Preserve metadata for citations metadata = chunk.get("metadata", {}) grouped_metadata[entity].append(metadata) @@ -240,12 +406,15 @@ def write_section(self, section_title: str, context_chunks: list[dict]) -> dict: "text": f"Insufficient data for analysis. Entities: {entities}", "table": [], "chart_data": {}, - "sources": [] + "sources": [], + # propagate for downstream report logic + "is_comparison": False, + "entities": entities } def _write_single_entity_section(self, section_title: str, grouped_chunks: dict, entity: str, grouped_metadata: dict | None = None) -> dict: text = "\n\n".join(grouped_chunks[entity]) - + # Extract unique sources from metadata sources = [] if grouped_metadata and entity in grouped_metadata: @@ -260,7 +429,6 @@ def _write_single_entity_section(self, section_title: str, grouped_chunks: dict, }) seen_sources.add(source_key) - # OPTIMIZED: Shorter, more focused prompt for faster processing prompt = f"""Extract key data for {entity} on {section_title}. Return JSON: @@ -269,8 +437,12 @@ def _write_single_entity_section(self, section_title: str, grouped_chunks: dict, Data: {text[:2000]} -CRITICAL: Never use possessive forms (no apostrophes). Instead of "manager's approval" write "manager approval" or "approval from manager". Use "N/A" for missing data. Valid JSON only.""" - +CRITICAL RULES: +1. NEVER use possessive forms or apostrophes (no 's). + - Wrong: "Oracle's revenue", "company's growth" + - Right: "Oracle revenue", "company growth", "revenue of Oracle" +2. Use "N/A" for missing data. +3. Return valid JSON only - no apostrophes in text values.""" try: self.log_token_count(prompt, self.tokenizer, label=f"SingleEntity Prompt ({section_title})") @@ -288,14 +460,16 @@ def _write_single_entity_section(self, section_title: str, grouped_chunks: dict, chart_data = parsed.get("chart_data", {}) if isinstance(chart_data, str): try: - chart_data = ast.literal_eval(chart_data) + import ast as _ast + chart_data = _ast.literal_eval(chart_data) except Exception: chart_data = {} table = parsed.get("table", []) if isinstance(table, str): try: - table = ast.literal_eval(table) + import ast as _ast + table = _ast.literal_eval(table) except Exception: table = [] @@ -304,7 +478,10 @@ def _write_single_entity_section(self, section_title: str, grouped_chunks: dict, "text": parsed.get("text", ""), "table": table, "chart_data": chart_data, - "sources": sources + "sources": sources, + # NEW: carry entity info so charts/titles can highlight correctly + "is_comparison": False, + "entities": [entity] } except Exception as e: @@ -314,7 +491,9 @@ def _write_single_entity_section(self, section_title: str, grouped_chunks: dict, "text": f"Could not generate section due to error: {e}", "table": [], "chart_data": {}, - "sources": sources + "sources": sources, + "is_comparison": False, + "entities": [entity] } def _write_comparison_section(self, section_title: str, grouped_chunks: dict, entities: list[str], grouped_metadata: dict | None = None) -> dict: @@ -328,39 +507,43 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en text_a = "\n\n".join(grouped_chunks[entity_a]) text_b = "\n\n".join(grouped_chunks[entity_b]) - # Construct prompt prompt = f""" - You are writing a structured section for a comparison report between {entity_a} and {entity_b}. +You are writing a structured section for a comparison report between {entity_a} and {entity_b}. - Topic: {section_title} +Topic: {section_title} - OBJECTIVE: - Summarize key data from the context and produce a clear, side-by-side comparison table. +OBJECTIVE: +Summarize key data from the context and produce a clear, side-by-side comparison table. - Always follow this exact structure in your JSON output: - - heading: A short, descriptive title for the section - - text: A 1–2 sentence overview comparing {entity_a} and {entity_b} - - table: List of dicts formatted as: Metric | {entity_a} | {entity_b} | Analysis - - chart_data: A dictionary of comparable numeric values to plot +Always follow this exact structure in your JSON output: +- heading: A short, descriptive title for the section +- text: A 1–2 sentence overview comparing {entity_a} and {entity_b} +- table: List of dicts formatted as: Metric | {entity_a} | {entity_b} | Analysis +- chart_data: A dictionary of comparable numeric values to plot - DATA: - === {entity_a} === - {text_a} +DATA: +=== {entity_a} === +{text_a} - === {entity_b} === - {text_b} +=== {entity_b} === +{text_b} - INSTRUCTIONS: - - Extract specific metrics (numbers, %, dates) from the data - - Use "N/A" if one entity is missing a value - - Use analysis terms like: "Higher", "Lower", "Similar", "{entity_a} Only", "{entity_b} Only" - - Do not echo file names or metadata - - Keep values human-readable (e.g., "18,500 tonnes CO2e") - - CRITICAL: Never use possessive forms (no apostrophes). Instead of "company's target" write "company target" or "target for company". +INSTRUCTIONS: +- Extract specific metrics (numbers, %, dates) from the data +- Use "N/A" if one entity is missing a value +- Use analysis terms like: "Higher", "Lower", "Similar", "{entity_a} only", "{entity_b} only" +- Do not echo file names or metadata +- Keep values human-readable (e.g., "18,500 tonnes CO2e") - Respond only in JSON format. - """ +CRITICAL RULES: +1. NEVER use possessive forms or apostrophes (no 's). + - Wrong: "Oracle's revenue", "company's performance" + - Right: "Oracle revenue", "company performance", "revenue of Oracle" +2. Ensure all JSON is valid - no apostrophes in text values. +3. Use proper escaping if quotes are needed in text. + +Respond only in valid JSON format. +""" try: if self.tokenizer: @@ -379,7 +562,6 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en expected_structure="Object with 'heading', 'text', 'table', and 'chart_data' keys" ) - # Chart data cleanup chart_data = parsed.get("chart_data", {}) if isinstance(chart_data, str): try: @@ -390,7 +572,6 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en if not isinstance(chart_data, dict): chart_data = {} - # Table cleanup table = parsed.get("table", []) if isinstance(table, str): try: @@ -415,7 +596,6 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en if validated_row[entity_a] != "N/A" or validated_row[entity_b] != "N/A": validated.append(validated_row) - # Flatten chart_data if nested flat_chart_data = {} for k, v in chart_data.items(): if isinstance(v, dict): @@ -424,7 +604,7 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en else: flat_chart_data[k] = v - # Extract unique sources from metadata + # Extract unique sources sources = [] if grouped_metadata: seen_sources = set() @@ -445,12 +625,14 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en "text": parsed.get("text", ""), "table": validated, "chart_data": flat_chart_data, - "sources": sources + "sources": sources, + # NEW: signal comparison + entities for downstream styling and charts + "is_comparison": True, + "entities": [entity_a, entity_b] } except Exception as e: logger.error("⚠️ Failed to write comparison section: %s", e) - # Still try to extract sources sources = [] if grouped_metadata: seen_sources = set() @@ -465,39 +647,36 @@ def _write_comparison_section(self, section_title: str, grouped_chunks: dict, en "entity": entity }) seen_sources.add(source_key) - + return { "heading": section_title, "text": f"Could not generate summary due to error: {e}", "table": [], "chart_data": {}, - "sources": sources + "sources": sources, + "is_comparison": True, + "entities": entities } - class ReportWriterAgent: def __init__(self, doc=None, model_name: str = "unknown", llm=None): - # Don't store the document - create fresh one for each report self.model_name = model_name self.llm = llm # Store LLM for generating summaries def _generate_executive_summary(self, sections: list[dict], is_comparison: bool, entities: list[str], target_language: str = "english", query: str | None = None) -> str: - """Generate an executive summary based on actual section content and user query""" if not self.llm: return self._generate_intro_section(is_comparison, entities) - - # Extract key information from sections + section_summaries = [] for section in sections: heading = section.get("heading", "Unknown Section") text = section.get("text", "") if text: - section_summaries.append(f"**{heading}**: {text}") - + section_summaries.append(f"{heading}: {text}") + sections_text = "\n\n".join(section_summaries) - - # Add language instruction if not English + language_instruction = "" if target_language == "arabic": language_instruction = "\n\nIMPORTANT: Write the entire executive summary in Arabic (العربية). Use professional Arabic business terminology." @@ -505,12 +684,9 @@ def _generate_executive_summary(self, sections: list[dict], is_comparison: bool, language_instruction = "\n\nIMPORTANT: Write the entire executive summary in Spanish. Use professional Spanish business terminology." elif target_language == "french": language_instruction = "\n\nIMPORTANT: Write the entire executive summary in French. Use professional French business terminology." - - # Include user query context if available - query_context = "" - if query: - query_context = f"\nUser's Original Request:\n{query}\n" - + + query_context = f"\nUser's Original Request:\n{query}\n" if query else "" + if is_comparison: prompt = f""" You are writing an executive summary for a comparison report between {entities[0]} and {entities[1]}. @@ -523,6 +699,8 @@ def _generate_executive_summary(self, sections: list[dict], is_comparison: bool, Section Summaries: {sections_text} +CRITICAL: Never use possessive forms (no apostrophes). Write "Oracle revenue" not "Oracle's revenue", "company performance" not "company's performance". + Write in a professional, analytical tone. Focus on answering the user's specific request.{language_instruction} """ else: @@ -537,9 +715,11 @@ def _generate_executive_summary(self, sections: list[dict], is_comparison: bool, Section Summaries: {sections_text} +CRITICAL: Never use possessive forms (no apostrophes). Write "Oracle revenue" not "Oracle's revenue", "company performance" not "company's performance". + Write in a professional, analytical tone. Focus on answering the user's specific request.{language_instruction} """ - + try: response = self.llm.invoke([type("Msg", (object,), {"content": prompt})()]).content.strip() return response @@ -548,31 +728,27 @@ def _generate_executive_summary(self, sections: list[dict], is_comparison: bool, return self._generate_intro_section(is_comparison, entities) def _generate_conclusion(self, sections: list[dict], is_comparison: bool, entities: list[str], target_language: str = "english", query: str | None = None) -> str: - """Generate a conclusion based on actual section content and user query""" if not self.llm: return "This analysis provides insights based on available data from retrieved documents." - - # Extract key findings from sections + key_findings = [] for section in sections: heading = section.get("heading", "Unknown Section") text = section.get("text", "") table = section.get("table", []) - - # Extract key metrics from tables + if table and isinstance(table, list): - for row in table[:3]: # Top 3 rows + for row in table[:3]: if isinstance(row, dict): metric = row.get("Metric", "") if metric: key_findings.append(f"{heading}: {metric}") - + if text: key_findings.append(f"{heading}: {text}") - - findings_text = "\n".join(key_findings[:8]) # Limit to prevent token overflow - - # Add language instruction if not English + + findings_text = "\n".join(key_findings[:8]) + language_instruction = "" if target_language == "arabic": language_instruction = "\n\nIMPORTANT: Write the entire conclusion in Arabic (العربية). Use professional Arabic business terminology." @@ -580,12 +756,9 @@ def _generate_conclusion(self, sections: list[dict], is_comparison: bool, entiti language_instruction = "\n\nIMPORTANT: Write the entire conclusion in Spanish. Use professional Spanish business terminology." elif target_language == "french": language_instruction = "\n\nIMPORTANT: Write the entire conclusion in French. Use professional French business terminology." - - # Include user query context if available - query_context = "" - if query: - query_context = f"\nUser's Original Request:\n{query}\n" - + + query_context = f"\nUser's Original Request:\n{query}\n" if query else "" + if is_comparison: prompt = f""" Based on the analysis of {entities[0]} and {entities[1]}, write a conclusion that directly answers the user's request. @@ -599,6 +772,8 @@ def _generate_conclusion(self, sections: list[dict], is_comparison: bool, entiti - Provide actionable insights based on their specific needs - Include specific recommendations if appropriate +CRITICAL: Never use possessive forms (no apostrophes). Write "Oracle revenue" not "Oracle's revenue", "company growth" not "company's growth". + Focus on providing value for the user's specific use case.{language_instruction} """ else: @@ -614,97 +789,77 @@ def _generate_conclusion(self, sections: list[dict], is_comparison: bool, entiti - Provide actionable insights based on their specific needs - Include specific recommendations if appropriate +CRITICAL: Never use possessive forms (no apostrophes). Write "Oracle revenue" not "Oracle's revenue", "company growth" not "company's growth". + Focus on providing value for the user's specific use case.{language_instruction} """ - + try: response = self.llm.invoke([type("Msg", (object,), {"content": prompt})()]).content.strip() return response except Exception as e: logger.warning(f"Failed to generate conclusion: {e}") return "This analysis provides insights based on available data from retrieved documents." - + def _filter_failed_sections(self, sections: list[dict]) -> list[dict]: - """Filter out sections that contain error messages or failed processing""" filtered_sections = [] - + error_patterns = [ + "Could not generate", + "due to error:", + "Expecting ',' delimiter:", + "Failed to", + "Error:", + "Exception:", + "Traceback" + ] for section in sections: text = section.get("text", "") heading = section.get("heading", "") - - # Check for common error patterns - error_patterns = [ - "Could not generate", - "due to error:", - "Expecting ',' delimiter:", - "Failed to", - "Error:", - "Exception:", - "Traceback" - ] - - # Check if section contains error messages has_error = any(pattern in text for pattern in error_patterns) - if not has_error: filtered_sections.append(section) else: logger.info(f"🚫 Filtered out failed section: {heading}") - return filtered_sections - + def _apply_document_styling(self, doc): - """Apply professional styling to the document""" from docx.shared import Pt, RGBColor - from docx.enum.text import WD_ALIGN_PARAGRAPH - - # Set default font for the document style = doc.styles['Normal'] font = style.font font.name = 'Times New Roman' font.size = Pt(12) - - # Style headings heading1_style = doc.styles['Heading 1'] heading1_style.font.name = 'Times New Roman' heading1_style.font.size = Pt(18) heading1_style.font.bold = True - heading1_style.font.color.rgb = RGBColor(0x00, 0x00, 0x00) # Black - + heading1_style.font.color.rgb = RGBColor(0x00, 0x00, 0x00) heading2_style = doc.styles['Heading 2'] heading2_style.font.name = 'Times New Roman' heading2_style.font.size = Pt(14) heading2_style.font.bold = True - heading2_style.font.color.rgb = RGBColor(0x00, 0x00, 0x00) # Black - + heading2_style.font.color.rgb = RGBColor(0x00, 0x00, 0x00) + def _generate_report_title(self, is_comparison: bool, entities: list[str], query: str | None, sections: list[dict]) -> str: - """Generate a dynamic, informative report title based on user query""" if query and self.llm: - # Use LLM to generate a more specific title based on the query try: entity_context = f"{entities[0]} vs {entities[1]}" if is_comparison and len(entities) >= 2 else entities[0] if entities else "Organization" - prompt = f"""Generate a concise, professional report title (max 10 words) based on: User Query: {query} Entities: {entity_context} Type: {'Comparison' if is_comparison else 'Analysis'} Report +CRITICAL: Never use possessive forms (no apostrophes). Write "Oracle Performance" not "Oracle's Performance". + Return ONLY the title, no quotes or extra text.""" - title = self.llm.invoke([type("Msg", (object,), {"content": prompt})()]).content.strip() - # Clean up the title title = title.replace('"', '').replace("'", '').strip() - # Ensure it's not too long if len(title) > 100: title = title[:97] + "..." return title except Exception as e: logger.warning(f"Failed to generate dynamic title: {e}") - # Fall back to default title generation - - # Default title generation logic + if query: - # Extract key topics from the query query_lower = query.lower() if "esg" in query_lower or "sustainability" in query_lower: topic_type = "ESG & Sustainability" @@ -719,7 +874,6 @@ def _generate_report_title(self, is_comparison: bool, entities: list[str], query else: topic_type = "Business Analysis" else: - # Infer from section headings section_topics = [s.get("heading", "") for s in sections[:3]] if any("climate" in h.lower() or "carbon" in h.lower() for h in section_topics): topic_type = "Climate & Environmental" @@ -727,161 +881,123 @@ def _generate_report_title(self, is_comparison: bool, entities: list[str], query topic_type = "ESG & Sustainability" else: topic_type = "Business Analysis" - + if is_comparison and len(entities) >= 2: return f"{topic_type} Report: {entities[0]} vs {entities[1]}" elif entities: return f"{topic_type} Report: {entities[0]}" else: return f"{topic_type} Report" - + def _add_report_header(self, doc, report_title: str, is_comparison: bool, entities: list[str]): - """Add a professional report header with title, date, and metadata""" from docx.shared import Pt, RGBColor from docx.enum.text import WD_ALIGN_PARAGRAPH - - # Main title + title_paragraph = doc.add_heading(report_title, level=1) title_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER - - # Add subtitle with entity information + if is_comparison and len(entities) >= 2: subtitle = f"Comparative Analysis: {entities[0]} and {entities[1]}" elif entities: subtitle = f"Analysis of {entities[0]}" else: subtitle = "Comprehensive Analysis Report" - + subtitle_paragraph = doc.add_paragraph(subtitle) subtitle_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER subtitle_run = subtitle_paragraph.runs[0] subtitle_run.font.size = Pt(12) subtitle_run.italic = True - - # Add generation date and metadata + now = datetime.datetime.now() date_str = now.strftime("%B %d, %Y") time_str = now.strftime("%H:%M") - - doc.add_paragraph() # spacing - - # Create a professional metadata section + + doc.add_paragraph() metadata_paragraph = doc.add_paragraph() metadata_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER - metadata_text = f"Generated on {date_str} at {time_str}\nPowered by OCI Generative AI" metadata_run = metadata_paragraph.add_run(metadata_text) metadata_run.font.size = Pt(10) - metadata_run.font.color.rgb = RGBColor(0x70, 0x70, 0x70) # Gray color - - # Add separator line + metadata_run.font.color.rgb = RGBColor(0x70, 0x70, 0x70) + doc.add_paragraph() separator = doc.add_paragraph("─" * 50) separator.alignment = WD_ALIGN_PARAGRAPH.CENTER separator_run = separator.runs[0] separator_run.font.color.rgb = RGBColor(0x70, 0x70, 0x70) - - doc.add_paragraph() # spacing after header - + doc.add_paragraph() + def _detect_target_language(self, query: str | None) -> str: - """Detect the target language from the query""" if not query: return "english" - - query_lower = query.lower() - - # Arabic language indicators + q = query.lower() arabic_indicators = [ - "بالعربية", "باللغة العربية", "in arabic", "arabic report", "تقرير", + "بالعربية", "باللغة العربية", "in arabic", "arabic report", "تقرير", "تحليل", "باللغة العربيه", "عربي", "arabic language" ] - - # Check for Arabic script arabic_chars = any('\u0600' <= char <= '\u06FF' for char in query) - - # Check for explicit language requests - if any(indicator in query_lower for indicator in arabic_indicators) or arabic_chars: + if any(ind in q for ind in arabic_indicators) or arabic_chars: return "arabic" - - # Add more languages as needed - if "en español" in query_lower or "in spanish" in query_lower: + if "en español" in q or "in spanish" in q: return "spanish" - - if "en français" in query_lower or "in french" in query_lower: + if "en français" in q or "in french" in q: return "french" - return "english" - + def _ensure_language_consistency(self, sections: list[dict], target_language: str, query: str | None) -> list[dict]: - """Ensure all sections are in the target language""" if not self.llm or target_language == "english": return sections - logger.info(f"🔄 Ensuring language consistency for {target_language}") - corrected_sections = [] - for section in sections: corrected_section = section.copy() - - # Check and translate heading if needed heading = section.get("heading", "") + text = section.get("text", "") + table = section.get("table", []) + if heading and not self._is_in_target_language(heading, target_language): corrected_section["heading"] = self._translate_text(heading, target_language, "section heading") - - # Check and translate text if needed - text = section.get("text", "") if text and not self._is_in_target_language(text, target_language): corrected_section["text"] = self._translate_text(text, target_language, "section text") - - # Handle table translations - table = section.get("table", []) + if table and isinstance(table, list): corrected_table = [] for row in table: if isinstance(row, dict): corrected_row = {} for key, value in row.items(): - # Translate table headers and values - translated_key = self._translate_text(str(key), target_language, "table header") if not self._is_in_target_language(str(key), target_language) else str(key) - translated_value = self._translate_text(str(value), target_language, "table value") if not self._is_in_target_language(str(value), target_language) and not str(value).replace('.', '').replace(',', '').isdigit() else str(value) + k = str(key) + v = str(value) + translated_key = self._translate_text(k, target_language, "table header") if not self._is_in_target_language(k, target_language) else k + # keep numeric strings unchanged + if not self._is_in_target_language(v, target_language) and not v.replace('.', '').replace(',', '').isdigit(): + translated_value = self._translate_text(v, target_language, "table value") + else: + translated_value = v corrected_row[translated_key] = translated_value corrected_table.append(corrected_row) corrected_section["table"] = corrected_table - + corrected_sections.append(corrected_section) - return corrected_sections - + def _is_in_target_language(self, text: str, target_language: str) -> bool: - """Check if text is already in the target language""" if not text or target_language == "english": return True - if target_language == "arabic": - # Check if text contains Arabic characters arabic_chars = sum(1 for char in text if '\u0600' <= char <= '\u06FF') total_chars = sum(1 for char in text if char.isalpha()) if total_chars == 0: - return True # No alphabetic characters, assume it's fine - return arabic_chars / total_chars > 0.3 # At least 30% Arabic characters - - # Add more language detection logic as needed - return True # Default to assuming it's correct - + return True + return arabic_chars / total_chars > 0.3 + return True + def _translate_text(self, text: str, target_language: str, context: str = "") -> str: - """Translate text to target language using LLM""" if not text or not self.llm: return text - - language_names = { - "arabic": "Arabic", - "spanish": "Spanish", - "french": "French" - } - + language_names = {"arabic": "Arabic", "spanish": "Spanish", "french": "French"} target_lang_name = language_names.get(target_language, target_language.title()) - prompt = f"""Translate the following {context} to {target_lang_name}. Maintain the professional tone and technical accuracy. If it's already in {target_lang_name}, return it unchanged. @@ -889,7 +1005,6 @@ def _translate_text(self, text: str, target_language: str, context: str = "") -> Text to translate: {text} Translation:""" - try: response = self.llm.invoke([type("Msg", (object,), {"content": prompt})()]).content.strip() logger.info(f"Translated {context}: '{text[:50]}...' → '{response[:50]}...'") @@ -897,33 +1012,25 @@ def _translate_text(self, text: str, target_language: str, context: str = "") -> except Exception as e: logger.warning(f"Failed to translate {context}: {e}") return text - + def _generate_intro_section(self, is_comparison: bool, entities: list[str]) -> str: - """Fallback intro section when LLM is not available""" if is_comparison: - comparison_note = ( - f"This report compares data between {entities[0]} and {entities[1]} across key topics." - ) + comparison_note = f"This report compares data between {entities[0]} and {entities[1]} across key topics." else: comparison_note = f"This report presents information for {entities[0]}." - return ( f"{comparison_note} All data is sourced from retrieved documents and structured using LLM-based analysis.\n\n" "The analysis includes tables and charts where possible. Missing data is noted explicitly." ) - + def _organize_sections_with_llm(self, sections: list[dict], query: str | None, entities: list[str]) -> list[dict]: - """Use LLM to intelligently organize sections into a hierarchical structure""" if not query or not self.llm or not sections: return sections - - # Create a list of section titles section_info = [] for i, section in enumerate(sections): section_info.append(f"{i+1}. {section.get('heading', 'Untitled Section')}") - sections_list = "\n".join(section_info) - + prompt = f"""You are organizing sections for a report about {', '.join(entities)}. User's Original Request: @@ -942,7 +1049,7 @@ def _organize_sections_with_llm(self, sections: list[dict], query: str | None, e {{ "title": "Main Category Title from User's Request", "level": 1, - "sections": [1, 3, 5] // section numbers that belong under this category + "sections": [1, 3, 5] }}, {{ "title": "Another Main Category", @@ -950,7 +1057,7 @@ def _organize_sections_with_llm(self, sections: list[dict], query: str | None, e "sections": [2, 4, 6] }} ], - "orphan_sections": [7, 8] // sections that don't fit under any main category + "orphan_sections": [7, 8] }} IMPORTANT: @@ -964,39 +1071,29 @@ def _organize_sections_with_llm(self, sections: list[dict], query: str | None, e try: response = self.llm.invoke([type("Msg", (object,), {"content": prompt})()]).content.strip() - - # Clean and parse JSON response - import json - import re - - # Extract JSON from response + import json, re json_match = re.search(r'\{.*\}', response, re.DOTALL) if json_match: json_str = json_match.group() structure = json.loads(json_str) - - # Build organized sections list + organized = [] used_sections = set() - + for category in structure.get("structure", []): - # Add main category as a header-only section organized.append({ "heading": category.get("title", "Category"), "level": 1, "is_category_header": True }) - - # Add sections under this category for section_num in category.get("sections", []): - idx = section_num - 1 # Convert to 0-based index + idx = section_num - 1 if 0 <= idx < len(sections) and idx not in used_sections: section_copy = sections[idx].copy() section_copy["level"] = 2 organized.append(section_copy) used_sections.add(idx) - - # Add orphan sections at the end + for section_num in structure.get("orphan_sections", []): idx = section_num - 1 if 0 <= idx < len(sections) and idx not in used_sections: @@ -1004,33 +1101,23 @@ def _organize_sections_with_llm(self, sections: list[dict], query: str | None, e section_copy["level"] = 2 organized.append(section_copy) used_sections.add(idx) - - # Add any sections not mentioned in the structure + for i, section in enumerate(sections): if i not in used_sections: section_copy = section.copy() section_copy["level"] = 2 organized.append(section_copy) - + return organized - except Exception as e: logger.warning(f"Failed to organize sections with LLM: {e}") - # Return original sections if organization fails - pass - - # Return original sections if LLM organization fails or isn't attempted + return sections - - - + def _build_references_section(self, sections: list[dict]) -> tuple[dict, str]: - """Build a references section from all sources in sections and return citation map""" all_sources = [] citation_map = {} citation_counter = 1 - - # Collect all unique sources seen_sources = set() for section in sections: sources = section.get("sources", []) @@ -1041,137 +1128,108 @@ def _build_references_section(self, sections: list[dict]) -> tuple[dict, str]: citation_map[source_key] = citation_counter citation_counter += 1 seen_sources.add(source_key) - - # Build references text + references_text = [] for i, source in enumerate(all_sources, 1): file_name = source.get("file", "Unknown") sheet = source.get("sheet", "") entity = source.get("entity", "") - if sheet: ref_text = f"[{i}] {file_name}, Sheet: {sheet}" else: ref_text = f"[{i}] {file_name}" - if entity: ref_text += f" ({entity})" - references_text.append(ref_text) - + return citation_map, "\n".join(references_text) - + def write_report(self, sections: list[dict], filter_failures: bool = True, query: str | None = None) -> str: if not isinstance(sections, list): raise TypeError("Expected list of sections") - - # Detect requested language from query + target_language = self._detect_target_language(query) logger.info(f"🌐 Detected target language: {target_language}") - - # Filter out failed sections if requested + if filter_failures: sections = self._filter_failed_sections(sections) logger.info(f"📊 After filtering failures: {len(sections)} sections remaining") - - # Validate and fix language consistency across all sections + if target_language != "english": sections = self._ensure_language_consistency(sections, target_language, query) - - # Create a fresh document for each report to prevent accumulation + doc = Document() - - # Apply professional document styling self._apply_document_styling(doc) - - # Create reports directory if it doesn't exist + reports_dir = "reports" os.makedirs(reports_dir, exist_ok=True) - - # Extract metadata from sections - is_comparison = sections[0].get("is_comparison", False) if sections else False - entities = sections[0].get("entities", []) if sections else [] - - # Generate dynamic report title + + # NEW: infer comparison/entity context from first valid section (or defaults) + is_comparison = False + entities: list[str] = [] + for s in sections: + if "entities" in s: + entities = list(s.get("entities") or []) + if "is_comparison" in s: + is_comparison = bool(s.get("is_comparison")) + if entities: + break + report_title = self._generate_report_title(is_comparison, entities, query, sections) - - # Add professional header self._add_report_header(doc, report_title, is_comparison, entities) - # PARALLEL GENERATION of executive summary and conclusion while processing sections - from concurrent.futures import ThreadPoolExecutor, as_completed - - summary_and_conclusion_futures = [] - - if self.llm: # Only if LLM is available for intelligent generation + from concurrent.futures import ThreadPoolExecutor + if self.llm: with ThreadPoolExecutor(max_workers=2) as summary_executor: - # Start executive summary generation in parallel summary_future = summary_executor.submit( self._generate_executive_summary, sections, is_comparison, entities, target_language, query ) - summary_and_conclusion_futures.append(("summary", summary_future)) - - # Start conclusion generation in parallel conclusion_future = summary_executor.submit( self._generate_conclusion, sections, is_comparison, entities, target_language, query ) - summary_and_conclusion_futures.append(("conclusion", conclusion_future)) - - # Add executive summary + doc.add_heading("Executive Summary", level=2) - executive_summary = summary_future.result() # Wait for completion + executive_summary = summary_future.result() + add_inline_markdown_paragraph(doc, executive_summary) doc.add_paragraph(executive_summary) - doc.add_paragraph() # spacing + doc.add_paragraph() - # Organize sections hierarchically using LLM organized_sections = self._organize_sections_with_llm(sections, query, entities) - - # Build citation map before adding sections citation_map, references_text = self._build_references_section(organized_sections) - - # Add organized sections with citations + for section in organized_sections: if section.get("is_category_header"): - # This is a main category header doc.add_heading(section.get("heading", "Category"), level=1) else: - # Regular section with appropriate level and citations level = section.get("level", 2) append_to_doc(doc, section, level=level, citation_map=citation_map) - doc.add_paragraph() # spacing between sections + doc.add_paragraph() - # Add conclusion doc.add_heading("Conclusion", level=2) - conclusion = conclusion_future.result() # Wait for completion + conclusion = conclusion_future.result() + add_inline_markdown_paragraph(doc, conclusion) doc.add_paragraph(conclusion) - - # Add References section (already built above) + if references_text: - doc.add_paragraph() # spacing + doc.add_paragraph() doc.add_heading("References", level=2) doc.add_paragraph(references_text) else: - # Fallback for when no LLM is available doc.add_heading("Executive Summary", level=2) executive_summary = self._generate_intro_section(is_comparison, entities) doc.add_paragraph(executive_summary) - doc.add_paragraph() # spacing + doc.add_paragraph() - # Build citation map citation_map, references_text = self._build_references_section(sections) - - # Add all sections with citations (no LLM available for organization) for section in sections: append_to_doc(doc, section, level=2, citation_map=citation_map) - doc.add_paragraph() # spacing between sections + doc.add_paragraph() doc.add_heading("Conclusion", level=2) conclusion = "This analysis provides insights based on available data from retrieved documents." doc.add_paragraph(conclusion) - - # Add References section (already built above) if references_text: - doc.add_paragraph() # spacing + doc.add_paragraph() doc.add_heading("References", level=2) doc.add_paragraph(references_text) @@ -1181,15 +1239,19 @@ def write_report(self, sections: list[dict], filter_failures: bool = True, query save_doc(doc, filepath) return filepath + # Example usage if __name__ == "__main__": doc = Document() sample_section = { "heading": "Climate Commitments", - "text": "Both Elinexa and Aelwyn have committed to net-zero targets...", - "table": [{"Bank": "Elinexa", "Target": "Net-zero 2050"}, - {"Bank": "Aelwyn", "Target": "Net-zero 2050"}], - "chart_data": {"Elinexa": 42, "Aelwyn": 36} + "text": "Both Acme Bank and Globex Bank have committed to net-zero targets...", + "table": [{"Bank": "Acme Bank", "Target": "Net-zero 2050"}, + {"Bank": "Globex Bank", "Target": "Net-zero 2050"}], + "chart_data": {"Acme Bank": 42, "Globex Bank": 36}, + # NEW: tell the pipeline which two entities are being compared + "entities": ["Acme Bank", "Globex Bank"], + "is_comparison": True } agent = ReportWriterAgent(doc) agent.write_report([sample_section]) diff --git a/ai/generative-ai-service/complex-document-rag/files/gradio.css b/ai/generative-ai-service/complex-document-rag/files/gradio.css index 9d847f9c6..3296d7199 100644 --- a/ai/generative-ai-service/complex-document-rag/files/gradio.css +++ b/ai/generative-ai-service/complex-document-rag/files/gradio.css @@ -1,12 +1,15 @@ /* ===== CLEAN LIGHT THEME ===== */ :root { - --primary-color: #ff6b35; - --secondary-color: #6c757d; - --background-color: #ffffff; - --surface-color: #ffffff; + --primary-color: #c74634; + --oracle-red: #c74634; + --secondary-color: #6f757e; + --background-color: #fffefe; + --surface-color: #fffefe; + --off-white: #fffefe; --border-color: #dee2e6; - --text-color: #212529; - --text-muted: #6c757d; + --text-color: #312d2a; + --text-muted: #6f7572; + --dark-grey: #404040; } /* ===== GLOBAL STYLING ===== */ @@ -36,7 +39,7 @@ /* ===== BUTTONS ===== */ .gr-button, button, .primary-button, .secondary-button { - background: white !important; + background: var(--off-white) !important; color: var(--primary-color) !important; border: 1px solid var(--primary-color) !important; padding: 10px 20px !important; @@ -46,43 +49,95 @@ letter-spacing: 0.5px !important; cursor: pointer !important; font-size: 12px !important; - transition: color 0.2s ease !important; + transition: background-color 0.2s ease, color 0.2s ease !important; } .gr-button:hover, button:hover, .primary-button:hover, .secondary-button:hover { background: #f8f8f8 !important; color: var(--primary-color) !important; + padding: 10px 20px !important; /* Keep same padding to prevent jumpy behavior */ } .gr-button:active, button:active, .primary-button:active, .secondary-button:active { background: #f0f0f0 !important; color: var(--primary-color) !important; + padding: 10px 20px !important; /* Keep same padding to prevent jumpy behavior */ } /* ===== TABS ===== */ -.gr-tabs .gr-tab-nav button { - background: #6c757d !important; - color: white !important; +/* Target all possible tab button selectors for Gradio */ +.gr-tabs .tab-nav button, +.gr-tabs .gr-tab-nav button, +div[role="tablist"] button, +button[role="tab"], +.gradio-container .gr-tabs button[role="tab"], +.gradio-container button.tab-nav-button { + background: #c74634 !important; + background-color: #c74634 !important; + color: #fffefe !important; border: none !important; + border-bottom: 3px solid transparent !important; /* Remove orange underline */ padding: 12px 20px !important; font-weight: 500 !important; text-transform: uppercase !important; letter-spacing: 0.5px !important; border-radius: 4px 4px 0 0 !important; margin-right: 2px !important; -} - -.gr-tabs .gr-tab-nav button.selected { - background: #495057 !important; -} - -.gr-tabs .gr-tab-nav button:hover { - background: #5a6268 !important; + transition: background-color 0.3s ease, border-bottom 0.3s ease !important; + opacity: 0.8 !important; +} + +/* Selected/Active tab with black underline */ +.gr-tabs .tab-nav button.selected, +.gr-tabs .gr-tab-nav button.selected, +div[role="tablist"] button.selected, +button[role="tab"][aria-selected="true"], +button[role="tab"].selected, +.gradio-container .gr-tabs button[role="tab"].selected, +.gradio-container button.tab-nav-button.selected { + background: #c74634 !important; + background-color: #c74634 !important; + opacity: 1 !important; + color: #fffefe !important; + font-weight: 500 !important; /* Keep same weight as non-selected to prevent jumpy behavior */ + border-bottom: 3px solid #312d2a !important; /* Black underline for active tab */ + padding: 12px 20px !important; /* Keep same padding */ +} + +/* Hover state for non-selected tabs */ +.gr-tabs .tab-nav button:hover:not(.selected), +.gr-tabs .gr-tab-nav button:hover:not(.selected), +div[role="tablist"] button:hover:not(.selected), +button[role="tab"]:hover:not([aria-selected="true"]), +button[role="tab"]:hover:not(.selected), +.gradio-container .gr-tabs button[role="tab"]:hover:not(.selected), +.gradio-container button.tab-nav-button:hover:not(.selected) { + background: #404040 !important; + background-color: #404040 !important; + color: #fffefe !important; + opacity: 1 !important; + padding: 12px 20px !important; /* Keep same padding */ +} + +/* Additional override for any nested spans or text elements in tabs */ +.gr-tabs button span, +button[role="tab"] span, +.gr-tabs button *, +button[role="tab"] * { + color: inherit !important; +} + +/* Remove any orange borders/underlines that might appear */ +button[role="tab"]::after, +button[role="tab"]::before, +.gr-tabs button::after, +.gr-tabs button::before { + display: none !important; } /* ===== COMPACT UPLOAD SECTIONS ===== */ .upload-section { - background: white !important; + background: var(--off-white) !important; border: 1px solid var(--border-color) !important; border-radius: 8px !important; padding: 12px !important; @@ -113,13 +168,13 @@ margin: 8px 0 !important; display: block !important; padding: 12px !important; - background: white !important; + background: var(--off-white) !important; border: 1px solid var(--primary-color) !important; } /* ===== INFERENCE LAYOUT ===== */ .inference-left-column, .inference-right-column { - background: white !important; + background: var(--off-white) !important; padding: 20px !important; } @@ -127,30 +182,37 @@ margin-bottom: 16px !important; } -.model-controls, .collection-controls { - background: white !important; +/* Make control sections more compact */ +.model-controls, .collection-controls, .processing-controls { + background: var(--off-white) !important; border: 1px solid var(--border-color) !important; border-radius: 6px !important; - padding: 12px !important; - margin-bottom: 12px !important; + padding: 8px !important; /* Reduced padding for compactness */ + margin-bottom: 8px !important; /* Reduced margin */ } .processing-controls { - background: white !important; border: 1px solid var(--primary-color) !important; - border-radius: 6px !important; - padding: 12px !important; - margin-bottom: 12px !important; } -.compact-query textarea { - min-height: 120px !important; - max-height: 150px !important; +/* Compact headers in control sections */ +.model-controls h4, +.collection-controls h4, +.processing-controls h4 { + font-size: 12px !important; + margin-bottom: 4px !important; +} + +/* Make query textarea much larger */ +.compact-query textarea, +.query-section textarea { + min-height: 360px !important; /* 3x larger than before */ + max-height: 450px !important; } /* ===== INPUT FIELDS ===== */ .gr-textbox, .gr-textbox textarea, .gr-textbox input { - background: white !important; + background: var(--off-white) !important; border: 1px solid var(--border-color) !important; border-radius: 4px !important; color: var(--text-color) !important; @@ -159,15 +221,22 @@ /* ===== DROPDOWNS ===== */ .gr-dropdown, .gr-dropdown select { - background: white !important; + background: var(--off-white) !important; border: 1px solid var(--border-color) !important; border-radius: 4px !important; color: var(--text-color) !important; } +/* Make dropdowns more compact */ +.model-controls .gr-dropdown, +.collection-controls .gr-dropdown { + padding: 6px !important; + font-size: 13px !important; +} + /* ===== FILE UPLOAD ===== */ .gr-file { - background: white !important; + background: var(--off-white) !important; border: 2px dashed var(--primary-color) !important; border-radius: 8px !important; padding: 20px !important; @@ -212,17 +281,68 @@ display: none !important; } -/* ===== FORCE WHITE BACKGROUNDS ===== */ +/* ===== FORCE OFF-WHITE BACKGROUNDS ===== */ .gr-group, .gr-form, .gr-block { - background: white !important; + background: var(--off-white) !important; } /* ===== DELETE BUTTON ===== */ .gr-button[variant="stop"] { - background: #dc3545 !important; - color: white !important; + background: var(--oracle-red) !important; + color: var(--off-white) !important; + border: 1px solid var(--oracle-red) !important; } .gr-button[variant="stop"]:hover { - background: #c82333 !important; + background: #a13527 !important; + border: 1px solid #a13527 !important; +} + +/* ===== CHECKBOXES - MORE COMPACT ===== */ +.gr-checkbox-group { + display: flex !important; + gap: 12px !important; + flex-wrap: wrap !important; +} + +.gr-checkbox-group label { + font-size: 13px !important; + margin-bottom: 0 !important; +} + +/* ===== COMPACT SETTINGS SECTION ===== */ +.compact-settings { + background: var(--off-white) !important; + border: 1px solid var(--border-color) !important; + border-radius: 6px !important; + padding: 8px !important; + margin-top: 8px !important; +} + +.compact-settings .gr-row { + margin-bottom: 4px !important; +} + +.compact-settings .gr-dropdown { + margin-bottom: 4px !important; +} + +.compact-settings .gr-dropdown label { + font-size: 12px !important; + margin-bottom: 2px !important; +} + +.compact-settings .gr-checkbox { + margin: 0 !important; + padding: 4px !important; +} + +.compact-settings .gr-checkbox label { + font-size: 12px !important; + margin: 0 !important; +} + +/* Remove extra spacing in compact settings */ +.compact-settings > div { + gap: 4px !important; } diff --git a/ai/generative-ai-service/complex-document-rag/files/gradio_app.py b/ai/generative-ai-service/complex-document-rag/files/gradio_app.py index e41b98ebe..9e52af416 100644 --- a/ai/generative-ai-service/complex-document-rag/files/gradio_app.py +++ b/ai/generative-ai-service/complex-document-rag/files/gradio_app.py @@ -1,6 +1,9 @@ #!/usr/bin/env python3 """Oracle Enterprise RAG System Interface.""" +# Disable telemetry first to prevent startup errors +import disable_telemetry + import gradio as gr import logging import os @@ -82,7 +85,7 @@ def __init__(self) -> None: self._initialize_vector_store(self.current_embedding_model) self._initialize_rag_agent( self.current_llm_model, - collection="Multi-Collection", + collection="multi", embedding_model=self.current_embedding_model ) @@ -257,7 +260,7 @@ def _initialize_processors(self) -> Tuple[Optional[XLSXIngester], Optional[PDFIn def _initialize_rag_agent( self, llm_model: str, - collection: str = "Multi-Collection", + collection: str = "multi", embedding_model: Optional[str] = None ) -> bool: """ @@ -265,7 +268,7 @@ def _initialize_rag_agent( Args: llm_model: Name of the LLM model to use - collection: Name of the collection to use (default: "Multi-Collection") + collection: Name of the collection to use (default: "multi") embedding_model: Optional embedding model to switch to Returns: @@ -483,72 +486,30 @@ def create_oracle_interface(): placeholder="Deletion results will appear here..." ) - with gr.Tab("SEARCH COLLECTIONS", id="search"): - gr.Markdown("### Search through your vector store collections") - - # Add embedding model selector for search tab - with gr.Row(): - embedding_model_selector_search = gr.Dropdown( - choices=rag_system.available_embedding_models, - value=rag_system.current_embedding_model, - label="Embedding Model for Search", - info="Select the embedding model to use for searching" - ) - - with gr.Row(): - search_query = gr.Textbox( - label="Search Query", - placeholder="Enter search terms...", - scale=3 - ) - search_collection = gr.Dropdown( - choices=["PDF Documents", "XLSX Documents"], - value="XLSX Documents", - label="Collection", - scale=1 - ) - search_results_count = gr.Slider( - minimum=1, - maximum=20, - value=5, - step=1, - label="Results", - scale=1 - ) - - search_btn = gr.Button("Search", variant="secondary", elem_classes=["secondary-button"]) - - search_results = gr.Textbox( - elem_id="scientific-results-box", - label="Search Results", - lines=25, - max_lines=30, - placeholder="Search results will appear here..." - ) - with gr.Tab("INFERENCE & QUERY", id="inference"): with gr.Row(): - # Left Column - Input Controls + # Left Column - Query Input with gr.Column(scale=1, elem_classes=["inference-left-column"]): - # Query Section + # Large Query Section with gr.Group(elem_classes=["query-section"]): query_input = gr.Textbox( label="Query", - lines=4, - max_lines=6, + lines=15, # Much larger query area + max_lines=20, placeholder="Enter your query here...", elem_classes=["compact-query"] ) + query_btn = gr.Button( "Run Query", elem_classes=["primary-button"], - size="sm", + size="lg", elem_id="run-query-btn" ) - # Model Configuration - with gr.Group(elem_classes=["model-controls"]): - gr.HTML("

Model Configuration

") + # Compact Configuration Section - All in one group + with gr.Group(elem_classes=["compact-settings"]): + # Model Configuration in one row with gr.Row(): llm_model_selector = gr.Dropdown( choices=rag_system.available_llm_models, @@ -559,26 +520,16 @@ def create_oracle_interface(): embedding_model_selector_query = gr.Dropdown( choices=rag_system.available_embedding_models, value=rag_system.current_embedding_model, - label="Embedding Model", + label="Embeddings", interactive=True, scale=1 ) - - # Data Sources - with gr.Group(elem_classes=["collection-controls"]): - gr.HTML("

Data Sources

") + + # Data Sources and Processing Mode in one compact row with gr.Row(): - collection_pdf = gr.Checkbox(label="Include PDF Collection", value=False) - collection_xlsx = gr.Checkbox(label="Include XLSX Collection", value=False) - - # Processing Mode - with gr.Group(elem_classes=["processing-controls"]): - gr.HTML("

Processing Mode

") - agent_mode = gr.Checkbox( - label="Use Agentic Workflow", - value=False, - info="Enable advanced reasoning and multi-step processing" - ) + collection_pdf = gr.Checkbox(label="Include PDF", value=False, scale=1) + collection_xlsx = gr.Checkbox(label="Include XLSX", value=False, scale=1) + agent_mode = gr.Checkbox(label="Agentic Mode", value=False, scale=1) # Right Column - Results with gr.Column(scale=1, elem_classes=["inference-right-column"]): @@ -636,11 +587,11 @@ def process_pdf_and_clear(file, model, entity): outputs=[collection_documents] ) - search_btn.click( - fn=lambda q, coll, emb, n: search_chunks(q, coll, emb, rag_system, n), - inputs=[search_query, search_collection, embedding_model_selector_search, search_results_count], - outputs=[search_results] - ) + # search_btn.click( + # fn=lambda q, coll, emb, n: search_chunks(q, coll, emb, rag_system, n), + # inputs=[search_query, search_collection, embedding_model_selector_search, search_results_count], + # outputs=[search_results] + # ) list_chunks_btn.click( fn=lambda coll, emb: list_all_chunks(coll, emb, rag_system), @@ -697,8 +648,11 @@ def handle_query_with_download(query, llm_model, embedding_model, include_pdf, i gr.update(visible=False) ) - # Actually process the query - response, report_path = process_query(query, llm_model, embedding_model, include_pdf, include_xlsx, agentic, rag_system) + # Actually process the query with entity parameters + # Pass empty strings for entities to trigger automatic detection + entity1 = "" # Will be automatically detected by the LLM + entity2 = "" # Will be automatically detected by the LLM + response, report_path = process_query(query, llm_model, embedding_model, include_pdf, include_xlsx, agentic, rag_system, entity1, entity2) progress(1.0, desc="Complete!") @@ -720,6 +674,7 @@ def handle_query_with_download(query, llm_model, embedding_model, include_pdf, i query_btn.click( fn=handle_query_with_download, + # inputs=[query_input, llm_model_selector, embedding_model_selector_query, collection_pdf, collection_xlsx, agent_mode, entity1_input, entity2_input], inputs=[query_input, llm_model_selector, embedding_model_selector_query, collection_pdf, collection_xlsx, agent_mode], outputs=[status_box, response_box, download_file], show_progress="full" diff --git a/ai/generative-ai-service/complex-document-rag/files/handlers/pdf_handler.py b/ai/generative-ai-service/complex-document-rag/files/handlers/pdf_handler.py index 6fb4caeed..b4889de09 100644 --- a/ai/generative-ai-service/complex-document-rag/files/handlers/pdf_handler.py +++ b/ai/generative-ai-service/complex-document-rag/files/handlers/pdf_handler.py @@ -53,8 +53,8 @@ def progress(*args, **kwargs): return "❌ ERROR: Vector store not initialized", "" file_path = Path(file.name) - chunks, doc_id = rag_system.pdf_processor.process_pdf(file_path, entity=entity) - + chunks, doc_id, _ = rag_system.pdf_processor.ingest_pdf(file_path, entity=entity) + print("PDF processor type:", type(rag_system.pdf_processor)) progress(0.7, desc="Adding to vector store...") converted_chunks = [ diff --git a/ai/generative-ai-service/complex-document-rag/files/handlers/query_handler.py b/ai/generative-ai-service/complex-document-rag/files/handlers/query_handler.py index f16a174d8..b9b5f9abf 100644 --- a/ai/generative-ai-service/complex-document-rag/files/handlers/query_handler.py +++ b/ai/generative-ai-service/complex-document-rag/files/handlers/query_handler.py @@ -20,9 +20,11 @@ def process_query( include_xlsx: bool, agentic: bool, rag_system, + entity1: str = "", + entity2: str = "", progress=gr.Progress() ) -> Tuple[str, Optional[str]]: - """Process a query using the RAG system""" + """Process a query using the RAG system with optional entity specification""" if not query.strip(): return "ERROR: Please enter a query", None @@ -108,11 +110,24 @@ def _safe_query(collection_label: str, text: str, n: int = 5): progress(0.8, desc="Generating response...") + # Prepare provided entities if any + provided_entities = [] + if entity1 and entity1.strip(): + provided_entities.append(entity1.strip().lower()) + if entity2 and entity2.strip(): + provided_entities.append(entity2.strip().lower()) + + # Log entities being used + if provided_entities: + logger.info(f"Using provided entities: {provided_entities}") + if all_results: + # Pass provided entities to the RAG system result = rag_system.rag_agent.process_query_with_multi_collection_context( query, all_results, - collection_mode=active_collection + collection_mode=active_collection, + provided_entities=provided_entities if provided_entities else None ) # Ensure result is a dictionary if not isinstance(result, dict): @@ -140,12 +155,12 @@ def query_collection(collection_type): """Query a single collection in parallel""" try: if collection_type == "pdf": - # Increased to 20 chunks for non-agentic workflows - results = _safe_query("pdf", query, n=20) + # Optimized to 10 chunks for faster processing + results = _safe_query("pdf", query, n=10) return ("PDF", results if results else []) elif collection_type == "xlsx": - # Increased to 20 chunks for non-agentic workflows - results = _safe_query("xlsx", query, n=20) + # Optimized to 10 chunks for faster processing + results = _safe_query("xlsx", query, n=10) return ("XLSX", results if results else []) else: return (collection_type.upper(), []) @@ -178,8 +193,11 @@ def query_collection(collection_type): return "No relevant information found in selected collections.", None # Use more chunks for better context in non-agentic mode - # Take top 20 chunks total (or all if less than 20) - chunks_to_use = retrieved_chunks[:20] + # Optimize chunk usage based on model + if llm_model == "grok-4": + chunks_to_use = retrieved_chunks[:15] # Can handle more context + else: + chunks_to_use = retrieved_chunks[:10] # Optimized for speed context_str = "\n\n".join(chunk["content"] for chunk in chunks_to_use) prompt = f"""You are an expert assistant. diff --git a/ai/generative-ai-service/complex-document-rag/files/handlers/vector_handler.py b/ai/generative-ai-service/complex-document-rag/files/handlers/vector_handler.py index 782700a0a..f37cfada1 100644 --- a/ai/generative-ai-service/complex-document-rag/files/handlers/vector_handler.py +++ b/ai/generative-ai-service/complex-document-rag/files/handlers/vector_handler.py @@ -319,8 +319,13 @@ def delete_all_chunks_in_collection(collection_name: str, embedding_model: str, client = rag_system.vector_store.client all_colls = client.list_collections() - # Find all physical collections for this logical group (e.g., xlsx_documents_*) - targets = [c for c in all_colls if c.name.startswith(f"{base_prefix}_")] + # Find all physical collections for this logical group (e.g., xlsx_documents_* or pdf_documents_*) + targets = [] + for c in all_colls: + # Handle both collection objects and dict representations + coll_name = getattr(c, 'name', None) or (c.get('name') if isinstance(c, dict) else str(c)) + if coll_name and coll_name.startswith(f"{base_prefix}_"): + targets.append((coll_name, c)) if not targets: return f"Collection group '{collection_name}' has no collections to delete." @@ -328,21 +333,31 @@ def delete_all_chunks_in_collection(collection_name: str, embedding_model: str, # Delete them all total_deleted_chunks = 0 deleted_names = [] - for coll in targets: + for coll_name, coll_obj in targets: try: count = 0 try: - count = coll.count() + # Get the actual collection object if we only have the name + if isinstance(coll_obj, str): + actual_coll = client.get_collection(coll_name) + else: + actual_coll = coll_obj + count = actual_coll.count() except Exception: pass + total_deleted_chunks += count - client.delete_collection(coll.name) - deleted_names.append(coll.name) - # Also drop from in-memory map if present + client.delete_collection(coll_name) + deleted_names.append(coll_name) + + # Clean up all in-memory references if hasattr(rag_system.vector_store, "collections"): - rag_system.vector_store.collections.pop(coll.name, None) + rag_system.vector_store.collections.pop(coll_name, None) + if hasattr(rag_system.vector_store, "collection_map"): + rag_system.vector_store.collection_map.pop(coll_name, None) + except Exception as e: - logging.error(f"Failed to delete collection '{coll.name}': {e}") + logging.error(f"Failed to delete collection '{coll_name}': {e}") # Recreate the CURRENT model's empty collection so the app keeps a live handle # Build full name like: {base_prefix}_{model_name}_{dimensions} @@ -357,16 +372,28 @@ def delete_all_chunks_in_collection(collection_name: str, embedding_model: str, new_full_name = f"{base_prefix}_{model_name}_{dims}" new_collection = client.get_or_create_collection(name=new_full_name, metadata=metadata) - # Refresh vector_store references for this base prefix + # Refresh ALL vector_store references comprehensively if hasattr(rag_system.vector_store, "collections"): rag_system.vector_store.collections[new_full_name] = new_collection - # Also store under the base key for compatibility with older code paths - rag_system.vector_store.collections[base_prefix] = new_collection - + + if hasattr(rag_system.vector_store, "collection_map"): + rag_system.vector_store.collection_map[new_full_name] = new_collection + # Also ensure the collection_map is properly updated + rag_system.vector_store.collection_map = { + k: v for k, v in rag_system.vector_store.collection_map.items() + if not k.startswith(f"{base_prefix}_") or k == new_full_name + } + rag_system.vector_store.collection_map[new_full_name] = new_collection + + # Update the specific collection references if base_prefix == "xlsx_documents": rag_system.vector_store.xlsx_collection = new_collection + if hasattr(rag_system.vector_store, "current_xlsx_collection_name"): + rag_system.vector_store.current_xlsx_collection_name = new_full_name elif base_prefix == "pdf_documents": rag_system.vector_store.pdf_collection = new_collection + if hasattr(rag_system.vector_store, "current_pdf_collection_name"): + rag_system.vector_store.current_pdf_collection_name = new_full_name # Nice summary deleted_list = "\n".join(f" • {name}" for name in deleted_names) if deleted_names else " • (none)" @@ -374,7 +401,7 @@ def delete_all_chunks_in_collection(collection_name: str, embedding_model: str, "✅ DELETION COMPLETED\n\n" f"Logical collection: {collection_name}\n" f"Collections removed: {len(deleted_names)}\n" - f"Total chunks deleted (best-effort): {total_deleted_chunks}\n" + f"Total chunks deleted: {total_deleted_chunks}\n" f"Deleted collections:\n{deleted_list}\n\n" "Recreated empty collection for current model:\n" f" • {new_full_name}\n" diff --git a/ai/generative-ai-service/complex-document-rag/files/handlers/xlsx_handler.py b/ai/generative-ai-service/complex-document-rag/files/handlers/xlsx_handler.py index 3c4ca27c0..95a118aaf 100644 --- a/ai/generative-ai-service/complex-document-rag/files/handlers/xlsx_handler.py +++ b/ai/generative-ai-service/complex-document-rag/files/handlers/xlsx_handler.py @@ -56,7 +56,15 @@ def progress(*args, **kwargs): return "❌ ERROR: Vector store not initialized", "" file_path = Path(file.name) - chunks, doc_id = rag_system.xlsx_processor.ingest_xlsx(file_path, entity=entity) + # Now returns 3 values: chunks, doc_id, and chunks_to_delete + result = rag_system.xlsx_processor.ingest_xlsx(file_path, entity=entity) + + # Handle both old (2-tuple) and new (3-tuple) return formats + if len(result) == 3: + chunks, doc_id, chunks_to_delete = result + else: + chunks, doc_id = result + chunks_to_delete = [] progress(0.7, desc="Adding to vector store...") @@ -69,6 +77,17 @@ def progress(*args, **kwargs): for chunk in chunks ] + # Delete original chunks FIRST if they were rewritten + if chunks_to_delete and hasattr(rag_system.vector_store, 'delete_chunks'): + progress(0.7, desc="Removing original chunks that were rewritten...") + try: + rag_system.vector_store.delete_chunks('xlsx_documents', chunks_to_delete) + logger.info(f"Deleted {len(chunks_to_delete)} original chunks that were rewritten") + except Exception as e: + logger.warning(f"Could not delete original chunks: {e}") + + # THEN add the new rewritten chunks to vector store + progress(0.8, desc="Adding rewritten chunks to vector store...") rag_system.vector_store.add_xlsx_chunks(converted_chunks, doc_id) progress(1.0, desc="Complete!") @@ -78,6 +97,9 @@ def progress(*args, **kwargs): actual_collection_name = rag_system.vector_store.xlsx_collection.name collection_name = f"{actual_collection_name} ({embedding_model})" + + # Count rewritten chunks + rewritten_count = sum(1 for chunk in chunks if chunk.get('metadata', {}).get('rewritten', False)) summary = f""" ✅ **XLSX PROCESSING COMPLETE** @@ -86,6 +108,7 @@ def progress(*args, **kwargs): **Document ID:** {doc_id} **Entity:** {entity} **Chunks created:** {len(chunks)} +**Chunks with rewritten content:** {rewritten_count} **Embedding model:** {embedding_model} **Collection:** {collection_name} @@ -123,4 +146,3 @@ def progress(*args, **kwargs): error_msg = f"❌ ERROR: Processing XLSX file failed: {str(e)}" logger.error(f"{error_msg}\n{traceback.format_exc()}") return error_msg, traceback.format_exc() - diff --git a/ai/generative-ai-service/complex-document-rag/files/images/screenshot1.png b/ai/generative-ai-service/complex-document-rag/files/images/screenshot1.png new file mode 100644 index 000000000..b93876012 Binary files /dev/null and b/ai/generative-ai-service/complex-document-rag/files/images/screenshot1.png differ diff --git a/ai/generative-ai-service/complex-document-rag/files/ingest_pdf.py b/ai/generative-ai-service/complex-document-rag/files/ingest_pdf.py index 83110ebd5..f2542c54c 100644 --- a/ai/generative-ai-service/complex-document-rag/files/ingest_pdf.py +++ b/ai/generative-ai-service/complex-document-rag/files/ingest_pdf.py @@ -1,159 +1,355 @@ +# pdf_ingester_v2.py +import logging, time, uuid, re, os from pathlib import Path from typing import List, Dict, Any, Optional, Tuple -import uuid -import time -import re import tiktoken +import pandas as pd + +# Hard deps you likely already have: import pdfplumber -import logging + +# Optional but recommended for tables +try: + import camelot + _HAS_CAMELOT = True +except Exception: + _HAS_CAMELOT = False + +# Optional for embedded files +try: + from pypdf import PdfReader + _HAS_PYPDF = True +except Exception: + _HAS_PYPDF = False logger = logging.getLogger(__name__) class PDFIngester: - def __init__(self, tokenizer: str = "BAAI/bge-small-en-v1.5", chunk_rewriter=None): + """ + PDF -> chunks with consistent semantics to XLSXIngester. + Strategy: + 1) Detect embedded spreadsheets -> delegate to XLSXIngester + 2) Try Camelot (lattice->stream) for vector tables + 3) Fallback to pdfplumber tables + 4) Extract remaining prose blocks + 5) Batch + select + batch-rewrite (same as XLSX flow) + """ + + def __init__(self, tokenizer: str = "BAAI/bge-small-en-v1.5", + chunk_rewriter=None, + batch_size: int = 16): + self.tokenizer_name = tokenizer self.chunk_rewriter = chunk_rewriter + self.batch_size = batch_size self.accurate_tokenizer = tiktoken.get_encoding("cl100k_base") - self.tokenizer_name = tokenizer self.stats = { 'total_chunks': 0, 'rewritten_chunks': 0, - 'processing_time': 0, - 'rewriting_time': 0 + 'high_value_chunks': 0, + 'processing_time': 0.0, + 'extraction_time': 0.0, + 'rewriting_time': 0.0, + 'selection_time': 0.0 } - logger.info("📄 PDF processor initialized") - + # ---------- Utility parity with XLSX ---------- def _count_tokens(self, text: str) -> int: if not text or not text.strip(): return 0 return len(self.accurate_tokenizer.encode(text)) - def _should_rewrite(self, text: str) -> bool: - if not text.strip() or self._count_tokens(text) < 120: - return False + def _is_high_value_chunk(self, text: str, metadata: Dict[str, Any]) -> int: + # Same heuristic as your XLSX version (copy/paste with tiny tweaks) + if len(text.strip()) < 100: + return 0 + score = 0 + if re.search(r'\d+\.?\d*\s*(%|MW|GW|tCO2|ktCO2|MtCO2|€|\$|£|million|billion)', + text, re.IGNORECASE): + score += 2 + key_terms = ['revenue','guidance','margin','cash flow','eps', + 'emission','target','reduction','scope','net-zero', + 'renewable','sustainability','biodiversity'] + score += min(2, sum(1 for term in key_terms if term in text.lower())) + if text.count('|') > 5: + score += 1 + skip_indicators = ['cover', 'disclaimer', 'notice', 'table of contents'] + if any(skip in text.lower()[:200] for skip in skip_indicators): + score = max(0, score - 2) + return min(5, score) - pipe_count = text.count('|') - number_ratio = sum(c.isdigit() for c in text) / len(text) if text else 0 - line_count = len(text.splitlines()) + def _batch_rows_by_token_count(self, rows: List[str], max_tokens: int = 400) -> List[List[str]]: + chunks, current, tok = [], [], 0.0 + for row in rows: + if not row or not row.strip(): + continue + est = len(row.split()) * 1.3 + if tok + est > max_tokens: + if current: chunks.append(current) + current, tok = [row], est + else: + current.append(row); tok += est + if current: chunks.append(current) + return chunks - is_tabular = (pipe_count > 10 or number_ratio > 0.3 or line_count > 20) - messy = 'nan' in text.lower() or 'null' in text.lower() - sentence_count = len([s for s in text.split('.') if s.strip()]) - is_prose = sentence_count > 3 and pipe_count < 5 + def _batch_rewrite_chunks(self, chunks_to_rewrite: List[Tuple[str, Dict[str, Any], int]]): + if not chunks_to_rewrite or not self.chunk_rewriter: + return chunks_to_rewrite + start = time.time() + results = [] - return (is_tabular or messy) and not is_prose + # Fast path if your rewriter supports batch + if hasattr(self.chunk_rewriter, 'rewrite_chunks_batch'): + BATCH_SIZE = min(self.batch_size, len(chunks_to_rewrite)) + batches = [chunks_to_rewrite[i:i+BATCH_SIZE] + for i in range(0, len(chunks_to_rewrite), BATCH_SIZE)] - def _rewrite_chunk(self, text: str, metadata: Dict[str, Any]) -> str: - if not self.chunk_rewriter: - return text + for bidx, batch in enumerate(batches, 1): + batch_input = [{'text': t, 'metadata': m} for (t, m, _) in batch] + try: + rewritten = self.chunk_rewriter.rewrite_chunks_batch(batch_input, batch_size=BATCH_SIZE) + except Exception as e: + logger.warning(f"⚠️ Batch {bidx} failed: {e}") + rewritten = [None]*len(batch) + for i, (orig_text, meta, idx) in enumerate(batch): + new_text = rewritten[i] if i < len(rewritten) else None + if new_text and new_text != orig_text: + meta = meta.copy() + meta['rewritten'] = True + self.stats['rewritten_chunks'] += 1 + results.append((new_text, meta, idx)) + else: + results.append((orig_text, meta, idx)) + else: + # Sequential fallback + for (t, m, idx) in chunks_to_rewrite: + try: + new_t = self.chunk_rewriter.rewrite_chunk(t, metadata=m).strip() + except Exception as e: + logger.warning(f"⚠️ Rewrite failed for chunk {idx}: {e}") + new_t = None + if new_t and new_t != t: + m = m.copy(); m['rewritten'] = True + self.stats['rewritten_chunks'] += 1 + results.append((new_t, m, idx)) + else: + results.append((t, m, idx)) + self.stats['rewriting_time'] += time.time() - start + return results + + # ---------- Ingestion helpers ---------- + def _find_embedded_spreadsheets(self, pdf_path: Path) -> List[Tuple[str, bytes]]: + if not _HAS_PYPDF: + return [] try: - rewritten = self.chunk_rewriter.rewrite_chunk(text, metadata=metadata).strip() - if rewritten: - self.stats['rewritten_chunks'] += 1 - return rewritten - except Exception as e: - logger.warning(f"⚠️ Rewrite failed: {e}") - return text - - def process_pdf( - self, - file_path: str | Path, - entity: Optional[str] = None, - max_rewrite_chunks: int = 100 - ) -> Tuple[List[Dict[str, Any]], str]: - start_time = time.time() - self.stats = { - 'total_chunks': 0, - 'rewritten_chunks': 0, - 'processing_time': 0, - 'rewriting_time': 0 - } - all_chunks = [] - rewrite_candidates = [] - document_id = str(uuid.uuid4()) + reader = PdfReader(str(pdf_path)) + names_tree = reader.trailer.get("/Root", {}).get("/Names", {}) + efiles = names_tree.get("/EmbeddedFiles", {}) + names = efiles.get("/Names", []) + pairs = list(zip(names[::2], names[1::2])) + out = [] + for fname, ref in pairs: + spec = ref.getObject() + if "/EF" in spec and "/F" in spec["/EF"]: + data = spec["/EF"]["/F"].getData() + if str(fname).lower().endswith((".xlsx", ".xls", ".csv")): + out.append((str(fname), data)) + return out + except Exception: + return [] - # -------- 1. Validate Inputs -------- + def _extract_tables_with_camelot(self, pdf_path: Path, pages="all") -> List[pd.DataFrame]: + if not _HAS_CAMELOT: + return [] + dfs: List[pd.DataFrame] = [] try: - file = Path(file_path) - if not file.exists() or not file.is_file(): - raise FileNotFoundError(f"File not found: {file_path}") - if not str(file).lower().endswith(('.pdf',)): - raise ValueError(f"File must be a PDF: {file_path}") + # 1) lattice first + tables = camelot.read_pdf(str(pdf_path), pages=pages, flavor="lattice", line_scale=40) + dfs.extend([t.df for t in tables] if tables else []) + # 2) stream fallback if sparse + if not dfs: + tables = camelot.read_pdf(str(pdf_path), pages=pages, flavor="stream", edge_tol=200) + dfs.extend([t.df for t in tables] if tables else []) except Exception as e: - logger.error(f"❌ Error opening file: {e}") - return [], document_id + logger.info(f"Camelot failed: {e}") + return dfs - if not entity or not isinstance(entity, str): - logger.error("❌ Entity name must be provided as a non-empty string when ingesting a PDF file.") - return [], document_id - entity = entity.strip().lower() - - logger.info(f"📄 Processing {file.name}") - - # -------- 2. Main Extraction -------- - try: - with pdfplumber.open(file) as pdf: - for page_num, page in enumerate(pdf.pages): - try: - text = page.extract_text() - except Exception as e: - logger.warning(f"⚠️ Failed to extract text from page {page_num+1}: {e}") + def _extract_tables_with_pdfplumber(self, pdf_path: Path) -> List[Tuple[pd.DataFrame, int]]: + out = [] + with pdfplumber.open(str(pdf_path)) as pdf: + for pno, page in enumerate(pdf.pages, 1): + try: + tables = page.extract_tables() or [] + except Exception: + tables = [] + for tbl in tables: + if not tbl or len(tbl) < 2: # need header + at least 1 row continue + df = pd.DataFrame(tbl[1:], columns=tbl[0]) + out.append((df, pno)) + return out - if not text or len(text.strip()) < 50: - logger.debug(f"Skipping short/empty page {page_num+1}") - continue + def _df_to_rows(self, df: pd.DataFrame) -> List[str]: + # Normalize like your XLSX rows + df = df.copy() + df = df.replace(r'\n', ' ', regex=True) + df.columns = [str(c).strip() for c in df.columns] + return [ " | ".join([str(v) for v in row if (pd.notna(v) and str(v).strip())]) + for _, row in df.iterrows() ] - metadata = { - "page": page_num + 1, - "source": str(file), - "filename": file.name, - "entity": entity, - "document_id": document_id, - "type": "pdf_page" - } + def _extract_prose_blocks(self, pdf_path: Path) -> List[Tuple[str, int]]: + blocks = [] + with pdfplumber.open(str(pdf_path)) as pdf: + for pno, page in enumerate(pdf.pages, 1): + try: + text = page.extract_text() or "" + except Exception: + text = "" + text = re.sub(r'[ \t]+\n', '\n', text) # unwrap ragged whitespace + text = re.sub(r'\n{3,}', '\n\n', text) + if len(text.strip()) >= 40: + blocks.append((text.strip(), pno)) + return blocks - self.stats['total_chunks'] += 1 + # ---------- Public API ---------- + def ingest_pdf(self, + file_path: str | Path, + entity: Optional[str] = None, + max_rewrite_chunks: int = 30, + min_chunk_score: int = 2, + delete_original_if_rewritten: bool = True, + prefer_tables_first: bool = True + ) -> Tuple[List[Dict[str, Any]], str, List[str]]: + """ + Returns (chunks, document_id, original_chunk_ids_to_delete) + """ + start = time.time() + self.stats = {k: 0.0 if 'time' in k else 0 for k in self.stats} + all_chunks: List[Dict[str, Any]] = [] + original_chunks_to_delete: List[str] = [] + doc_id = str(uuid.uuid4()) - if self._should_rewrite(text): - rewrite_candidates.append((text, metadata)) - else: - all_chunks.append({"content": text.strip(), "metadata": metadata}) - except Exception as e: - logger.error(f"❌ PDF read error: {e}") - return [], document_id + file = Path(file_path) + if not file.exists() or not file.is_file() or not file.suffix.lower() == ".pdf": + raise FileNotFoundError(f"Not a PDF: {file_path}") + if not entity or not isinstance(entity, str): + raise ValueError("Entity name must be provided") + entity = entity.strip().lower() - # -------- 3. Rewrite Candidates (if needed) -------- - rewritten_chunks = [] - try: - if self.chunk_rewriter and rewrite_candidates: - logger.info(f"🧠 Rewriting {min(len(rewrite_candidates), max_rewrite_chunks)} of {len(rewrite_candidates)} chunks") - rewrite_candidates = rewrite_candidates[:max_rewrite_chunks] - for text, metadata in rewrite_candidates: - rewritten = self._rewrite_chunk(text, metadata) - metadata = dict(metadata) # make a copy for safety - metadata["rewritten"] = True - rewritten_chunks.append({"content": rewritten, "metadata": metadata}) + # 0) Router: embedded spreadsheets? + embedded = self._find_embedded_spreadsheets(file) + if embedded: + # Save, then delegate to your XLSX flow for each + from your_xlsx_module import XLSXIngester # <-- import your class + xlsx_ingester = XLSXIngester(chunk_rewriter=self.chunk_rewriter) + for fname, data in embedded: + tmp = file.with_name(f"__embedded__{fname}") + with open(tmp, "wb") as f: f.write(data) + x_chunks, _, _ = xlsx_ingester.ingest_xlsx( + tmp, entity=entity, + max_rewrite_chunks=max_rewrite_chunks, + min_chunk_score=min_chunk_score, + delete_original_if_rewritten=delete_original_if_rewritten + ) + # Tag source and page unknown for embedded + for ch in x_chunks: + ch['metadata']['source_pdf'] = str(file) + ch['metadata']['embedded_file'] = fname + all_chunks.append(ch) + try: os.remove(tmp) + except Exception: pass + # Note: continue to extract PDF content as well (often desirable) + + # 1) Tables (Camelot → pdfplumber) + extraction_start = time.time() + table_chunks: List[Dict[str, Any]] = [] + if prefer_tables_first: + dfs = self._extract_tables_with_camelot(file) + if not dfs: + for df, pno in self._extract_tables_with_pdfplumber(file): + rows = self._df_to_rows(df) + if not rows: continue + chunks = self._batch_rows_by_token_count(rows) + for cidx, rows_batch in enumerate(chunks): + content = f"Detected Table (pdfplumber)\n" + "\n".join(rows_batch) + meta = { + "page": pno, + "source": str(file), + "filename": file.name, + "entity": entity, + "document_id": doc_id, + "type": "pdf_table", + "extractor": "pdfplumber" + } + table_chunks.append({'id': f"{doc_id}_chunk_{len(all_chunks)+len(table_chunks)}", + 'content': content, 'metadata': meta}) else: - rewritten_chunks = [{"content": text, "metadata": metadata} for text, metadata in rewrite_candidates] - except Exception as e: - logger.warning(f"⚠️ Error rewriting chunks: {e}") - for text, metadata in rewrite_candidates: - rewritten_chunks.append({"content": text, "metadata": metadata}) + # Camelot doesn't preserve page numbers directly; we’ll mark unknown unless available on t.parsing_report + for t_idx, df in enumerate(dfs): + rows = self._df_to_rows(df) + if not rows: continue + chunks = self._batch_rows_by_token_count(rows) + for cidx, rows_batch in enumerate(chunks): + content = f"Detected Table (camelot)\n" + "\n".join(rows_batch) + meta = { + "page": None, # could be added by parsing report if needed + "source": str(file), + "filename": file.name, + "entity": entity, + "document_id": doc_id, + "type": "pdf_table", + "extractor": "camelot", + "table_index": t_idx + } + table_chunks.append({'id': f"{doc_id}_chunk_{len(all_chunks)+len(table_chunks)}", + 'content': content, 'metadata': meta}) - all_chunks.extend(rewritten_chunks) + # 2) Prose blocks + prose_chunks: List[Dict[str, Any]] = [] + for text, pno in self._extract_prose_blocks(file): + meta = { + "page": pno, "source": str(file), "filename": file.name, + "entity": entity, "document_id": doc_id, "type": "pdf_page_text" + } + prose_chunks.append({'id': f"{doc_id}_chunk_{len(all_chunks)+len(table_chunks)+len(prose_chunks)}", + 'content': text, 'metadata': meta}) - # -------- 4. Finalize IDs and Metadata -------- - try: - for i, chunk in enumerate(all_chunks): - chunk["id"] = f"{document_id}_chunk_{i}" - chunk.setdefault("metadata", {}) - chunk["metadata"]["document_id"] = document_id - except Exception as e: - logger.warning(f"⚠️ Error finalizing chunk IDs: {e}") + extracted = (table_chunks + prose_chunks) if prefer_tables_first else (prose_chunks + table_chunks) + all_chunks.extend(extracted) + self.stats['extraction_time'] = time.time() - extraction_start + self.stats['total_chunks'] = len(all_chunks) + + # 3) Smart selection + rewriting (same semantics as XLSX) + if self.chunk_rewriter and max_rewrite_chunks > 0 and all_chunks: + # score + selection_start = time.time() + scored = [] + for i, ch in enumerate(all_chunks): + s = self._is_high_value_chunk(ch['content'], ch['metadata']) + if s >= min_chunk_score: + scored.append((ch['content'], ch['metadata'], i, s)) + self.stats['high_value_chunks'] += 1 + scored.sort(key=lambda x: x[3], reverse=True) + to_rewrite = [(t, m, idx) for (t, m, idx, _) in scored[:max_rewrite_chunks]] + self.stats['selection_time'] = time.time() - selection_start + + # rewrite + rewritten = self._batch_rewrite_chunks(to_rewrite) + for new_text, new_meta, original_idx in rewritten: + if new_meta.get('rewritten'): + original_id = all_chunks[original_idx]['id'] + if delete_original_if_rewritten: + # replace in place but mark original id for vector-store deletion + original_chunks_to_delete.append(original_id) + new_id = f"{original_id}_rewritten" + all_chunks[original_idx]['id'] = new_id + all_chunks[original_idx]['content'] = new_text + all_chunks[original_idx]['metadata'] = {**all_chunks[original_idx]['metadata'], **new_meta, + "original_chunk_id": original_id} - self.stats['processing_time'] = time.time() - start_time - logger.info(f"✅ PDF processing complete in {self.stats['processing_time']:.2f}s — Total: {len(all_chunks)}") + self.stats['processing_time'] = time.time() - start + logger.info(f"✅ PDF processed: {file.name} — chunks: {len(all_chunks)}; " + f"extract {self.stats['extraction_time']:.2f}s; " + f"rewrite {self.stats['rewriting_time']:.2f}s") - return all_chunks, document_id + return all_chunks, doc_id, original_chunks_to_delete diff --git a/ai/generative-ai-service/complex-document-rag/files/ingest_xlsx.py b/ai/generative-ai-service/complex-document-rag/files/ingest_xlsx.py index a3729ba4d..22d3d355a 100644 --- a/ai/generative-ai-service/complex-document-rag/files/ingest_xlsx.py +++ b/ai/generative-ai-service/complex-document-rag/files/ingest_xlsx.py @@ -116,8 +116,8 @@ def _batch_rows_by_token_count(self, rows: List[str], max_tokens: int = 400) -> return chunks - def _batch_rewrite_chunks(self, chunks_to_rewrite: List[Tuple[str, Dict[str, Any]]]) -> List[Tuple[str, Dict[str, Any]]]: - """Fast parallel batch rewriting""" + def _batch_rewrite_chunks(self, chunks_to_rewrite: List[Tuple[str, Dict[str, Any], int]]) -> List[Tuple[str, Dict[str, Any], int]]: + """Fast parallel batch rewriting - now returns tuples with indices""" if not chunks_to_rewrite or not self.chunk_rewriter: return chunks_to_rewrite @@ -140,22 +140,29 @@ def _batch_rewrite_chunks(self, chunks_to_rewrite: List[Tuple[str, Dict[str, Any logger.info(f"📦 Processing {len(batches)} batches of size {BATCH_SIZE}") - def process_batch(batch_idx: int, batch: List[Tuple[str, Dict[str, Any]]]): - batch_input = [{'text': text, 'metadata': metadata} for text, metadata in batch] + def process_batch(batch_idx: int, batch: List[Tuple[str, Dict[str, Any], int]]): + batch_input = [{'text': text, 'metadata': metadata} for text, metadata, _ in batch] try: logger.info(f" Processing batch {batch_idx + 1}/{len(batches)}") rewritten_texts = self.chunk_rewriter.rewrite_chunks_batch(batch_input, batch_size=BATCH_SIZE) batch_result = [] - for i, (original_text, metadata) in enumerate(batch): + for i, (original_text, metadata, chunk_idx) in enumerate(batch): rewritten_text = rewritten_texts[i] if i < len(rewritten_texts) else None - if rewritten_text and rewritten_text != original_text: + # Check for None (failure) or empty string (failure) explicitly + if rewritten_text is None or rewritten_text == "": + logger.warning(f" ⚠️ Chunk {chunk_idx} rewriting failed, keeping original") + batch_result.append((original_text, metadata, chunk_idx)) + elif rewritten_text != original_text: + # Successfully rewritten and different from original metadata = metadata.copy() metadata["rewritten"] = True + metadata["original_chunk_id"] = f"{metadata.get('document_id', '')}_chunk_{chunk_idx}" self.stats['rewritten_chunks'] += 1 - batch_result.append((rewritten_text, metadata)) + batch_result.append((rewritten_text, metadata, chunk_idx)) else: - batch_result.append((original_text, metadata)) + # Rewritten but same as original (no changes needed) + batch_result.append((original_text, metadata, chunk_idx)) logger.info(f" ✅ Batch {batch_idx + 1} complete") return batch_result @@ -178,19 +185,20 @@ def process_batch(batch_idx: int, batch: List[Tuple[str, Dict[str, Any]]]): else: # Fallback to sequential processing logger.info(f"🔄 Sequential rewriting for {len(chunks_to_rewrite)} chunks") - for text, metadata in chunks_to_rewrite: + for text, metadata, chunk_idx in chunks_to_rewrite: try: rewritten = self.chunk_rewriter.rewrite_chunk(text, metadata=metadata).strip() if rewritten: metadata = metadata.copy() metadata["rewritten"] = True + metadata["original_chunk_id"] = f"{metadata.get('document_id', '')}_chunk_{chunk_idx}" self.stats['rewritten_chunks'] += 1 - results.append((rewritten, metadata)) + results.append((rewritten, metadata, chunk_idx)) else: - results.append((text, metadata)) + results.append((text, metadata, chunk_idx)) except Exception as e: logger.warning(f"Failed to rewrite chunk: {e}") - results.append((text, metadata)) + results.append((text, metadata, chunk_idx)) self.stats['rewriting_time'] = time.time() - start_time return results @@ -200,9 +208,14 @@ def ingest_xlsx( file_path: str | Path, entity: Optional[str] = None, max_rewrite_chunks: int = 30, # Reasonable default - min_chunk_score: int = 2 # Only rewrite chunks with score >= 2 - ) -> Tuple[List[Dict[str, Any]], str]: - """Fast XLSX processing with smart chunk selection""" + min_chunk_score: int = 2, # Only rewrite chunks with score >= 2 + delete_original_if_rewritten: bool = True # New parameter + ) -> Tuple[List[Dict[str, Any]], str, List[str]]: + """Fast XLSX processing with smart chunk selection + + Returns: + Tuple of (chunks, document_id, original_chunk_ids_to_delete) + """ start_time = time.time() self.stats = { @@ -216,6 +229,7 @@ def ingest_xlsx( } all_chunks = [] document_id = str(uuid.uuid4()) + original_chunks_to_delete = [] # Validate inputs file = Path(file_path) @@ -297,38 +311,43 @@ def ingest_xlsx( # Smart chunk selection for rewriting selection_start = time.time() if self.chunk_rewriter and max_rewrite_chunks > 0: - # Score all chunks + # Score all chunks and include their indices scored_chunks = [] - for chunk in all_chunks: + for i, chunk in enumerate(all_chunks): score = self._is_high_value_chunk(chunk['content'], chunk['metadata']) if score >= min_chunk_score: - scored_chunks.append((chunk['content'], chunk['metadata'], score)) + scored_chunks.append((chunk['content'], chunk['metadata'], i, score)) self.stats['high_value_chunks'] += 1 # Sort by score and take top N - scored_chunks.sort(key=lambda x: x[2], reverse=True) - chunks_to_rewrite = [(text, meta) for text, meta, _ in scored_chunks[:max_rewrite_chunks]] + scored_chunks.sort(key=lambda x: x[3], reverse=True) + chunks_to_rewrite = [(text, meta, idx) for text, meta, idx, _ in scored_chunks[:max_rewrite_chunks]] self.stats['selection_time'] = time.time() - selection_start - logger.info(f"🎯 Selected {len(chunks_to_rewrite)} high-value chunks from {self.stats['high_value_chunks']} candidates in {self.stats['selection_time']:.2f}s") + logger.info(f"Selected {len(chunks_to_rewrite)} high-value chunks from {self.stats['high_value_chunks']} candidates in {self.stats['selection_time']:.2f}s") if chunks_to_rewrite: # Rewrite selected chunks rewritten = self._batch_rewrite_chunks(chunks_to_rewrite) - # Create mapping for quick lookup - rewritten_map = {} - for text, meta in rewritten: - if meta.get('rewritten'): - key = f"{meta['sheet']}_{meta.get('chunk_index', 0)}" - rewritten_map[key] = text - # Update original chunks with rewritten content - for chunk in all_chunks: - key = f"{chunk['metadata']['sheet']}_{chunk['metadata'].get('chunk_index', 0)}" - if key in rewritten_map: - chunk['content'] = rewritten_map[key] - chunk['metadata']['rewritten'] = True + for rewritten_text, rewritten_meta, original_idx in rewritten: + if rewritten_meta.get('rewritten'): + # Store the original chunk ID for deletion + original_chunk_id = all_chunks[original_idx]['id'] + if delete_original_if_rewritten: + original_chunks_to_delete.append(original_chunk_id) + + # Create NEW ID for rewritten chunk (append _rewritten) + new_chunk_id = f"{original_chunk_id}_rewritten" + + # Update the chunk with rewritten content and NEW ID + all_chunks[original_idx]['id'] = new_chunk_id + all_chunks[original_idx]['content'] = rewritten_text + all_chunks[original_idx]['metadata'] = rewritten_meta + all_chunks[original_idx]['metadata']['original_chunk_id'] = original_chunk_id + + logger.info(f"✅ Replaced chunk {original_idx} with rewritten version (new ID: {new_chunk_id})") self.stats['processing_time'] = time.time() - start_time @@ -339,6 +358,8 @@ def ingest_xlsx( logger.info(f"📊 Total chunks: {len(all_chunks)}") logger.info(f"🎯 High-value chunks: {self.stats['high_value_chunks']}") logger.info(f"🔥 Rewritten chunks: {self.stats['rewritten_chunks']}") + if original_chunks_to_delete: + logger.info(f"🗑️ Original chunks to delete: {len(original_chunks_to_delete)}") logger.info(f"\n⏱️ TIMING BREAKDOWN:") logger.info(f" Extraction: {self.stats['extraction_time']:.2f}s") logger.info(f" Selection: {self.stats['selection_time']:.2f}s") @@ -348,7 +369,7 @@ def ingest_xlsx( logger.info(f" Speed: {len(all_chunks)/self.stats['processing_time']:.1f} chunks/sec") logger.info(f"{'='*60}\n") - return all_chunks, document_id + return all_chunks, document_id, original_chunks_to_delete def main(): """CLI interface""" @@ -359,6 +380,7 @@ def main(): parser.add_argument("--max-rewrite", type=int, default=30, help="Maximum chunks to rewrite") parser.add_argument("--min-score", type=int, default=2, help="Minimum score for rewriting (0-5)") parser.add_argument("--no-rewrite", action="store_true", help="Skip chunk rewriting") + parser.add_argument("--keep-originals", action="store_true", help="Keep original chunks even if rewritten") args = parser.parse_args() @@ -388,11 +410,12 @@ def main(): # Process file try: - chunks, doc_id = processor.ingest_xlsx( + chunks, doc_id, chunks_to_delete = processor.ingest_xlsx( args.input, entity=args.entity, max_rewrite_chunks=args.max_rewrite, - min_chunk_score=args.min_score + min_chunk_score=args.min_score, + delete_original_if_rewritten=not args.keep_originals ) # Save results @@ -400,7 +423,8 @@ def main(): result_data = { "document_id": doc_id, "chunks": chunks, - "stats": processor.stats + "stats": processor.stats, + "original_chunks_to_delete": chunks_to_delete } with open(args.output, "w", encoding="utf-8") as f: diff --git a/ai/generative-ai-service/complex-document-rag/files/local_rag_agent.py b/ai/generative-ai-service/complex-document-rag/files/local_rag_agent.py index b5bf1c580..5b8b29f55 100644 --- a/ai/generative-ai-service/complex-document-rag/files/local_rag_agent.py +++ b/ai/generative-ai-service/complex-document-rag/files/local_rag_agent.py @@ -74,7 +74,7 @@ class OCIModelHandler: "grok-4": { "model_id": os.getenv("OCI_GROK_4_MODEL_ID"), "request_type": "generic", - "max_output_tokens": 120000, + "max_output_tokens": 8000, # Reduced from 120000 for faster response "default_params": { "temperature": 1, "top_p": 1 @@ -84,7 +84,7 @@ class OCIModelHandler: "model_id": os.getenv("OCI_GROK_3_MODEL_ID", os.getenv("GROK_MODEL_ID")), "request_type": "generic", - "max_output_tokens": 16000, + "max_output_tokens": 8000, # Reduced from 16000 for consistency "default_params": { "temperature": 0.7, "top_p": 0.9 @@ -94,7 +94,7 @@ class OCIModelHandler: "model_id": os.getenv("OCI_GROK_3_FAST_MODEL_ID", os.getenv("GROK_MODEL_ID")), "request_type": "generic", - "max_output_tokens": 16000, + "max_output_tokens": 4000, # Optimized for speed "default_params": { "temperature": 0.7, "top_p": 0.9 @@ -197,13 +197,29 @@ def __init__(self, model_name: str = "grok-3", config_profile: str = "DEFAULT", region = self.model_config.get("region", "us-chicago-1") self.endpoint = f"https://inference.generativeai.{region}.oci.oraclecloud.com" - # Initialize OCI client + # Initialize OCI client with better retry and timeout settings config = oci.config.from_file("~/.oci/config", config_profile) + + # Create a custom retry strategy for chunk rewriting operations + retry_strategy = oci.retry.RetryStrategyBuilder( + max_attempts=3, + retry_max_wait_between_calls_seconds=10, + retry_base_sleep_time_seconds=2, + retry_exponential_growth_multiplier=2, + retry_eligible_service_errors=[429, 500, 502, 503, 504], + service_error_retry_config={ + -1: [] # Retry on timeout errors + } + ).add_service_error_check( + service_error_retry_config={-1: []}, + service_error_retry_on_any_5xx=True + ).get_retry_strategy() + self.client = oci.generative_ai_inference.GenerativeAiInferenceClient( config=config, service_endpoint=self.endpoint, - retry_strategy=oci.retry.NoneRetryStrategy(), - timeout=(10, 240) + retry_strategy=retry_strategy, + timeout=(30, 120) # Increased timeout: 30s connect, 120s read for chunk rewriting ) print(f"✅ Initialized OCI handler for {model_name}") @@ -359,7 +375,7 @@ def get_model_info(self) -> Dict[str, Any]: class RAGSystem: def __init__(self, vector_store: EnhancedVectorStore = None, model_name: str = None, use_cot: bool = False, skip_analysis: bool = False, - quantization: str = None, use_oracle_db: bool = True, collection: str = "Multi-Collection", + quantization: str = None, use_oracle_db: bool = True, collection: str = "multi", embedding_model: str = "cohere-embed-multilingual-v3.0"): """Initialize local RAG agent with vector store and local LLM @@ -484,9 +500,54 @@ def __init__(self, vector_store: EnhancedVectorStore = None, model_name: str = N tokenizer=self.tokenizer ) logger.info(f"Agents initialized: {list(self.agents.keys())}") + # --- known tag cache loaded from vector store - helps identify entities in the query --- + self.known_tags: set[str] = set() + try: + self.refresh_known_tags() + except Exception as e: + logger.warning(f"[RAG] Could not load known tags on init: {e}") + def _vector_store_all_ids(self) -> list[str]: + """ + Return ALL canonical document/entity IDs (tags) from the vector store. + Tries a few common method names to avoid tight coupling. + """ + vs = self.vector_store + # Try common APIs + for attr in ("list_ids", "get_all_ids", "get_all_document_ids", "all_ids"): + if hasattr(vs, attr) and callable(getattr(vs, attr)): + try: + ids = getattr(vs, attr)() + return [str(x) for x in ids] + except Exception as e: + logger.debug(f"[RAG] {_safe_name(vs)}.{attr} failed: {e}") + # Fallback: try listing collections and aggregating + try: + if hasattr(vs, "list_collections"): + coll_names = vs.list_collections() + ids = [] + for c in coll_names: + try: + ids.extend(vs.list_ids(collection=c)) + except Exception: + pass + return [str(x) for x in ids] + except Exception as e: + logger.debug(f"[RAG] Could not enumerate collections: {e}") + return [] + def refresh_known_tags(self) -> None: + """ + Populate self.known_tags (lowercased) from the vector store. + Call this after any ingest/update that changes IDs. + """ + ids = self._vector_store_all_ids() + self.known_tags = {s.lower() for s in ids if isinstance(s, str)} + logger.info(f"[RAG] known_tags loaded: {len(self.known_tags)}") + def _safe_name(obj) -> str: + return getattr(obj, "__class__", type(obj)).__name__ + def _initialize_sub_agents(self, llm_model: str) -> bool: """ Initializes agents for agentic workflows (planner, researcher, etc.) @@ -521,22 +582,28 @@ def _initialize_sub_agents(self, llm_model: str) -> bool: def process_query_with_multi_collection_context(self, query: str, multi_collection_context: List[Dict[str, Any]], is_comparison_report: bool = False, - collection_mode: str = "multi") -> Dict[str, Any]: - """Process a query with pre-retrieved multi-collection context""" + collection_mode: str = "multi", + provided_entities: Optional[List[str]] = None) -> Dict[str, Any]: + """Process a query with pre-retrieved multi-collection context and optional provided entities""" logger.info(f"Processing query with {len(multi_collection_context)} multi-collection chunks") + if provided_entities: + logger.info(f"Using provided entities: {provided_entities}") if self.use_cot: - return self._process_query_with_report_agent(query, multi_collection_context, is_comparison_report, collection_mode=collection_mode) + return self._process_query_with_report_agent(query, multi_collection_context, is_comparison_report, + collection_mode=collection_mode, provided_entities=provided_entities) else: # For non-CoT mode, use the context directly return self._generate_response(query, multi_collection_context) + def _process_query_with_report_agent( self, query: str, multi_collection_context: Optional[List[Dict[str, Any]]] = None, is_comparison_report: bool = False, - collection_mode: str = "multi" + collection_mode: str = "multi", + provided_entities: Optional[List[str]] = None ) -> Dict[str, Any]: """ Report agent pipeline: @@ -558,8 +625,10 @@ def _process_query_with_report_agent( # STEP 1: Plan the report logger.info("Planning report sections...") + if provided_entities: + logger.info(f"Using provided entities for planning: {provided_entities}") try: - result = planner.plan(query, is_comparison_report=is_comparison_report) + result = planner.plan(query, is_comparison_report=is_comparison_report, provided_entities=provided_entities) if not isinstance(result, tuple) or len(result) != 3: raise ValueError(f"Planner returned unexpected format: {type(result)} → {result}") plan, entities, is_comparison = result @@ -799,7 +868,7 @@ def main(): parser = argparse.ArgumentParser(description="Query documents using local LLM") parser.add_argument("--query", required=True, help="Query to search for") parser.add_argument("--embed", default="oracle", choices=["oracle", "chromadb"], help="embed backend to use") - parser.add_argument("--model", default="qwen2", help="Model to use (default: qwen2)") + parser.add_argument("--model", default="grok3", help="Model to use (default: qwen2)") parser.add_argument("--collection", help="Collection to search (PDF, Repository, General Knowledge)") parser.add_argument("--use-cot", action="store_true", help="Use Chain of Thought reasoning") parser.add_argument("--store-path", default="embed", help="Path to ChromaDB store") diff --git a/ai/generative-ai-service/complex-document-rag/files/oci_embedding_handler.py b/ai/generative-ai-service/complex-document-rag/files/oci_embedding_handler.py index 2d7a7a19f..82404bb64 100644 --- a/ai/generative-ai-service/complex-document-rag/files/oci_embedding_handler.py +++ b/ai/generative-ai-service/complex-document-rag/files/oci_embedding_handler.py @@ -88,6 +88,9 @@ def __init__(self, config_profile: OCI config profile to use compartment_id: OCI compartment ID """ + # Load environment variables from .env file if not already loaded + load_dotenv() + self.model_name = model_name # Validate model name @@ -100,6 +103,10 @@ def __init__(self, # Set compartment ID - check both OCI_COMPARTMENT_ID and COMPARTMENT_ID for compatibility self.compartment_id = compartment_id or os.getenv("OCI_COMPARTMENT_ID") or os.getenv("COMPARTMENT_ID") + # Log if compartment ID is missing + if not self.compartment_id: + logger.error("❌ No compartment ID found. Please set COMPARTMENT_ID or OCI_COMPARTMENT_ID in .env file") + # Set endpoint region based on model configuration (supports multiple OCI regions) endpoint_region = self.model_config.get("endpoint", "us-chicago-1") self.endpoint = f"https://inference.generativeai.{endpoint_region}.oci.oraclecloud.com" diff --git a/ai/generative-ai-service/complex-document-rag/files/requirements.txt b/ai/generative-ai-service/complex-document-rag/files/requirements.txt index c1f17bffb..9d81f7410 100644 --- a/ai/generative-ai-service/complex-document-rag/files/requirements.txt +++ b/ai/generative-ai-service/complex-document-rag/files/requirements.txt @@ -13,7 +13,7 @@ pdfplumber==0.11.4 python-docx==1.1.2 # NLP and Embeddings -transformers==4.53.0 +transformers==4.44.2 tokenizers==0.19.1 tiktoken==0.7.0 diff --git a/ai/generative-ai-service/complex-document-rag/files/vector_store.py b/ai/generative-ai-service/complex-document-rag/files/vector_store.py index 04eecc5b4..f5f4ce16f 100644 --- a/ai/generative-ai-service/complex-document-rag/files/vector_store.py +++ b/ai/generative-ai-service/complex-document-rag/files/vector_store.py @@ -4,7 +4,7 @@ Extends the existing VectorStore to support OCI Cohere embeddings alongside ChromaDB defaults """ from oci_embedding_handler import OCIEmbeddingHandler, EmbeddingModelManager -import logging +import logging, numbers from typing import List, Dict, Any, Optional, Union, Tuple from pathlib import Path import chromadb @@ -27,143 +27,137 @@ def __init__(self, *args, **kwargs): "VectorStore is an abstract base class. Use EnhancedVectorStore instead." ) - - class EnhancedVectorStore(VectorStore): """Enhanced vector store with multi-embedding model support (SAFER VERSION)""" - def __init__(self, persist_directory: str = "embeddings", embedding_model: str = "cohere-embed-multilingual-v3.0", embedder=None): + def __init__(self, persist_directory: str = "embeddings", + embedding_model: str = "cohere-embed-multilingual-v3.0", + embedder=None): self.embedding_manager = EmbeddingModelManager() - self.embedding_model_name = embedding_model # string (name) - self.embedder = embedder # object (has .embed_query/.embed_documents) + self.embedding_model_name = embedding_model + self.embedder = embedder self.embedding_dimensions = getattr(embedder, "model_config", {}).get("dimensions", None) if embedder else None - - # If embedder is provided, use it; otherwise fall back to embedding manager - if embedder: - self.embedding_model = embedder - else: - self.embedding_model = self.embedding_manager.get_model(embedding_model) + # Resolve embedding handler + self.embedding_model = embedder or self.embedding_manager.get_model(embedding_model) + + # Chroma client (ensure Settings import: from chromadb.config import Settings) self.client = chromadb.PersistentClient( path=persist_directory, - settings=Settings(allow_reset=True) + settings=Settings(allow_reset=True, anonymized_telemetry=False) ) - # Always get dimensions from the embedding manager or embedder - embedding_dim = None - if embedder: - # Use the provided embedder's dimensions - info = embedder.get_model_info() - if info and "dimensions" in info: - embedding_dim = info["dimensions"] - else: - raise ValueError( - f"Cannot determine embedding dimensions from provided embedder." - ) - elif isinstance(self.embedding_model, str): - # Try to get from embedding_manager - embedding_info = self.embedding_manager.get_model_info(self.embedding_model_name) - if embedding_info and "dimensions" in embedding_info: - embedding_dim = embedding_info["dimensions"] - else: - raise ValueError( - f"Unknown embedding dimension for model '{self.embedding_model_name}'." - " Please update your EmbeddingModelManager to include this info." - ) - else: - # Should have a get_model_info() method - info = self.embedding_model.get_model_info() - if info and "dimensions" in info: - embedding_dim = info["dimensions"] + # Resolve dimensions once + self._embedding_dim = self._resolve_dimensions() + + # Internal maps/handles + self.collections: dict[str, Any] = {} + self.collection_map = self.collections # alias + + # Create/bind base collections (pdf/xlsx) for current model+dim + self._ensure_base_collections(self._embedding_dim) + + logger.info(f"✅ Enhanced vector store initialized with {self.embedding_model_name} ({self._embedding_dim}D)") + + # --- Utility: sanitize metadata before sending to Chroma --- + def _safe_metadata(self, metadata: dict) -> dict: + """Ensure Chroma-compatible metadata (convert everything non-str → str).""" + safe = {} + for k, v in (metadata or {}).items(): + key = str(k) + if isinstance(v, str): + safe[key] = v + elif isinstance(v, numbers.Number): # catches numpy.int64, Decimal, etc. + safe[key] = str(v) + elif v is None: + continue else: - raise ValueError( - f"Cannot determine embedding dimensions for non-string embedding model {self.embedding_model}." - ) + safe[key] = str(v) + return safe + + def _as_int(self, x): + try: + return int(x) + except Exception: + return None + def _resolve_dimensions(self) -> int: + if self.embedder: + info = self.embedder.get_model_info() + if info and "dimensions" in info: + return int(info["dimensions"]) + raise ValueError("Cannot determine embedding dimensions from provided embedder.") + if isinstance(self.embedding_model, str): + info = self.embedding_manager.get_model_info(self.embedding_model_name) + if info and "dimensions" in info: + return int(info["dimensions"]) + raise ValueError(f"Unknown embedding dimension for model '{self.embedding_model_name}'.") + # non-string handler + info = self.embedding_model.get_model_info() + if info and "dimensions" in info: + return int(info["dimensions"]) + raise ValueError("Cannot determine embedding dimensions for non-string embedding model.") + + def _ensure_base_collections(self, embedding_dim: int): + base_collection_names = ["pdf_documents", "xlsx_documents"] metadata = { "hnsw:space": "cosine", - "embedding_model": self.embedding_model_name, - "embedding_dimensions": embedding_dim + "embedding_model": self.embedding_model_name, # keep int in memory + "embedding_dimensions": embedding_dim # keep int in memory } - base_collection_names = [ - "pdf_documents", "xlsx_documents" - ] - - self.collections = {} - for base_name in base_collection_names: full_name = f"{base_name}_{self.embedding_model_name}_{embedding_dim}" - try: - # Check for exact match first - existing_collections = self.client.list_collections() - by_name = {c.name: c for c in existing_collections} - if full_name in by_name: - coll = by_name[full_name] - actual_dim = coll.metadata.get("embedding_dimensions", None) - if actual_dim != embedding_dim: - # This should never happen unless DB is corrupt - logger.error( - f"❌ Dimension mismatch for collection '{full_name}'. Expected {embedding_dim}, found {actual_dim}." - ) - raise ValueError( - f"Collection '{full_name}' has dim {actual_dim}, but expected {embedding_dim}." - ) - collection = coll - logger.info(f"🎯 Using existing collection '{full_name}' ({embedding_dim}D, {coll.count()} chunks)") - else: - # Safe: only ever create the *fully qualified* name - collection = self.client.get_or_create_collection( - name=full_name, - metadata=metadata - ) - logger.info(f"🗂️ Created new collection '{full_name}' with dimension {embedding_dim}") + # Prefer fast path: get_or_create with safe metadata + coll = self.client.get_or_create_collection( + name=full_name, + metadata=self._safe_metadata(metadata) # ← sanitize only here + ) - self.collections[full_name] = collection + # Defensive dim check (cast back to int if Chroma stored as str) + actual_dim = self._as_int((coll.metadata or {}).get("embedding_dimensions")) + if actual_dim and actual_dim != embedding_dim: + logger.error(f"❌ Dimension mismatch for '{full_name}'. Expected {embedding_dim}, found {actual_dim}.") + raise ValueError(f"Collection '{full_name}' has dim {actual_dim}, expected {embedding_dim}.") - # For direct access: always the selected model/dim + self.collections[full_name] = coll if base_name == "pdf_documents": - self.pdf_collection = collection - elif base_name == "xlsx_documents": - self.xlsx_collection = collection + self.pdf_collection = coll + self.current_pdf_collection_name = full_name + else: + self.xlsx_collection = coll + self.current_xlsx_collection_name = full_name + logger.info(f"🗂️ Ready collection '{full_name}' ({embedding_dim}D, {coll.count()} chunks)") except Exception as e: logger.error(f"❌ Failed to create or get collection '{full_name}': {e}") raise - # Only include full names in the map; never ambiguous short names - self.collection_map = self.collections - - logger.info(f"✅ Enhanced vector store initialized with {embedding_model} ({embedding_dim}D)") - - def get_collection_key(self, base_name: str) -> str: - # Build the correct key for a base collection name - embedding_dim = ( - self.get_embedding_info()["dimensions"] - if hasattr(self, "get_embedding_info") - else 1024 - ) - return f"{base_name}_{self.embedding_model_name}_{embedding_dim}" + return f"{base_name}_{self.embedding_model_name}_{self._embedding_dim}" def _find_collection_variants(self, base_name: str): """ - Yield (name, collection) for all collections in the DB that start with base_name + "_", - across ANY embedding model/dimension (not just the ones cached at init). + Yield (name, collection) for all collections that start with base_name+"_". + Never create here—only fetch existing collections. """ for c in self.client.list_collections(): try: - name = c.name - except Exception: - # Some clients return plain dicts name = getattr(c, "name", None) or (c.get("name") if isinstance(c, dict) else None) + except Exception: + name = None if not name: continue - if name.startswith(base_name + "_"): - # get_or_create is fine; if it exists it just returns it - yield name, self.client.get_or_create_collection(name=name) + if not name.startswith(base_name + "_"): + continue + try: + coll = self.client.get_collection(name=name) # ← get (NOT get_or_create) + yield name, coll + except Exception as e: + logger.warning(f"Skip collection {name}: {e}") + def list_documents(self, collection_name: str) -> List[Dict[str, Any]]: """ @@ -524,145 +518,249 @@ def _add_cite(self, meta: Union[Dict[str, Any], "Metadata"]) -> Dict[str, Any]: return meta + def delete_chunks(self, collection_name: str, chunk_ids: List[str]): + """Delete specific chunks from a collection by their IDs + + Args: + collection_name: Name of the collection (e.g., 'xlsx_documents', 'pdf_documents') + chunk_ids: List of chunk IDs to delete + """ + if not chunk_ids: + return + + try: + # Get the appropriate collection + if collection_name == "xlsx_documents": + collection = self.xlsx_collection + elif collection_name == "pdf_documents": + collection = self.pdf_collection + else: + # Try to get from collection map + collection = self.collection_map.get(collection_name) + if not collection: + # Try with current model/dimension suffix + full_name = self.get_collection_key(collection_name) + collection = self.collection_map.get(full_name) + + if not collection: + logger.error(f"Collection {collection_name} not found") + return + + # Delete the chunks + collection.delete(ids=chunk_ids) + logger.info(f"✅ Deleted {len(chunk_ids)} chunks from {collection_name}") + + except Exception as e: + logger.error(f"❌ Failed to delete chunks: {e}") + raise + def add_xlsx_chunks(self, chunks: List[Dict[str, Any]], document_id: str): """Add XLSX chunks to the vector store with proper embedding handling""" if not chunks: return - + # Extract texts and metadata - texts = [chunk["content"] for chunk in chunks] - metadatas = [chunk["metadata"] for chunk in chunks] - ids = [chunk["id"] for chunk in chunks] - - # Check collection metadata to see what dimensions are expected + texts = [c["content"] for c in chunks] + metadatas = [self._add_cite(c.get("metadata", {})) for c in chunks] # add cite & normalize + ids = [c["id"] for c in chunks] + + # Normalize expected dimensions/model from collection metadata collection_metadata = self.xlsx_collection.metadata or {} - expected_dimensions = collection_metadata.get('embedding_dimensions') - expected_model = collection_metadata.get('embedding_model') - - # Handle embeddings based on model type + expected_dimensions = self._as_int(collection_metadata.get("embedding_dimensions")) + expected_model = collection_metadata.get("embedding_model") + + # Path A: chroma-default (Chroma embeds on add) if isinstance(self.embedding_model, str): - # ChromaDB default - let ChromaDB handle embeddings + # If the collection expects non-384, error early (your policy) if expected_dimensions and expected_dimensions != 384: logger.error(f"❌ Collection expects {expected_dimensions}D but using ChromaDB default (384D)") - raise ValueError(f"Dimension mismatch: collection expects {expected_dimensions}D, ChromaDB default is 384D") - - self.xlsx_collection.add( - documents=texts, - metadatas=metadatas, - ids=ids - ) - else: - # Use OCI embeddings + raise ValueError( + f"Dimension mismatch: collection expects {expected_dimensions}D, ChromaDB default is 384D" + ) + + # Optional: warn if the collection was created without an embedding function bound (older Chroma) try: - embeddings = self.embedding_model.embed_documents(texts) - actual_dimensions = len(embeddings[0]) if embeddings and embeddings[0] else 0 - - if expected_dimensions and actual_dimensions != expected_dimensions: - # Try to find or create the correct collection - correct_collection_name = f"xlsx_documents_{self.embedding_model_name}_{actual_dimensions}" - logger.warning(f"⚠️ Dimension mismatch: collection '{self.xlsx_collection.name}' expects {expected_dimensions}D, embedder produces {actual_dimensions}D") - logger.info(f"🔍 Looking for correct collection: {correct_collection_name}") - - try: - # Try to get the correct collection - correct_collection = self.client.get_collection(correct_collection_name) - logger.info(f"✅ Found correct collection: {correct_collection_name}") - except: - # Create new collection with correct dimensions - metadata = { - "hnsw:space": "cosine", - "embedding_model": self.embedding_model_name, - "embedding_dimensions": actual_dimensions - } - correct_collection = self.client.create_collection( - name=correct_collection_name, - metadata=metadata - ) - logger.info(f"✅ Created new collection: {correct_collection_name}") - - # Add to the correct collection - correct_collection.add( - documents=texts, - metadatas=metadatas, - ids=ids, - embeddings=embeddings - ) - - # Update the reference for future use - self.xlsx_collection = correct_collection - self.collections[correct_collection_name] = correct_collection - - logger.info(f"✅ Added {len(chunks)} XLSX chunks to {correct_collection_name}") - else: - # Dimensions match, proceed normally - self.xlsx_collection.add( - documents=texts, - metadatas=metadatas, - ids=ids, - embeddings=embeddings - ) - logger.info(f"✅ Added {len(chunks)} XLSX chunks to {self.embedding_model_name}") - + self.xlsx_collection.add(documents=["probe"], metadatas=[{}], ids=["__probe__tmp__"]) + self.xlsx_collection.delete(ids=["__probe__tmp__"]) except Exception as e: - logger.error(f"❌ Failed to add chunks with OCI embeddings: {e}") - raise # Don't silently fall back - this causes dimension mismatches + logger.warning(f"⚠️ Chroma default embedding may not be bound; add() failed probe: {e}") + + # Add documents directly (Chroma will embed) + # Consider batching if many chunks + self.xlsx_collection.add(documents=texts, metadatas=metadatas, ids=ids) + logger.info(f"✅ Added {len(chunks)} XLSX chunks to {self.embedding_model_name} (chroma-default)") + return + + # Path B: OCI (you provide embeddings explicitly) + try: + embeddings = self.embedding_model.embed_documents(texts) + if not embeddings or not embeddings[0] or not hasattr(embeddings[0], "__len__"): + raise RuntimeError("Embedder returned empty/invalid embeddings") + + actual_dimensions = len(embeddings[0]) + + if expected_dimensions and actual_dimensions != expected_dimensions: + # Try to find or create the correct collection + correct_collection_name = f"xlsx_documents_{self.embedding_model_name}_{actual_dimensions}" + logger.warning( + f"⚠️ Dimension mismatch: collection '{self.xlsx_collection.name}' " + f"expects {expected_dimensions}D, embedder produces {actual_dimensions}D" + ) + logger.info(f"🔍 Looking for correct collection: {correct_collection_name}") + + try: + correct_collection = self.client.get_collection(correct_collection_name) + logger.info(f"✅ Found correct collection: {correct_collection_name}") + except Exception: + # Create new collection with correct dimensions (sanitize metadata for Chroma) + metadata = { + "hnsw:space": "cosine", + "embedding_model": self.embedding_model_name, + "embedding_dimensions": actual_dimensions, # keep as int internally + } + correct_collection = self.client.create_collection( + name=correct_collection_name, + metadata=self._safe_metadata(metadata) # ← sanitize only here + ) + logger.info(f"✅ Created new collection: {correct_collection_name}") + + # Add to the correct collection (explicit vectors) + # Consider batching if many chunks + correct_collection.add( + documents=texts, + metadatas=metadatas, + ids=ids, + embeddings=embeddings + ) + + # Update the reference for future use + self.xlsx_collection = correct_collection + self.collections[correct_collection_name] = correct_collection + + logger.info(f"✅ Added {len(chunks)} XLSX chunks to {correct_collection_name}") + else: + # Dimensions match, proceed normally + self.xlsx_collection.add( + documents=texts, + metadatas=metadatas, + ids=ids, + embeddings=embeddings + ) + logger.info(f"✅ Added {len(chunks)} XLSX chunks to {self.embedding_model_name}") + + except Exception as e: + logger.error(f"❌ Failed to add chunks with OCI embeddings: {e}") + raise # Keep explicit; prevents silent dimension drift + def add_pdf_chunks(self, chunks: List[Dict[str, Any]], document_id: str): - """Add PDF chunks to the vector store with proper embedding handling""" + """Add PDF chunks to the vector store with proper embedding handling.""" if not chunks: return - - # Extract texts and metadata - texts = [chunk["content"] for chunk in chunks] - metadatas = [chunk["metadata"] for chunk in chunks] - ids = [chunk["id"] for chunk in chunks] - - # Check collection metadata to see what dimensions are expected - collection_metadata = self.pdf_collection.metadata or {} - expected_dimensions = collection_metadata.get('embedding_dimensions') - expected_model = collection_metadata.get('embedding_model') - - # Handle embeddings based on model type and expected dimensions + + # Extract texts and metadata; add cite + normalize metadata + texts = [c["content"] for c in chunks] + metadatas = [self._add_cite(c.get("metadata", {})) for c in chunks] + ids = [c["id"] for c in chunks] + + # Collection expectations (cast back to int to avoid string/int mismatches) + coll_meta = self.pdf_collection.metadata or {} + expected_dimensions = self._as_int(coll_meta.get("embedding_dimensions")) + expected_model = coll_meta.get("embedding_model") + + # A) chroma-default path (Chroma embeds on add) if isinstance(self.embedding_model, str): - # String identifier - check if it matches expected model if expected_model and self.embedding_model_name != expected_model: - logger.warning(f"⚠️ Model mismatch: collection expects '{expected_model}', got '{self.embedding_model_name}'") - - if expected_dimensions == 384 or self.embedding_model_name == "chromadb-default": - # ChromaDB default - let ChromaDB handle embeddings - logger.info(f"📝 Using ChromaDB default embeddings ({expected_dimensions or 384}D)") - self.pdf_collection.add( + logger.warning( + f"⚠️ Model mismatch: collection expects '{expected_model}', got '{self.embedding_model_name}'" + ) + + # Your policy: chroma-default is 384D only + if expected_dimensions and expected_dimensions != 384: + raise ValueError( + f"Dimension mismatch: collection expects {expected_dimensions}D, " + f"but chroma-default produces 384D. Recreate the collection with chroma-default " + f"or switch to the correct OCI embedder." + ) + + # Optional: probe add for older Chroma builds without an embedding_function bound + try: + self.pdf_collection.add(documents=["__probe__"], metadatas=[{}], ids=["__probe__"]) + self.pdf_collection.delete(ids=["__probe__"]) + except Exception as e: + logger.warning(f"⚠️ Chroma default embedder may not be bound; add() probe failed: {e}") + + # Add (consider batching if very large) + self.pdf_collection.add(documents=texts, metadatas=metadatas, ids=ids) + logger.info(f"✅ Added {len(chunks)} PDF chunks via chroma-default (384D)") + return + + # B) OCI path (explicit embeddings) + try: + embeddings = self.embedding_model.embed_documents(texts) + if not embeddings or not embeddings[0] or not hasattr(embeddings[0], "__len__"): + raise RuntimeError("Embedder returned empty/invalid embeddings") + + actual_dimensions = len(embeddings[0]) + + # If the target collection's dim doesn't match, route/create the correct one + if expected_dimensions and actual_dimensions != expected_dimensions: + logger.warning( + f"⚠️ Dimension mismatch: collection '{self.pdf_collection.name}' expects " + f"{expected_dimensions}D, embedder produced {actual_dimensions}D" + ) + correct_name = f"pdf_documents_{self.embedding_model_name}_{actual_dimensions}" + try: + correct_collection = self.client.get_collection(correct_name) + # Sanity check: if it already contains data of a different dim (shouldn’t happen), bail + probe_meta = correct_collection.metadata or {} + probe_dim = self._as_int(probe_meta.get("embedding_dimensions")) + if probe_dim and probe_dim != actual_dimensions: + raise RuntimeError( + f"Existing collection '{correct_name}' is {probe_dim}D, expected {actual_dimensions}D" + ) + logger.info(f"✅ Found correct PDF collection: {correct_name}") + except Exception: + # Create with sanitized metadata (only at API boundary) + md = { + "hnsw:space": "cosine", + "embedding_model": self.embedding_model_name, + "embedding_dimensions": actual_dimensions, # keep int internally + } + correct_collection = self.client.get_or_create_collection( + name=correct_name, + metadata=self._safe_metadata(md) # sanitize here + ) + logger.info(f"🆕 Created PDF collection: {correct_name}") + + # Add to the correct collection + correct_collection.add( documents=texts, metadatas=metadatas, - ids=ids + ids=ids, + embeddings=embeddings ) + + # Re-point handles + self.pdf_collection = correct_collection + self.collections[correct_name] = correct_collection + self.current_pdf_collection_name = correct_name + + logger.info(f"✅ Added {len(chunks)} PDF chunks to {correct_name}") else: - # Expected OCI model but got string - this is a configuration error - logger.error(f"❌ Configuration error: Expected {expected_model} ({expected_dimensions}D) but OCI embedding handler failed to initialize") - logger.error(f"💡 Falling back to ChromaDB default, but this will cause dimension mismatch!") - raise ValueError(f"Cannot add {expected_dimensions}D embeddings using ChromaDB default (384D). Please fix OCI configuration or recreate collection with chromadb-default.") - else: - # Use OCI embeddings - try: - embeddings = self.embedding_model.embed_documents(texts) - actual_dimensions = len(embeddings[0]) if embeddings and embeddings[0] else 0 - - if expected_dimensions and actual_dimensions != expected_dimensions: - logger.error(f"❌ Dimension mismatch: collection expects {expected_dimensions}D, embedder produces {actual_dimensions}D") - raise ValueError(f"Dimension mismatch: collection expects {expected_dimensions}D, got {actual_dimensions}D") - - logger.info(f"📝 Using OCI embeddings ({actual_dimensions}D)") + # Dimensions match; add directly self.pdf_collection.add( documents=texts, metadatas=metadatas, ids=ids, embeddings=embeddings ) - except Exception as e: - logger.error(f"❌ Failed to add PDF chunks with OCI embeddings: {e}") - raise # Don't fall back silently - this causes dimension mismatches - - logger.info(f"✅ Added {len(chunks)} PDF chunks to {self.embedding_model_name}") + logger.info(f"✅ Added {len(chunks)} PDF chunks ({actual_dimensions}D)") + + except Exception as e: + logger.error(f"❌ Failed to add PDF chunks with OCI embeddings: {e}") + raise # keep explicit; prevents silent dimension drift + @@ -875,7 +973,7 @@ def query_pdf_collection( } self.pdf_collection = self.client.get_or_create_collection( name=correct_collection_name, - metadata=metadata + metadata=self._safe_metadata(metadata) ) logger.info(f"✅ Created new PDF collection: {correct_collection_name}") actual_dim = handler_dim @@ -923,75 +1021,6 @@ def query_pdf_collection( return [] - def OLD_query_pdf_collection(self, query: str, n_results: int = 3, entity: Optional[str] = None, add_cite: bool = False) -> List[Dict[str, Any]]: - """Query PDF collection with embedding support and optional citation markup.""" - try: - # Build filter - where_filter = {"entity": entity.lower()} if entity else None - - # ✅ Minimal guard – blow up early if dims mismatch - if (self.pdf_collection.metadata or {}).get("embedding_dimensions") != (self.get_embedding_info() or {}).get("dimensions"): - raise ValueError( - f"EMBEDDING_DIMENSION_MISMATCH: collection expects " - f"{(self.pdf_collection.metadata or {}).get('embedding_dimensions')}D, " - f"current handler has {(self.get_embedding_info() or {}).get('dimensions')}D" - ) - - # Query by embedding or text, depending on backend - if isinstance(self.embedding_model, str): - # ChromaDB default - results = self.pdf_collection.query( - query_texts=[query], - n_results=n_results, - where=where_filter, - include=["documents", "metadatas", "distances"] - ) - else: - try: - query_embedding = self.embedding_model.embed_query(query) - results = self.pdf_collection.query( - query_embeddings=[query_embedding], - n_results=n_results, - where=where_filter, - include=["documents", "metadatas", "distances"] - ) - except Exception as e: - logger.error(f"❌ OCI query embedding failed: {e}") - # Fallback to text query - results = self.pdf_collection.query( - query_texts=[query], - n_results=n_results, - where=where_filter, - include=["documents", "metadatas", "distances"] - ) - - # Format results with optional citation - formatted_results = [] - docs = results.get("documents", [[]])[0] - metas = results.get("metadatas", [[]])[0] - dists = results.get("distances", [[]])[0] if "distances" in results else [0.0] * len(docs) - - for i, (doc, meta, dist) in enumerate(zip(docs, metas, dists)): - out = { - "content": doc, - "metadata": meta if meta else {}, - "distance": dist - } - if add_cite and hasattr(self, "_add_cite"): - meta_with_cite = self._add_cite(meta) - out["metadata"] = meta_with_cite - out["content"] = f"{doc} {meta_with_cite['cite']}" - formatted_results.append(out) - - return formatted_results - - except Exception as e: - logger.error(f"❌ Error querying PDF collection: {e}") - return [] - - - - def inspect_xlsx_chunk_metadata(self, limit: int = 10): """ Print stored metadata from the XLSX vector store for debugging. @@ -1070,14 +1099,21 @@ def bind_collections_for_model(self, embedding_model: str) -> None: "embedding_model": self.embedding_model_name, "embedding_dimensions": embedding_dim } - + logger.info( + "Create/get collections: PDF=%r, XLSX=%r | meta=%r (dim_field=%s:%s)", + pdf_name, + xlsx_name, + metadata, + "embedding_dimensions", + type(metadata.get("embedding_dimensions")).__name__, + ) self.pdf_collection = self.client.get_or_create_collection( name=pdf_name, - metadata=metadata + metadata=self._safe_metadata(metadata) ) self.xlsx_collection = self.client.get_or_create_collection( name=xlsx_name, - metadata=metadata + metadata=self._safe_metadata(metadata) ) # Cache for debugging diff --git a/ai/generative-ai-service/hr-goal-alignment/files/requirements.txt b/ai/generative-ai-service/hr-goal-alignment/files/requirements.txt index 3a8639536..0f4efd44b 100644 --- a/ai/generative-ai-service/hr-goal-alignment/files/requirements.txt +++ b/ai/generative-ai-service/hr-goal-alignment/files/requirements.txt @@ -74,7 +74,7 @@ pydantic_core==2.33.1 pydantic-settings==2.8.1 pydeck==0.9.1 pyOpenSSL==24.3.0 -pypdf==5.4.0 +pypdf==6.1.3 python-dateutil==2.9.0.post0 python-docx==1.1.2 python-dotenv==1.1.0 diff --git a/ai/oracle-digital-assistant/README.md b/ai/oracle-digital-assistant/README.md index df8e2562b..30317cb2c 100644 --- a/ai/oracle-digital-assistant/README.md +++ b/ai/oracle-digital-assistant/README.md @@ -18,6 +18,21 @@ Reviewed: 21.08.2025 - [ODA Pro styled](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/oda-pro-styled) - A customizable chat interface for ODA +- [ODA Concierge Template](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/templates/concierge-template) + - An easy to use Q&A template for ODA + +- [ODA Concierge+Agent Template](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/templates/concierge-agent-template) + - An easy to use Q&A with Agent template for ODA + +- [ODA HCM Template](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/templates/hcm-ml) + - Multilingual HCM template to combine with Fusion HCM skill + +- [ODA AI Services](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/templates/ai-services) + - Template skill to consume AI Services speech/vision/document/language + +- [ODA AI Agent with doc-groups](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/templates/agent-doc-groups) + - Template skill to use AI Agent with different document groups + ## Cloud Coaching & Live Labs - [Cloud Coaching - Art of the possible Digital Assistant](https://www.youtube.com/watch?v=zPmfjuYQCGg&t=49s) diff --git a/ai/oracle-digital-assistant/templates/agent-doc-groups/LICENSE b/ai/oracle-digital-assistant/templates/agent-doc-groups/LICENSE new file mode 100644 index 000000000..2685138fc --- /dev/null +++ b/ai/oracle-digital-assistant/templates/agent-doc-groups/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/ai/oracle-digital-assistant/templates/agent-doc-groups/README.md b/ai/oracle-digital-assistant/templates/agent-doc-groups/README.md new file mode 100644 index 000000000..c12f08eb3 --- /dev/null +++ b/ai/oracle-digital-assistant/templates/agent-doc-groups/README.md @@ -0,0 +1,27 @@ +# Oracle Digital Assistant AI Agent with document groups + +This template is an ODA-skill for using AI Agent with different document-groups. +This limits AI Agent to only use a specific group of documents when answering a prompt +There are several scenario's with ODA how this can be used: +- Limit a skill to a certain subject +- Define document groups per intent/flow +- Define document groups based on users role + +Reviewed: 31.10.2025 + +Setup: + Import the mdAgent1 skill in ODA + In the skill configuration you can define one or more document groups + In the sample flow you can pass the document group in the API call + In AI Agent you have to define document groups by: + [adding meta-data when uploading docs](https://docs.oracle.com/en-us/iaas/Content/generative-ai-agents/RAG-tool-object-storage-guidelines.htm#add-metadata-header) + set metaData type in _all.metadata.json in the root of your object storage bucket + (re)run ingestion job in Knowledge bases + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/ai/oracle-digital-assistant/templates/agent-doc-groups/files/_all.metadata.json b/ai/oracle-digital-assistant/templates/agent-doc-groups/files/_all.metadata.json new file mode 100644 index 000000000..05fca0f1c --- /dev/null +++ b/ai/oracle-digital-assistant/templates/agent-doc-groups/files/_all.metadata.json @@ -0,0 +1,22 @@ +{ + "HR-general.pdf": { + "metadataAttributes": { + "type": "HCM" + } + }, + "ITsupport.pdf": { + "metadataAttributes": { + "type": "ICT" + } + }, + "expenses.pdf": { + "metadataAttributes": { + "type": "HCM" + } + }, + "SALES/revrec.pdf": { + "metadataAttributes": { + "type": "SALES" + } + } +} \ No newline at end of file diff --git a/ai/oracle-digital-assistant/templates/agent-doc-groups/files/mdAgent1(1.0).zip b/ai/oracle-digital-assistant/templates/agent-doc-groups/files/mdAgent1(1.0).zip new file mode 100644 index 000000000..d020e25ae Binary files /dev/null and b/ai/oracle-digital-assistant/templates/agent-doc-groups/files/mdAgent1(1.0).zip differ diff --git a/ai/oracle-digital-assistant/templates/ai-services/LICENSE b/ai/oracle-digital-assistant/templates/ai-services/LICENSE new file mode 100644 index 000000000..2685138fc --- /dev/null +++ b/ai/oracle-digital-assistant/templates/ai-services/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/ai/oracle-digital-assistant/templates/ai-services/README.md b/ai/oracle-digital-assistant/templates/ai-services/README.md new file mode 100644 index 000000000..d02e5c786 --- /dev/null +++ b/ai/oracle-digital-assistant/templates/ai-services/README.md @@ -0,0 +1,18 @@ +# Oracle Digital Assistant AI Services Template + +This template is a skill for quickly setting up using AI Vision, AI Document Understanding, AI Language and AI speech integration from ODA + +Reviewed: 31.10.2025 + +Setup: + Import the mdAI_services skill in ODA + For each of the services a flow is defined to connect to one of the AI Services + Make sure you change connection to your AI Service instances + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/ai/oracle-digital-assistant/templates/ai-services/files/mdAI_services(1.0).zip b/ai/oracle-digital-assistant/templates/ai-services/files/mdAI_services(1.0).zip new file mode 100644 index 000000000..fae1a4961 Binary files /dev/null and b/ai/oracle-digital-assistant/templates/ai-services/files/mdAI_services(1.0).zip differ diff --git a/ai/oracle-digital-assistant/templates/concierge-agent-template/LICENSE b/ai/oracle-digital-assistant/templates/concierge-agent-template/LICENSE new file mode 100644 index 000000000..2685138fc --- /dev/null +++ b/ai/oracle-digital-assistant/templates/concierge-agent-template/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/ai/oracle-digital-assistant/templates/concierge-agent-template/README.md b/ai/oracle-digital-assistant/templates/concierge-agent-template/README.md new file mode 100644 index 000000000..09aa6398a --- /dev/null +++ b/ai/oracle-digital-assistant/templates/concierge-agent-template/README.md @@ -0,0 +1,20 @@ +# Oracle Digital Assistant Concierge with Agent Template + +This template is a skill for quickly setting up a Knowledge bot and using AI Agent for anything no answer was found. + +Reviewed: 31.10.2025 + +Concierge-Agent Template WebSDK +Import the mdAgentConcierge in ODA to include Agent whenever no matching answer intent was found + +Setup: + First setup the Concierge template as described [here](https://github.com/oracle-devrel/technology-engineering/tree/main/ai/oracle-digital-assistant/templates/concierge-template) + Next import the mdAgentConcierge in ODA to include Agent whenever no matching answer intent was found + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/ai/oracle-digital-assistant/templates/concierge-agent-template/files/mdAgentConcierge(24.10).zip b/ai/oracle-digital-assistant/templates/concierge-agent-template/files/mdAgentConcierge(24.10).zip new file mode 100644 index 000000000..cc619f36f Binary files /dev/null and b/ai/oracle-digital-assistant/templates/concierge-agent-template/files/mdAgentConcierge(24.10).zip differ diff --git a/ai/oracle-digital-assistant/templates/concierge-template/README.md b/ai/oracle-digital-assistant/templates/concierge-template/README.md index abb023530..e0722e3e5 100644 --- a/ai/oracle-digital-assistant/templates/concierge-template/README.md +++ b/ai/oracle-digital-assistant/templates/concierge-template/README.md @@ -2,7 +2,7 @@ The Concierge Template is a skill for quickly setting up a Knowledge bot. -Reviewed: 22.09.2025 +Reviewed: 31.10.2025 Concierge Template WebSDK diff --git a/ai/oracle-digital-assistant/templates/concierge-template/files/mdAgentConcierge(24.10).zip b/ai/oracle-digital-assistant/templates/concierge-template/files/mdAgentConcierge(24.10).zip new file mode 100644 index 000000000..cc619f36f Binary files /dev/null and b/ai/oracle-digital-assistant/templates/concierge-template/files/mdAgentConcierge(24.10).zip differ diff --git a/app-dev/app-integration-and-automation/oracle-integration-cloud/01-oic-connectivity-agent/README.md b/app-dev/app-integration-and-automation/oracle-integration-cloud/01-oic-connectivity-agent/README.md index 895d6acec..a2dbd4632 100644 --- a/app-dev/app-integration-and-automation/oracle-integration-cloud/01-oic-connectivity-agent/README.md +++ b/app-dev/app-integration-and-automation/oracle-integration-cloud/01-oic-connectivity-agent/README.md @@ -32,4 +32,4 @@ Copyright (c) 2025 Oracle and/or its affiliates. Licensed under the Universal Permissive License (UPL), Version 1.0. -See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/app-dev/app-integration-and-automation/oracle-integration-cloud/01-oic-connectivity-agent/LICENSE) for more details. diff --git a/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/LICENSE b/app-dev/app-integration-and-automation/oracle-integration-cloud/06-oic-cicd-quickstart/LICENSE similarity index 99% rename from cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/LICENSE rename to app-dev/app-integration-and-automation/oracle-integration-cloud/06-oic-cicd-quickstart/LICENSE index 4427bb286..8dc7c0703 100644 --- a/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/LICENSE +++ b/app-dev/app-integration-and-automation/oracle-integration-cloud/06-oic-cicd-quickstart/LICENSE @@ -1,4 +1,3 @@ - Copyright (c) 2025 Oracle and/or its affiliates. The Universal Permissive License (UPL), Version 1.0 diff --git a/app-dev/app-integration-and-automation/oracle-integration-cloud/07-oic-rest-write&retrieve-blob-atpadapter/LICENSE b/app-dev/app-integration-and-automation/oracle-integration-cloud/07-oic-rest-write&retrieve-blob-atpadapter/LICENSE new file mode 100644 index 000000000..8dc7c0703 --- /dev/null +++ b/app-dev/app-integration-and-automation/oracle-integration-cloud/07-oic-rest-write&retrieve-blob-atpadapter/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/app-dev/app-integration-and-automation/oracle-integration-cloud/08-oic-cicd-vbs-shellsteps/LICENSE b/app-dev/app-integration-and-automation/oracle-integration-cloud/08-oic-cicd-vbs-shellsteps/LICENSE new file mode 100644 index 000000000..8dc7c0703 --- /dev/null +++ b/app-dev/app-integration-and-automation/oracle-integration-cloud/08-oic-cicd-vbs-shellsteps/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/cloud-infrastructure/compute-including-hpc/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/requirements.txt b/cloud-infrastructure/compute-including-hpc/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/requirements.txt index 687b8f428..39cb19627 100644 --- a/cloud-infrastructure/compute-including-hpc/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/requirements.txt +++ b/cloud-infrastructure/compute-including-hpc/ai-infra-gpu/ai-infrastructure/rag-langchain-vllm-mistral/files/requirements.txt @@ -204,7 +204,7 @@ soupsieve==2.6 spider-client==0.0.27 SQLAlchemy==2.0.40 stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work -starlette==0.47.2 +starlette==0.49.1 striprtf==0.0.26 sympy==1.13.1 tenacity==9.1.2 diff --git a/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/README.md b/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/README.md deleted file mode 100644 index db73fe39c..000000000 --- a/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/README.md +++ /dev/null @@ -1,31 +0,0 @@ -# OSSA Checklist - -OSSA stands for 'Oracle Software Security Assurance', which encompasses the security measures taken by Oracle to ensure the safety of its software solutions. It's important to note that security is an ongoing process, so we continuously update and improve our OSSA checklist to uphold the highest standards of security. - -CIS Oracle Cloud Infrastructure Foundations Benchmark provides prescriptive guidance for establishing a secure -baseline configuration for the Oracle Cloud Infrastructure environment. The scope of this benchmark is to -establish a base level of security for anyone utilizing the included Oracle Cloud Infrastructure services. The current OSSA checklist is aligned with the CIS OCI Benchmark 1.2.0 - -Reviewed: 18.11.2024 - -# When to use this asset? - -For every Oracle Cloud Infrastructure implementation, best practices and guidelines need to be validated against the OSSA checklist. - -# How to use this asset? - -Validate the solution before project closure against OSSA checklist controls and seek justification against any non-compliant entry. - - -# Useful Links - -- Download the full [CIS OCI Benchmark from the link](https://www.cisecurity.org/benchmark/oracle_cloud) for the full details of the remediation steps. - - -# License - -Copyright (c) 2025 Oracle and/or its affiliates. - -Licensed under the Universal Permissive License (UPL), Version 1.0. - -See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/files/OSSA-checklist.xlsx b/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/files/OSSA-checklist.xlsx deleted file mode 100644 index 4fbbf168c..000000000 Binary files a/cloud-infrastructure/infrastructure-security/shared-assets/security-checklist/files/OSSA-checklist.xlsx and /dev/null differ diff --git a/cloud-infrastructure/networking/multicloud/README.md b/cloud-infrastructure/networking/multicloud/README.md index 06ed46c52..50c05ef5a 100644 --- a/cloud-infrastructure/networking/multicloud/README.md +++ b/cloud-infrastructure/networking/multicloud/README.md @@ -29,6 +29,7 @@ Reviewed: 10.10.2025 ## Reference Architectures +- [About DNS resolution in Oracle Database@Google Cloud](https://docs.oracle.com/en/solutions/dns-resolution-oracle-db-at-google-cloud/index.html) - [Learn About multicloud Architecture Framework](https://docs.oracle.com/en/solutions/learn-about-multicloud-arch-framework/index.html) - [Use AWS endpoint service to securely connect applications to Oracle Autonomous Database](https://docs.oracle.com/en/solutions/adb-endpoint-in-aws/index.html) - [Resolve DNS records seamlessly in OCI multicloud architectures](https://docs.oracle.com/en/solutions/resolve-dns-oci/index.html#GUID-84375E55-F207-4A72-84E8-C17CE0CE6BF3) diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/LICENSE b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/LICENSE new file mode 100644 index 000000000..f5385ce4e --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/README.md b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/README.md new file mode 100644 index 000000000..02750f4b5 --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/README.md @@ -0,0 +1,61 @@ +# Implement Multicloud cross-region disaster recovery for Oracle Database@X (Azure, Google Cloud, AWS) + +This repository contains terraform code that allows you to deploy all the needed networking for implementing cross-region disaster recovery for Oracle Database@X (Azure, Google Cloud, AWS). +It configures all the networking detailed in the following reference architectures in Oracle Cloud Infrastaructure: + +- [Implement cross-region disaster recovery for Exadata Database on Oracle Database@Azure](https://docs.oracle.com/en/solutions/exadb-dr-on-db-azure/index.html) +- [Implement cross-region disaster recovery for Exadata Database Service on Google Cloud](https://docs.oracle.com/en/solutions/exadb-dr-on-db-google-cloud/index.html) +- [Oracle Database@AWS Achieves Gold MAA Certification for Maximum Availability Architecture](https://blogs.oracle.com/maa/post/oracle-databaseaws-achieves-gold-maa-certification) + +Reviewed: 10.11.2025 + +## Architecture diagram + +![ExaDB-D-DR-DB-Azure](./images/exadb-dr-db-azure.png) + +## Requirements + +- An active Oracle Cloud Infrastructure Account +- An Oracle Exadata Database@X deployment in primary region and standby region +- API Key Authentication for OCI Terraform provider -> https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/terraformproviderconfiguration.htm +- Compartment for deploying the infrastructure managed by this Terraform code +- Terraform +- Permission to manage virtual-network-family resources to a specific group within a compartment in your Oracle Cloud Infrastructure tenancy +- Exadata Database@X VCN primary and VCN Standby OCIDs +- Non-overlapping IP Addresses for Hub VCN Primary and Standby + +## Steps + +- Duplicate the "terraform.tfvars.template" file and rename it to "terraform.tfvars" +- In the new "terraform.tfvars" file complete the "OCI Tenancy Credentials" and "Oracle Cloud Infrastructure Variables" + +## Deployment + +Create the Resources using the following commands: + +```bash +terraform init +terraform plan +terraform apply +``` + +## Note: After successfully running terraform apply, the administrator should configure the following + +- Route rule in the VCN Primary default route table with destination VCN Standby CIDR and target LPG +- Route rule in the VCN Standby default route table with destination VCN Primary CIDR and target LPG +- Update VCN Primary security lists and NSG +- Update VCN Standby security lists and NSG +- Complete the Data Guard association. + +Use the following command to destroy the deployment: + +```bash +terraform destroy +``` +## Acknowledgements +### Author +- Ricardo Anda, Oracle +### Contributors + - Emiel Ramakers, Oracle + - Ejaz Akram, Oracle + - Julien Silverston, Oracle diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/data.tf b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/data.tf new file mode 100644 index 000000000..fa803bc67 --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/data.tf @@ -0,0 +1,17 @@ +############################## +# ------ Region Primary ---- # +############################## + +data "oci_core_vcn" "vcn_primary" { + provider = oci.region-primary + vcn_id = var.vcn_primary_ocid +} + +############################## +# ------ Region Standby ---- # +############################## + +data "oci_core_vcn" "vcn_standby" { + provider = oci.region-standby + vcn_id = var.vcn_standby_ocid +} \ No newline at end of file diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/network-oci.tf b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/network-oci.tf new file mode 100644 index 000000000..241d0e44f --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/network-oci.tf @@ -0,0 +1,286 @@ +############################## +# ------ Region Primary ---- # +############################## + +# ------ Create Hub VCN Primary +resource "oci_core_vcn" "hub_vcn_primary" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + display_name = var.hub_vcn_primary_name + cidr_block = var.hub_vcn_primary_cidr_block +} + +# ------ Create Hub VCN Primary Transit DRG Route Table +resource "oci_core_route_table" "hub_vcn_primary_transit_drg_rt" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + vcn_id = oci_core_vcn.hub_vcn_primary.id + display_name = var.hub_vcn_primary_transit_drg_rt_name + route_rules { + network_entity_id = oci_core_local_peering_gateway.hub_primary_local_peering_gateway.id + destination = data.oci_core_vcn.vcn_primary.cidr_block + destination_type = "CIDR_BLOCK" + } +} + +# ------ Create Hub VCN Primary Transit LPG Route Table +resource "oci_core_route_table" "hub_vcn_primary_transit_lpg_rt" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + vcn_id = oci_core_vcn.hub_vcn_primary.id + display_name = var.hub_vcn_primary_transit_drg_lpg_name + route_rules { + network_entity_id = oci_core_drg.primary_drg.id + destination = data.oci_core_vcn.vcn_standby.cidr_block + destination_type = "CIDR_BLOCK" + } +} + +# ------ Create Hub Primary LPG +resource "oci_core_local_peering_gateway" "hub_primary_local_peering_gateway" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + vcn_id = oci_core_vcn.hub_vcn_primary.id + display_name = var.hub_primary_local_peering_gateway_name + peer_id = oci_core_local_peering_gateway.primary_local_peering_gateway.id + route_table_id = oci_core_route_table.hub_vcn_primary_transit_lpg_rt.id +} + +# ------ Create Primary LPG +resource "oci_core_local_peering_gateway" "primary_local_peering_gateway" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + vcn_id = var.vcn_primary_ocid + display_name = var.primary_local_peering_gateway_name +} + +# ------ Create Primary DRG +resource "oci_core_drg" "primary_drg" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + display_name = var.oci_primary_drg_name +} + +# ------ Create Primary DRG Hub VCN attachment +resource "oci_core_drg_attachment" "primary_drg_vcn_attachment" { + provider = oci.region-primary + vcn_id = oci_core_vcn.hub_vcn_primary.id + drg_id = oci_core_drg.primary_drg.id + drg_route_table_id = oci_core_drg_route_table.primary_drg_vcn_route_table.id + route_table_id = oci_core_route_table.hub_vcn_primary_transit_drg_rt.id + display_name = var.primary_drg_vcn_attachment_name +} + +# ------ Create Primary DRG VCN Route Table + +resource "oci_core_drg_route_table" "primary_drg_vcn_route_table" { + provider = oci.region-primary + display_name = var.primary_drg_vcn_route_table_name + drg_id = oci_core_drg.primary_drg.id + import_drg_route_distribution_id = oci_core_drg_route_distribution.primary_drg_route_distribution.id +} + +# ------ Create Primary DRG RPC Route Table + +resource "oci_core_drg_route_table" "primary_drg_rpc_route_table" { + provider = oci.region-primary + display_name = var.primary_drg_rpc_route_table_name + drg_id = oci_core_drg.primary_drg.id +} + +# ------ Create Primary DRG RPC Route Table rule + +resource "oci_core_drg_route_table_route_rule" "primary_drg_route_table_route_rule_primary_client_subnet" { + provider = oci.region-primary + drg_route_table_id = oci_core_drg_route_table.primary_drg_rpc_route_table.id + destination = data.oci_core_vcn.vcn_primary.cidr_block + destination_type = "CIDR_BLOCK" + next_hop_drg_attachment_id = oci_core_drg_attachment.primary_drg_vcn_attachment.id +} + +# ------ Create Primary DRG Route Distribution + +resource "oci_core_drg_route_distribution" "primary_drg_route_distribution" { + provider = oci.region-primary + distribution_type = "IMPORT" + display_name = var.primary_drg_route_distribution_name + drg_id = oci_core_drg.primary_drg.id +} + +# ------ Create Primary DRG Route Distribution Statement for RPC + +resource "oci_core_drg_route_distribution_statement" "primary_drg_route_distribution_rpc" { + provider = oci.region-primary + drg_route_distribution_id = oci_core_drg_route_distribution.primary_drg_route_distribution.id + action = "ACCEPT" + match_criteria { + match_type = "DRG_ATTACHMENT_TYPE" + attachment_type = "REMOTE_PEERING_CONNECTION" + } + priority = "1" +} + +# ------ Create Primary DRG RPC + +resource "oci_core_remote_peering_connection" "primary_drg_remote_peering_connection" { + provider = oci.region-primary + compartment_id = var.compartment_ocid + drg_id = oci_core_drg.primary_drg.id + display_name = var.primary_drg_remote_peering_connection_name + peer_id = oci_core_remote_peering_connection.standby_drg_remote_peering_connection.id + peer_region_name = var.standby_region +} + +# ------ Modify Primary DRG RT for RPC + +resource "oci_core_drg_attachment_management" "primary_drg_rpc_attachment" { + provider = oci.region-primary + attachment_type = "REMOTE_PEERING_CONNECTION" + compartment_id = var.compartment_ocid + network_id = oci_core_remote_peering_connection.primary_drg_remote_peering_connection.id + drg_id = oci_core_drg.primary_drg.id + drg_route_table_id = oci_core_drg_route_table.primary_drg_rpc_route_table.id +} + +############################## +# ------ Region Standby ---- # +############################## + +# ------ Create Hub VCN Standby +resource "oci_core_vcn" "hub_vcn_standby" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + display_name = var.hub_vcn_standby_name + cidr_block = var.hub_vcn_standby_cidr_block +} + +# ------ Create Hub VCN Standby Transit DRG Route Table +resource "oci_core_route_table" "hub_vcn_standby_transit_drg_rt" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + vcn_id = oci_core_vcn.hub_vcn_standby.id + display_name = var.hub_vcn_standby_transit_drg_rt_name + route_rules { + network_entity_id = oci_core_local_peering_gateway.hub_standby_local_peering_gateway.id + destination = data.oci_core_vcn.vcn_standby.cidr_block + destination_type = "CIDR_BLOCK" + } +} + +# ------ Create Hub VCN Standby Transit LPG Route Table +resource "oci_core_route_table" "hub_vcn_standby_transit_lpg_rt" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + vcn_id = oci_core_vcn.hub_vcn_standby.id + display_name = var.hub_vcn_standby_transit_drg_lpg_name + route_rules { + network_entity_id = oci_core_drg.standby_drg.id + destination = data.oci_core_vcn.vcn_primary.cidr_block + destination_type = "CIDR_BLOCK" + } +} + +# ------ Create Hub Standby LPG +resource "oci_core_local_peering_gateway" "hub_standby_local_peering_gateway" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + vcn_id = oci_core_vcn.hub_vcn_standby.id + display_name = var.hub_standby_local_peering_gateway_name + peer_id = oci_core_local_peering_gateway.standby_local_peering_gateway.id + route_table_id = oci_core_route_table.hub_vcn_standby_transit_lpg_rt.id +} + +# ------ Create Standby LPG +resource "oci_core_local_peering_gateway" "standby_local_peering_gateway" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + vcn_id = var.vcn_standby_ocid + display_name = var.standby_local_peering_gateway_name +} + +# ------ Create Standby DRG +resource "oci_core_drg" "standby_drg" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + display_name = var.oci_standby_drg_name +} + +# ------ Create Standby DRG Hub VCN attachment +resource "oci_core_drg_attachment" "standby_drg_vcn_attachment" { + provider = oci.region-standby + vcn_id = oci_core_vcn.hub_vcn_standby.id + drg_id = oci_core_drg.standby_drg.id + drg_route_table_id = oci_core_drg_route_table.standby_drg_vcn_route_table.id + route_table_id = oci_core_route_table.hub_vcn_standby_transit_drg_rt.id + display_name = var.standby_drg_vcn_attachment_name +} + + +# ------ Create Standby DRG VCN Route Table + +resource "oci_core_drg_route_table" "standby_drg_vcn_route_table" { + provider = oci.region-standby + display_name = var.standby_drg_vcn_route_table_name + drg_id = oci_core_drg.standby_drg.id + import_drg_route_distribution_id = oci_core_drg_route_distribution.standby_drg_route_distribution.id +} + +# ------ Create Standby DRG RPC Route Table + +resource "oci_core_drg_route_table" "standby_drg_rpc_route_table" { + provider = oci.region-standby + display_name = var.standby_drg_rpc_route_table_name + drg_id = oci_core_drg.standby_drg.id +} + +# ------ Create Standby DRG RPC Route Table rule + +resource "oci_core_drg_route_table_route_rule" "standby_drg_route_table_route_rule_primary_client_subnet" { + provider = oci.region-standby + drg_route_table_id = oci_core_drg_route_table.standby_drg_rpc_route_table.id + destination = data.oci_core_vcn.vcn_standby.cidr_block + destination_type = "CIDR_BLOCK" + next_hop_drg_attachment_id = oci_core_drg_attachment.standby_drg_vcn_attachment.id +} + +# ------ Create Standby DRG Route Distribution + +resource "oci_core_drg_route_distribution" "standby_drg_route_distribution" { + provider = oci.region-standby + distribution_type = "IMPORT" + display_name = var.standby_drg_route_distribution_name + drg_id = oci_core_drg.standby_drg.id +} + +# ------ Create Standby DRG Route Distribution Statement for RPC + +resource "oci_core_drg_route_distribution_statement" "standby_drg_route_distribution_rpc" { + provider = oci.region-standby + drg_route_distribution_id = oci_core_drg_route_distribution.standby_drg_route_distribution.id + action = "ACCEPT" + match_criteria { + match_type = "DRG_ATTACHMENT_TYPE" + attachment_type = "REMOTE_PEERING_CONNECTION" + } + priority = "1" +} + +# ------ Create Standby DRG RPC + +resource "oci_core_remote_peering_connection" "standby_drg_remote_peering_connection" { + provider = oci.region-standby + compartment_id = var.compartment_ocid + drg_id = oci_core_drg.standby_drg.id + display_name = var.standby_drg_remote_peering_connection_name +} + +# ------ Modify Standby DRG RT for RPC + +resource "oci_core_drg_attachment_management" "standby_drg_rpc_attachment" { + provider = oci.region-standby + attachment_type = "REMOTE_PEERING_CONNECTION" + compartment_id = var.compartment_ocid + network_id = oci_core_remote_peering_connection.standby_drg_remote_peering_connection.id + drg_id = oci_core_drg.standby_drg.id + drg_route_table_id = oci_core_drg_route_table.standby_drg_rpc_route_table.id +} \ No newline at end of file diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/provider.tf b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/provider.tf new file mode 100644 index 000000000..1ce78ad49 --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/provider.tf @@ -0,0 +1,28 @@ +terraform { + required_providers { + oci = { + source = "oracle/oci" + version = ">= 7.0.0" + } + } +} + +# ------ Initialize Oracle Terraform provider Primary Region +provider "oci" { + alias = "region-primary" + user_ocid = var.user + private_key_path = var.private_key_path + fingerprint = var.fingerprint + region = var.primary_region + tenancy_ocid = var.tenancy +} + +# ------ Initialize Oracle Terraform provider Standby Region +provider "oci" { + alias = "region-standby" + user_ocid = var.user + private_key_path = var.private_key_path + fingerprint = var.fingerprint + region = var.standby_region + tenancy_ocid = var.tenancy +} \ No newline at end of file diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/terraform.tfvars.template b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/terraform.tfvars.template new file mode 100644 index 000000000..c9f3dee83 --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/terraform.tfvars.template @@ -0,0 +1,20 @@ +############################ +# OCI Tenancy Credentials # +############################ + +primary_region="xx-yyyyyy-a" # Region identifier https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm +standby_region="xz-yyyccc-b" # Region identifier https://docs.oracle.com/en-us/iaas/Content/General/Concepts/regions.htm +user="ocid1.user.............." +tenancy="ocid1.tenancy............." +compartment_ocid="ocid1.compartment............." # Compartment designated for hosting all network-related resources +fingerprint="XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX:XX" +private_key_path= "xxxxxxxxxxxx.pem" + +########################################### +# Oracle Cloud Infrastructure Variables # +########################################### + +hub_vcn_primary_cidr_block = "X.X.Z.Z/X" +hub_vcn_standby_cidr_block = "X.X.B.B/X" +vcn_primary_ocid = "ocid1.vcn.xxxxxxxxxxxxx" +vcn_standby_ocid = "ocid1.vcn.yyyyyyyyyyyyy" \ No newline at end of file diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/variables.tf b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/variables.tf new file mode 100644 index 000000000..56532f5d1 --- /dev/null +++ b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/files/variables.tf @@ -0,0 +1,162 @@ +############################ +# OCI Tenancy Credentials # +############################ +variable "tenancy" { + description = "User Tenancy OCID" +} + +variable "compartment_ocid" { + description = "User Compartment OCID" +} + +variable "primary_region" { + description = "User Primary Region Value" +} + +variable "standby_region" { + description = "User Standby Region Value" +} + +variable "user" { + description = "User OCID" +} + +variable "fingerprint" { + description = "User Private Key Fingerprint" +} + +variable "private_key_path" { + description = "User Private Key Path" +} + +########################################### +# Oracle Cloud Infrastructure Variables # +########################################### + +variable "hub_vcn_primary_name" { + description = "Hub VCN Primary name" + default = "hub_vcn_primary" +} + +variable "hub_vcn_primary_cidr_block" { + description = "Hub VCN Primary CIDR" + default = "10.15.0.0/24" +} + +variable "hub_vcn_primary_transit_drg_rt_name" { + description = "Hub VCN Primary Transit DRG RT name" + default = "hub_vcn_primary_transit_drg_rt" +} + +variable "hub_vcn_primary_transit_drg_lpg_name" { + description = "Hub VCN Primary Transit LPG RT name" + default = "hub_vcn_primary_transit_lpg_rt" +} + +variable "vcn_primary_ocid" { + description = "VCN Primary ocid" +} + +variable "hub_primary_local_peering_gateway_name" { + description = "Hub Primary Local Peering Gateway name " + default = "hub_primary_lpg" +} + +variable "primary_local_peering_gateway_name" { + description = "Primary Local Peering Gateway name " + default = "primary_lpg" +} + +variable "oci_primary_drg_name" { + description = "Primary DRG name" + default = "primary_drg" +} + +variable "primary_drg_vcn_attachment_name" { + description = "Primary DRG Hub VCN attachment name" + default = "primary_drg_hub_vcn_att" +} + +variable "hub_vcn_standby_name" { + description = "Hub VCN Standby name" + default = "hub_vcn_standby" +} + +variable "hub_vcn_standby_cidr_block" { + description = "Hub VCN Standby CIDR" + default = "10.16.0.0/24" +} + +variable "hub_vcn_standby_transit_drg_rt_name" { + description = "Hub VCN Standby Transit DRG RT name" + default = "hub_vcn_standby_transit_drg_rt" +} + +variable "hub_vcn_standby_transit_drg_lpg_name" { + description = "Hub VCN Standby Transit LPG RT name" + default = "hub_vcn_standby_transit_lpg_rt" +} + +variable "vcn_standby_ocid" { + description = "VCN Standby ocid" +} + +variable "hub_standby_local_peering_gateway_name" { + description = "Hub Standby Local Peering Gateway name " + default = "hub_standby_lpg" +} + +variable "standby_local_peering_gateway_name" { + description = "Standby Local Peering Gateway name " + default = "standby_lpg" +} + +variable "oci_standby_drg_name" { + description = "Standby DRG name" + default = "standby_drg" +} + +variable "standby_drg_vcn_attachment_name" { + description = "Standby DRG Hub VCN attachment name" + default = "standby_drg_hub_vcn_att" +} + +variable "primary_drg_vcn_route_table_name" { + description = "Primary DRG VCN RT name" + default = "primary_drg_vcn_rt" +} + +variable "primary_drg_rpc_route_table_name" { + description = "Primary DRG RPC RT name" + default = "primary_drg_rpc_rt" +} + +variable "standby_drg_vcn_route_table_name" { + description = "Standby DRG VCN RT name" + default = "standby_drg_vcn_rt" +} + +variable "standby_drg_rpc_route_table_name" { + description = "Standby DRG RPC RT name" + default = "standby_drg_rpc_rt" +} + +variable "primary_drg_route_distribution_name" { + description = "Primary DRG Route Distribution name" + default = "primary_drg_rd" +} + +variable "primary_drg_remote_peering_connection_name" { + description = "Primary DRG Remote Peering Connection name" + default = "primary_drg_rpc" +} + +variable "standby_drg_route_distribution_name" { + description = "Standby DRG Route Distribution name" + default = "standby_drg_rd" +} + +variable "standby_drg_remote_peering_connection_name" { + description = "Standby DRG Remote Peering Connection name" + default = "standby_drg_rpc" +} \ No newline at end of file diff --git a/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/images/exadb-dr-db-azure.png b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/images/exadb-dr-db-azure.png new file mode 100644 index 000000000..3081b1d7a Binary files /dev/null and b/cloud-infrastructure/networking/multicloud/multicloud-dr-terraform/images/exadb-dr-db-azure.png differ diff --git a/cloud-infrastructure/private-cloud-and-edge/README.md b/cloud-infrastructure/private-cloud-and-edge/README.md index 013a32606..26859c36c 100644 --- a/cloud-infrastructure/private-cloud-and-edge/README.md +++ b/cloud-infrastructure/private-cloud-and-edge/README.md @@ -1,6 +1,6 @@ # Private Cloud and Edge -Reviewed: 16.10.2025 +Updated: 7.11.2025 Oracle’s distributed cloud delivers the benefits of cloud with greater control and flexibility. Oracle’s distributed cloud lineup includes: @@ -15,9 +15,10 @@ This section focuses on Dedicated Cloud and Hybrid Cloud services. - [Oracle Compute Cloud@Customer on oracle.com](https://www.oracle.com/cloud/compute/cloud-at-customer/) - [Oracle Compute Cloud@Customer Isolated on oracle.com](https://www.oracle.com/cloud/compute/cloud-at-customer-isolated/) +- [Oracle Roving Edge Infrastructure on oracle.com](https://www.oracle.com/cloud/roving-edge-infrastructure/) - [Oracle Dedicated Region on oracle.com](https://www.oracle.com/cloud/cloud-at-customer/dedicated-region/) - [Oracle Alloy on oracle.com](https://www.oracle.com/cloud/alloy/) -- [Oracle Cloud Isolated Region](https://www.oracle.com/government/govcloud/isolated/) +- [Oracle Cloud Isolated Region on oracle.com](https://www.oracle.com/government/govcloud/isolated/) ## License diff --git a/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/blogs/README.md b/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/blogs/README.md index 747276262..7db0a0718 100644 --- a/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/blogs/README.md +++ b/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/blogs/README.md @@ -1,8 +1,10 @@ # Compute Cloud@Customer -Reviewed: 16.10.2025 +Updated: 10.11.2025 -## Blogs, Press Releases, News Articles, Videos & Podcasts +Collection of blogs and other publications (Press Releases, News Articles, Videos & Podcasts) relevant to Compute Cloud@Customer (CCATC). The collection will continue to be updated as needed. + +## Blogs and other publications Blogs / Press Releases 1. [Oracle Recognized as a Leader in the 2025 Gartner® Magic Quadrant™ for Distributed Hybrid Infrastructure](https://www.oracle.com/news/announcement/oracle-recognized-as-a-leader-in-the-2025-gartner-magic-quadrant-for-distributed-hybrid-infrastructure-2025-09-10/) (10/Sep/2025) @@ -30,8 +32,10 @@ Blogs / Press Releases 23. [Oracle Linux Powers Oracle Compute Cloud@Customer](https://blogs.oracle.com/linux/post/oracle-linux-powers-oracle-compute-cloud-at-customer) (9/Aug/2023) 24. [Oracle Compute Cloud@Customer](https://blogs.oracle.com/cloud-infrastructure/post/oracle-compute-cloud-at-customer) (9/Aug/2023) 25. [Druid Software Delivers 3GPP-Compliant Enterprise Core on Oracle Cloud](https://blogs.oracle.com/cloud-infrastructure/post/druid-software-3gpp-compliant-enterprise-core) (17/Apr/2023) -26. [Unlocking 5G with End-to-End Distributed Cloud](https://blogs.oracle.com/cloud-infrastructure/post/unlocking-5g-with-end-to-end-distributed-cloud) (21/Jun/2022) -27. [Compute Cloud@Customer: Hybrid Cloud Compute for Your Data Center](https://blogs.oracle.com/infrastructure/post/compute-cloud-at-customer-hybrid-cloud-compute-for-your-data-center) (13/Jun/2022) +26. [Pacemaker / Corosync fencing on Oracle Private Cloud Appliance X9-2](https://blogs.oracle.com/oracle-systems/post/pacemaker-corosync-fencing-on-oracle-private-cloud-appliance-x9-2) (8/Oct/2022) +27. [Unlocking 5G with End-to-End Distributed Cloud](https://blogs.oracle.com/cloud-infrastructure/post/unlocking-5g-with-end-to-end-distributed-cloud) (21/Jun/2022) +28. [Announcing: Preview of Compute Cloud@Customer](https://blogs.oracle.com/cloud-infrastructure/post/dedicated-regions-now-available-with-smaller-footprint-and-lower-price-point) (21/Jun/2022) +29. [Compute Cloud@Customer: Hybrid Cloud Compute for Your Data Center](https://blogs.oracle.com/infrastructure/post/compute-cloud-at-customer-hybrid-cloud-compute-for-your-data-center) (13/Jun/2022) ## License diff --git a/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/mos-notes/README.md b/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/mos-notes/README.md index 167827a2f..3a123dd14 100644 --- a/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/mos-notes/README.md +++ b/cloud-infrastructure/private-cloud-and-edge/compute-cloud-at-customer/mos-notes/README.md @@ -1,6 +1,6 @@ # Compute Cloud@Customer Support Notes -Updated: 17.6.2025 +Updated: 10.11.2025 Collection of My Cloud Oracle Support (MCOS) & My Oracle Support (MOS) notes relevant to Compute Cloud@Customer (CCATC) and Private Cloud Appliance (PCA) notes that are relevant to CCATC. The collection will continue to be updated as needed and is intended for use by CCATC administrators or anyone working on the CCATC service. @@ -24,6 +24,7 @@ Collection of My Cloud Oracle Support (MCOS) & My Oracle Support (MOS) notes rel - [[CCATC] How to Update a Compute Cloud@Customer x509 Certificate in OCI's IDP (Doc ID 3056675.1)](https://support.oracle.com/epmos/faces/DocumentDisplay?id=3056675.1) - [[PCA 3.x] How to Add a Secondary VNIC to an Instance (Doc ID 3059065.1)](https://support.oracle.com/epmos/faces/DocumentDisplay?id=3059065.1) - [OCI Edge Cloud Supported Database Releases (Doc ID 3061220.1)](https://support.oracle.com/epmos/faces/DocumentDisplay?id=3061220.1) +- [[PCA 3.x] How to create a PCS cluster between instances (Doc ID 3080424.1)](https://support.oracle.com/epmos/faces/DocumentDisplay?id=3080424.1) ## License diff --git a/cloud-infrastructure/private-cloud-and-edge/dedicated-cloud/README.md b/cloud-infrastructure/private-cloud-and-edge/dedicated-cloud/README.md new file mode 100644 index 000000000..cca3b7086 --- /dev/null +++ b/cloud-infrastructure/private-cloud-and-edge/dedicated-cloud/README.md @@ -0,0 +1,23 @@ +# Dedicated Cloud + +Updated: 10.11.2025 + +Oracle Dedicated Cloud includes two distinct deployment models: Oracle Dedicated Region and Oracle Alloy + +Dedicated Region is a rapidly deployed full-stack OCI region with an expandable footprint starting as small as 3 Racks, optimized for diverse environments. The pre-configured modular infrastructure and streamlined service design enables a seamless experience while accelerating time-to-market. Access more than 150 OCI services for a complete cloud journey, from migration to modernization to innovation. Run nearly any workload, even mission-critical and AI workloads, and address stringent sovereignty and regulatory requirements. + +Oracle Alloy is a complete cloud infrastructure platform that enables partners to become cloud providers and offer a full range of cloud services to expand their businesses. Partners control the commercial and customer experience of Oracle Alloy and can customize and extend it to address their specific market needs. Oracle Alloy is designed to give partners better control over the change management process and operations to fulfill regulatory and sovereignty requirements. + +## Useful Links + +- [Oracle Dedicated Region on oracle.com](https://www.oracle.com/cloud/cloud-at-customer/dedicated-region/) +- [Oracle Alloy on oracle.com](https://www.oracle.com/cloud/alloy/) + + +## License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE.txt) for more details. diff --git a/cloud-infrastructure/private-cloud-and-edge/dedicated-cloud/blogs/README.md b/cloud-infrastructure/private-cloud-and-edge/dedicated-cloud/blogs/README.md new file mode 100644 index 000000000..7083db44d --- /dev/null +++ b/cloud-infrastructure/private-cloud-and-edge/dedicated-cloud/blogs/README.md @@ -0,0 +1,53 @@ +# Dedicated Cloud + +Updated: 10.11.2025 + +Collection of blogs and other publications (Press Releases, News Articles, Videos & Podcasts) relevant to Oracle Dedicated Cloud (Dedicated Region and Alloy). The collection will continue to be updated as needed. + +## Dedicated Cloud - Common + +1. [Explore the Future of Cloud with OCI Dedicated Region and Oracle Alloy at AI World](https://blogs.oracle.com/cloud-infrastructure/post/ai-world-2025-dedicated-region) (14/Oct/2025) +2. [Enhancing Cloud Security and Sovereignty with Thales & Oracle Alloy - Post-Webinar Recap](https://blogs.oracle.com/cloud-infrastructure/post/oracle-alloy-thales-cloud-security-sovereignty-webinar-recap) (29/Sep/2025) +3. [Unveiling the Future of Cloud Computing: A Sneak Peek into Dedicated Cloud at Oracle AI World 2025](https://blogs.oracle.com/cloud-infrastructure/post/dedicated-cloud-at-oracle-ai-world-2025) (27/Aug/2025) +4. [Tackle your sovereignty obligations with Oracle Cloud Infrastructure](https://blogs.oracle.com/cloud-infrastructure/post/tackle-sovereignty-obligations-with-oci) (22/Apr/2025) +5. [Announcing New AI Infrastructure Capabilities with NVIDIA Blackwell for Public, On-Premises, and Service Provider Clouds](https://blogs.oracle.com/cloud-infrastructure/post/supercluster-nvidia-blackwell-dedicated-alloy) (27/Mar/2025) +6. [https://blogs.oracle.com/cloud-infrastructure/post/oracle-and-nvidia-deliver-sovereign-ai-anywhere](https://blogs.oracle.com/cloud-infrastructure/post/oracle-and-nvidia-deliver-sovereign-ai-anywhere) (18/Mar/2025) +7. [Oracle recognized as a Leader for a second year in the/2024 Gartner Magic Quadrant for Distributed Hybrid Infrastructure](https://blogs.oracle.com/cloud-infrastructure/post/2024-gartner-mq-distributed-hybrid-infrastructure) (10/Oct/2024) +8. [Oracle sovereign cloud solutions: Implement more personnel requirements](https://blogs.oracle.com/cloud-infrastructure/post/oracle-sovereign-cloud-operations-support-personnel-requirements) (27/Jun/2023) +9. [Solving Key Industry Use Cases with Oracle Dedicated Cloud](https://blogs.oracle.com/cloud-infrastructure/post/key-industry-use-cases-oracle-dedicated-cloud) (14/Mar/2025) +10. [Oracle recognized as Leader in 2023 Gartner Magic Quadrant for Distributed Hybrid Infrastructure](https://blogs.oracle.com/cloud-infrastructure/post/2023-gartner-magic-quadrant-dist-hybrid-infra) (6/Nov/2023) +11. [Delivering full public cloud services without risk to regulated enterprises and public sector organizations](https://blogs.oracle.com/cloud-infrastructure/post/delivering-full-public-cloud-services) (17/Apr/2023) +12. [Oracle sovereign cloud solutions: Using realms for enhanced cloud isolation](https://blogs.oracle.com/cloud-infrastructure/post/sovereign-cloud-realms-enhanced-isolation) (10/Apr/2023) +13. [Oracle sovereign cloud solutions: Choose where your data is located](https://blogs.oracle.com/cloud-infrastructure/post/oracle-sovereign-cloud-choose-where-data-located) (2/Mar/2023) +14. [OCI’s distributed cloud: meeting customer needs beyond the public cloud](https://blogs.oracle.com/cloud-infrastructure/post/oci-distributed-cloud-meeting-customer-needs) (18/Oct/2022) + +## Dedicated Cloud - Dedicated Region + +1. [Meeting Customer Requirements for On-Premises Public Cloud](https://blogs.oracle.com/cloud-infrastructure/post/better-than-the-competition-meeting-customer-requirements-for-on-premises-public-cloud) (21/Jun/2022) +2. [OCI Dedicated Regions now available with smaller footprint and lower price point](https://blogs.oracle.com/cloud-infrastructure/post/dedicated-regions-now-available-with-smaller-footprint-and-lower-price-point) (21/Jun/2022) +3. [Oracle Dedicated Region - A complete cloud solution in your data center](https://blogs.oracle.com/cloud-infrastructure/post/oracle-dedicated-region-a-complete-cloud-solution-in-your-data-center) (21/Jun/2022) +4. [Oracle Dedicated Region Cloud@Customer meets migration requirements better than AWS Outposts](https://blogs.oracle.com/cloud-infrastructure/post/oracle-dedicated-region-cloudcustomer-meets-migration-requirements-better-than-aws-outposts) (20/Aug/2021) +5. [Oracle Dedicated Region Cloud@Customer meets bare metal and VMware requirements better than AWS Outposts](https://blogs.oracle.com/cloud-infrastructure/post/oracle-dedicated-region-cloudcustomer-meets-bare-metal-and-vmware-requirements-better-than-aws-outposts) (12/Jul/2021) +6. [Unlocking 5G with End-to-End Distributed Cloud](https://blogs.oracle.com/cloud-infrastructure/post/unlocking-5g-with-end-to-end-distributed-cloud) (21/Jun/2022) +7. [Does AWS Outposts' Pricing/Performance Match Oracle Dedicated Region Cloud@Customer?](https://blogs.oracle.com/cloud-infrastructure/post/oracle-dedicated-region-cloudcustomers-performance-and-pricing-advantages-over-aws-outposts) (7/May/2021) +8. [Oracle’s distinct approach on hybrid and multicloud](https://blogs.oracle.com/cloud-infrastructure/post/oracles-distinct-approach-on-hybrid-and-multicloud) (30/Apr/2021) +9. [Oracle Dedicated Region Cloud@Customer meets data sovereignty and security requirements better than AWS Outposts](https://blogs.oracle.com/cloud-infrastructure/post/oracle-dedicated-region-cloudcustomer-meets-data-sovereignty-and-security-requirements-better-than-aws-outposts) (31/Mar/2021) +10. [Six reasons customers choose Oracle Dedicated Region Cloud@Customer over AWS Outposts](https://blogs.oracle.com/cloud-infrastructure/post/six-reasons-customers-choose-oracle-dedicated-region-cloudcustomer-over-aws-outposts) (9/Mar/2021) +11. [Oracle Dedicated Region Cloud@Customer: Experience the cloud anywhere](https://blogs.oracle.com/cloud-infrastructure/post/oracle-dedicated-region-cloudcustomer-experience-the-cloud-anywhere) (4/Mar/2021) +12. [Resetting the boundaries of hybrid cloud flexibility and control](https://blogs.oracle.com/cloud-infrastructure/post/resetting-the-boundaries-of-hybrid-cloud-flexibility-and-control) (9/Feb/2021) +13. [First Principles: Shrink wrap the cloud scale](https://blogs.oracle.com/cloud-infrastructure/post/first-principles-shrink-wrap-the-cloud-scale) (1/Dec/2020) +14. [Announcing Oracle Dedicated Region Cloud@Customer and Oracle Autonomous Database on Exadata Cloud@Customer](https://blogs.oracle.com/cloud-infrastructure/post/announcing-oracle-dedicated-region-cloudcustomer-and-oracle-autonomous-database-on-exadata-cloudcustomer) (8/Jul/2020) + +## Dedicated Cloud - Alloy + +1. [Enhancing Cloud Security and Sovereignty with Thales & Oracle Alloy - Post-Webinar Recap](https://blogs.oracle.com/cloud-infrastructure/post/oracle-alloy-thales-cloud-security-sovereignty-webinar-recap) (29/Sep/2025) +2. [The Latest Updates in Oracle Alloy - Expanding Control & Flexibility](https://blogs.oracle.com/cloud-infrastructure/post/the-latest-updates-in-oracle-alloy) (23/May/2025) +3. [Delivering ongoing innovation and superior experiences with Oracle Alloy](https://blogs.oracle.com/cloud-infrastructure/post/delivering-ongoing-innovation-oracle-alloy) (19/Sep/2023) + +## License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE.txt) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/README.md b/cloud-infrastructure/virtualization-solutions/README.md index 59bc38298..2405057ab 100644 --- a/cloud-infrastructure/virtualization-solutions/README.md +++ b/cloud-infrastructure/virtualization-solutions/README.md @@ -7,7 +7,7 @@ Cloud Virtualization Solutions area focuses on providing deep technical guidance - Oracle Secure Desktops - Oracle Cloud Migrations (OCM) -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # License diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/README.md index 130a19dc1..6276be937 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/README.md @@ -3,7 +3,7 @@ Red Hat OpenShift can be hosted on OCI as a self-run platform. Oracle provides terraform templates for easy implementation and platform integration. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # Useful Links @@ -16,14 +16,12 @@ Reviewed: 06.11.2024 - [Use multiple and floating Egress IP(s) by leveraging OCI VLANs](openshift-floating-egress-ip/README.md) - [Enable Seamless Access to Red Hat OpenShift Container Platform on OCI from On-Premises to VCNs in the Same Region](https://docs.oracle.com/en/learn/oci-openshift-vcn/) +- [Deploying Red Hat OpenShift on OCI using Assisted Installer Method](https://github.com/oracle-devrel/technology-engineering/blob/main/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/index.md) ## OpenShift Day-2 - Operations - [Using OCI Object storage for the OpenShift Internal Registry](enable-image-registry/README.md) -- [Adding extra worker nodes to your Assisted installed cluster](assisted-cluster-add-host/README.md) - -## Videos - -- [Red Hat OpenShift on Oracle Cloud Infrastructure ](https://www.youtube.com/watch?v=_3WMrRVRD1o) +- [Add a New Worker Node to an Existing OpenShift Cluster Using the Assisted Installer](https://github.com/oracle-devrel/technology-engineering/blob/main/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/index.md) + # Reusable Assets diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/README.md new file mode 100644 index 000000000..d58c752f7 --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/README.md @@ -0,0 +1,25 @@ +# OpenShift on OCI - Adding new host to an existing cluster using Assisted Installer + +This guide explains how to add new worker or compute nodes to an existing Red Hat OpenShift cluster on Oracle Cloud Infrastructure (OCI) using the Assisted Installer. It outlines prerequisites, OCI network configurations, and the automated process for integrating new hosts. The document helps engineers efficiently scale OpenShift clusters on OCI while ensuring seamless integration, consistency, and performance. + +Reviewed: 12.11.2025 + +# When to use this asset? + +Use this document when expanding an existing Red Hat OpenShift cluster on Oracle Cloud Infrastructure (OCI) by adding new worker or compute nodes using the Assisted Installer. It serves as a practical guide for architects and engineers to plan and execute cluster scaling operations securely and efficiently, ensuring new nodes integrate seamlessly with existing networking, storage, and OCI-native services. + +# Instructions for Utilizing This Asset + +Use this document as a foundation for planning and executing the addition of new worker or compute nodes to an existing Red Hat OpenShift cluster on Oracle Cloud Infrastructure (OCI) using the Assisted Installer. It provides example workflows, architecture references, and configuration guidelines that can be tailored to match specific environments and customer requirements, ensuring seamless integration of new hosts into the existing cluster infrastructure. + +# Conclusion + +Adding new hosts to an existing Red Hat OpenShift cluster on Oracle Cloud Infrastructure (OCI) using the Assisted Installer simplifies and automates the process of scaling cluster capacity. This approach ensures seamless integration of additional compute resources while maintaining consistency, performance, and reliability across the environment. Contributors are encouraged to share feedback, raise questions, and collaborate to further enhance the scalability experience and operational efficiency of OpenShift on OCI. + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/0.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/0.png new file mode 100644 index 000000000..df957e81b Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/0.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/1.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/1.png new file mode 100644 index 000000000..6195b1b05 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/1.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/10.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/10.png new file mode 100644 index 000000000..2b1c62e09 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/10.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/11.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/11.png new file mode 100644 index 000000000..3d5c90829 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/11.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/12.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/12.png new file mode 100644 index 000000000..c285add09 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/12.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/13.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/13.png new file mode 100644 index 000000000..9d61bd413 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/13.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/14.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/14.png new file mode 100644 index 000000000..f0ec4380d Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/14.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/15.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/15.png new file mode 100644 index 000000000..a0a018ba0 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/15.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/16.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/16.png new file mode 100644 index 000000000..e76ae82a5 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/16.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/17.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/17.png new file mode 100644 index 000000000..9f7e390d9 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/17.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/18.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/18.png new file mode 100644 index 000000000..d76c90467 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/18.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/19.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/19.png new file mode 100644 index 000000000..34eb04465 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/19.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/2.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/2.png new file mode 100644 index 000000000..a513fb135 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/2.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/20.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/20.png new file mode 100644 index 000000000..a56fa3492 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/20.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/21.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/21.png new file mode 100644 index 000000000..d5f0848f1 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/21.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/22.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/22.png new file mode 100644 index 000000000..3722d2aff Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/22.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/23.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/23.png new file mode 100644 index 000000000..9c4b51e02 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/23.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/3.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/3.png new file mode 100644 index 000000000..47af4ddd4 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/3.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/4.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/4.png new file mode 100644 index 000000000..225f24776 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/4.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/5.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/5.png new file mode 100644 index 000000000..3ab663b3c Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/5.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/6.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/6.png new file mode 100644 index 000000000..0024a8bf0 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/6.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/7.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/7.png new file mode 100644 index 000000000..c14ddb1e8 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/7.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/8.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/8.png new file mode 100644 index 000000000..752249a01 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/8.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/9.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/9.png new file mode 100644 index 000000000..086c6d654 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/images/9.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/index.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/index.md new file mode 100755 index 000000000..5907564be --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/add-new-host-assisted-method/files/index.md @@ -0,0 +1,162 @@ + +# **Add a New Worker Node to an Existing OpenShift Cluster Using the Assisted Installer** + + +  + + +

Overview

+ +In an earlier [tutorial](https://github.com/oracle-devrel/technology-engineering/blob/main/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/index.md), we discussed how to deploy OpenShift on OCI using the Assisted Installer method. In this tutorial, we’ll cover how to add a new or additional worker node to an existing OpenShift cluster. + +When adding a new node, you can choose between a virtual machine–based node or a bare metal node. The list of supported shapes that can be added to the cluster is available in the [Oracle Documentation](https://docs.oracle.com/en-us/iaas/Content/openshift-on-oci/overview.htm#supported-shapes) + + + + +  + +

Architecture

+ +Here is a sample architecture + +![OpenShift Architecture](./images/0.png "Architecture Diagram") + +  + +

Before we begin

+ +- Active OpenShift Cluster installed via Assisted Installer Method +- RedHat account details that was used for cluster deployment +- Access to OCI Console with appropriate privileges. Please refer to Oracle [documentation](https://docs.oracle.com/en-us/iaas/Content/openshift-on-oci/install-prereq.htm#install-prereq-account) +- SSH key pair + +  + +

High Level Steps

+ +1. Generate the ISO from the RedHat console +2. Upload the ISO to OCI Object Storage +3. Download the add node TF stack from OCI GitHub page +4. Run the OCI stack +5. Finish the node installation from RedHat console +6. Approve/Add the node on the cluster + +  + +

Step #1 – Generate the installation ISO

+ +1. Login to RedHat console on console.redhat.com +2. Click on the navigation console and click on RedHat OpenShift. +3. Click on Cluster List followed by the cluster name. Click on the add host tab +4. Click on Add host tab + +![OpenShift Architecture](./images/1.png "Node Addition") + +5. Click next on the cluster details page (choose x86_64) + +6. On the generate ISO tab, paste the public key and click on Generate Discovery ISO button. + +7. Click on the Download Discovery ISO button and it will start the downloading the ISO + +8. Keep the window open as we will come back to it later. + +![OpenShift Architecture](./images/2.png "Node Addition") + +![OpenShift Architecture](./images/3.png "Node Addition") + +![OpenShift Architecture](./images/4.png "Node Addition") + +  + +

Step #2 – Upload the ISO to Object Storage

+ +1. Login to the OCI Tenancy. From the navigation menu, click on storage followed by buckets +2. Select the compartment (vaibhav-demo in my exmaple). Create a bucket if you don't have one or select the bucket where you wish to upload the ISO +3. Under the objects tab click on upload objects and upload the ISO downloaded earlier + +![OpenShift Architecture](./images/5.png "Node Addition") + +4. Click on the three dots on the right of the uploaded image and click on create pre-authenticated request + +5. Keep the default values and click on create pre-authenticated request button. + +6. Have the link handy as it will be needed later + +![OpenShift Architecture](./images/6.png "Node Addition") + +  + +

Step #3 – Download the Terraform Stack for node addition

+ +1. Download the latest Terraform file for node addition from URL https://github.com/oracle-quickstart/oci-openshift/releases +2. At the time of writing the tutorial 1.4.2 is the latest. Click on the file name and it should start the download. + +![OpenShift Architecture](./images/7.png "Node Addition") + +  + +

Step #4 – Run the OCI Stack

+ +1. Login to the OCI Tenancy. Click on the navigation menu followed by Developer Services. +2. Under Resource Manager click on Stack. +3. Select the compartment (vaibhav-demo in my exmaple). Click on create stack button. +4. Map the zip file downloaded in the above step with My Configuration and click next + +![OpenShift Architecture](./images/8.png "Node Addition") + +5. Enter below details + + a. Choose the compartment where the existing OpenShift cluster is deployed. + + b. Enter the cluster name + + c. Enter the PAR link for the add node ISO created earlier + + d. During OpenShift Cluster creation, we created tag namespace "openshift-tags". Map the compartment where it resides. +![OpenShift Architecture](./images/9.png "Node Addition") + + e. During the cluster creation as part of the deployment process VCN and three subnets are created. Map the VCN and the subnets. +![OpenShift Architecture](./images/10.png "Node Addition") + + f. Here we are adding a new Worker / Compute Node and not Master / Control Plane. Keep the "Control Plan Node Count" as 0 +![OpenShift Architecture](./images/11.png "Node Addition") + + g. Under the "Compute Node Configuration" enter the shape name for the new worker node. Here I am adding one bare metal server. Keep the default values for other parameters. Click Next. + + If you are adding a Virtual Machine based worker node then you need to specify the OCPU and RAM. For Bare Metal shape entire RAM and OCPU (the physical server has) are considered by default. +![OpenShift Architecture](./images/12.png "Node Addition") + + h. Review the configuration. Check the option for "run apply". Click on the button apply. The stack should complete successfully. New node will be seen on the OCI portal +![OpenShift Architecture](./images/13.png "Node Addition") ![OpenShift Architecture](./images/14.png "Node Addition") ![OpenShift Architecture](./images/15.png "Node Addition") + + + +  + +

Step #5 – Finish the node installation on RedHat portal

+ +1. Post completing step 4, in few minutes we should see the node on the RedHat portal +2. Click on the button "Install Ready Hosts" + +![OpenShift Architecture](./images/16.png "Node Addition") + +3. It will start the installation process and should finish in few minutes. + +![OpenShift Architecture](./images/17.png "Node Addition") ![OpenShift Architecture](./images/18.png "Node Addition") + +  + +

Step #6 – Approve/Add the node from the OpenShift Cluster console

+ +1. Post completing step 5, after few minutes we should see the node in pending state on the OpenShift Cluster +2. We need to approve the discovered node followed by certificate signing request. +3. The node in few minutes should change to ready state. + +![OpenShift Architecture](./images/19.png "Node Addition") ![OpenShift Architecture](./images/20.png "Node Addition") ![OpenShift Architecture](./images/21.png "Node Addition") ![OpenShift Architecture](./images/23.png "Node Addition") ![OpenShift Architecture](./images/22.png "Node Addition") + + + +# Acknowledgments + +- **Author** - Vaibhav Tiwari (Oracle Virtualization BlackBelt) diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/README.md deleted file mode 100644 index c0777cf67..000000000 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/README.md +++ /dev/null @@ -1,156 +0,0 @@ -# Adding a Host to an Assisted Installed OpenShift Cluster on Oracle Cloud (OCI) - -This guide provides detailed instructions on adding a host to an OpenShift cluster installed via the Assisted Installer, specifically in the Oracle Cloud Infrastructure (OCI). The process includes generating a discovery ISO, creating a custom image, configuring OCI load balancers, launching a new instance, and approving the host in the OpenShift console. - -Reviewed: 06.11.2024 - -## Prerequisites - -Before starting, ensure the following: - -- A functioning OpenShift cluster installed via the Assisted Installer on OCI. -- You have access to the OpenShift Assisted Cluster and the OpenShift console. -- You have privileges to manage instances and load balancers within OCI - - -## Steps -### 1. Create Add Host Discovery ISO - -1. Log in to the **OpenShift Console** (https://console.redhat.com/openshift/), go to your cluster list and select your cluster. -2. Navigate to **Add Hosts** tab. - -| - -3. Click on **Add Host** button. -4. Follow the wizard to configure and generate the **Discovery ISO**, you can add an SSH public key to this ISO if you later require direct SSH access. -5. Once the ISO is generated, download it locally. - - - -### 2. Create Custom Image in Oracle Cloud (OCI) Based on Add Host Discovery ISO - -OCI requires a custom image to boot a new instance with the Discovery ISO embedded. This is a different ISO that what was used for creating the initial cluster! - -#### Create Custom Image Using OCI Commands or Console - -1. **Upload the ISO to an Object Storage Bucket**: - - Upload the discovery ISO to an OCI Object Storage bucket in your tenancy. - - - -2. **Create a Custom Image from the Discovery ISO**: - - Go to **Compute > Custom Images** (in the OCI Console) - - Click **Import-discovery-image` - - **Operating System**: RHEL - - **Bucket / Object Name**: Select the bucket where you uploaded the ISO file and select under object name the ISO file. - - Set the **Image Type** to: QCOW2 - - Set the **Launch Mode** to: Paravirtualized mode - - click on **Import image** - - - -3. **Modify Image Capabilities** - - After the custom image is created, click on **Edit image capabilities** - - Set the firmware available options to ONLY **UEFI_64** - - - - -### 3. Modify the Oracle Cloud Infrastructure (OCI) Load Balancer - -To allow the new host to communicate with the OpenShift cluster, you need to modify your OCI OpenShift APP Load Balancer to allow traffic on port **22624**. This port is used for Machine Config Server (MCS) communication. By default only the internal API load balancer is configured for this. - -1. Navigate to **Networking > Load Balancer**. -2. Select the **api apps** load balancer used by your OpenShift cluster. -3. Create a new backend set and set the health check to: - - Protocol: HTTP - - Port: 22624 - - Interval: 10000 - - Timeout: 3000 - - Number of retries: 3 - - Status code: 200 - - URL path: /healthz - - response: .* - - - -4. Add the Management Nodes to this backend set, by clicking on the **Add Backends** option. Set the port for each backend to 22624. - - - - -5. Wait until your backend is in healthy state, then under **Listeners** menu, add a istener to: - - Allow incoming traffic on port **22624** (Machine Configuration). - - Ensure the listener forwards this traffic to the newly created backend. - - - -5. Modify the NSG (Network Security Group) assigned to this load balancer to allowin incomming traffic on TCP/22624 from the intenal VCN Network - - Go to the main page of the load balancer - - In the Load Balancer Information section you will see the assigned NSG. Click on this NSG - - Add a rule for incoming (ingress) traffic. Set the source CIDR range to the CIDR range of your VCN. The Protocol to TCP and destination port to 22624 - - - - -### 4. Launch a New Instance Using the Custom Image - -Once the custom image is created and the load balancer is configured, you can launch a new instance as worker node that will register with the OpenShift cluster. - -1. In the **OCI Console**, go to **Compute > Instances**. -2. Click **Create Instance** and configure the instance: - - **Image**: Choose the custom image you created (`openshift-discovery-image`). - - **Shape**: Select an appropriate shape (e.g., VM.Standard.E4.Flex). - - **Network**: Attach the instance to the correct VCN and subnet that the OpenShift cluster uses. Usse the private subnet for your instance. - - It is recommended to have an openshift worknode with min 4 cores and 16 GB Ram. - - The worker node needs at minimum a 100GB Disk and it is recommended to have this 30 VPUs assigned - - Click on **create** to launch the instance - - | - -3. **NSG Assignment**: When the node is being created you can click on the **edit** link behing the Network Security Groups in the primary VNIC section. - - Set the NSG to the NSG for the Openshift Worker Nodes (cluster-compute-nsg) - - - -4. **Set the correct tag**: After the instance is up an running (Green: Running state). Add the correct openshift tag - - Navigate to **[More Actions]** on the main page of the instance - - Click on **Add Tags** - - Select the Tag Namespace used for this Openshift cluster. Likely the name of your cluster - - Set the tag key to **compute** - - - -### 5. Install ready nodes in the Openshift console (https://console.redhat.com/openshift/) - -It will take a few minutes, but at somepoint your new node should show on the **Add hosts** tab. Wait for the host to become in the **Ready state**. It likely will first show **Insufficient**, just be patient. - -When the node is in ready state, you can click on the **[Install Ready Nodes]** - - - -This will take some time. When it get to the **Installed** state, the node will reboot and after a few minute shoud up as node in your Cluster Console. - -### 5. Approve the Host in the OpenShift Cluster Console - -After the new instance boots and registers with the OpenShift cluster, it must be approved from the OpenShift console. - -1. Log in to your **OpenShift Web Console** of your cluster. -2. Go to **Compute > Nodes**. -3. You new worker node should appear here. - - - -4. Click on the Discovered link and **Aprove** that the node is added to your cluster. - - - -5. As a final step you likely also need to approve the new nodes certificate. Click on the **Not Ready** link and approve the Certificate Signing process. - - - -You node will now be accepted as a new worker node for your Openshift cluster and you will automatically start seeing pods running on this new node. ---- - -## Conclusion - -Following this guide, you will successfully add a new host to your OpenShift cluster on Oracle Cloud Infrastructure (OCI). The new host will be automatically configured and integrated into the cluster after it is approved via the OpenShift web console. - - -# License -Copyright (c) 2025 Oracle and/or its affiliates. -Licensed under the Universal Permissive License (UPL), Version 1.0. -See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/1. clusteroverview.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/1. clusteroverview.png deleted file mode 100644 index b2579bde4..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/1. clusteroverview.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/10. NSG-LB.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/10. NSG-LB.png deleted file mode 100644 index c25587d58..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/10. NSG-LB.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/11. AddNode1.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/11. AddNode1.png deleted file mode 100644 index 1ec4cb2e0..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/11. AddNode1.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/12. AddNode2.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/12. AddNode2.png deleted file mode 100644 index 1565271f8..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/12. AddNode2.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/13. NodeSetNSG.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/13. NodeSetNSG.png deleted file mode 100644 index bde7c20ab..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/13. NodeSetNSG.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/14. AddTag.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/14. AddTag.png deleted file mode 100644 index d48816db3..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/14. AddTag.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/15. WaitNodeReady.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/15. WaitNodeReady.png deleted file mode 100644 index 94e016b2e..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/15. WaitNodeReady.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/16. NewWorker.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/16. NewWorker.png deleted file mode 100644 index 9425d9936..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/16. NewWorker.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/17. NewWorkerApprove.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/17. NewWorkerApprove.png deleted file mode 100644 index aa6435ccc..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/17. NewWorkerApprove.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/18. NewWorkerApprove2.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/18. NewWorkerApprove2.png deleted file mode 100644 index 5083e9060..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/18. NewWorkerApprove2.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/2. addHost1.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/2. addHost1.png deleted file mode 100644 index 676914495..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/2. addHost1.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/3. DownloadISO.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/3. DownloadISO.png deleted file mode 100644 index 6ebc8bb1f..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/3. DownloadISO.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/4. uploadISO.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/4. uploadISO.png deleted file mode 100644 index 78db0fd08..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/4. uploadISO.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/5. importImage.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/5. importImage.png deleted file mode 100644 index 1fe07a88c..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/5. importImage.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/6. editImageCapabilities.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/6. editImageCapabilities.png deleted file mode 100644 index 4f94abfe6..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/6. editImageCapabilities.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/7. CreateBackend.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/7. CreateBackend.png deleted file mode 100644 index 04336dd28..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/7. CreateBackend.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/8. AddBackenNodes.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/8. AddBackenNodes.png deleted file mode 100644 index 144e10e9f..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/8. AddBackenNodes.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/9. CreateListener.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/9. CreateListener.png deleted file mode 100644 index e531ac55a..000000000 Binary files a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/9. CreateListener.png and /dev/null differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/README.md deleted file mode 100644 index f9754e5f1..000000000 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/assisted-cluster-add-host/files/README.md +++ /dev/null @@ -1,7 +0,0 @@ -# License - -Copyright (c) 2025 Oracle and/or its affiliates. - -Licensed under the Universal Permissive License (UPL), Version 1.0. - -See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/README.md new file mode 100644 index 000000000..53d733cc6 --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/README.md @@ -0,0 +1,25 @@ +# OpenShift on OCI - Deployment using Assisted Installer + +This repository provides a concise, step-by-step guide to deploying Red Hat OpenShift on Oracle Cloud Infrastructure (OCI) using the Assisted Installer. It outlines the deployment architecture, key requirements, and integration with OCI services, serving as a practical reference for engineers and architects implementing OpenShift clusters through an automated installation process. + +Reviewed: 07.11.2025 + +# When to use this asset? + +Use this document when planning or deploying Red Hat OpenShift on Oracle Cloud Infrastructure (OCI) with the Assisted Installer. It serves as a practical guide for architects and engineers to set up a secure, scalable, and automated OpenShift environment leveraging OCI’s native services. + +# Instructions for Utilizing This Asset + +Use this document as a foundation for defining and planning your OpenShift deployment on Oracle Cloud Infrastructure (OCI) using the Assisted Installer. It includes example deployment architectures and reference diagrams that can be customized with environment-specific configurations and customer-specific details as needed. + +# Conclusion + +Red Hat OpenShift on Oracle Cloud Infrastructure (OCI), deployed using the Assisted Installer, streamlines the process of setting up and managing containerized environments in the cloud. This approach enables faster provisioning, improved scalability, and simplified operations. All contributors are encouraged to share feedback, raise questions, and collaborate to continuously enhance the deployment experience and solution effectiveness. + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/0.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/0.png new file mode 100644 index 000000000..5fc6fa96e Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/0.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/1.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/1.png new file mode 100644 index 000000000..24e9ff234 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/1.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/10.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/10.png new file mode 100644 index 000000000..57565b52c Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/10.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/11.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/11.png new file mode 100644 index 000000000..eaeac3bc9 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/11.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/12.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/12.png new file mode 100644 index 000000000..754e3e8e5 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/12.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/13.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/13.png new file mode 100644 index 000000000..3036c5f94 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/13.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/14.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/14.png new file mode 100644 index 000000000..ac002f489 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/14.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/15.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/15.png new file mode 100644 index 000000000..b7217082f Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/15.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/16.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/16.png new file mode 100644 index 000000000..b00560d80 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/16.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/17.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/17.png new file mode 100644 index 000000000..5fd780ff4 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/17.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/18.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/18.png new file mode 100644 index 000000000..a32f518d7 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/18.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/19.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/19.png new file mode 100644 index 000000000..58bb73335 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/19.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/2.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/2.png new file mode 100644 index 000000000..e2d479f18 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/2.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/20.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/20.png new file mode 100644 index 000000000..0a41c9d03 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/20.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/21.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/21.png new file mode 100644 index 000000000..68763b6a0 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/21.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/22.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/22.png new file mode 100644 index 000000000..41b5ed345 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/22.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/23.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/23.png new file mode 100644 index 000000000..910f2f93a Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/23.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/24.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/24.png new file mode 100644 index 000000000..7439c45b5 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/24.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/25.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/25.png new file mode 100644 index 000000000..866b5ed3a Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/25.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/26.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/26.png new file mode 100644 index 000000000..e35466db7 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/26.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/27.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/27.png new file mode 100644 index 000000000..41d750559 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/27.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/28.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/28.png new file mode 100644 index 000000000..bd70e3230 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/28.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/29.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/29.png new file mode 100644 index 000000000..fc2c54e2b Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/29.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/3.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/3.png new file mode 100644 index 000000000..d28df1dda Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/3.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/30.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/30.png new file mode 100644 index 000000000..90ca35215 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/30.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/31.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/31.png new file mode 100644 index 000000000..f6001aef9 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/31.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/4.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/4.png new file mode 100644 index 000000000..2ed9be977 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/4.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/5.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/5.png new file mode 100644 index 000000000..7948d7658 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/5.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/6.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/6.png new file mode 100644 index 000000000..cb0242647 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/6.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/7.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/7.png new file mode 100644 index 000000000..00d097958 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/7.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/8.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/8.png new file mode 100644 index 000000000..26156ca56 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/8.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/9.png b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/9.png new file mode 100644 index 000000000..8c560351c Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/images/9.png differ diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/index.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/index.md new file mode 100755 index 000000000..6db3ab03a --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/deploy-new-cluster-assisted-method/files/index.md @@ -0,0 +1,219 @@ + +# **Deploying Red Hat OpenShift on OCI using Assisted Installer Method** + + +  + + +

Overview

+ +Red Hat OpenShift is an enterprise-grade Kubernetes platform that enables organizations to build, deploy, and manage both containerized applications and virtual machines on a unified platform. + +Now officially certified on Oracle Cloud Infrastructure (OCI), OpenShift offers a fully supported and optimized environment on OCI’s high-performance virtual machine and bare metal shapes. Customers benefit from native OCI integrations such as Block Storage, Load Balancer, Object Storage, and Identity services, along with enterprise-level security, scalability, and automation. + +Deploying OpenShift on OCI delivers a consistent hybrid and multi-cloud experience, enabling businesses to modernize applications, integrate legacy VMs, and accelerate innovation. + +This tutorial provides a step-by-step guide to deploy OpenShift on OCI, covering architecture, networking, storage, installation, and essential configurations—helping you quickly build a production-ready OpenShift cluster that supports both containers and VMs on Oracle Cloud. + +  + +

Architecture

+ +As part of the deployment framework, we deploy Control Plane (Master) and Compute (Worker) Nodes. + +Storage is backed by OCI Block Volume. + +We recommend carefully planning the deployment — including the compartment structure, VCN range, and the placement of all components. + +![OpenShift Architecture](./images/0.png "OpenShift Assisted Cluster Deployment") + +  + +

Before we begin

+ +- Active RedHat account +- Access to OCI Console with appropriate privileges. Please refer to Oracle [documentation](https://docs.oracle.com/en-us/iaas/Content/openshift-on-oci/install-prereq.htm#install-prereq-account) +- SSH key pair +- Decide the compartment where we need to deploy the solution + - I have created/deployed the solution in a single compartment vaibhav-demo + - However, we can have different/multiple compartments too + - Domain Name + +  + +

High Level Steps

+ +1. Create the OpenShift Cluster & generate the ISO image on the RedHat console +2. Upload the ISO on the OCI Object Storage +3. Create the tag namespace on the OCI compartment +4. Run the OCI OpenShift stack from the OCI console +5. Finish installing the cluster from the RedHat console + +  + +

Step #1 – Generate the installation ISO

+ +1. Login to RedHat console on console.redhat.com +2. Click on the navigation console and click on RedHat OpenShift. Click on Cluster List followed by Create Cluster + +![OpenShift Architecture](./images/1.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/2.png "OpenShift Assisted Cluster Deployment") + +3. Under Cloud tab scroll down and click on Oracle Cloud Infrastructure + +4. Click on Interactive + +5. Enter the Cluster name followed by base domain. Choose stable version. We would recommend to choose the latest one. + +![OpenShift Architecture](./images/3.png "OpenShift Assisted Cluster Deployment") + +6. Click on the drop down for Integrate with external partner platforms and choose Oracle Cloud Infrastructure and click Next + +![OpenShift Architecture](./images/4.png "OpenShift Assisted Cluster Deployment") + +7. Hit Next on the Operators screen. On the Host Discovery tab click on add hosts, paste the public SSH key and click on Generate Discovery ISO. + + a. Post downloading the ISO, we can click on the cancel button. + + b. Do not close this page as we will come back to it post the installation. + +![OpenShift Architecture](./images/5.png "OpenShift Assisted Cluster Deployment") + +  + +

Step #2 – Upload the ISO to Object Storage

+ +1. Login to the OCI Tenancy. From the navigation menu, click on storage followed by buckets +2. Select the compartment (vaibhav-demo in my exmaple). Create a bucket if you don't have one or select the bucket where you wish to upload the ISO +3. Under the objects tab click on upload objects and upload the ISO downloaded earlier + +![OpenShift Architecture](./images/6.png "OpenShift Assisted Cluster Deployment") + +4. Click on the three dots on the right of the uploaded image and click on create pre-authenticated request + +5. Keep the default values and click on create pre-authenticated request button. + +6. Have the link handy as it will be needed later + +![OpenShift Architecture](./images/7.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/8.png "OpenShift Assisted Cluster Deployment") + + + +

Step #3 – Create Tag Namespace

+ +1. Browse to the URL https://github.com/oracle-quickstart/oci-openshift/releases and download the latest resource attribution tag. +2. At the time of document the tutorial create-resource-attribution-tags-v1.4.2.zip is the latest. +3. Click on the file name and it will download it + +![OpenShift Architecture](./images/9.png "OpenShift Assisted Cluster Deployment") + +4. Login to the OCI tenancy, from the navigation menu click on Developer Services. + +5. Click on Stack under Resource Manager. + +6. Click on create stack button and select the zip file downloaded above. Click next. + +![OpenShift Architecture](./images/10.png "OpenShift Assisted Cluster Deployment") + +7. Select the right compartment (vaibhav-demo in my exmaple) and click next. Select run apply and click on the create button. + +![OpenShift Architecture](./images/11.png "OpenShift Assisted Cluster Deployment") + +8. Once the stack finishes successfully. From the navigation menu, click on Governance & Administration. + +9. Click on Tag Namespaces under Tenancy Management. + +10. Choose the right compartment (vaibhav-demo in my exmaple) and you should see the openshift-tags + +![OpenShift Architecture](./images/12.png "OpenShift Assisted Cluster Deployment") + +  + +

Step #4 – Create the OpenShift cluster

+ +**Create repository for Oracle Cloud Agent (Optional)** + +1. Login to the OCI tenancy. From the navigation menu, click on the Marketplace followed by All Applications +2. Type Cloud Agent and click on the terraform stack +3. Click on the export button on the far right + +![OpenShift Architecture](./images/13.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/14.png "OpenShift Assisted Cluster Deployment") + +4. Choose the right compartment (vaibhav-demo in my exmaple), create a new public repository. Accept the terms and click on export package button on top right. + +5. Verify the work request completes successfully. + +![OpenShift Architecture](./images/15.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/16.png "OpenShift Assisted Cluster Deployment") + +  + +**Run the Cluster Wizard** + +1. From the navigation menu, click on the Developer Services followed by RedHat OpenShift + +![OpenShift Architecture](./images/17.png "OpenShift Assisted Cluster Deployment") + +2. The process will redirect you and we now need to run Terraform stack to initiate the cluster deployment process. Click Next. + +3. Enter the details + + a. Make sure we are in the right compartment (vaibhav-demo in my exmaple). + + b. Enter the same cluster name what we specified in step 1 + + c. Enter the ISO PAR link generated from step 2. + + ![OpenShift Architecture](./images/18.png "OpenShift Assisted Cluster Deployment") + + d. Choose the compartment (vaibhav-demo in my exmaple) where we created the tag namespace in step 3 + + ![OpenShift Architecture](./images/19.png "OpenShift Assisted Cluster Deployment") + + e. We can keep the defaults for the Control Plane and Compute nodes or change them as per the requirement. + + f. In the demo, I do not want to create a public DNS zone nor public OCI LB. It is all private. + + g. Enter the same domain name as mentioned on step 1. I will let the wizard create the VCN for us. Enter the VCN details + + ![OpenShift Architecture](./images/20.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/21.png "OpenShift Assisted Cluster Deployment") + + h. Choose the latest CSI driver followed by Cloud agent details. Verify the information, apply and run the stack. + + At the time of writing the tutorial v1.32.0-UHP is the latest. Here is the [link](https://github.com/oracle-quickstart/oci-openshift/tree/main/custom_manifests/oci-ccm-csi-drivers) for future references + + ![OpenShift Architecture](./images/22.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/23.png "OpenShift Assisted Cluster Deployment") + + i. Monitor and verify the stack completes successfully. Click on the outputs tab and copy dynamic_custom_manifest. Save the content in a notepad file with extension .yml (for example demo-cluster.yml). This will be used later + + ![OpenShift Architecture](./images/24.png "OpenShift Assisted Cluster Deployment") + + j. On the OCI console, you should see Instances, VCN and the Load Balancers. + + ![OpenShift Architecture](./images/25.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/26.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/27.png "OpenShift Assisted Cluster Deployment") + +  + +

Step #5 – Complete the Cluster Installation

+ +1. Navigate to the RedHat console and you should see the Control Plane and Compute Nodes on the portal. + +2. Mark the Compute Nodes as Worker and Control Plane as Control Plane Nodes, click next. + +![OpenShift Architecture](./images/28.png "OpenShift Assisted Cluster Deployment") + +3. Click next on the storage & networking tab with the default values. It is not recommended to make any changes. + +4. On the Custom Manifest tab, enter the file name as saved above in the notepad file. Drag and drop the yml file in the content box. Click next + +![OpenShift Architecture](./images/29.png "OpenShift Assisted Cluster Deployment") + +5. Validate the summary and click on the Install Cluster button. Post successful installation we will see the cluster details on the RedHat portal. + +6. Download the Kubeconfig file. Copy the kubeadmin password and the URL. These will be needed to access the cluster. + +![OpenShift Architecture](./images/30.png "OpenShift Assisted Cluster Deployment") ![OpenShift Architecture](./images/31.png "OpenShift Assisted Cluster Deployment") + +  + +# Acknowledgments + +- **Author** - Vaibhav Tiwari (Oracle Virtualization BlackBelt) diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/enable-image-registry/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/enable-image-registry/README.md index 083368b6f..bc35ebd88 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/enable-image-registry/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/enable-image-registry/README.md @@ -1,6 +1,6 @@ # Setting up OpenShift Image Registry to use OCI Object Storage Bucket -Reviewed: 06.11.2024 +Reviewed: 11.11.2025 ## Prerequisites You need to have the OpenShift CLI tool installed and properly configured. diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-agent-based-install/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-agent-based-install/README.md index d27ccf216..798a5a881 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-agent-based-install/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-agent-based-install/README.md @@ -7,7 +7,7 @@ This guide is structured into two parts: Part 1: Covers the creation of necessary resources and sets up the OpenShift control plane. Part 2: Focuses on scaling the OpenShift cluster by adding worker nodes to support containerized workloads. -Reviewed: 05.02.2025 +Reviewed: 05.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-discovery-questionnaire/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-discovery-questionnaire/README.md index d74e56f75..adb16fc38 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-discovery-questionnaire/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-discovery-questionnaire/README.md @@ -2,7 +2,7 @@ This document can be used as a reference questionnaire to collect the required details for a project involving Red Hat OpenShift. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-floating-egress-ip/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-floating-egress-ip/README.md index 784718c80..d76ecbf55 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-floating-egress-ip/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-floating-egress-ip/README.md @@ -13,6 +13,9 @@ https://docs.oracle.com/en-us/iaas/Content/VMware/Tasks/ocvsmanagingl2net.htm We can use these VLANs for our OpenShift worker nodes as secondary NIC interfaces and route the EgressIP traffic over. +This document provides the guidelines to configure Floating Egress IP for OpenShift environments on OCI. + +Reviewed: 12.11.2025 ## Create VLAN and assign addition vNIC(s) to worker nodes attached to the VLAN in OCI diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-multi-cluster-domain-management/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-multi-cluster-domain-management/README.md index 4acd8d6fc..a67424cad 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-multi-cluster-domain-management/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-multi-cluster-domain-management/README.md @@ -1,7 +1,7 @@ # Multi-Cluster OpenShift on OCI: Implementing Shared and Unique Domain Architectures This repository provides architectural guidance for implementing a common base domain across multiple OpenShift Container Platform (OCP) clusters in Oracle Cloud Infrastructure (OCI). Designed for customers requiring unified DNS naming while maintaining cluster isolation. -Reviewed: 14.04.2025 +Reviewed: 12.11.2025 # When to use this asset? Use this guide when: diff --git a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-solution-definition-document/README.md b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-solution-definition-document/README.md index 64ecbdfa5..6b4cf8e30 100644 --- a/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-solution-definition-document/README.md +++ b/cloud-infrastructure/virtualization-solutions/openshift-on-oci/openshift-solution-definition-document/README.md @@ -2,7 +2,7 @@ This repository provides a comprehensive guide for deploying the Red Hat OpenShift Container Platform on Oracle Cloud Infrastructure (OCI). It outlines a high-level solution definition, including deployment architecture and the migration process for containerized workloads from an existing OpenShift environment—whether on-premises or in another cloud. The document captures the current state architecture, requirements, and a prospective state, along with potential project scope and anticipated timelines for implementation. -Reviewed: 11.04.2025 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/README.md index 9cd15c0ce..a0c8bb9ac 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/README.md @@ -2,7 +2,7 @@ Oracle Cloud Migrations enables customers to migrate virtual machines to Oracle Cloud Infrastructure (OCI) Compute instances. It helps customers eliminate manual migration tasks and ultimately reduces errors in asset discovery and migration planning and execution. -Reviewed: 06.11.2024 +Reviewed: 012.11.2025 # Table of Contents @@ -21,7 +21,7 @@ Reviewed: 06.11.2024 - [OCM Deployment Guide - Migrate VMs from an on-premises VMware environment to Oracle Cloud Compute VMs using Oracle Cloud Migrations service](https://docs.oracle.com/en/learn/ocm-migrate-on-prem-vm/) - This tutorial provides step-by-step guidelines for configuring the Oracle Cloud Migrations service, to enable customers to migrate their virtual machines from an on-premises VMware environment to Oracle Cloud Compute VMs. -- [Preparing your Windows VMs for successful migrations](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/vmware-solutions/oracle-cloud-migrations/windows-migrations) +- [Preparing your Windows VMs for successful migrations](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations) ## Videos @@ -31,7 +31,7 @@ Reviewed: 06.11.2024 # Reusable Assets -- [OCM Solution Definition Document](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/vmware-solutions/oracle-cloud-migrations/ocm-solution-definition-document) +- [OCM Solution Definition Document](https://github.com/oracle-devrel/technology-engineering/tree/main/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document) diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/README.md index 0fe6d9f25..2f7600a2b 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/README.md @@ -1,7 +1,7 @@ # Oracle Cloud Migrations - Workload Migration Solution Definition This repository contains an in-depth guide for the migration of VMware workloads to OCI Compute VMs. It offers a high-level solution definition of the deployment architecture and migration process of workloads from a current VMware environment to OCI Compute. The document is aimed at capturing the current state architecture with requirements and provides a prospective state, potential project scope, and anticipated timelines for the migration. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # When to use this asset? This document serves as an integral asset for individuals and organizations seeking to deploy re-platform their VMware workloads and migrate OCI Compute VMs. diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/files/Oracle-Cloud-Migration-Solution-Definition-Template.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/files/Oracle-Cloud-Migration-Solution-Definition-Template.md index 81a8fb0fc..c419a5711 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/files/Oracle-Cloud-Migration-Solution-Definition-Template.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/ocm-solution-definition-document/files/Oracle-Cloud-Migration-Solution-Definition-Template.md @@ -233,20 +233,19 @@ Below is the current high-level architecture of the customer's on-premises VMwar * A Compartment in the tenancy. This can be a new or existing compartment. -* Appropriate policy and permissions in place to manage Oracle Cloud Migrations and required components in the selected compartment. +* Run the prerequisites provided on the Cloud Migration Console page, this will: -* Please find details about required IAM Polices at: IAM and Oracle Cloud Migrations Policies. and On-prem vCenter roles and permissions. + * Creates tenancy and compartment specific policies + * Creates object storage bucket for replication + * Creates Namespaces for use by OCM + * Creates Vault for holding vCenter credentials * Supported vSphere environment (6.5 and Above). Supported vSphere versions & Operating systems. Supported vSphere versions and Operating systems. -* Provide agent dependency, which is a 3rd party package required by remote agent appliance for it’s function. Oracle Cloud Migrations replication function running on the remote agent appliance depends on the appropriate VMware Virtual Disk Development Kit (VDDK) agent to perform the snapshot operations on the VMware VM disk. This can be downloaded from theVMware portal. +* Provide agent dependency, which is a 3rd party package required by remote agent appliance for it’s function. Oracle Cloud Migrations replication function running on the remote agent appliance depends on the appropriate VMware Virtual Disk Development Kit (VDDK) agent to perform the snapshot operations on the VMware VM disk. This can be downloaded from theVMware portal. At this moment OCM only supports the VDDK 7.0.2 version For more information and download links for vSphere VDDK, see vSphere VDDK. -* Create a Private Object Storage bucket in the OCI tenancy, to store the source asset snapshots. -* Create a vault to store the credentials used by the Oracle Cloud Migrations Service. - -* Object Storage Configuration: OCI Object Storage will be used to store the replicated VM data by Oracle Cloud Migrations service from on-premises environment. Oracle Cloud migration service Being a SAAS offering is deployed at tenancy level within the OCI region. diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/README.md index 8d98662a8..3643c9f8f 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/README.md @@ -2,7 +2,7 @@ This is a guide on how to migrate source environments based on the Microsoft Windows operating system. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/files/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/files/README.md index b0614dff2..8ba8b6641 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/files/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-migrations/windows-migrations/files/README.md @@ -10,7 +10,7 @@ Microsoft Windows instances run on OCI Compute shapes using paravirtualized driv [You can download the Oracle VirtIO drivers for Windows for Oracle's e-delivery site](https://docs.oracle.com/en/operating-systems/oracle-linux/kvm-virtio/) -**IMPORTANT**: Use the new VirtIO 2.0.1 or 2.1.0 Drivers, as the previous version (2.0) will result in an inaccessible boot device error. +**IMPORTANT**: Use the new VirtIO 2.3 Drivers ### Replication issues with source instances running Microsoft Windows @@ -43,4 +43,4 @@ Copyright (c) 2025 Oracle and/or its affiliates. Licensed under the Universal Permissive License (UPL), Version 1.0. -See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. \ No newline at end of file +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/README.md index d9e9037a4..3118b9a66 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/README.md @@ -2,7 +2,7 @@ Oracle Cloud VMware Solution is based on VMware Cloud Foundation (VCF) and provides a fully supported, customizable cloud environment for VMware deployments and migrations. The solution delivers a full-stack software-defined data center (SDDC), including VMware’s vCenter, ESXi, NSX, and vSAN. Specific use cases targeted by Oracle Cloud VMware Solution include data center and application migration, hybrid extension, on-demand capacity, and disaster recovery. -Reviewed: 23.09.2025 +Reviewed: 12.11.2025 # Table of Contents diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/README.md index a25438d7b..e0053a112 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/README.md @@ -2,7 +2,7 @@ This repository contains a detailed guide for the disaster recovery of VMware workloads to Oracle Cloud VMware Solution. It offers a high-level solution definition of the deployment architecture and tools like Site Recovery Manager or HCX. The document is aimed at capturing the current state architecture and provides a prospective state, potential project scope, RPO/RTO requirements and target OCVS architecture. -Reviewed: 11.06.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/files/OCVS-Disaster-Recovery-Solution-Definition-Template.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/files/OCVS-Disaster-Recovery-Solution-Definition-Template.md index 8ea4e428f..7228fd680 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/files/OCVS-Disaster-Recovery-Solution-Definition-Template.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/disaster-recovery-to-ocvs-solution-definition/files/OCVS-Disaster-Recovery-Solution-Definition-Template.md @@ -348,6 +348,9 @@ According to the feature requirements, the HCX license type should be carefully [HCX License types](https://docs.oracle.com/en-us/iaas/Content/VMware/Concepts/ocvsoverview.htm#aboutsoftware__hcx-license-types) +__Please note:__ According to [Broadcom’s official release notes for VMware HCX 4.11](https://techdocs.broadcom.com/us/en/vmware-cis/hcx/vmware-hcx/4-11/hcx-4-11-release-notes/vmware-hcx-411-release-notes.html), +the **HCX Disaster Recovery (HCX DR)** feature has been **deprecated** and is planned for removal in a future release. + __Please note:__ If you are using OCVS with Standard Shapes, then the HCX enterprise is included in the subscription and no additional cost is required. __Please Note:__ Please check the VMware interoperability matrix for version compatibility of VMware HCX with the source vSphere environment. diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/discovery-questionnaire/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/discovery-questionnaire/README.md index 8ec66f6e0..f8dd3a22d 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/discovery-questionnaire/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/discovery-questionnaire/README.md @@ -2,7 +2,7 @@ This document can be used as a reference questionnaire to collect the required details for a project involving Oracle Cloud VMware Solution. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/README.md index f4bbfff29..daba90759 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/README.md @@ -8,7 +8,7 @@ Terraform can be used both to provision and manage an OCVS environment. In the f Examples created by: Richard Garsthagen, feedback is welcome! Please see the 'Issue' feature in GitHub. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # License diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/addhost-multiad/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/addhost-multiad/README.md index dfecdb695..42cbe33be 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/addhost-multiad/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/addhost-multiad/README.md @@ -2,7 +2,7 @@ Automate the provisioning and management of an OCVS environment. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-denseio-single-cluster-sddc/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-denseio-single-cluster-sddc/README.md index 84fe44f2e..cb691a765 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-denseio-single-cluster-sddc/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-denseio-single-cluster-sddc/README.md @@ -2,7 +2,7 @@ Automate the provisioning and management of an OCVS environment. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-standard-single-cluster-sddc/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-standard-single-cluster-sddc/README.md index af770c02c..4017b0bb4 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-standard-single-cluster-sddc/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/multi-host-standard-single-cluster-sddc/README.md @@ -2,7 +2,7 @@ Automate the provisioning and management of an OCVS environment. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/singlehost-sddc/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/singlehost-sddc/README.md index dc62d4d94..c1d15c0c2 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/singlehost-sddc/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/singlehost-sddc/README.md @@ -2,7 +2,7 @@ Automate the provisioning and management of an OCVS environment. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/vlan/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/vlan/README.md index 2f26b7696..4001456b3 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/vlan/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/ocvs-terrafom-automation/vlan/README.md @@ -2,7 +2,7 @@ Automate the provisioning and management of an OCVS environment. -Reviewed: 06.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/vmware-migration-solution-definition/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/vmware-migration-solution-definition/README.md index b0774502b..ef2ce06ce 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/vmware-migration-solution-definition/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/vmware-migration-solution-definition/README.md @@ -2,7 +2,7 @@ This repository contains an in-depth guide for the migration of VMware workloads to Oracle Cloud VMware Solution. It offers a high-level solution definition of the deployment architecture and migration process of workloads from a current VMware environment to Oracle Cloud VMware Solution. The document is aimed at capturing the current state architecture with requirements and provides a prospective state, potential project scope, and anticipated timelines for the migration. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/LICENSE b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/LICENSE new file mode 100644 index 000000000..46c0c79d9 --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/README.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/README.md new file mode 100644 index 000000000..ae2fc358b --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/README.md @@ -0,0 +1,25 @@ +# Workload Migration to OCI Comprehensive Guide + +This guide provides a detailed technical overview of migration methodologies and tooling — including VMware HCX, RackWare, and Oracle Cloud Migrations (OCM) — used to transition workloads from both VMware and non-VMware environments to Oracle Cloud Infrastructure (OCI). It outlines key requirements, tool capabilities, architectural approaches, and decision frameworks to support large-scale enterprise migrations with minimal disruption. + +Reviewed: 12.11.2025 + +# When to use this asset? + +Use this document when planning or executing migrations from on-premises or other cloud environments to OCI Native services or Oracle Cloud VMware Solution (OCVS). It covers virtualized, bare-metal, and mixed-environment scenarios. + +# Instructions for Utilising This Asset + +Use this guide as a reference and planning framework for OCI and OCVS migration projects. It includes decision diagrams, tool comparison, guidance and best practices. + +# Conclusion + +Migrating workloads to OCI requires a comprehensive assessment of the existing environment, target architecture design, and methodical execution. By following the approaches and best practices outlined in this guide, organizations can achieve a secure, efficient, and low-risk migration to Oracle Cloud Infrastructure. + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/153850.png b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/153850.png new file mode 100644 index 000000000..01e8f8e90 Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/153850.png differ diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/155917.png b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/155917.png new file mode 100644 index 000000000..66b9b12ef Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/155917.png differ diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/Decision tree.drawio.png b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/Decision tree.drawio.png new file mode 100644 index 000000000..02aa56a0c Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/Decision tree.drawio.png differ diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/Decisiontree.png b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/Decisiontree.png new file mode 100644 index 000000000..d1c92539d Binary files /dev/null and b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/images/Decisiontree.png differ diff --git a/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/workload-migration-to-oci-comprehensive-guide.md b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/workload-migration-to-oci-comprehensive-guide.md new file mode 100644 index 000000000..cda227ba1 --- /dev/null +++ b/cloud-infrastructure/virtualization-solutions/oracle-cloud-vmware-solution/workload-migration-to-oci-comprehensive-guide/files/workload-migration-to-oci-comprehensive-guide.md @@ -0,0 +1,394 @@ +# **Workload Migration to OCI – Comprehensive Guide** + + +## **1. Introduction.** + +Oracle Cloud Infrastructure (OCI) is a global cloud services platform offering a comprehensive portfolio of IaaS, PaaS, SaaS, and DaaS capabilities across distributed datacenters. It enables enterprises to run virtual machines, containers, databases, AI/ML workloads, storage, networking, and VMware environments at scale in the cloud. + +Migrating VMware and non-VMware workloads to OCI is a strategic initiative that requires a thorough assessment of the source environment, detailed planning, target architecture alignment, and technical validation. + +This document provides a technical overview of migration methodologies and tooling, including VMware HCX, RackWare, and Oracle Cloud Migrations (OCM), which are used to transition workloads from VMware and non-VMware platforms into OCI. It outlines key requirements, tool capabilities, architectural approaches, and decision criteria to support enterprise-scale migrations with minimal disruption. + +## **2. Source and Target Platforms.** + +This guide addresses the most common on-premises environments — VMware vSphere, Microsoft Hyper-V, KVM, and physical x86 servers and maps them to the appropriate Oracle Cloud Infrastructure (OCI) landing zones. + +Enterprises can choose between two primary target platforms in OCI, based on workload characteristics, technical requirements, operational preferences, and budget considerations: +- Oracle Cloud VMware Solution (OCVS): Purpose-built for lift-and-shift migrations of existing VMware environments. OCVS preserves the full VMware software-defined datacenter (vSphere, vSAN, NSX, vCenter), enabling organizations to maintain existing tools, processes, and operational models with minimal disruption. +- OCI Native Compute Instances: Designed for replatformed or cloud-native workloads, OCI Compute offers secure, elastic, and high-performance virtual machines provisioned directly within OCI. This option is ideal for modernizing applications, integrating with OCI-native services, or optimizing costs and scalability. + +Both platforms integrate seamlessly with the broader OCI ecosystem of managed services—including databases, networking, observability, identity, and security. This ensures that organizations cannot only migrate but also modernize, optimize, and scale their IT environments while preserving business continuity and operational resilience. + +**Target platfrom comparison.** + +| Category | OCI Native Instances | Oracle Cloud VMware Solution | +| --------------- | -------------------------------------------------------------- | ----------------------------------------------------- | +| Hypervisor | OCI KVM | VMware ESXi | +| Managment Tools | OCI Console | vCenter, HCX, NSX, vSAN | +| Best Use Case | Cloud-native apps, replatformed VMs, containers | Lift-and-shift of VMware estates | +| Compute Shapes | Flexibly defined OCPU/RAM; Standard, DenseIO, GPU, HPC, Ampere | Dedicated Bare Metal DenseIO, Standard and GPU shapes | +| Networking | OCI Virtual Cloud Network (VCN) | VMware NSX-T | +| Primary Storage | OCI Block Volumes | vSAN or OCI Block Storage | +| Supported OS | Latest Linux/Windows distributions | All OS supprted by vSphere | +| Migration Tools | Oracle Cloud Migrations (OCM), Rackware | VMware HCX (Advanced/Enterprise), Rackware | + +## **3. Migration scenarios.** + +This document outlines the principal migration paths for transitioning workloads from on-premises environments to Oracle Cloud Infrastructure (OCI), covering both virtualized and bare-metal sources. The following scenarios are considered: + +- **VMware vSphere → Oracle Cloud VMware Solution (OCVS):** A seamless lift-and-shift approach that moves VMware workloads to OCVS while preserving the full vSphere stack (ESXi, vCenter, vSAN, and NSX). This method minimizes operational disruption by avoiding re-platforming and enables advanced capabilities such as Layer 2 network extension, IP address retention, and live workload mobility using tools like VMware HCX. + +- **VMware vSphere → OCI Native Compute Instances:** In this scenario, VMware VMs are replatformed into OCI Native Compute. The process involves converting VM formats, adapting networking and storage, and integrating with OCI services. Migration tooling includes Oracle Cloud Migrations (OCM), RackWare, or custom image imports to perform discovery, replication, and deployment. + +- **Microsoft Hyper-V / KVM → OCI Native Compute Instances:** Migration from Hyper-V or KVM requires VM format conversion and deployment into OCI’s compute environment. This path suits organizations aiming to modernize workloads or consolidate platforms. Tools such as RackWare or HCX Enterprise with OSAM support automated discovery, dependency mapping, and migration wave planning. + +- **Microsoft Hyper-V / KVM → Oracle Cloud VMware Solution (OCVS):** Although less common, this option consolidates workloads under a single VMware SDDC on OCI for standardization. It requires cross-hypervisor conversion tools to preserve VM configuration attributes and migrate workloads reliably. + +- **Physical x86 Servers → OCI Native or OCVS:** Bare-metal workloads can be migrated directly to OCI, either as OCI Native Compute Instances or as VMs in OCVS. This path is often chosen for legacy applications or workloads running on physical infrastructure that require modernization or consolidation. Migration tools such as RackWare provide OS-level replication and transformation for smooth cutover. + +## **4. Migration Tools Overview.** + +This section provides a comparative overview of the core tools available for migrating workloads to Oracle Cloud Infrastructure (OCI). The choice of tooling depends on the source platform, target landing zone (OCI Native or Oracle Cloud VMware Solution), workload characteristics, and operational requirements. + +### **4.1 VMware HCX: Seamless VMware-to-VMware Mobility.** + +**VMware HCX** is the preferred enterprise-grade mobility solution for vSphere-to-vSphere migrations. Integrated natively with OCVS, HCX enables live and bulk migrations, network extension, and non-disruptive workload transitions. + +**Licensing** : HCX is bundled with OCVS deployments. Depending on the chosen Bare Metal shape, customers receive HCX Advanced or HCX Enterprise licenses + +**Key Capabilities** + +- Bulk Migration: Parallel batch migration of VMs with scheduled cutovers. (Advanced Licensing) +- vMotion: Live migration of a single powered-on workload with zero downtime (Advanced Licensing) +- Cold Migration: Migration of a powered off VM (Advanced Licnesing) +- Replicatied Assisted vMotion: Live migration of multiple powered-on workloads with zero downtime. (Enterprise Licensing) +- OS-Assisted Migration (OSAM): Enables guest-based migration of Hyper-V or KVM VMs to OCVS. (Enterprise licensing) +- L2 Network Extension: Seamless IP preservation and extended subnets across OCI and on-premises. +- Mobility Optimized Networking: Allowing VMs using a layer 2 extension to route more efficiently. + +**Bulk Migraiton** + +Bulk Migration allows the parallel migration of multiple virtual machines in batches from a source environment to a target environment, such as from on-premises VMware vSphere to Oracle Cloud VMware Solution (OCVS). + +Administrators can group VMs into migration waves and schedule cutover times to minimize business disruption. + +The migration process involves an initial full syncronisation of the majority of a VM’s data in advance, with only the delta changes being synchronized during the cutover window. During the pre copy phase, HCX will maintain an RPO of 2 hours by default. + +This approach is ideal for planned migrations where downtime can be scheduled in advance and large numbers of VMs must be moved efficiently. To the app user it will be seen as a guest OS reboot, as the source VM is powered down and the moved copy is powered on at the destination. + +This is the preferred option for migrating VMs, as it allows you to do it at scale. A copy of the VM is left behind on prem, its name is amended to inlcude a POSIX timestamp (this helps with auditing) and it is powered off with its vNIC is disconnected from the network (to ensure if it is ever powered on again by accident there is no chance of a conflict on the network) + +**Key Considerations** + +- VMs with RDMs in Physical Compatbility mode are not supported + +- VMs with ISOs attached can't be migrated (HCX can force unmount if needed) + +- Snapshots are not migrated + +- VMs Multi -Writer/Shared VMDKs can not be migrated + +- VMs must be running at least Hardware Level 7 to be migrated + +**vMotion Migration** + +Enables the live migration of powered-on workloads from the source vSphere environment to OCVS without downtime. This is achieved by transferring memory, CPU state, and active network connections over the HCX Interconnect. This method is typically used for smaller numbers of VMs that require zero-downtime relocation. + +**Key Considerations** + +- Requires 150Mbps or higher throughput + +- VMs with any kind of RDM are not supported for migration + +- VMs must be at a hardware level of 9 or above to be migrated + +- VMs Multi -Writer/Shared VMDKs can not be migrated + +- vMotion migration only supports serial migrations (until the first vMotion migration has finished another one can not start) + +- HCX Deactivates Change Block Tracking (CBT) before migration + +- VM Encryption is not supported + + +**Cold Migration** + +Cold migration uses the same network path as HCX vMotion to transfer a powered-off virtual machine. During a cold migration, the Virtual Machine IP address and the MAC address are preserved. Cold migrations must satisfy the vMotion requirements. + +This feature is seldom used, as it has the same limitations as vMotion, in our experience disonnecting the vNIC of the VM and powering it on and then using Bulk Migration has been the way to work around the cold migration limitations and constraints. + +**Key Considerations** + +Same as vMotion above. + + +**Replicated Assisted vMotion (RAV)** + +Combines replication and vMotion to support the migration of larger workloads with zero downtime. The majority of the VM’s disk data is replicated in advance, followed by a short vMotion event to transfer the remaining changes (CPU/memory). RAV is particularly useful for migrating large, busy VMs that would otherwise require long vMotion windows or exceed vMotion thresholds. + +Just like Bulk Migration there is the initial sync and after that an RPO of 2 hours is maintained + +With RAV, multiple VMs are replicated simultaneously. When the replication phase reaches the switchover window, a delta vMotion cycle is initiated to do a quick, live switchover. **Live switchover happens serially.** + + +**Key Considerations:** + +- Replication Assisted vMotion creates two folders at the destination site. One folder contains the virtual machine infrastructure definition, and the other contains the virtual machine disk information. This is normal behavior for RAV migrations and has no impact on the functionality of the virtual machine at the destination site. The only way to fix this is to do a Storage vMotion at the destination after the migration has completed + +- Requires 150Mbps or higher throughput + +- VMs with Physical Compatbility mode RDMs are not supported for migration + +- VMs must be at a hardware level of 9 or above to be migrated + +- HCX Deactivates Change Block Tracking (CBT) before migration + + +**OS-Assisted Migration (OSAM)** + +OS-Assisted Migration enables the migration of workloads from non-vSphere hypervisors, such as Microsoft Hyper-V or KVM, into OCVS. Unlike vMotion or Bulk Migration, OSAM performs a guest-level migration by installing an HCX migration agent within the operating system. The agent copies the VM’s disk and configuration data to the target vSphere environment, where the VM is then reconstructed. This method is particularly useful for consolidating heterogeneous environments into VMware-based infrastructure. + +OSAM requires an HCX Enterprise license and is typically used for one-time migrations of workloads that cannot be moved using standard vSphere-based replication methods. + +**Key Considerations** + +- Replication begins a full synchronization transfer to the destination site. The guest virtual machine remains online during replication until the final delta synchronization. +- After the full sync, the switch over can be immediate or at a specic shedule just like Bulk Migration +- Final delats ync starts when the switch phase starts, untill then it maintains a continous sync of changes +- HCX performs a hardware mapping of the replicated volumes to ensure proper operation, including updates of the software stack on the replica. This fix-up process includes adding drivers and modifying the OS configuration files at the destination. The migrated virtual machine reboots during this process. +- VMware Tools is installed on the migrated virtual machine and migration completes. +- OSAM does not support P2V +- If the source VM does not power off, HCX will attempt to power off the replica VM. + - **If the replica powers off successfully:** It remains connected to its NICs. You can then manually power off the source VM and power on the replica. + + - **If the replica fails to power off:** Both the source and replica remain powered on, but the replica is disconnected from the network. In this case, manually enable the NICs for the replica in vCenter, power off the source VM (if it is still running), and then power on the migrated VM. + +

+ +![Deciding on the correct HCX migration type](./images/153850.png) + +

+ +**Layer 2 Network Extension** + +L2 Network Extension enables seamless extension of Layer 2 broadcast domains from an on-premises datacenter to OCVS. This allows virtual machines to retain their existing IP addresses, avoiding the need for re-IPing during migration. + +By bridging networks across the HCX Interconnect, workloads can move between sites without changes to IP configuration or gateway addresses, which is critical for application compatibility and minimizing disruption during phased migrations. + +L2E supports operational models where applications are split across sites, enabling hybrid cloud architectures with consistent networking. + +Traditonally extending a layer 2 network (VLAN) was considered to be very risky, but HCX has made it simple and reliable with its Network Extension High Avilability configuration. + +**HCX L2E High Availability (HA)** + +**Overview** + +In a standard HCX L2E setup, an **HCX Network Extension appliance** is deployed at both the source and destination sites to bridge Layer 2 networks across locations. +High Availability (HA) mode ensures that network extension services remain operational even if the active appliance fails. + +This feature requires HCX Enterprise licensing + +**How It Works** + +**Dual Appliances:**HCX deploys two Network Extension appliances in an HA pair per extended network — an **Active** and a **Standby**. + +**Heartbeat Monitoring:**The appliances communicate via a heartbeat channel to detect failures. + +**Automatic Failover:**If the active appliance becomes unavailable (e.g., VM crash, host failure, appliance upgrade), the standby appliance automatically takes over Layer 2 extension duties. + +**Seamless Transition:** The switchover happens without requiring IP changes, preserving workload connectivity during migration or hybrid operations. + +**Key Considerations** + +- **Resource Usage:** HA mode requires **twice the appliance resources** (CPU, RAM, storage). + +- **Bandwidth:** Ensure sufficient bandwidth for both appliances to handle failover scenarios. + +- **Licensing:** HCX Advanced includes L2E, and HA functionality is part of the Enterprise feature set. + +- **Appliance Placement:** Place HA pairs on separate hosts or clusters to avoid a single point of failure. + +- **Limitations:** Not supported for every network type (e.g., management networks), and some failover events may briefly drop a few packets before recovery. + +- Each Active and Standby pair is managed as an HA group, which includes upgrading and redeploying appliances. The process for redeploying and updating HA groups is the same as with standalone appliances, except that the operation is applied to both Active and Standby appliances at both the source or remote site. + +**HCX Mobility Optimized Networking (MON)** + +**Mobility Optimized Networking** (MON) is an HCX feature that ensures virtual machines migrated with **Layer 2 Network Extension (L2E)** can still route traffic efficiently after migration — without being forced to hairpin back to the source site for external network access. + +When you extend a network with HCX L2E, VMs retain their original IP addresses. This is great for avoiding re-IPing, but it also means that, by default, routing for that subnet remains tied to the **gateway** in the source site. +Without MON, a VM migrated to OCVS might send all north-south traffic back to the on-premises gateway, causing: + +- **Suboptimal routing / higher latency** + +- **Increased bandwidth usage** on the interconnect + +- **Potential bottlenecks** if WAN bandwidth is limited + +MON solves this by enabling a **local default gateway** at the destination site for the extended network. + +**How It Works** + +- MON is enabled **per-extended network** in HCX and then on a per VM basis. + +- When a VM is migrated to the destination site, MON updates the VM’s **default gateway** to point to the **local gateway** at the target site instead of the source site. + +- Local routing tables are adjusted so the VM’s traffic to external networks uses the closest available egress point. + +- If the VM moves back to the source site, HCX automatically reverts it to the original gateway configuration. + + +**Key Considerations:** + +- MON requires **HCX Enterprise** license. + +- Must be explicitly enabled per extended network. + +- Works only with HCX L2 Network Extensions. + +- Requires NSX-T Tier 1 gateway at the destination to host the local default gateway. + +- May require updates to firewall rules or routing policies to account for the change in egress point. + +- It has key scaling limits which may change with every release of HCX + +- Can make troubleshooting networking issues harder, so should only be enabled on networks and VMs that really need it + +

+ +![Mobility Optimized Networking](./images/155917.png) + +

+
+ +With its powerful suite of capabilities, HCX is a cornerstone of any successful VMware migration strategy. + +### **4.2 RackWare Heterogeneous Workload Portability.** + +RackWare is a cloud-agnostic workload mobility and resilience platform that simplifies migration, disaster recovery, and backup across physical, virtual, and cloud-native environments. Its core product, the RackWare Management Module (RMM), provides agentless, policy driven automation to move and protect workloads. + +**Key Benefits** + +- Broad Compatibility – Supports VMware, Hyper-V, KVM, physical servers, and containers (Kubernetes). Works with all major public clouds, including OCI, AWS, Azure, and GCP. +- Agentless & Lightweight – No permanent agents required; minimal impact on production systems. +- Multi-Use Platform – One solution for migration, DR, and backup—reducing tool sprawl and complexity. +- Efficient Replication – Delta-based sync ensures faster migrations and near-zero RPO/RTO for DR. +- Scalability – Handles migrations from a few to thousands of workloads in coordinated “waves.” +- Oracle Integration – Available on Oracle Cloud Marketplace; supports OCI and on-prem Oracle environments (C3, PCA). + +**Key Capabilities** +- Migration – workload migrations with automated sizing and IP/DNS remapping. +- Disaster Recovery – Policy-based DR with automated failover/fallback and non-disruptive testing. +- Backup – Application-consistent snapshots, long-term retention, and granular restore. +- Heterogeneous Support – Windows, Linux, and container workloads. + +RMM is available in OCI Marketplace with built-in support for Oracle Cloud VMware Solution (OCVS) and OCI Native migrations. + +### **4.3 Oracle Cloud Migrations (OCM).** + +Oracle Cloud Migrations is a managed service that automates the migration of workloads—specifically VMware virtual machines and AWS EC2 instances—to Oracle Cloud Infrastructure (OCI). It streamlines every step of the process—from discovery and planning to replication and deployment—using OCI Console, CLI, or API interface + +**Key Capabilities** + +Automated Asset Discovery & Inventory: +- For VMware, a remote agent appliance is deployed to discover VMs and their metadata. +- For AWS, the service performs agentless discovery of EC2 instances and EBS volumes. +- Discovered assets are stored in an OCI-hosted Inventory, along with performance data available in the OCI Monitoring service. + +Migration Planning & Execution +- Assets are grouped into Migration Projects, each containing one or more Migration Plans. +- The service recommends target OCI configurations—such as compute shape and placement—based on source attributes and performance metrics. Users can customize these plans and evaluate cost estimates. +- Supports incremental replication of VM data to OCI. + +Secure, Compliant Access & Governance +- Integrates with OCI Identity and Access Management (IAM) and Vault to ensure secure authentication, authorization, and management of credentials. +- Administrators must configure compartments, dynamic groups, and IAM policies to enable migration components and access control. + +Supports VMware and AWS Source Environments +- VMware vSphere versions 6.5 through 8.0 (vSphere 8.0 support requires VDDK 7.0U2). +- AWS EC2 (x86/EBS-backed). +- Supports a wide range of Linux and Windows guest OS versions. + +Oracle Cloud Migrations provides a streamlined, end-to-end migration experience for VMware and AWS workloads moving into OCI. From discovery through validation and cutover, the service offers automation, governance integration, cost insights, and incremental replication. It’s ideal for organizations seeking a secure, self-service migration path that integrates directly with OCI infrastructure. + + +## **5. Assesment and discovery mapping.** + +A structured assessment, planning, and testing phase is essential for validating the migration design, minimizing risk, and ensuring a smooth transition to Oracle Cloud Infrastructure (OCI). This phase should include: + +- Workload Discovery & Classification – Use inventory tools (e.g., RVTools, OCM discovery, vSphere inventory, or RackWare) to map out virtual machines, applications, and dependencies. Classify workloads by criticality (mission-critical, business-critical, dev/test) and migration complexity. +- Dependency Mapping – Identify application interdependencies, DNS records, firewall rules, and IP address requirements. Pay special attention to workloads requiring strict IP preservation or low-latency communication. +- Right-Sizing & Capacity Planning – Assess CPU, memory, and storage utilization to define the required OCI shape, OCVS node count, and storage architecture (OCI Block Volumes, or vSAN). Consider current utilization and growth factors. +- Network & Security Planning – Validate OCI networking design, including subnets, security lists, and FastConnect/IPSec VPN. Plan for Layer 2 extension where IP preservation is required. +- Testing & Validation – Conduct pilot migrations for representative workloads before executing large-scale cutovers. Validate performance, failover, and recovery procedures. + + +## **6. Tooling - Decision Tree.** + +Selecting the right migration tool depends on platform compatibility, workload complexity, and migration objectives. The decision tree below helps identify the most suitable migration tool based on the source and target environments. + +

+ +![Decision tree ](./images/Decisiontree.png) + +

+ +**VMware HCX** +- Native VMware solution for VMware-to-VMware migrations. +- Supports vMotion, Bulk Migration, Replication-Assisted vMotion (RAV), and Layer 2 extension. +- Best choice for seamless workload mobility between on-premises VMware and Oracle Cloud VMware Solution (OCVS). +- Reduces downtime through live and replication-assisted migrations. + +**RackWare** +- Cloud-agnostic and hypervisor-independent. +- Supports physical, virtual, and cloud-native platforms. +- Operates at the VM operations level, offering replication and failback options. +- Recommended for heterogeneous environments where non-VMware workloads or physical servers must be migrated alongside VMware workloads. + +**Oracle Cloud Migrations (OCM)** +- Best suited for VMware to OCI Native Compute migrations. +- Ideal for low-complexity, non-mission-critical workloads. +- Automatically discover virtual machines and instances in the source environment. + +## **7. Special considerations for enterprise and mission critical databases.** + +While VM-level migration tools (e.g., HCX, RackWare, OCM) can handle the majority of workloads, they may not be sufficient for enterprise-scale, mission-critical applications where downtime is unacceptable. For large and transaction-heavy databases, a pure VM-level migration introduces significant challenges due to: + +- Database size (terabytes or petabytes). +- Continuous write activity (transaction-heavy workloads). +- The requirement for zero or near-zero downtime. + +To migrate such mission critical workloads and DB's you might consider dedicated solutions and architectures: + +- Oracle Databases – Use Oracle Data Guard or GoldenGate for robust replication, synchronization, and failover capabilities. +- Microsoft SQL Server – Implement Always On Availability Groups to ensure transactional consistency and minimize downtime. +- Microsoft Active Directory – Use native AD replication between domain controllers to maintain consistency. +- Microsoft Exchange Server – Leverage Exchange Hybrid configurations or Database Availability Groups (DAGs) for continuity during migration. + +By combining VM-level mobility with application-aware replication, enterprises can achieve data consistency, reduced downtime, and a resilient cutover strategy for their most business-critical workloads. + + +## **8. Best practices & guidance.** + +To ensure a smooth and resilient transition to Oracle Cloud Infrastructure (OCI), the following best practices should be incorporated into any migration strategy: + +- Adopt a phased migration approach – Start with lower-priority or non-production workloads to validate tooling, processes, and network designs. Use early phases as learning cycles before addressing mission-critical systems. + +- Leverage Layer 2 network extension – For workloads requiring strict IP preservation, deploy VMware HCX L2 extensions with sufficient bandwidth and redundancy. This reduces reconfiguration effort and ensures application continuity. + +- Validate bandwidth and connectivity – Confirm that FastConnect or VPN capacity can handle large-scale data transfers without impacting production traffic. Monitor latency, throughput, and error rates throughout the migration. + +- Establish rollback procedures – Define clear fallback and recovery scenarios in case of migration failure. Document rollback steps and ensure that teams are trained to execute them under time pressure. + +- Implement robust testing – Conduct end-to-end validation of application behavior, security controls, and performance benchmarks in the target environment before go-live. Include functional, load, and failover testing. + +- Engage stakeholders early – Involve application owners, database administrators, network and security teams during the planning stage. Align technical decisions with business objectives, compliance requirements, and service-level agreements (SLAs). + +- Maintain comprehensive documentation – Capture migration plans, cutover runbooks, rollback steps, and lessons learned to ensure repeatability and knowledge transfer across teams. + +- Use automation where possible – Leverage orchestration and migration tooling (e.g., HCX policies, RackWare workflows, OCM migration plans) to reduce manual effort and minimize human error. + +- Plan for post-migration optimization – After workloads are stable in OCI, review performance, right-size compute shapes, and integrate with OCI’s managed services (e.g., monitoring, security, backup) to maximize efficiency and cost savings. + +- Prioritize security and compliance – Ensure IAM policies, network security lists, encryption settings, and audit configurations are validated before workloads are exposed to production use. + diff --git a/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/README.md b/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/README.md index ce680dcb1..80cf24965 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/README.md @@ -4,7 +4,7 @@ The Oracle Cloud Infrastructure Secure Desktops service allows an administrator Secure Desktops is ideal for organizations that need to provide employees with controlled access to a preconfigured desktop environment. An administrator can create pools of desktops in their tenancy, based on existing compute shapes or custom images. All configuration for the desktop and Oracle Cloud Infrastructure is completed by the administrator, making it possible for non-technical users to securely access and use a remote desktop for their day-to-day work. Secure Desktops controls all access to the remote desktops, protecting Oracle Cloud Infrastructure resources and customer data from malicious client activity. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # Useful Links diff --git a/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/secure-desktops-solution-definition/README.md b/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/secure-desktops-solution-definition/README.md index ba9194f44..b2d5d925e 100644 --- a/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/secure-desktops-solution-definition/README.md +++ b/cloud-infrastructure/virtualization-solutions/oracle-secure-desktops/secure-desktops-solution-definition/README.md @@ -2,7 +2,7 @@ This repository contains a detailed guide for hosting VDI instances on Oracle Secure Desktop. It offers a high-level solution definition of the deployment architecture. The document is aimed at capturing the current state architecture and provides a prospective state, potential project scope, deployment requirments and target Secure Desktop architecuture. -Reviewed: 11.11.2024 +Reviewed: 12.11.2025 # When to use this asset? diff --git a/data-platform/data-development/sqldev-copilot-vscode-sqcl-mcp/README.md b/data-platform/data-development/oracle-mcp-server-sqldev-vscode/README.md similarity index 100% rename from data-platform/data-development/sqldev-copilot-vscode-sqcl-mcp/README.md rename to data-platform/data-development/oracle-mcp-server-sqldev-vscode/README.md diff --git a/data-platform/data-science/oracle-data-science/r-conda-oci-data-science/.ipynb_checkpoints/LICENSE-checkpoint b/data-platform/data-science/oracle-data-science/r-conda-oci-data-science/.ipynb_checkpoints/LICENSE-checkpoint new file mode 100644 index 000000000..8dc7c0703 --- /dev/null +++ b/data-platform/data-science/oracle-data-science/r-conda-oci-data-science/.ipynb_checkpoints/LICENSE-checkpoint @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/data-platform/data-science/oracle-data-science/your-first-data-science-project/LICENSE b/data-platform/data-science/oracle-data-science/your-first-data-science-project/LICENSE new file mode 100644 index 000000000..8dc7c0703 --- /dev/null +++ b/data-platform/data-science/oracle-data-science/your-first-data-science-project/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. \ No newline at end of file diff --git a/data-platform/data-science/oracle-data-science/your-first-data-science-project/README.md b/data-platform/data-science/oracle-data-science/your-first-data-science-project/README.md new file mode 100644 index 000000000..945f4f9df --- /dev/null +++ b/data-platform/data-science/oracle-data-science/your-first-data-science-project/README.md @@ -0,0 +1,47 @@ + +# Overview + +Your First Data Science Project demonstrates how to build a complete end-to-end data science workflow using Oracle Cloud Infrastructure (OCI) Data Science Platform. +The project walks through the main stages of a typical machine learning lifecycle — from data preparation to model deployment and inference — using practical examples. + +Reviewed: 2025.11.10 + +# What You’ll Learn + +This project covers the following steps: + +Data ingestion + +Data preprocessing and visualization + +Model training and validation + +Model explainability + +Model deployment + +Endpoint invocation for predictions + +# Prerequisites + +Access to OCI Data Science Platform + +Basic familiarity with Python and machine learning concepts + +A valid compartment, resource principal and policies configured for Data Science services. More details can be found in the Guide for Your First Data Science Project prerequisites.pdf + +# How to Use + +Open the provided notebook in your OCI Data Science Notebook Session. + +Select the following conda environment: automlx234_p310_cpu_x86_64_v1 + +Run the notebook cells in sequence to reproduce the complete workflow. + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/data-platform/data-science/oracle-data-science/your-first-data-science-project/files/Guide for Your First Data Science Project prerequisites.pdf b/data-platform/data-science/oracle-data-science/your-first-data-science-project/files/Guide for Your First Data Science Project prerequisites.pdf new file mode 100644 index 000000000..26650eeab Binary files /dev/null and b/data-platform/data-science/oracle-data-science/your-first-data-science-project/files/Guide for Your First Data Science Project prerequisites.pdf differ diff --git a/data-platform/data-science/oracle-data-science/your-first-data-science-project/files/adult_income shared.ipynb b/data-platform/data-science/oracle-data-science/your-first-data-science-project/files/adult_income shared.ipynb new file mode 100644 index 000000000..e2acb29c6 --- /dev/null +++ b/data-platform/data-science/oracle-data-science/your-first-data-science-project/files/adult_income shared.ipynb @@ -0,0 +1,1090 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "a325c96e-756f-4628-9a3d-9c6976e340e6", + "metadata": {}, + "source": [ + "Conda environment: automlx251_p311_cpu_x86_64_v2\\\n", + "Created Data: 09/11/2025\\\n", + "By: Assaf Rabinowicz, EMEA Data Science Team" + ] + }, + { + "cell_type": "markdown", + "id": "a736e6a0-a81e-47c7-b4b0-14d6b19ac027", + "metadata": { + "tags": [] + }, + "source": [ + "# 1. Import Packages\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05af78e8-c25b-4d10-9d8d-1b1ab07ea30f", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# third-party open-source packages\n", + "import pandas as pd\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.datasets import fetch_openml\n", + "from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, accuracy_score\n", + "from xgboost import XGBClassifier\n", + "import os\n", + "import requests\n", + "\n", + "# Oracle packages\n", + "import automlx\n", + "from automlx import init\n", + "import oci\n", + "from oci.object_storage import UploadManager\n", + "import ads\n", + "from ads.common.model_metadata import UseCaseType\n", + "from ads.model import GenericModel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "de12f611-a70e-4a02-9e87-b97fc3fc1a94", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# hash symbol used for commenting\n", + "# Ctrl+ Enter for running the code\n", + "# Enter for a new line" + ] + }, + { + "cell_type": "markdown", + "id": "56cbba26-8ac5-472b-a644-2b18b2905306", + "metadata": { + "tags": [] + }, + "source": [ + "# 2. Data Import, Exploration and Pre-Processing" + ] + }, + { + "cell_type": "markdown", + "id": "88efbdc4-5e9b-47c2-9288-6a893351d6c9", + "metadata": { + "tags": [] + }, + "source": [ + "## 2.1 Data Import" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cba6a5ee-431f-423f-8a74-1ad122106941", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "data = fetch_openml(name=\"adult\", version=2, as_frame=True) # https://www.openml.org/search?type=data&sort=version&status=any&order=asc&exact_name=adult\n", + "df = data.frame" + ] + }, + { + "cell_type": "markdown", + "id": "09607f12-8397-4851-b1f5-c55cb9602a8f", + "metadata": { + "tags": [] + }, + "source": [ + "### 2.1.1 Bonus: Importing from the atteached block volume" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d922fe16-9240-4369-ac45-c20d74be1836", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#file_path=\"your_path\" # an example for a path: '/home/datascience/df_sample.csv'. Commonly you need to use /home/datascience before the visable path.\n", + "#df = pd.read_csv(file_path)" + ] + }, + { + "cell_type": "markdown", + "id": "5a7fd513-3096-4c9b-a543-f00600b869c9", + "metadata": {}, + "source": [ + "## 2.2 Data Structure" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "588a89f0-1b57-4b8a-bf9a-78501b5fe26a", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "72bcb542-cf3d-4d4b-82ec-052562f883c5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "23a72a8d-d408-46f3-aa86-3d5c4ac8e2eb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# We would like to create a formula that uses the features for predicting the target variable" + ] + }, + { + "cell_type": "markdown", + "id": "a77498e7-2ea6-4d16-ac2a-06fce64e3eb9", + "metadata": {}, + "source": [ + "## 2.2 Data Analysis and Processing" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf68aa72-abe3-45e7-b0f7-bfc072d8d0ee", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df.drop(['fnlwgt'], axis=1,inplace=True) # dropping 'sampling weights' column for simplification" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47fec1cd-5592-41d0-ad6e-ed0172b7e220", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "round(df.describe(percentiles=[]),1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70c6f89f-c6ff-457a-83d4-c280d3846453", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df.describe(include=['category']).round(1)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fa2d9ed8-3044-4f02-bee7-18fa5e26b981", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df['class'] = (df['class'] == '>50K').astype(int)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "13bd0e3c-6d12-4579-92c0-f7168257d9a6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fe510568-98ef-4497-b9c9-7d83065aa69d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pd.plotting.scatter_matrix(df)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e479730e-0206-46a2-87ce-e2205f96dffa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pd.plotting.scatter_matrix(df[['education-num','capital-gain']])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "acbaf252-8cea-440b-a619-c6de9758e70c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#!conda install seaborn -y" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5388fd5b-b856-4b86-a369-57954c5be3e6", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import seaborn as sns\n", + "\n", + "sns.histplot(df, x='education-num', hue=\"class\",multiple=\"dodge\", bins=30)\n", + "plt.title(\"Distribution of Education-Num by Salery Class\")\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0885ecb5-da3f-4cb1-9c0a-fc3bb4847e06", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "sns.boxplot(data=df, x=\"class\", y=\"education-num\")\n", + "plt.title(\"Distribution of Education-Num by Salery Class\")\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "232c7ebc-dd80-4eca-b103-1f017bf1daa0", + "metadata": { + "tags": [] + }, + "source": [ + "# 3. Model Training" + ] + }, + { + "cell_type": "markdown", + "id": "f411e856-5827-449d-a622-1d739ce4369b", + "metadata": {}, + "source": [ + "## 3.1 Train and Test Split" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ca7bd821-27ba-495c-91ed-4683b3c541d4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "X = df.drop('class', axis=1)\n", + "y = df['class']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4fe9fee8-0190-4639-a886-adcf6ec45bdd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=0.3) # \n", + "\n", + "print(\"Train shape:\", X_train.shape)\n", + "print(\"Test shape:\", X_test.shape)" + ] + }, + { + "cell_type": "markdown", + "id": "0c5adb4f-f2a5-4d4d-a66a-404396fd1242", + "metadata": {}, + "source": [ + "## 3.2 Using AutoML Pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5429e219-640d-40e1-bd9e-1726c4ebd9cd", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "init(engine='local')" + ] + }, + { + "cell_type": "markdown", + "id": "3b4f6e81-87e5-42bc-bf57-0333da5f4601", + "metadata": {}, + "source": [ + "Optinal Tasks:\\\n", + "classification, regression, anomaly_detection, forecasting, recommendation\n", + "\n", + "Optional algorithms for classification are: \\\n", + "AdaBoostClassifier, DecisionTreeClassifier, ExtraTreesClassifier, TorchMLPClassifier\n", + "KNeighborsClassifier, LGBMClassifier\n", + "LinearSVC, LogisticRegression, RandomForestClassifier\n", + "SVC, XGBClassifier, GaussianNB" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ab8b68fd-9c73-437f-81dd-108b68d047d5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pipeline1 = automlx.Pipeline(task='classification',model_list=['LogisticRegression', 'RandomForestClassifier','XGBClassifier'],max_tuning_trials =10)\n", + "# model_list and max_tuning_trials were added to reduce fitting time. Removing them allows training a potentially better model.\n", + "# The automl pipeline has a rich api: https://docs.oracle.com/en-us/iaas/tools/automlx/latest/latest/automl.html " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05d17ab5-125c-4806-8ff8-f02b347873ab", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "pipeline1.fit(X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "id": "d8231e29-8246-481b-9e51-8d4f7343a55f", + "metadata": {}, + "source": [ + "The pipeline includes several main steps:\n", + "1. Data pre-processing\n", + "2. Algorithm selection - based on existing data, predicting which algorithm is the best for your data \n", + "3. Sample size reduction try ('adaptive sampling')\n", + "4. Features reduction try ('feature selection')\n", + "5. Model hyperparameters selection ('model tuning')\n", + "6. Model fitting with the selected hypterparameters" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "2c11eac3-2c78-4d24-a846-7f9f9f2f1f72", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "y_train_pred = pipeline1.predict(X_train)\n", + "y_test_pred = pipeline1.predict(X_test)\n", + "print(y_test_pred[0:20])" + ] + }, + { + "cell_type": "markdown", + "id": "bde41405-8df0-48f8-b895-589e2bf5f083", + "metadata": {}, + "source": [ + "### 3.2.1 Understanding the Automl Pipeline Selection" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5cd90f60-07f0-4a96-bd75-b8e6a2370ace", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "#pipeline1.completed_trials_summary_" + ] + }, + { + "cell_type": "markdown", + "id": "54aa7349-0a0e-4255-aa61-0bae1659ba86", + "metadata": {}, + "source": [ + "## 3.3 Modeling with other open-sources" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f554c534-b949-48f5-bf58-1abbdca961a7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "X_train_encoded = pd.get_dummies(X_train)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "04930ab0-5865-43a0-bbff-6f339179d5db", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model = XGBClassifier(max_depth=5, n_estimators=200, learning_rate=0.01,eval_metric='logloss')\n", + "model.fit(X_train_encoded, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "432bcdcb-f54d-45da-8e76-565af87b5102", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "y_train_pred_xgboost = model.predict(X_train_encoded)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f8c871ad-ed53-48de-bad1-249185657a26", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "np.bincount(y_train_pred_xgboost)" + ] + }, + { + "cell_type": "markdown", + "id": "aca66782-a828-42fb-8ba4-429b8507a511", + "metadata": { + "tags": [] + }, + "source": [ + "# 4. Model Validation and Explainabilty" + ] + }, + { + "cell_type": "markdown", + "id": "a6796d7b-1839-43d2-9f82-d944c2511d70", + "metadata": { + "tags": [] + }, + "source": [ + "## 4.1 Model Validation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "24eb410e-6b33-4e9a-aea8-a865316d2e77", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "acc_test = accuracy_score(y_test, y_test_pred) * 100\n", + "print('Model Accuracy, test: ',acc_test.round(1))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0cdbbf74-570c-4d67-89c3-b80ce2d1d40d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "cm_test = confusion_matrix(y_test, y_test_pred)\n", + "cm_test_pct = cm_test / cm_test.sum(axis=1, keepdims=True) * 100\n", + "\n", + "ConfusionMatrixDisplay(cm_test_pct, display_labels=['<=50K', '>50K']).plot(cmap='Blues', values_format=\".1f\")\n", + "plt.title('Confusion Matrix - Test Set [%]')\n", + "\n", + "plt.savefig('confusion_matrix.png', dpi=300)\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "4ad8baa1-1433-4408-9f80-5da52da656d3", + "metadata": {}, + "source": [ + "## 4.2 Saving the confusion matrices in the object storage" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a300a62b-54c8-42a6-a6bc-ece51b63d631", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "signer = oci.auth.signers.get_resource_principals_signer()\n", + "object_storage = oci.object_storage.ObjectStorageClient({}, signer=signer)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "15df4394-f4c5-4cbe-9d66-21afcf3f575d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "namespace = object_storage.get_namespace().data\n", + "bucket_name = \"data-science-reports\"\n", + "file_name = \"confusion_matrix2\"\n", + "local_path = \"/home/datascience/confusion_matrix.png\" # make sure to add '/home/datascience/' to the path." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b2c34668-7716-4c93-86b8-5461cf5dda1d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "upload_manager = UploadManager(object_storage, allow_parallel_uploads=True)\n", + "upload_manager.upload_file(\n", + " namespace_name=namespace,\n", + " bucket_name=bucket_name,\n", + " object_name=file_name,\n", + " file_path=local_path\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "5d7f4c4b-08fc-4a13-8061-7590fd8217b1", + "metadata": {}, + "source": [ + "### 4.2.1 Bonus: Interacting with Object Storage and ADB" + ] + }, + { + "cell_type": "markdown", + "id": "4780a408-d303-4f12-b589-564ba8b1dc19", + "metadata": {}, + "source": [ + "#### 4.2.1 Reading a table from object storage" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31350eaf-bfbc-46c8-8368-54e07f175ad4", + "metadata": {}, + "outputs": [], + "source": [ + "# import io\n", + "\n", + "# signer = oci.auth.signers.get_resource_principals_signer()\n", + "# object_storage = oci.object_storage.ObjectStorageClient({}, signer=signer)\n", + "\n", + "# namespace = object_storage.get_namespace().data\n", + "# bucket_name='data-science-reports'\n", + "# file_name= 'testagg_day_0.csv'\n", + "\n", + "# obj = object_storage.get_object(namespace, bucket_name, file_name)\n", + "# df = pd.read_csv(io.BytesIO(obj.data.content))" + ] + }, + { + "cell_type": "markdown", + "id": "1b22fd08-0d1e-4745-b64e-8e14df9ef0bd", + "metadata": {}, + "source": [ + "#### 4.2.1 Reading a table from the database" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "33f6c85f-c42d-4057-9d8b-e9b9514f0a39", + "metadata": {}, + "outputs": [], + "source": [ + "# import ads\n", + "\n", + "# connection_parameters = {\n", + "# \"user_name\": \"\",\n", + "# \"password\": \"\",\n", + "# \"service_name\": \"\",\n", + "# \"wallet_location\": \"/full/path/to/my_wallet.zip\", # download the wallet file from the databse\n", + "# }\n", + "\n", + "# df = pd.DataFrame.ads.read_sql(\n", + "# \"SELECT * FROM SH.SALES\",\n", + "# connection_parameters=connection_parameters,\n", + "# )\n" + ] + }, + { + "cell_type": "markdown", + "id": "2e2f9884-14ec-46e5-9a65-969d538f16f8", + "metadata": {}, + "source": [ + "## 4.3 Explainability" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ad2091ea-9107-406e-83b5-a676891d28f4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "explainer = automlx.MLExplainer(pipeline1,\n", + " X_train,\n", + " y_train,\n", + " target_names=[\"<=50K\", \">50K\"],\n", + " task=\"classification\")\n", + "\n", + "y_train = (y_train == \">50K\").astype(int)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3b971fb2-e9c1-4091-a167-e3fe7db03574", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "result_explain_model_default = explainer.explain_model()" + ] + }, + { + "cell_type": "markdown", + "id": "43d9fb85-69a0-4e7f-bc5d-f47398650afa", + "metadata": {}, + "source": [ + "### 4.3.1 Gloabal Explainability" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ce86b25a-f49e-4ab2-b5a6-14f86136089d", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "result_explain_model_default.show_in_notebook() # based on permutation" + ] + }, + { + "cell_type": "markdown", + "id": "67371780-8fbb-4c84-bf0e-21ccfe9ef9e8", + "metadata": { + "tags": [] + }, + "source": [ + "### 4.3.2 Local Explainability" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "da1c513f-5225-4415-93b8-19220f4ae0e0", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "index = 0\n", + "X_train.iloc[[index]]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a0c91f93-f905-4b95-9087-d3173fa455c4", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "actual=y_train[index]\n", + "prediction=pipeline1.predict(X_train.iloc[[index]])[0]\n", + "print('actual: ',actual)\n", + "print('prediction: ',prediction)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f463c6b3-e9ad-42e9-aafe-5dd611562c32", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "explainer.configure_explain_prediction(tabulator_type=\"kernel_shap\",\n", + " sampling={'technique': 'random', 'n_samples': 2000})\n", + "result_explain_prediction_kernel_shap = explainer.explain_prediction(X_train.iloc[[index]])\n", + "result_explain_prediction_kernel_shap[0].show_in_notebook()" + ] + }, + { + "cell_type": "markdown", + "id": "04cf260e-3a3b-45f2-9047-f573c31d18b0", + "metadata": { + "tags": [] + }, + "source": [ + "## 4.4 Bonus: Notebook Explorer" + ] + }, + { + "cell_type": "markdown", + "id": "3e6ed89d-34a4-482b-8e70-ade333a81a22", + "metadata": { + "tags": [] + }, + "source": [ + "# 5 Deployment" + ] + }, + { + "cell_type": "markdown", + "id": "ace2e128-9673-4adb-873a-559771d741bb", + "metadata": {}, + "source": [ + "## 5.1 Prepare the Artifacts (Serializiation) Using ADS" + ] + }, + { + "cell_type": "markdown", + "id": "6b05ef1e-ea60-4ead-9827-e8dd5dab1aa0", + "metadata": {}, + "source": [ + "* Create the files required for deployment and pack them together.\n", + "* Besides the model, the following required files are generated automatically: `score.py`, `runtime.yaml`, `input_schema.json`, `output_schema.json`\n", + "* Optional info can be added, such as: `inference_conda_env`, `training_conda_env`\n", + "\n", + "* The following frameworks have an automated prepare function: TensorFlow, PyTorch, scikit-learn, XGBoost, LightGBM, SparkPipelineModel, AutoMlx, transformers\n", + "* In addition" + ] + }, + { + "cell_type": "markdown", + "id": "cb7c0e6b-877b-43af-8b3e-6418c341cc9f", + "metadata": {}, + "source": [ + "ADS takes you through the deployment process in a simple way" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ea8b8fa0-cdfa-4265-b659-017aef1f2b32", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "ads.set_auth(\"resource_principal\") # a signer for all ads operations, managed automatically" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5d05ef6e-a6d9-499f-bcf7-890c1fc39793", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "automl_model = GenericModel(estimator=pipeline1, artifact_dir=\"automl_model_artifact2\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "35cc92ab-6285-46c5-9be9-92bd19832847", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "automl_model.summary_status()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7c70f5ed-0ace-4987-9970-830310df3734", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "conda_env=\"automlx251_p311_cpu_x86_64_v2\"\n", + "automl_model.prepare(inference_conda_env=conda_env,\n", + " training_conda_env=conda_env,\n", + " use_case_type=UseCaseType.BINARY_CLASSIFICATION,\n", + " X_sample=X_test,\n", + " force_overwrite=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e93a7a8e-bace-4d07-a0bc-1c3a029fcfec", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "automl_model.summary_status()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9573715-9015-44d0-b779-cfc005ec46ed", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "automl_model.verify(X_test.iloc[:20], auto_serialize_data=True)" + ] + }, + { + "cell_type": "markdown", + "id": "85a58e48-3cfa-4eac-86b6-ebcc0ac9545a", + "metadata": {}, + "source": [ + "## 5.2 Register" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b4828b67-06ba-48e1-b425-6333b4dff9e8", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "model_id = automl_model.save(display_name=\"Demo Adults Income Model 1\")" + ] + }, + { + "cell_type": "markdown", + "id": "a651d3c2-d540-45c9-82d7-42112445736e", + "metadata": {}, + "source": [ + "## 5.3 Deploy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f1706b2d-73eb-4a46-992e-ea721ae5f81b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#automl_model.deploy(display_name=\"Demo Adults Income Model 1\")" + ] + }, + { + "cell_type": "markdown", + "id": "8d4757f6-7509-4516-b5c0-9eb635965a6b", + "metadata": { + "tags": [] + }, + "source": [ + "# 6. Inference " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "492f82fb-8090-497a-8a47-16324936dffb", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "auth = oci.auth.signers.get_resource_principals_signer()\n", + "\n", + "endpoint = ''\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d8c01a8f-5e9a-40e0-8d26-c71a4ca4e6aa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "body = {\n", + " \"data\": '''[\n", + " {\n", + " \"age\": 37,\n", + " \"workclass\": \"Private\",\n", + " \"education\": \"Bachelors\",\n", + " \"education-num\": 13,\n", + " \"marital-status\": \"Married-civ-spouse\",\n", + " \"occupation\": \"Exec-managerial\",\n", + " \"relationship\": \"Husband\",\n", + " \"race\": \"White\",\n", + " \"sex\": \"Male\",\n", + " \"capital-gain\": 500,\n", + " \"capital-loss\": 0,\n", + " \"hours-per-week\": 40,\n", + " \"native-country\": \"United-States\"\n", + " }\n", + " ]'''\n", + "}\n", + "# play with the capital-gain variable to see changes in prediction" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3acd4b19-f89c-4488-a631-f22183adc39b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "requests.post(endpoint, json=body, auth=auth).json()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c5f3379b-5885-490d-aa72-2fa29dd4ea10", + "metadata": { + "jupyter": { + "source_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "df_example = pd.DataFrame([{\n", + " \"age\": 37,\n", + " \"workclass\": \"Private\",\n", + " \"education\": \"Bachelors\",\n", + " \"education-num\": 13,\n", + " \"marital-status\": \"Married-civ-spouse\",\n", + " \"occupation\": \"Exec-managerial\",\n", + " \"relationship\": \"Husband\",\n", + " \"race\": \"White\",\n", + " \"sex\": \"Male\",\n", + " \"capital-gain\": 0,\n", + " \"capital-loss\": 0,\n", + " \"hours-per-week\": 40,\n", + " \"native-country\": \"United-States\"\n", + "}])\n", + "\n", + "# Convert DataFrame to JSON (orientation='records' creates a list of dicts)\n", + "body = {\n", + " \"data_type\": \"pandas.core.frame.DataFrame\",\n", + " \"data\": df_example.to_json(orient='records')\n", + "}" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python [conda env:automlx251_p311_cpu_x86_64_v2]", + "language": "python", + "name": "conda-env-automlx251_p311_cpu_x86_64_v2-py" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.11" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/data-platform/data-science/oracle-vector-search/README.md b/data-platform/data-science/oracle-vector-search/README.md index 462441ce6..abe2cc22a 100644 --- a/data-platform/data-science/oracle-vector-search/README.md +++ b/data-platform/data-science/oracle-vector-search/README.md @@ -1,9 +1,46 @@ # Oracle Vector Search -This page covers Oracle Vector Search. +Oracle AI Vector Search is designed for Artificial Intelligence (AI) workloads and allows you to query data based on semantics, rather than keywords. The VECTOR data type is introduced with the release of Oracle AI Database 26ai, providing the foundation to store vector embeddings alongside business data in the database. Using embedding models, you can transform unstructured data into vector embeddings that can then be used for semantic queries on business data. In order to use the VECTOR data type and its related features, the COMPATIBLE initialization parameter must be set to 23.4.0 or higher. -Reviewed: 2025.09.26 +Reviewed: 2025.11.26 +# Useful Links + +## Documentation + +- [What is a Vector Database](https://www.oracle.com/database/vector-database/) +- [Oracle.com](https://www.oracle.com/database/ai-vector-search/) +- [Oracle AI Vector Search FAQ](https://www.oracle.com/database/ai-vector-search/faq/) +- [Oracle AI Vector Search Technical Architecture](https://docs.oracle.com/en/database/oracle/oracle-database/26/vsiad/aivs_genarch.html) +- [Oracle AI Vector Search User's Guide](https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/overview-ai-vector-search.html) +- [Vector Search PL/SQL Packages](https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/vector-search-pl-sql-packages-node.html) +- [What's New for Oracle AI Vector Search](https://docs.oracle.com/en/database/oracle/oracle-database/26/vecse/whats-new-oracle-ai-vector-search.html) +- [Oracle Machine Learning AI models (Downloads)](https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/fU1V-voY2VBhhqMPjhCC57Up77ROK9u6GN_j3-uGi_EzIdHm9XDn-RfnZS5bV0cN/n/adwc4pm/b/OML-ai-models/o/Oracle%20Machine%20Learning%20AI%20models.htm) + +## Blogs + +- [Oracle Database Insider AI vector search blogs - complete list](https://blogs.oracle.com/database/category/db-vector-search) +- [Getting Started with Oracle AI Database AI Vector Search](https://blogs.oracle.com/database/post/getting-started-with-oracle-database-23ai-ai-vector-search) +- [Indexing Guidelines with AI Vector Search](https://blogs.oracle.com/database/post/indexing-guidelines-with-ai-vector-search) +- [Using HNSW Vector Indexes in AI Vector Search](https://blogs.oracle.com/database/post/using-hnsw-vector-indexes-in-ai-vector-search) +- [Using IVF Vector Indexes in AI Vector Search](https://blogs.oracle.com/database/post/using-ivf-vector-indexes) +- [Using Hybrid Vector Indexes in AI Vector Search](https://blogs.oracle.com/database/post/using-hybrid-vector-indexes) +- [Getting started with vectors in Oracle Database](https://blogs.oracle.com/coretec/post/getting-started-with-vectors-in-23ai) +- [Hybrid Vector Index - a combination of AI Vector Search with Text Search](https://blogs.oracle.com/coretec/post/hybrid-vector-index-the-combination-of-full-text-and-semantic-vector-search) +- [More Examples on Hybrid Vector Search](https://blogs.oracle.com/coretec/post/more-examples-on-hybrid-vector-search) + + +## LiveLabs + +- [AI Vector Search Livelabs - complete list](https://livelabs.oracle.com/pls/apex/f?p=133:100:101524290096807::::SEARCH:Ai%20Vector%20search) +- [Oracle AI Vector Search - Basics](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=1070&clear=RR,180&session=123357260636138) +- [Getting Started with AI Vector Search](https://livelabs.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=4166&clear=RR,180&session=4085850813889) +- [RAG Application using AI Vector Search in Oracle Autonomous Database and APEX](https://livelabs.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=4021&clear=RR,180&session=4085850813889) +- [RAG example with Oracle AI Vector Search](https://livelabs.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=4127&clear=RR,180&session=4085850813889) +- [Use of Graph RAG and Vector Search](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3953&clear=RR,180&session=123357260636138) +- [Oracle AI Vector Search - Using Vector Embedding Models with Nodejs](https://apexapps.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3926&clear=RR,180&session=123357260636138) +- [Oracle AI Vector Search - Using Vector Embedding Models with Python](https://livelabs.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3928&clear=RR,180&session=4085850813889) +- [Oracle AI Vector Search - Using Vector Embedding Models with JDBC](https://livelabs.oracle.com/pls/apex/r/dbpm/livelabs/view-workshop?wid=3925&clear=RR,180&session=4085850813889) # License diff --git a/data-platform/data-security/README.md b/data-platform/data-security/README.md index f9c9f1943..3e42eaedc 100644 --- a/data-platform/data-security/README.md +++ b/data-platform/data-security/README.md @@ -7,7 +7,7 @@ Security of Data is at the core of our products. As a team, we focus on Securi - The stand-alone products: Audit Vault and Database Firewall, Key Vault - The Enterprise Manager pack: Data Masking and Subsetting -Reviewd: 03.09.24 +Reviewed: 30.10.25 # Team Publications @@ -27,7 +27,6 @@ Reviewd: 03.09.24 - [Oracle.com](https://www.oracle.com) - [Oracle Security Blog](https://blogs.oracle.com/security/) - [Oracle Cloud Security Database Blogs](https://blogs.oracle.com/cloudsecurity/category/ocs-database-security) -- [Oracle Autonomous Database Security](https://videohub.oracle.com/media/Safeguarding%20your%20data%3A%20Oracle%20Autonomous%20Database%20Security/1_c4f4qui6?elq_mid=231948&sh=25121261326111887129186815826312&cmid=) # License @@ -35,4 +34,4 @@ Copyright (c) 2025 Oracle and/or its affiliates. Licensed under the Universal Permissive License (UPL), Version 1.0. -See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE.txt) for more details. diff --git a/data-platform/modernise/README.md b/data-platform/modernise/README.md index 2234dd824..9cc75e87f 100644 --- a/data-platform/modernise/README.md +++ b/data-platform/modernise/README.md @@ -52,6 +52,8 @@ Reviewed: 07.10.2025 - Detailed, step-by-step instructions for Migrating Database from existing ExaDB-D to latest generation of ExaDB-D , available on Amalraj Puthenchira's public platform - [Migrate and Upgrade your Oracle Base Database to Exadata Database Service on Exascale Infrastructure using AutoUpgrade](https://amalrajputhenchira.wordpress.com/2025/03/03/migrate-your-oracle-base-database-to-exadata-database-service-on-exascale-infrastructure-using-autoupgrade/) - Detailed, step-by-step instructions for Migrate and Upgrade Oracle Base Database 19c to ExaDB-XS using AutoUpgrade , available on Amalraj Puthenchira's public platform +- [Exadata Cloud Infrastructure Migration Automation Utility for ExaDB-D and ExaDB-C@C ](https://youtu.be/0GA-c24nFd4?si=Rl7E0Fd0jfSglxb8) + - Detailed Oracle Developer On Demand video with step-by-step instructions for migrating Infrastucture , VM Cluster and Databases on ExaDB-D and ExaDB-C@C using new Automation Utility , delivered by Amalraj Puthenchira in collaboration with Black Belts from Mission Critical Team and Exadata Cloud@Customer Team. # Useful Links - [OCI Migration Hub - Migrate Oracle Databases to OCI](https://www.oracle.com/database/cloud-migration/) diff --git a/data-platform/modernise/zero-downtime-migration/README.md b/data-platform/modernise/zero-downtime-migration/README.md index 50992aaf5..a9340277b 100644 --- a/data-platform/modernise/zero-downtime-migration/README.md +++ b/data-platform/modernise/zero-downtime-migration/README.md @@ -19,6 +19,8 @@ Reviewed: 07.10.2025 - Cloud Coaching Webinar, including a technical demonstration, delivered by Amalraj Puthenchira and Bilegt Bat Ochir around the use of ZDM for efficient migration into OCI. - [Fast-track your Journey to AI with Oracle and Azure (Migration to Oracle ADB23ai@Azure using Zero-Downtime Migration Logical Online) - Developer Coaching Webinar](https://www.youtube.com/watch?v=SanGj96PoxI) - Developer Coaching Webinar, including a technical demonstration, delivered by Mihai Costeanu and Emiel Ramakers around the use of ZDM for efficient migration to Oracle ADB23ai@Azure . +- [Modernise your Database with Autonomous Database on Oracle Database@Google Cloud - Developer Coaching Webinar](https://youtu.be/5zdOtUEfa1E?si=FeS6xhRf2nxEWSjA) + - Developer Coaching Webinar delivered by Amalraj Puthenchira and Neeraj Pandita covering migration solution , provisioning of key components and a live demo of ZDM logical online migration to Autonomous Database on Oralce Database@Google Cloud. ## Technical Guides diff --git a/data-platform/open-source-data-platforms/README.md b/data-platform/open-source-data-platforms/README.md index f11240139..7ccc49df2 100644 --- a/data-platform/open-source-data-platforms/README.md +++ b/data-platform/open-source-data-platforms/README.md @@ -11,6 +11,7 @@ Open Source Data Platforms continue to extend the Oracle Data Platform with Key - OCI OpenSearch - OCI NoSQL + # License Copyright (c) 2025 Oracle and/or its affiliates. diff --git a/data-platform/open-source-data-platforms/oci-big-data-service/README.md b/data-platform/open-source-data-platforms/oci-big-data-service/README.md index 4e54b8591..ed7365b29 100644 --- a/data-platform/open-source-data-platforms/oci-big-data-service/README.md +++ b/data-platform/open-source-data-platforms/oci-big-data-service/README.md @@ -3,7 +3,7 @@ Oracle Big Data Service is a fully managed, automated cloud service that provide Easily create secure and scalable Hadoop-based data lakes that can quickly process large amounts of data. BDS is based on Oracle [Hadoop Distribution](https://docs.oracle.com/en-us/iaas/Content/bigdata/overview.htm#overview-odh). -Reviewed: 04.06.2024 +Reviewed: 11.11.2025 # Table of Contents diff --git a/data-platform/open-source-data-platforms/oci-cache/README.md b/data-platform/open-source-data-platforms/oci-cache/README.md index 29c006f45..004907002 100644 --- a/data-platform/open-source-data-platforms/oci-cache/README.md +++ b/data-platform/open-source-data-platforms/oci-cache/README.md @@ -2,7 +2,7 @@ OCI Cache with Redis is a comprehensive, managed in-memory caching solution built on the foundation of open source Redis. This fully managed service accelerates data reads and writes, significantly enhancing application response times and database performance to provide an improved customer experience. -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # Table of Contents @@ -14,6 +14,7 @@ Reviewed: 05.06.2024 - [Leverage OCI-managed Redis and PostgreSQL for your e-commerce application](https://docs.oracle.com/en/solutions/oci-redis-postgresql/index.html#GUID-DD63C617-DEEE-4357-B203-A3CFDF1B34EC) - [Enable a Low Code Modular LLM App Engine using Oracle Integration and OCI Generative AI](https://docs.oracle.com/en/solutions/oci-generative-ai-integration/index.html#GUID-0C310162-34D9-4EB2-978D-FBAFB931E637) +- [Upgrade OCI Cache with Redis to Valkey](https://medium.com/@devpiotrekk/upgrade-oci-cache-with-redis-to-valkey-d3c01deb8733) # Useful Links diff --git a/data-platform/open-source-data-platforms/oci-cache/connect-via-nlb/README.md b/data-platform/open-source-data-platforms/oci-cache/connect-via-nlb/README.md index 53b5616c3..05d791674 100644 --- a/data-platform/open-source-data-platforms/oci-cache/connect-via-nlb/README.md +++ b/data-platform/open-source-data-platforms/oci-cache/connect-via-nlb/README.md @@ -1,6 +1,6 @@ # Connect OCI Cache from a local machine via a Network Load Balancer(NLB) -Reviewed: 13.08.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/README.md b/data-platform/open-source-data-platforms/oci-data-flow/README.md index d5bfb27b6..cd1aad781 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/README.md @@ -1,7 +1,7 @@ # OCI Data Flow Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage. Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data. This enables rapid application delivery because developers can focus on app development, not infrastructure management -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # Table of Contents @@ -15,6 +15,8 @@ Reviewed: 05.06.2024 - [Machine Learning with OCI Data Flow](https://www.youtube.com/watch?v=A6uVbK7wQb4) - [Extracting data from Salesforce in near real-time using OCI Data Flow — Part 1](https://medium.com/@eloi-lopes29/extracting-data-from-salesforce-in-near-real-time-using-oci-data-flow-part-1-f096886b9fcd) - [OCI Audit Logs to Object Storage with Data Flow](https://blogs.oracle.com/cloud-infrastructure/post/behind-the-scenes-shrinking-log-analysis) +- [Predict energy consumption with OCI Data Flow and Spark MLlib](https://medium.com/data-engineer-things/predict-energy-consumption-with-oci-data-flow-and-spark-mllib-74626c4db56a) +- [Apache Spark with OCI Data Flow and Oracle Autonomous Database](https://medium.com/@sylwekdec/apache-spark-with-oci-data-flow-and-oracle-autonomous-database-bd96055445ee) # Useful Links diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/readme.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/readme.md index cf21f07c9..5e5ecf0b5 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/readme.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/DeltaLake_Optimize/readme.md @@ -1,5 +1,7 @@ # Delta Lake Optimization +Reviewed: 11.11.2025 + Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage. Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data. However Spark structured streaming application can produce thousants of small files (according to microbatching and number of executors), which leads to performance degradadion. diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/Streaming_from_ObjectStorage/readme.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/Streaming_from_ObjectStorage/readme.md index 7163aa3c1..2a715606c 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/Streaming_from_ObjectStorage/readme.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/Streaming_from_ObjectStorage/readme.md @@ -1,5 +1,7 @@ # OCI Data Flow Reading files from Object Storage in Streaming mode +Reviewed: 11.11.2025 + Sometimes you would like to continously monitor a Object Storage (S3 compatible) location and incrementally process new incoming data.
With Spark we can create a StreamingQuery using ObjectStorage source and process data from files in streaming mode .... without streaming platform. All we need is to use spark.readStream with a location - object storage or S3 compatible. diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/build-and-deploy-app-from-oci-ds/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/build-and-deploy-app-from-oci-ds/README.md index f9bb91aaa..a577c1719 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/build-and-deploy-app-from-oci-ds/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/build-and-deploy-app-from-oci-ds/README.md @@ -1,6 +1,6 @@ # Build and Deploy an OCI Data Flow application using OCI Data Science -Reviewed: 04.06.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-AWS-S3/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-AWS-S3/README.md index 5fe4afdf2..51ab29b58 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-AWS-S3/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-AWS-S3/README.md @@ -1,6 +1,6 @@ # OCI Data Flow Connection to AWS S3 -Reviewed: 10.07.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-Snowflake/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-Snowflake/README.md index 380508b36..249af4a34 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-Snowflake/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-Snowflake/README.md @@ -1,6 +1,6 @@ # OCI Data Flow Connection to Snowflake -Reviewed: 10.07.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-adw/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-adw/README.md index aa8f4028f..8dbcbb859 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-adw/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-adw/README.md @@ -1,6 +1,6 @@ # Load data from Autonomous Database into OCI Data Flow -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-postgresql/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-postgresql/README.md index 1e01901e5..c0d5e7272 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-postgresql/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-postgresql/README.md @@ -1,6 +1,6 @@ # OCI Data Flow Connection to PostgreSQL -Reviewed: 10.07.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-salesforce/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-salesforce/README.md index 49e9b1ca7..dbf2e7ce0 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-salesforce/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/connect-to-salesforce/README.md @@ -1,6 +1,6 @@ # Connect to Salesforce using OCI Data Flow -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/db-connectors-and-streaming/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/db-connectors-and-streaming/README.md index 037c2b959..7c2ecd4c6 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/db-connectors-and-streaming/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/db-connectors-and-streaming/README.md @@ -1,6 +1,6 @@ # OCI Data Flow Connection to ADB/PostgreSQL/OCI Streaming from OCI Data Science Notebook using Spark -Reviewed: 9.08.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/load-data-to-adw/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/load-data-to-adw/README.md index 7bb3870a1..15654862f 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/load-data-to-adw/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/load-data-to-adw/README.md @@ -1,6 +1,6 @@ # Load data from Autonomous Database into OCI Data Flow -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/run-from-functions/README.md b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/run-from-functions/README.md index d899b0380..70a6c8b6b 100644 --- a/data-platform/open-source-data-platforms/oci-data-flow/code-examples/run-from-functions/README.md +++ b/data-platform/open-source-data-platforms/oci-data-flow/code-examples/run-from-functions/README.md @@ -1,6 +1,6 @@ # Triggering OCI Data Flow from OCI Functions -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-no-sql/README.md b/data-platform/open-source-data-platforms/oci-no-sql/README.md index 8b3ad8b28..5bb426c9f 100644 --- a/data-platform/open-source-data-platforms/oci-no-sql/README.md +++ b/data-platform/open-source-data-platforms/oci-no-sql/README.md @@ -1,7 +1,7 @@ # OCI NoSQL Oracle NoSQL Database Cloud Service makes it easy for developers to build applications using document, fixed schema, and key-value database models, delivering predictable single-digit millisecond response times with data replication for high availability. The service offers active-active regional replication, ACID transactions, serverless scaling, comprehensive security, and low pay-per-use pricing for both on-demand and provisioned capacity modes, including 100% compatibility with on-premises Oracle NoSQL Database. -Reviewed: 04.06.2024 +Reviewed: 11.11.2025 # Table of Contents diff --git a/data-platform/open-source-data-platforms/oci-opensearch/code-examples/anomaly-detection-ons/README.md b/data-platform/open-source-data-platforms/oci-opensearch/code-examples/anomaly-detection-ons/README.md index fb0679698..1d13ab084 100644 --- a/data-platform/open-source-data-platforms/oci-opensearch/code-examples/anomaly-detection-ons/README.md +++ b/data-platform/open-source-data-platforms/oci-opensearch/code-examples/anomaly-detection-ons/README.md @@ -1,6 +1,6 @@ # Run Anomaly Detection in OCI OpenSearch and receive an alert using Oracle Notification Service -Reviewed: 10.04.2025 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-opensearch/code-examples/nginx-server/README.md b/data-platform/open-source-data-platforms/oci-opensearch/code-examples/nginx-server/README.md index 83a8c6073..bddeb4c6e 100644 --- a/data-platform/open-source-data-platforms/oci-opensearch/code-examples/nginx-server/README.md +++ b/data-platform/open-source-data-platforms/oci-opensearch/code-examples/nginx-server/README.md @@ -1,6 +1,6 @@ # Create a NGINX server to access the OCI OpenSearch Dashboards -Reviewed: 22.07.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-opensearch/code-examples/rag-oci-opensearch-genai-service/README.md b/data-platform/open-source-data-platforms/oci-opensearch/code-examples/rag-oci-opensearch-genai-service/README.md index edd5c8518..635099d94 100644 --- a/data-platform/open-source-data-platforms/oci-opensearch/code-examples/rag-oci-opensearch-genai-service/README.md +++ b/data-platform/open-source-data-platforms/oci-opensearch/code-examples/rag-oci-opensearch-genai-service/README.md @@ -1,6 +1,6 @@ # Create a full RAG pipeline using OCI OpenSearch and the GenAI service -Reviewed: 05.06.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-opensearch/readme.md b/data-platform/open-source-data-platforms/oci-opensearch/readme.md index 142326a88..0de6d1234 100644 --- a/data-platform/open-source-data-platforms/oci-opensearch/readme.md +++ b/data-platform/open-source-data-platforms/oci-opensearch/readme.md @@ -3,7 +3,7 @@ OCI Search with OpenSearch is a managed service that you can use to build in-app Search with OpenSearch handles all the management and operations of search clusters, including operations such as security updates, upgrades, resizing, and scheduled backups. This allows you to focus your resources on building features for your OpenSearch solutions. -Reviewed: 04.06.2024 +Reviewed: 11.11.2025 # Table of Contents @@ -18,6 +18,8 @@ Reviewed: 04.06.2024 - [Retrieval Augmented Generation with OCI OpenSearch and GenAI service](https://github.com/bobpeulen/oci_opensearch/blob/main/oci_opensearch_rag_auto.ipynb) A notebook describing and performing all steps to create and store a custom embedding model in the OCI OpenSearch cluster and create a full RAG pipeline (OCI OpenSearch as Vector database and in-memory engine and the GenAI service (cohere) as LLM) - [LiveLabs: Search and visualize data with OCI Search Service with OpenSearch](https://apexapps.oracle.com/pls/apex/f?p=133:180:6071760449919::::wid:3427) +- [How to Interact with OpenSearch?](https://www.linkedin.com/pulse/how-interact-opensearch-isma%C3%ABl-hassane-j7z9f/?trackingId=TbuchZqCSDa9X65sWhx8xw%3D%3D) +- [Architecting with OpenSearch](https://www.linkedin.com/pulse/architecting-opensearch-isma%C3%ABl-hassane-gz1jf/?trackingId=xw1G1Yq9SMutkogC3FIJmg%3D%3D) # Useful Links diff --git a/data-platform/open-source-data-platforms/oci-postgresql/README.md b/data-platform/open-source-data-platforms/oci-postgresql/README.md index c2152e366..6ee810deb 100644 --- a/data-platform/open-source-data-platforms/oci-postgresql/README.md +++ b/data-platform/open-source-data-platforms/oci-postgresql/README.md @@ -1,4 +1,7 @@ # OCI PostgreSQL + +Reviewed: 11.11.2025 + OCI Database with PostgreSQL is a fully managed PostgreSQL-compatible service with intelligent sizing, tuning, and high durability. The service automatically scales storage as database tables are created and dropped, making management easier on you and optimizing storage spend. @@ -21,6 +24,8 @@ OCI Database with PostgreSQL is designed for high availability by offering durab - [OCI PostgreSQL to OCI PostgreSQL cross-region replication with OCI GoldenGate — Part 3](https://medium.com/@devpiotrekk/oci-postgresql-to-oci-postgresql-cross-region-replication-with-oci-goldengate-oci-goldengate-4ccd5dea4d6c) - [OCI PostgreSQL replication with pglogical](https://medium.com/@devpiotrekk/replicating-oci-database-with-postgresql-using-pglogical-118182ff08f9) - [OCI PostgreSQL vector search with pgvector - Part 1](https://medium.com/@devpiotrekk/vector-search-with-pgvector-and-oci-database-with-postgresql-part-1-0915e5296148) +- [Benchmarking OCI Database with PostgreSQL](https://medium.com/@andreumdorokhinum/benchmarking-oci-database-with-postgresql-0a665e575fde) +- [Migrate PostgreSQL to OCI PostgreSQL using OCI Object Storage and Rclone](https://medium.com/@sylwekdec/migrate-postgresql-to-oci-postgresql-using-oci-object-storage-and-rclone-a61ef97c5b96) # Useful Links diff --git a/data-platform/open-source-data-platforms/oci-postgresql/code-examples/connect-to-oac/README.md b/data-platform/open-source-data-platforms/oci-postgresql/code-examples/connect-to-oac/README.md index 8bd02775d..8f5064f3e 100644 --- a/data-platform/open-source-data-platforms/oci-postgresql/code-examples/connect-to-oac/README.md +++ b/data-platform/open-source-data-platforms/oci-postgresql/code-examples/connect-to-oac/README.md @@ -1,6 +1,6 @@ # Connect OCI PostgreSQL to Oracle Analytics Cloud -Reviewed: 29.07.2024 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-postgresql/code-examples/postgis-geoserver/README.md b/data-platform/open-source-data-platforms/oci-postgresql/code-examples/postgis-geoserver/README.md index bc04a09c6..f79b92716 100644 --- a/data-platform/open-source-data-platforms/oci-postgresql/code-examples/postgis-geoserver/README.md +++ b/data-platform/open-source-data-platforms/oci-postgresql/code-examples/postgis-geoserver/README.md @@ -1,6 +1,6 @@ # Connect OCI PostgreSQL to Oracle Analytics Cloud -Reviewed: 30.09.2025 +Reviewed: 11.11.2025 # When to use this asset? diff --git a/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/README.md b/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/README.md index c2f974327..b07ca3eaa 100644 --- a/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/README.md +++ b/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/README.md @@ -1,7 +1,7 @@ # OCI Streaming with Apache Kafka A fully managed Kafka service that allows you to build real-time, distributed data streaming pipelines so you can collect, process, store, and move millions of events per minute in a cost-efficient manner that’s 100% compatible with open source Apache Kafka. -Reviewed: 18.09.2025 +Reviewed: 11.11.2025 # Table of Contents @@ -20,6 +20,7 @@ Reviewed: 18.09.2025 - [Kafka UI & Kafka Connect Setup with OCI OpenSearch](https://github.com/oracle-devrel/technology-engineering/tree/main/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/kafka-ui-connect-setup-with-oci-opensearch) - [OCI GoldenGate connection to OCI Streaming with Apache Kafka](https://github.com/oracle-devrel/technology-engineering/tree/main/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/goldengate_oci_streaming-with-apache-kafka) +- [Configuring Schema registry and Kafka connect with AKHQ(UI) for OSAK](https://github.com/oracle-devrel/technology-engineering/tree/main/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup) # License diff --git a/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/LICENSE b/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/LICENSE new file mode 100644 index 000000000..46c0c79d9 --- /dev/null +++ b/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/LICENSE @@ -0,0 +1,35 @@ +Copyright (c) 2025 Oracle and/or its affiliates. + +The Universal Permissive License (UPL), Version 1.0 + +Subject to the condition set forth below, permission is hereby granted to any +person obtaining a copy of this software, associated documentation and/or data +(collectively the "Software"), free of charge and under any and all copyright +rights in the Software, and any and all patent rights owned or freely +licensable by each licensor hereunder covering either (i) the unmodified +Software as contributed to or provided by such licensor, or (ii) the Larger +Works (as defined below), to deal in both + +(a) the Software, and +(b) any piece of software and/or hardware listed in the lrgrwrks.txt file if +one is included with the Software (each a "Larger Work" to which the Software +is contributed by such licensors), + +without restriction, including without limitation the rights to copy, create +derivative works of, display, perform, and distribute the Software and make, +use, sell, offer for sale, import, export, have made, and have sold the +Software and the Larger Work(s), and to sublicense the foregoing rights on +either these or other terms. + +This license is subject to the following condition: +The above copyright notice and either this complete permission notice or at +a minimum a reference to the UPL must be included in all copies or +substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/schema-registry.md b/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/schema-registry.md index 7bdc56cd3..d10d410d6 100644 --- a/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/schema-registry.md +++ b/data-platform/open-source-data-platforms/oci-streaming-with-apache-kafka/code-examples/schema-registry-akhq-setup/schema-registry.md @@ -1,5 +1,7 @@ # Configuring Schema registry and Kafka connect with AKHQ(UI) for OSAK +Review: 11.11.2025 + ### Prerequistes Working instance of OCI Streaming with Apache Kafka diff --git a/data-platform/open-source-data-platforms/oci-streaming/README.md b/data-platform/open-source-data-platforms/oci-streaming/README.md index ad6c60582..99c20e33b 100644 --- a/data-platform/open-source-data-platforms/oci-streaming/README.md +++ b/data-platform/open-source-data-platforms/oci-streaming/README.md @@ -1,7 +1,7 @@ # OCI Streaming The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable solution for ingesting and consuming high-volume data streams in real-time. Use Streaming for any use case in which data is produced and processed continually and sequentially in a publish-subscribe messaging model. -Reviewed: 04.06.2024 +Reviewed: 11.11.2025 # Table of Contents diff --git a/data-platform/open-source-data-platforms/oci-streaming/code-examples/fake-producer-consumer/README.md b/data-platform/open-source-data-platforms/oci-streaming/code-examples/fake-producer-consumer/README.md index 4d1a230e8..b41a09c9e 100644 --- a/data-platform/open-source-data-platforms/oci-streaming/code-examples/fake-producer-consumer/README.md +++ b/data-platform/open-source-data-platforms/oci-streaming/code-examples/fake-producer-consumer/README.md @@ -1,6 +1,6 @@ # Example of Producing and Consuming for OCI Streaming -Reviewed: 22.10.2024 +Reviewed: 11.11.2025 1. Create compute instance. Oracle Linux 7. 2. Run the below to install Git, clone the repo, and install several packages diff --git a/data-platform/open-source-data-platforms/oci-streaming/code-examples/mosquitto_node-red/README.md b/data-platform/open-source-data-platforms/oci-streaming/code-examples/mosquitto_node-red/README.md index 99d17e905..6575f254a 100644 --- a/data-platform/open-source-data-platforms/oci-streaming/code-examples/mosquitto_node-red/README.md +++ b/data-platform/open-source-data-platforms/oci-streaming/code-examples/mosquitto_node-red/README.md @@ -1,5 +1,7 @@ # Create and run Mosquitto & Node-RED, connecting to OCI Streaming +Reviewed: 11.11.2025 + The below creates a Mosquitto instance on OCI and adds configuration to handle the incoming KPN IoT platform traffic. KPN needs CA signed certificates and encrypted messages, and username/password auth. - Create instance with CentOS 7 image @@ -29,9 +31,9 @@ The below creates a Mosquitto instance on OCI and adds configuration to handle t mosquitto_pub -h localhost -t test_topic -m "hello world" ``` -- Create a password file. Run the below. In the example, 'bob' is the username. Password will be prompted when you run. +- Create a password file. Run the below. In the example, 'username' is the username. Password will be prompted when you run. ``` - sudo mosquitto_passwd -c /etc/mosquitto/passwd bob + sudo mosquitto_passwd -c /etc/mosquitto/passwd username ``` - Create the CA keys. Public IP should be added to public DNS. When prompted for domain, use the full Domain. @@ -65,13 +67,13 @@ The below creates a Mosquitto instance on OCI and adds configuration to handle t - Test with credentials ``` - mosquitto_sub -h localhost -t kpnthings -u "bob" -P "password" -p 8883 - mosquitto_pub -h localhost -t "kpnthings" -m "hello world" -u "bob" -P "password" -p 8883 + mosquitto_sub -h localhost -t kpnthings -u "username" -P "password" -p 8883 + mosquitto_pub -h localhost -t "kpnthings" -m "hello world" -u "username" -P "password" -p 8883 - Test with credentials and certificate. ``` - mosquitto_pub -h mosquitto-demo.cooldemo.org -t kpnthings -m "hello again" -p 8883 --cafile /etc/ssl/certs/ca-bundle.crt -u "bob" -P "password" - mosquitto_sub -h mosquitto-demo.cooldemo.org -t kpnthings -p 8883 --cafile /etc/ssl/certs/ca-bundle.crt -u "bob" -P "password" + mosquitto_pub -h mosquitto-demo.cooldemo.org -t kpnthings -m "hello again" -p 8883 --cafile /etc/ssl/certs/ca-bundle.crt -u "username" -P "password" + mosquitto_sub -h mosquitto-demo.cooldemo.org -t kpnthings -p 8883 --cafile /etc/ssl/certs/ca-bundle.crt -u "username" -P "password" ``` diff --git a/data-platform/oracle-ai-data-platform/README.md b/data-platform/oracle-ai-data-platform/README.md new file mode 100644 index 000000000..b893f48e7 --- /dev/null +++ b/data-platform/oracle-ai-data-platform/README.md @@ -0,0 +1,47 @@ +# Oracle AI Data Platform + +Oracle AI Data Platform provides streamlined, secure, and seamless data management, +analysis, and collaboration. +Oracle AI Data Platform is designed for enterprises that need to: +• Streamline Data Discovery and Governance: AI Data Platform provides a centralized +metadata repository (Master Catalog) that enhances searchability and governance of +structured and unstructured data. +• Enable Secure Data Collaboration: Through RBAC-based access control, AI Data +Platform allows different teams to work on shared datasets while maintaining strict security +policies. +• Accelerate Data Preparation and Processing: With built-in notebooks and workflow +orchestration, users can clean, transform, and enrich data efficiently. +• Support Advanced Analytics and AI/ML: AI Data Platform integrates with Apache Spark, +allowing data scientists and analysts to run complex computations and model training +directly within their data lake. +• Ensure Seamless Integration Across Data Sources: AI Data Platform supports external +catalogs from Autonomous Database (ADB), Object Storage (OS), and third-party data +sources, enabling users to query and analyze data without duplication. + +Reviewed: 13.11.2025 + +# Table of Contents +1. [Team Publications](#team-publications) +2. [Useful Links](#useful-links) + +# Team Publications + +## Specialists Blogs for various features & functionality + +- [Bringing Fusion Applications Data and eBusiness Suite data together in Oracle AI Data platform](https://medium.com/@DoubleUP66/bringing-fusion-applications-data-and-ebusiness-suite-data-together-in-oracle-ai-data-platform-3efd01c42dbc) + +# Useful Links + +## Public Homepage and official documentation content + +- [Oracle AI Data Platform: Public Homepage](https://www.oracle.com/ai-data-platform/) +- [Oracle AI Data Platform: Official documentation](https://docs.oracle.com/en/cloud/paas/ai-data-platform/) +- [Oracle AI Data Platform: Product Guides](https://docs.oracle.com/en/cloud/paas/ai-data-platform/books.html) + +# License + +Copyright (c) 2024 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. diff --git a/manageability-and-operations/observability-and-manageability/README.md b/manageability-and-operations/observability-and-manageability/README.md index 644977ac6..f075889a9 100644 --- a/manageability-and-operations/observability-and-manageability/README.md +++ b/manageability-and-operations/observability-and-manageability/README.md @@ -4,7 +4,7 @@ The Observability and Manageability (O&M) platform is a suite of OCI services th   -Reviewed: Reviewed: 15.10.2025 +Reviewed: Reviewed: 05.11.2025   @@ -35,7 +35,7 @@ Reviewed: Reviewed: 15.10.2025 | EBS | WIP | Coming Soon| | Webogic | WIP | Coming Soon| | Apex | WIP | Coming Soon| -| .... | | | +| OCI CI Container Instance | [Link](https://github.com/adibirzu/oci-container-monitoring) | | | .... | | | | .... | | | diff --git a/manageability-and-operations/observability-and-manageability/logging-analytics/finops/README.md b/manageability-and-operations/observability-and-manageability/logging-analytics/finops/README.md new file mode 100644 index 000000000..5cb9f77a4 --- /dev/null +++ b/manageability-and-operations/observability-and-manageability/logging-analytics/finops/README.md @@ -0,0 +1,56 @@ +# Mastering Cloud Cost Control with OCI Log Analytics + +Oracle Cloud Infrastructure (OCI) provides several built-in tools to +help users monitor, analyze, and control cloud spending. Among these +tools are OCI Cost Analysis and Scheduled Reports, which offer +visibility into usage patterns and cost trends over time. These tools +are valuable for high-level reporting and day-to-day cost tracking, +especially when trying to stay within budget or identify cost anomalies. + +However, for more in-depth analysis—such as breaking down spending +across departments, correlating costs with specific resource tags, or +building custom dashboards—access to raw cost and usage data becomes +essential. This is where the ability to export and analyze detailed cost +reports becomes particularly useful. + +OCI is fully compliant with the FinOps Foundation’s FOCUS (FinOps Open +Cost and Usage Specification) standard. The FOCUS report provides a +standardized and comprehensive dataset that includes detailed +information about costs, services, compartments, tags, and more. This +standardized format makes it easier to integrate OCI cost data into +third-party tools or advanced analytics platforms. + +In this asset I import +FOCUS report into OCI Log Analytics. By doing this, you can take +advantage of powerful querying capabilities and visualization features +within the Log Analytics platform. This approach allows you to build +customized dashboards, run advanced queries, and perform granular +analysis tailored to your organization’s specific needs. + + +Reviewed: 29.10.2025 + +# When to use this asset? + +[**Better Cloud cost control**] +OCI’s cost analysis tool provides a great high-level view of your cloud costs, but sometimes you need more detailed customization. For instance, you might want to track when you hit an overage, monitor monthly costs, and calculate the percentage of your budget consumed. Additionally, having visibility at the resource level, rather than just service categories, can give you deeper insights into where your money is going. + +[**Custom alerts on budgeting**] OCI’s Budget tool lets you set alerts based on compartments, tenancy, or billing tags. However, there may be instances where you need more granular control. For example, you might want alerts triggered when a new service is used, or to restrict usage to a specific set of services. This enables tighter cost control and better oversight over your cloud spending. + +[**Cross-Referencing Costs with Resource Utilization**]In a cloud environment, it’s crucial to monitor both resource allocation and resource utilization on the same dashboard. This approach allows you to identify inefficiencies, such as underutilized resources that may be driving up costs. By aligning cost data with utilization insights, you can optimize infrastructure usage and better manage your cloud expenditures. + + +# How to use this asset? + +You can follow the instruction of the [step by step guide](https://github.com/oracle-quickstart/oci-o11y-solutions/tree/main/knowledge-content/FinOps/files) + + +# License + +Copyright (c) 2025 Oracle and/or its affiliates. + +Licensed under the Universal Permissive License (UPL), Version 1.0. + +See [LICENSE](https://github.com/oracle-devrel/technology-engineering/blob/main/LICENSE) for more details. + + diff --git a/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/README.md b/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/README.md index ced928edc..921bb1003 100644 --- a/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/README.md +++ b/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/README.md @@ -3,7 +3,7 @@ This section will bring some examples of how to create OCI Monitoring custom metric namespaces to extend the default, out-of-the-box, OCI Monitoring metrics for OCI resources. -Reviewed: 18.11.2024 +Reviewed: 05.11.2025 # Team Publications diff --git a/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-FN-services-limit-monitoring/README.md b/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-FN-services-limit-monitoring/README.md index 104065ab8..be17289a4 100644 --- a/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-FN-services-limit-monitoring/README.md +++ b/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-FN-services-limit-monitoring/README.md @@ -1,6 +1,6 @@ # Using OCI Functions to create OCI Monitoring custom metric namespace: Services Limit monitoring example use case -Reviewed: 15.11.2024 +Reviewed: 05.11.2025 ## 1. INTRODUCTION diff --git a/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-python-SDK-services-limit-monitoring/README.md b/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-python-SDK-services-limit-monitoring/README.md index 166b32ddb..44afe1dd6 100644 --- a/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-python-SDK-services-limit-monitoring/README.md +++ b/manageability-and-operations/observability-and-manageability/oci-monitoring/custom-metrics/custom-metric-python-SDK-services-limit-monitoring/README.md @@ -1,7 +1,7 @@ # Using Python SDK to create OCI Monitoring custom metric namespace: Services Limit monitoring example use case -Reviewed: 15.11.2024 +Reviewed: 05.11.2025 ## 1. INTRODUCTION diff --git a/security/identity-and-access-management/README.md b/security/identity-and-access-management/README.md index 726d7c261..a4884f1f6 100644 --- a/security/identity-and-access-management/README.md +++ b/security/identity-and-access-management/README.md @@ -7,8 +7,8 @@ The Identity and Access Management group under the Technology Engineering Securi - Solution Assistance - Workshops to enable the partners/customers -Reviewed: 01.04.2025 - +Reviewed: 11.11.2025 + Table of Contents 1. [Team Publications](#team-publications) diff --git a/security/identity-and-access-management/oracle-access-governance/README.md b/security/identity-and-access-management/oracle-access-governance/README.md index c0610941a..e670f95f7 100644 --- a/security/identity-and-access-management/oracle-access-governance/README.md +++ b/security/identity-and-access-management/oracle-access-governance/README.md @@ -6,7 +6,7 @@ Access Governance is a cloud native identity governance and administration (IGA) Oracle Access Governance enables integration with a wide range of authoritative sources (trusted source of identities and their attributes) and managed systems (applications containing account and permissions). **For the most common integration patterns, please see the [Reusable Assets Overview](#reusable-assets-overview) section below**. -Reviewed: 20.03.2025 +Reviewed: 11.11.2025 # Useful Links @@ -24,10 +24,10 @@ Reviewed: 20.03.2025 - [Oracle Access Governance: Securing the identity posture for enterprise and cloud applications](https://blogs.oracle.com/cloud-infrastructure/post/securing-identity-posture) - [Intelligent Cloud Delivered Access Governance with Prescriptive Analytics](https://blogs.oracle.com/cloudsecurity/post/intelligent-cloud-delivered-access-governance-with-prescriptive-analytics) -## OAG Training & Live Labs +## OAG Training & Tutorials - [Cloud Coaching - Oracle Access Governance - Identity Governance and Access Reviews (Video)](https://www.youtube.com/watch?v=9reHN697x6g) -- [Demo & Labs](https://luna.oracle.com/lab/6345863c-42c4-4f17-96fc-130278ac4b1f/steps) +- [Oracle Access Governance Tutorials](https://docs.oracle.com/en/cloud/paas/access-governance/tutorials.html) # Reusable Assets Overview diff --git a/security/identity-and-access-management/oracle-access-governance/dbat-os-accounts-sample/README.md b/security/identity-and-access-management/oracle-access-governance/dbat-os-accounts-sample/README.md index 8fab0eccc..df93dcce2 100644 --- a/security/identity-and-access-management/oracle-access-governance/dbat-os-accounts-sample/README.md +++ b/security/identity-and-access-management/oracle-access-governance/dbat-os-accounts-sample/README.md @@ -6,7 +6,7 @@ At the time of writing, this capability is not offered natively in OAG. The described integration and data can be used for all supported user/account lifecycle operations in OAG, including use in access certification. Note that this simulates a connected system, therefore changes to OS level user access will be reflected in the targeted database tables. -Review Date: 04.08.2025 +Review Date: 11.11.2025 # When to use this asset? diff --git a/security/identity-and-access-management/oracle-access-governance/ebs-hrms-oci-iam/README.md b/security/identity-and-access-management/oracle-access-governance/ebs-hrms-oci-iam/README.md index 21ef4fdc9..ed6536481 100644 --- a/security/identity-and-access-management/oracle-access-governance/ebs-hrms-oci-iam/README.md +++ b/security/identity-and-access-management/oracle-access-governance/ebs-hrms-oci-iam/README.md @@ -1,6 +1,6 @@ # EBS HRMS to OCI IAM integration -Review Date: 20.03.2025 +Review Date: 11.11.2025 # When to use this asset? diff --git a/security/identity-and-access-management/oracle-access-governance/fusion-hcm-ebs-msad/README.md b/security/identity-and-access-management/oracle-access-governance/fusion-hcm-ebs-msad/README.md index 137ec6a45..9f4cbaa2f 100644 --- a/security/identity-and-access-management/oracle-access-governance/fusion-hcm-ebs-msad/README.md +++ b/security/identity-and-access-management/oracle-access-governance/fusion-hcm-ebs-msad/README.md @@ -1,6 +1,6 @@ # Fusion HCM & EBS to MS AD integration -Review Date: 20.03.2025 +Review Date: 11.11.2025 # When to use this asset? diff --git a/security/identity-and-access-management/oracle-access-governance/fusion-hcm-msentraid/README.md b/security/identity-and-access-management/oracle-access-governance/fusion-hcm-msentraid/README.md index 24bf50a8d..7bf18bedb 100644 --- a/security/identity-and-access-management/oracle-access-governance/fusion-hcm-msentraid/README.md +++ b/security/identity-and-access-management/oracle-access-governance/fusion-hcm-msentraid/README.md @@ -1,6 +1,6 @@ # Fusion HCM to Entra ID integration -Review Date: 20.03.2025 +Review Date: 11.11.2025 # When to use this asset? diff --git a/security/identity-and-access-management/oracle-access-governance/postman-rest-request-sample/README.md b/security/identity-and-access-management/oracle-access-governance/postman-rest-request-sample/README.md index 456b19fcb..2ac9328bb 100644 --- a/security/identity-and-access-management/oracle-access-governance/postman-rest-request-sample/README.md +++ b/security/identity-and-access-management/oracle-access-governance/postman-rest-request-sample/README.md @@ -2,7 +2,7 @@ A Postman collection of sample REST API requests for Oracle Access Governance (OAG) that showcases the ability to submit requests, trigger guardrail violations and interrogate OAG objects using REST API calls. Note that these samples are meant for reference only and are not intended for use in production systems. -Review Date: 12.09.2025 +Review Date: 11.11.2025 # When to use this asset? diff --git a/security/identity-and-access-management/oracle-access-manager/README.md b/security/identity-and-access-management/oracle-access-manager/README.md index 9637051af..1e173eb3f 100644 --- a/security/identity-and-access-management/oracle-access-manager/README.md +++ b/security/identity-and-access-management/oracle-access-manager/README.md @@ -2,14 +2,14 @@ Oracle Access Management provides innovative new services that complement traditional access management capabilities. It provides Web SSO with MFA, coarse-grained authorization, session management, standard SAML Federation, and OAuth capabilities to enable secure access to external cloud and mobile applications. It can be easily integrated with the Oracle Identity Cloud Service to support hybrid access management capabilities that can help customers protect on-premises and cloud applications seamlessly. -Reviewed: 28.10.2024 +Reviewed: 11.11.2025 # Useful Links ## General Product Links - [Oracle Access Manager Product Page](https://www.oracle.com/middleware/technologies/access-management.html) -- [Oracle Access Manager Documentation](https://docs.oracle.com/en/middleware/idm/suite/12.2.1.3/) +- [Oracle Access Manager Documentation](https://docs.oracle.com/en/middleware/idm/access-manager/14.1.2/index.html) # License diff --git a/security/identity-and-access-management/oracle-directory-services/README.md b/security/identity-and-access-management/oracle-directory-services/README.md index c62ff3461..71dc3d7d5 100644 --- a/security/identity-and-access-management/oracle-directory-services/README.md +++ b/security/identity-and-access-management/oracle-directory-services/README.md @@ -2,7 +2,7 @@ Oracle Unified Directory is part of Oracle's comprehensive directory solution offering for robust identity management deployments. Enable enterprise directory scalability with an all-in-one solution that provides the services required for high performance and massive scale. -Reviewed: 28.10.2024 +Reviewed: 11.11.2025 # Useful Links @@ -10,6 +10,7 @@ Reviewed: 28.10.2024 - [Oracle Directory Services Product Page](https://www.oracle.com/in/security/identity-management/directory-services/) - [Oracle Directory Services Product Tour](https://www.oracle.com/webfolder/s/quicktours/paas/pt-sec-oud/index.html) +- [Oracle Directory Services Documentation](https://docs.oracle.com/en/middleware/idm/unified-directory/14.1.2/index.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/README.md b/security/identity-and-access-management/oracle-identity-governance/README.md index 18a560004..8959615fe 100644 --- a/security/identity-and-access-management/oracle-identity-governance/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/README.md @@ -2,7 +2,7 @@ Oracle Identity Governance provides complete user lifecycle management and rich access entitlement controls across a wide range of services for both on-premises and cloud. Now supports microservices to discover common access patterns, optimize role-based access control, and automate the process of role publishing to Oracle Identity Governance. Oracle Identity Governance manages user provisioning and de-provisioning and provides actionable identity intelligence that enables rapid remediation of high-risk user entitlements. -Review Date: 28.10.2024 +Review Date: 11.11.2025 # Useful Links @@ -10,14 +10,14 @@ Review Date: 28.10.2024 - [Oracle Identity Governance Product Page](https://www.oracle.com/security/identity-management/governance/) - Oracle Identity Governance Documentation - - [OIG Public Documentation](https://docs.oracle.com/en/middleware/idm/suite/12.2.1.4/books.html) + - [OIG Public Documentation](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/index.html) - [OIG Connectors Documentation](https://docs.oracle.com/en/middleware/idm/identity-governance-connectors/12.2.1.3/index.html) ## Oracle University OIG course In this course, you learn essential concepts about implementing identity management solutions with Oracle Identity Governance. -- https://education.oracle.com/oracle-identity-governance-12c-essentials/courP_9411 +- https://learn.oracle.com/ols/course/oracle-identity-governance-12c-essentials/67157/67772 ## Application Onboarding with Oracle Identity Governance (OIG) diff --git a/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/password-reset-on-user-role-change/README.md b/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/password-reset-on-user-role-change/README.md index cfe84f0df..ac6fe9e6b 100644 --- a/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/password-reset-on-user-role-change/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/password-reset-on-user-role-change/README.md @@ -6,7 +6,7 @@ Note that by "user role" we are reffering to the user's role attribute (also kno Developed on and compatible with OIG 11g R2 PS3 and above. -Review Date: 28.10.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -35,7 +35,7 @@ Once registered the code will be run automatically when user lifecycle events oc # Useful Links -[Oracle Identity Governance developer's guide - Developing plugins](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-plug-ins.html#GUID-7F4EE3EA-076C-45DB-B13D-2905AB5AF6CB) +[Oracle Identity Governance developer's guide - Developing plugins](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-plug-ins.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/user-lifecycle-event-notification/README.md b/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/user-lifecycle-event-notification/README.md index e9769687c..7fd8a95fa 100644 --- a/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/user-lifecycle-event-notification/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/event-handler-samples/user-lifecycle-event-notification/README.md @@ -4,7 +4,7 @@ This asset contains the code and deployment items for an event handler that send Developed on and compatible with OIG 11g R2 PS3 and above. -Review Date: 28.10.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -37,7 +37,7 @@ Once registered the code will be run automatically when the configured lifecycle # Useful Links -[Oracle Identity Governance developer's guide - Developing plugins](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-plug-ins.html#GUID-7F4EE3EA-076C-45DB-B13D-2905AB5AF6CB) +[Oracle Identity Governance developer's guide - Developing plugins](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-plug-ins.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/postman-scim-samples/README.md b/security/identity-and-access-management/oracle-identity-governance/postman-scim-samples/README.md index 450f0782e..9f288493f 100644 --- a/security/identity-and-access-management/oracle-identity-governance/postman-scim-samples/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/postman-scim-samples/README.md @@ -2,7 +2,7 @@ A Postman collection of sample SCIM API requests for Oracle Identity Governance (OIG) that showcases the ability to quickly create organizations, managers and users via SCIM API calls. Note that these samples are meant for reference only and are not intended for use in production systems. -Review Date: 04.08.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -22,7 +22,7 @@ The collection can be used for demonstration purposes, to showcase the SCIM capa # Useful Links -- [Oracle Identity Governance SCIM API reference](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/using-scim-rest-services.html) +- [Oracle Identity Governance SCIM API reference](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/using-scim-rest-services.html) - [Postman collections guide](https://learning.postman.com/docs/collections/collections-overview/) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-extension-notification/README.md b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-extension-notification/README.md index 4b93cc784..080523a83 100644 --- a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-extension-notification/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-extension-notification/README.md @@ -6,7 +6,7 @@ The scheduled task needs to be used in conjunction with the Extend Access WebSer Developed on and compatible with OIG 11g R2 PS3 and above. -Review Date: 28.10.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -47,11 +47,11 @@ The following items need to be populated as part of the scheduled job parameters - SMTP Mail Server TLS: Enable or disable SMTP TLS, e.g. No - SMTP Mail Server Port: Port of the SMTP Mail server, e.g. 25 -[Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omusg/managing-jobs-1.html#GUID-71BB3623-AEE2-4F64-BBD4-D921DCA39D7C) on how to manually start or schedule a job. +[Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omadm/managing-scheduler.html#GUID-32651CE3-2B3B-4BAA-8DDA-CEFD6AB26EBF) on how to manually start or schedule a job. # Useful Links -[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-scheduled-tasks.html#GUID-F62EF833-1E70-41FC-9DCC-C1EAB407D151) +[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-scheduled-tasks.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-termination-notification/README.md b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-termination-notification/README.md index 9d49d9f91..56c41a2d9 100644 --- a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-termination-notification/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/access-termination-notification/README.md @@ -6,7 +6,7 @@ In case access extensions or a more complex handling of email contents are also Developed on and compatible with OIG 11g R2 PS3 and above. -Review Date: 28.10.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -32,11 +32,11 @@ The following items need to be populated as part of the scheduled job parameters - Days Before Expiration: Number of days before the email is sent, e.g. 7 - Email Template Name: Email template name for the email, e.g. Access_Termination_Template -[Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omusg/managing-jobs-1.html#GUID-71BB3623-AEE2-4F64-BBD4-D921DCA39D7C) on how to manually start or schedule a job. +[Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omadm/managing-scheduler.html#GUID-32651CE3-2B3B-4BAA-8DDA-CEFD6AB26EBF) on how to manually start or schedule a job. # Useful Links -[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-scheduled-tasks.html#GUID-F62EF833-1E70-41FC-9DCC-C1EAB407D151) +[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-scheduled-tasks.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/loa-account-disable/README.md b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/loa-account-disable/README.md index 98ddd17c3..f5c1c2430 100644 --- a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/loa-account-disable/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/loa-account-disable/README.md @@ -29,11 +29,11 @@ Please see the useful link below for detailed build and deployment steps. - Ensure you have specified a relevant value for the `LOA end date user attribute` scheduler parameter field in the scheduled task definition. Note that a either a UDF (User-defined field) or a pre-existing user attribute can be used. This value needs to contain the attribute's display label, not the backend name (e.g., `User Login`, not `USR_LOGIN`). -- [Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omusg/managing-jobs-1.html#GUID-71BB3623-AEE2-4F64-BBD4-D921DCA39D7C) on how to manually start or schedule a job. +- [Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omadm/managing-scheduler.html#GUID-32651CE3-2B3B-4BAA-8DDA-CEFD6AB26EBF) on how to manually start or schedule a job. # Useful Links -[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-scheduled-tasks.html#GUID-F62EF833-1E70-41FC-9DCC-C1EAB407D151) +[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-scheduled-tasks.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/temporary-user-disable/README.md b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/temporary-user-disable/README.md index 225362dd8..4d52272ce 100644 --- a/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/temporary-user-disable/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/scheduled-task-samples/temporary-user-disable/README.md @@ -6,7 +6,7 @@ Can be used as a basis to demonstrate or further customize a "leave of absence" Developed on and compatible with OIG 11g R2 PS3 and above. -Review Date: 24.10.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -29,11 +29,11 @@ Please see the useful link below for detailed build and deployment steps. - Ensure you have specified a relevant value for the `Temporary disable date user attribute` scheduler parameter field in the scheduled task definition. Note that a either a UDF (User-defined field) or a pre-existing user attribute can be used. This value needs to contain the attribute's display label, not the backend name (e.g., `User Login`, not `USR_LOGIN`). -- [Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omusg/managing-jobs-1.html#GUID-71BB3623-AEE2-4F64-BBD4-D921DCA39D7C) on how to manually start or schedule a job. +- [Consult this section](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omadm/managing-scheduler.html#GUID-32651CE3-2B3B-4BAA-8DDA-CEFD6AB26EBF) on how to manually start or schedule a job. # Useful Links -[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-scheduled-tasks.html#GUID-F62EF833-1E70-41FC-9DCC-C1EAB407D151) +[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-scheduled-tasks.html) # License diff --git a/security/identity-and-access-management/oracle-identity-governance/webservice-samples/extend-access-ws/README.md b/security/identity-and-access-management/oracle-identity-governance/webservice-samples/extend-access-ws/README.md index 28d7b3d03..20aa0b29a 100644 --- a/security/identity-and-access-management/oracle-identity-governance/webservice-samples/extend-access-ws/README.md +++ b/security/identity-and-access-management/oracle-identity-governance/webservice-samples/extend-access-ws/README.md @@ -6,7 +6,7 @@ The scheduled task needs to be used in conjunction with the Access Extension Not Developed on and compatible with OIG 11g R2 PS3 and above. -Review Date: 28.10.2024 +Review Date: 11.11.2025 # When to use this asset? @@ -63,7 +63,7 @@ Please see the useful link below for detailed build and deployment steps. [The Java API for RESTful Web Services (JAX-RS)](https://www.oracle.com/technical-resources/articles/java/jax-rs.html) [JSR 311: JAX-RS: The JavaTM API for RESTful Web Services](https://jcp.org/en/jsr/detail?id=311) -[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/12.2.1.4/omdev/developing-scheduled-tasks.html#GUID-F62EF833-1E70-41FC-9DCC-C1EAB407D151) +[Oracle Identity Governance developer's guide - Developing scheduled tasks](https://docs.oracle.com/en/middleware/idm/identity-governance/14.1.2/omdev/developing-scheduled-tasks.html) # License diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/README.md b/security/security-design/shared-assets/oci-security-health-check-standard/README.md index b72639240..332ab42e8 100644 --- a/security/security-design/shared-assets/oci-security-health-check-standard/README.md +++ b/security/security-design/shared-assets/oci-security-health-check-standard/README.md @@ -2,7 +2,7 @@ Owner: Olaf Heimburger -Version: 250722 (cis_report.py version 3.0.1) for CIS OCI Foundation Benchmark 3.0.0 +Version: 251104 (cis_report.py version 3.1.0) for CIS OCI Foundation Benchmark 3.0.0 # Introduction ![Flyer](./files/resources/OCI_Security_Health_Check_Standard.png) @@ -57,22 +57,22 @@ See the *OCI Security Health Check - Standard Edition* in action and watch the [ Before running the *OCI Security Health Check - Standard Edition* you should download and verify it. - - Download the latest distribution [oci-security-health-check-standard-250722.zip](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.zip). + - Download the latest distribution [oci-security-health-check-standard-251104.zip](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.zip). - Download the respective checksum file: - - [oci-security-health-check-standard-250722.sha512](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512). - - [oci-security-health-check-standard-250722.sha512256](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512256). + - [oci-security-health-check-standard-251104.sha512](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512). + - [oci-security-health-check-standard-251104.sha512256](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512256). - Verify the integrity of the distribution. Both files must be in the same directory (for example, in your downloads directory). On MacOS: ``` cd - shasum -a 512256 -c oci-security-health-check-standard-250722.sha512256 + shasum -a 512256 -c oci-security-health-check-standard-251104.sha512256 ``` On Linux (including Cloud Shell): ``` cd - sha512sum -c oci-security-health-check-standard-250722.sha512 + sha512sum -c oci-security-health-check-standard-251104.sha512 ``` **Reject the downloaded file if the check fails!** @@ -85,10 +85,10 @@ In OCI Cloud Shell you can do a short cut without downloading the files mentione 2. Open Cloud Shell 3. Run these commands in your Cloud Shell: ``` - wget -q https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.zip - wget -q https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512 - sha512sum -c oci-security-health-check-standard-250722.sha512 - unzip -q oci-security-health-check-standard-250722.zip + wget -q https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.zip + wget -q https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512 + sha512sum -c oci-security-health-check-standard-251104.sha512 + unzip -q oci-security-health-check-standard-251104.zip ``` ## Prepare the OCI Tenancy diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/README.md b/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/README.md index 0c1bfb930..dea8b8f3d 100644 --- a/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/README.md +++ b/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/README.md @@ -2,7 +2,7 @@ Owner: Olaf Heimburger -Version: 250722 (cis_report.py version 3.0.1) for CIS OCI Foundation Benchmark 3.0.0 +Version: 251104 (cis_report.py version 3.1.0) for CIS OCI Foundation Benchmark 3.0.0 ## When to use this asset? @@ -47,22 +47,22 @@ Tested on **OCI Cloud Shell** with **Public network**, **Oracle Linux**, **MacOS Before running the *OCI Security Health Check - Standard Edition* you should download and verify it. - - Download the latest distribution [oci-security-health-check-standard-250722.zip](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.zip). + - Download the latest distribution [oci-security-health-check-standard-251104.zip](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.zip). - Download the respective checksum file: - - [oci-security-health-check-standard-250722.sha512](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512). - - [oci-security-health-check-standard-250722.sha512256](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512256). + - [oci-security-health-check-standard-251104.sha512](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512). + - [oci-security-health-check-standard-251104.sha512256](https://github.com/oracle-devrel/technology-engineering/raw/main/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512256). - Verify the integrity of the distribution. Both files must be in the same directory (for example, in your downloads directory). On MacOS: ``` cd - shasum -a 512256 -c oci-security-health-check-standard-250722.sha512256 + shasum -a 512256 -c oci-security-health-check-standard-251104.sha512256 ``` On Linux (including Cloud Shell): ``` cd - sha512sum -c oci-security-health-check-standard-250722.sha512 + sha512sum -c oci-security-health-check-standard-251104.sha512 ``` **Reject the downloaded file when the check fails!** @@ -225,7 +225,7 @@ allow group 'Default'/'grp-auditors' to inspect vcns in compartment = 14: @@ -4069,6 +4196,7 @@ def __report_cis_analyze_tenancy_data(self): self.cis_foundations_benchmark_3_0['1.5']['Total'] = self.__identity_domains self.cis_foundations_benchmark_3_0['1.6']['Total'] = self.__identity_domains + def __cis_check_users(self): # 1.7 Check - Local Users w/o MFA for user in self.__users: if not(user['is_federated']) and user['can_use_console_password'] and not (user['is_mfa_activated']) and user['lifecycle_state']: @@ -4089,6 +4217,7 @@ def __report_cis_analyze_tenancy_data(self): "user_name": user['name'], "user_id": user['id'], "key_id": key['id'], + "domain_deeplink": user['domain_deeplink'], 'fingerprint': key['fingerprint'], # 'inactive_status': key['inactive_status'], # 'lifecycle_state': key['lifecycle_state'], @@ -4112,6 +4241,7 @@ def __report_cis_analyze_tenancy_data(self): "user_id": user['id'], "id": key['id'], 'display_name': key['display_name'], + "domain_deeplink": user['domain_deeplink'], # 'inactive_status': key['inactive_status'], # 'lifecycle_state': key['lifecycle_state'], 'time_created': key['time_created'], @@ -4134,10 +4264,11 @@ def __report_cis_analyze_tenancy_data(self): "user_name": user['name'], "user_id": user['id'], "id": key['id'], + "domain_deeplink": user['domain_deeplink'], "description": key['description'], # "inactive_status": key['inactive_status'], # "lifecycle_state": key['lifecycle_state'], - "time_created": key['time_created'], + "time_created": key['time_created'] # "time_expires": key['time_expires'], # "token": key['token'] } @@ -4148,7 +4279,7 @@ def __report_cis_analyze_tenancy_data(self): # CIS Total 1.10 Adding - Keys to CIS Total self.cis_foundations_benchmark_3_0['1.10']['Total'].append( key) - # CIS 1.11 Check - Old DB Password + # CIS 1.11 Check - Old DB Password #__iso_time_format1 = "%Y-%m-%dT%H:%M:%S.%fZ" for user in self.__users: if user['database_passwords']: @@ -4160,8 +4291,9 @@ def __report_cis_analyze_tenancy_data(self): "user_name": user['name'], "user_id": user['id'], "id": key['ocid'], + "domain_deeplink": user['domain_deeplink'], "description": key['description'], - "time_created": key['time_created'], + "time_created": key['time_created'] # "expires-on": key['expires_on'] } @@ -4186,7 +4318,7 @@ def __report_cis_analyze_tenancy_data(self): # CIS 1.13 Check - This check is complete uses email verification # Iterating through all users to see if they have API Keys and if they are active users for user in self.__users: - if user['external_identifier'] is None and user['lifecycle_state'] and not (user['email_verified']): + if not (user['is_federated'] and user['lifecycle_state']) and user['external_identifier'] is None and user['lifecycle_state'] and not user['email_verified']: self.cis_foundations_benchmark_3_0['1.13']['Status'] = False self.cis_foundations_benchmark_3_0['1.13']['Findings'].append( user) @@ -4194,54 +4326,6 @@ def __report_cis_analyze_tenancy_data(self): # CIS Total 1.13 Adding - All IAM Users for to CIS Total self.cis_foundations_benchmark_3_0['1.13']['Total'] = self.__users - # CIS 1.14 Check - Ensure Dynamic Groups are used for OCI instances, OCI Cloud Databases and OCI Function to access OCI resources - # Iterating through all dynamic groups ensure there are some for fnfunc, instance or autonomous. Using reverse logic so starts as a false - for dynamic_group in self.__dynamic_groups: - if any(oci_resource.upper() in str(dynamic_group['matching_rule'].upper()) for oci_resource in self.cis_iam_checks['1.14']['resources']): - self.cis_foundations_benchmark_3_0['1.14']['Status'] = True - else: - self.cis_foundations_benchmark_3_0['1.14']['Findings'].append( - dynamic_group) - # Clearing finding - if self.cis_foundations_benchmark_3_0['1.14']['Status']: - self.cis_foundations_benchmark_3_0['1.14']['Findings'] = [] - - # CIS Total 1.14 Adding - All Dynamic Groups for to CIS Total - self.cis_foundations_benchmark_3_0['1.14']['Total'] = self.__dynamic_groups - - # CIS 1.15 Check - Ensure storage service-level admins cannot delete resources they manage. - # Iterating through all policies - for policy in self.__policies: - if policy['name'].lower() not in ['tenant admin policy', 'psm-root-policy']: - for statement in policy['statements']: - for resource in self.cis_iam_checks['1.15']: - if "allow group".upper() in statement.upper() and "to manage ".upper() in statement.upper() and resource.upper() in statement.upper(): - split_statement = statement.split("where") - if len(split_statement) == 2: - clean_where_clause = split_statement[1].upper().replace(" ", "").replace("'", "") - if all(permission.upper() in clean_where_clause for permission in self.cis_iam_checks['1.15'][resource]) and \ - not(all(permission.upper() in clean_where_clause for permission in self.cis_iam_checks['1.15-storage-admin'][resource])): - debug("__report_cis_analyze_tenancy_data CIS 1.15 no permissions to delete storage: " + str(policy['name'])) - pass - # Checking if this is the Storage admin with allowed - elif all(permission.upper() in clean_where_clause for permission in self.cis_iam_checks['1.15-storage-admin'][resource]) and \ - not(all(permission.upper() in clean_where_clause for permission in self.cis_iam_checks['1.15'][resource])): - debug("__report_cis_analyze_tenancy_data CIS 1.15 storage admin policy is: " + str(policy['name'])) - pass - else: - self.cis_foundations_benchmark_3_0['1.15']['Findings'].append(policy) - debug("__report_cis_analyze_tenancy_data CIS 1.15 else policy is\n: " + str(policy['name'])) - - else: - self.cis_foundations_benchmark_3_0['1.15']['Findings'].append(policy) - - if self.cis_foundations_benchmark_3_0['1.15']['Findings']: - self.cis_foundations_benchmark_3_0['1.15']['Status'] = False - else: - self.cis_foundations_benchmark_3_0['1.15']['Status'] = True - - # CIS Total 1.15 Adding - All IAM Policies for to CIS Total - self.cis_foundations_benchmark_3_0['1.15']['Total'] = self.__policies # CIS 1.16 Check - Users with API Keys over 45 days @@ -4249,14 +4333,14 @@ def __report_cis_analyze_tenancy_data(self): login_over_45_days = None api_key_over_45_days = None - if user['lifecycle_state']: # and not(user['is_federated']) and user['can_use_console_password']: - debug(f'__report_cis_analyze_tenancy_data CIS 1.16 Login Over 45 days is: {login_over_45_days}') + if user['lifecycle_state'] and user['can_use_console_password'] and not(user['is_federated']): #and user['can_use_console_password']: if user['last_successful_login_date']: last_successful_login_date = user['last_successful_login_date'].split(".")[0] if self.local_user_time_max_datetime > datetime.datetime.strptime(last_successful_login_date, self.__iso_time_format): login_over_45_days = True debug(f"__report_cis_analyze_tenancy_data CIS 1.16 Last login is {user['last_successful_login_date']} and max login is {self.local_user_time_max_datetime}") else: + debug(f"__report_cis_analyze_tenancy_data CIS 1.16 Last login is {user['last_successful_login_date']} and max login is {self.local_user_time_max_datetime}") login_over_45_days = False else: debug("__report_cis_analyze_tenancy_data CIS 1.16 No Last login") @@ -4266,20 +4350,16 @@ def __report_cis_analyze_tenancy_data(self): debug("__report_cis_analyze_tenancy_data CIS 1.16 INACTIVE USE") login_over_45_days = False - if user['api_keys']: - debug("__report_cis_analyze_tenancy_data CIS 1.16 API Key Check") - for api_key in user['api_keys']: - if api_key['apikey_used_in_45_days']: - api_key_over_45_days = True - else: - debug("__report_cis_analyze_tenancy_data CIS 1.16 API Key used in under 45 days") - api_key_over_45_days = True - # else: - # api_key_over_45_days = False - - debug(f"__report_cis_analyze_tenancy_data CIS 1.16 User: {user['id']}") - debug(f'__report_cis_analyze_tenancy_data CIS 1.16 Over Login Over 45: {login_over_45_days}') - debug(f'__report_cis_analyze_tenancy_data CIS 1.16 Over API Key Over 45: {api_key_over_45_days}') + if user['api_keys'] and user['lifecycle_state']: + print("__report_cis_analyze_tenancy_data CIS 1.16 API Key Check") + api_key_over_45_days = not(all(key.get('apikey_used_in_45_days', False) for key in user['api_keys'])) + else: + api_key_over_45_days = False + + debug(f"__report_cis_analyze_tenancy_data CIS 1.16 User: {user['name']}") + debug(f"__report_cis_analyze_tenancy_data CIS 1.16 Domain: {user['domain_deeplink']}") + debug(f'__report_cis_analyze_tenancy_data CIS 1.16 Login Over 45: {login_over_45_days}') + debug(f'__report_cis_analyze_tenancy_data CIS 1.16 API Key Over 45: {api_key_over_45_days}') if login_over_45_days or api_key_over_45_days: finding = user.copy() finding['login_over_45_days'] = login_over_45_days @@ -4291,7 +4371,7 @@ def __report_cis_analyze_tenancy_data(self): else: self.cis_foundations_benchmark_3_0['1.16']['Status'] = True - # CIS Total 1.15 Adding - All IAM Policies for to CIS Total + # CIS Total 1.16 Adding - All IAM Policies for to CIS Total self.cis_foundations_benchmark_3_0['1.16']['Total'] = self.__users @@ -4307,8 +4387,25 @@ def __report_cis_analyze_tenancy_data(self): self.cis_foundations_benchmark_3_0['1.17']['Status'] = True # CIS Total 1.17 Adding - All IAM Policies for to CIS Total self.cis_foundations_benchmark_3_0['1.17']['Total'] = self.__users + + def __cis_check_dynamic_groups(self): + # CIS 1.14 Check - Ensure Dynamic Groups are used for OCI instances, OCI Cloud Databases and OCI Function to access OCI resources + # Iterating through all dynamic groups ensure there are some for fnfunc, instance or autonomous. Using reverse logic so starts as a false + for dynamic_group in self.__dynamic_groups: + if any(oci_resource.upper() in str(dynamic_group['matching_rule'].upper()) for oci_resource in self.cis_iam_checks['1.14']['resources']): + self.cis_foundations_benchmark_3_0['1.14']['Status'] = True + else: + self.cis_foundations_benchmark_3_0['1.14']['Findings'].append( + dynamic_group) + # Clearing finding + if self.cis_foundations_benchmark_3_0['1.14']['Status']: + self.cis_foundations_benchmark_3_0['1.14']['Findings'] = [] + + # CIS Total 1.14 Adding - All Dynamic Groups for to CIS Total + self.cis_foundations_benchmark_3_0['1.14']['Total'] = self.__dynamic_groups - # CIS 2.1, 2.2, & 2.5 Check - Security List Ingress from 0.0.0.0/0 on ports 22, 3389 + def __cis_check_network_security(self): + # CIS 2.1, 2.2 Check - Security List Ingress from 0.0.0.0/0 on ports 22, 3389 for sl in self.__network_security_lists: for irule in sl['ingress_security_rules']: if irule['source'] == "0.0.0.0/0" and irule['protocol'] == '6': @@ -4392,7 +4489,7 @@ def __report_cis_analyze_tenancy_data(self): self.cis_foundations_benchmark_3_0['2.4']['Findings'].append(nsg) break - # CIS Total 2.2 & 2.4 Adding - All NSGs Instances to CIS Total + # CIS Total 2.3 & 2.4 Adding - All NSGs Instances to CIS Total self.cis_foundations_benchmark_3_0['2.3']['Total'] = self.__network_security_groups self.cis_foundations_benchmark_3_0['2.4']['Total'] = self.__network_security_groups @@ -4434,40 +4531,36 @@ def __report_cis_analyze_tenancy_data(self): if autonomous_database['lifecycle_state'] not in [ oci.database.models.AutonomousDatabaseSummary.LIFECYCLE_STATE_TERMINATED, oci.database.models.AutonomousDatabaseSummary.LIFECYCLE_STATE_TERMINATING, oci.database.models.AutonomousDatabaseSummary.LIFECYCLE_STATE_UNAVAILABLE ]: if not (autonomous_database['whitelisted_ips']) and not (autonomous_database['subnet_id']): self.cis_foundations_benchmark_3_0['2.8']['Status'] = False - self.cis_foundations_benchmark_3_0['2.8']['Findings'].append( - autonomous_database) + self.cis_foundations_benchmark_3_0['2.8']['Findings'].append(autonomous_database) elif autonomous_database['whitelisted_ips']: for value in autonomous_database['whitelisted_ips']: - if '0.0.0.0/0' in str(autonomous_database['whitelisted_ips']): + if '0.0.0.0/0' in str(value): self.cis_foundations_benchmark_3_0['2.8']['Status'] = False - self.cis_foundations_benchmark_3_0['2.8']['Findings'].append( - autonomous_database) + self.cis_foundations_benchmark_3_0['2.8']['Findings'].append(autonomous_database) # CIS Total 2.8 Adding - All ADBs to CIS Total self.cis_foundations_benchmark_3_0['2.8']['Total'] = self.__autonomous_databases - # From CIS 2.0 CIS 4.1 Check - Ensure Audit log retention == 365 - Only checking in home region - # if self.__audit_retention_period >= 365: - # self.cis_foundations_benchmark_3_0['4.1']['Status'] = True - + def __cis_check_compute_instances(self): for instance in self.__Instance: - # CIS Check 3.1 Metadata Service v2 Enabled - if instance['instance_options'] is None or not(instance['instance_options']['are_legacy_imds_endpoints_disabled']): - debug(f"__report_cis_analyze_tenancy_data {instance['display_name']} doesn't disable IMDSv1") - self.cis_foundations_benchmark_3_0['3.1']['Status'] = False - self.cis_foundations_benchmark_3_0['3.1']['Findings'].append(instance) - - # CIS Check 3.2 Secure Boot enabled - if instance['platform_config'] is None or not(instance['platform_config']['is_secure_boot_enabled']): - debug(f"__report_cis_analyze_tenancy_data {instance['display_name']} doesn't enable secure boot") - self.cis_foundations_benchmark_3_0['3.2']['Status'] = False - self.cis_foundations_benchmark_3_0['3.2']['Findings'].append(instance) - - # CIS Check 3.3 Encryption in Transit enabled - if instance['launch_options'] is None or not(instance['launch_options']['is_pv_encryption_in_transit_enabled']): - debug(f"__report_cis_analyze_tenancy_data {instance['display_name']} doesn't enable encryption in transit") - self.cis_foundations_benchmark_3_0['3.3']['Status'] = False - self.cis_foundations_benchmark_3_0['3.3']['Findings'].append(instance) + if instance['lifecycle_state'] not in ["TERMINATED","TERMINATING"]: + # CIS Check 3.1 Metadata Service v2 Enabled + if instance['instance_options'] is None or not(instance['instance_options']['are_legacy_imds_endpoints_disabled']): + debug(f"__report_cis_analyze_tenancy_data {instance['display_name']} doesn't disable IMDSv1") + self.cis_foundations_benchmark_3_0['3.1']['Status'] = False + self.cis_foundations_benchmark_3_0['3.1']['Findings'].append(instance) + + # CIS Check 3.2 Secure Boot enabled + if instance['platform_config'] is None or not(instance['platform_config']['is_secure_boot_enabled']): + debug(f"__report_cis_analyze_tenancy_data {instance['display_name']} doesn't enable secure boot") + self.cis_foundations_benchmark_3_0['3.2']['Status'] = False + self.cis_foundations_benchmark_3_0['3.2']['Findings'].append(instance) + + # CIS Check 3.3 Encryption in Transit enabled + if instance['launch_options'] is None or not(instance['launch_options']['is_pv_encryption_in_transit_enabled']): + debug(f"__report_cis_analyze_tenancy_data {instance['display_name']} doesn't enable encryption in transit") + self.cis_foundations_benchmark_3_0['3.3']['Status'] = False + self.cis_foundations_benchmark_3_0['3.3']['Findings'].append(instance) # CIS Total 3.1 Adding - All Instances to CIS Total self.cis_foundations_benchmark_3_0['3.1']['Total'] = self.__Instance @@ -4476,6 +4569,7 @@ def __report_cis_analyze_tenancy_data(self): # CIS Total 3.3 Adding - All Instances to CIS Total self.cis_foundations_benchmark_3_0['3.3']['Total'] = self.__Instance + def __cis_check_tagging_and_monitoring(self): # CIS Check 4.1 - Check for Default Tags in Root Compartment # Iterate through tags looking for ${iam.principal.name} for tag in self.__tag_defaults: @@ -4657,6 +4751,7 @@ def __report_cis_analyze_tenancy_data(self): # CIS Check 4.17 Total - Adding All Buckets to total self.cis_foundations_benchmark_3_0['4.17']['Total'] = self.__buckets + def __cis_check_storage(self): # CIS Section 5.1 Bucket Checks # Generating list of buckets names for bucket in self.__buckets: @@ -4704,7 +4799,7 @@ def __report_cis_analyze_tenancy_data(self): boot_volume) self.cis_foundations_benchmark_3_0['5.2.2']['Status'] = False - # CIS Check 4.2.2 Total - Adding All Block Volumes to total + # CIS Check 5.2.2 Total - Adding All Block Volumes to total self.cis_foundations_benchmark_3_0['5.2.2']['Total'] = self.__boot_volumes # CIS Section 5.3.1 FSS Checks @@ -4719,6 +4814,7 @@ def __report_cis_analyze_tenancy_data(self): # CIS Check 4.3.1 Total - Adding All Block Volumes to total self.cis_foundations_benchmark_3_0['5.3.1']['Total'] = self.__file_storage_system + def __cis_check_assets(self): # CIS Section 6 Checks # Checking if more than one compartment because of the ManagedPaaS Compartment if len(self.__compartments) < 2: @@ -4774,21 +4870,41 @@ def __obp_init_regional_checks(self): "drgs": [], "findings": [], "status": False - }, + } } + ########################################################################## - # OBP Budgets Check + # OBP Budgets Check ########################################################################## def __obp_check_budget(self): if len(self.__budgets) > 0: for budget in self.__budgets: - if budget['alert_rule_count'] > 0 and budget['target_compartment_id'] == self.__tenancy.id: - self.obp_foundations_checks['Cost_Tracking_Budgets']['Status'] = True - self.obp_foundations_checks['Cost_Tracking_Budgets']['OBP'].append(budget) + if ( + budget["alert_rule_count"] > 0 + and budget["target_compartment_id"] == self.__tenancy.id + ): + for alert in budget["alerts"]: + if alert.type == "FORECAST": + self.obp_foundations_checks["Cost_Tracking_Budgets"]["Status"] = True + self.obp_foundations_checks["Cost_Tracking_Budgets"]["OBP"].append(budget) + break + else: + self.obp_foundations_checks["Cost_Tracking_Budgets"]["Findings"].append(budget) else: - self.obp_foundations_checks['Cost_Tracking_Budgets']['Findings'].append(budget) + self.obp_foundations_checks["Cost_Tracking_Budgets"]["Findings"].append(budget) + + ####################################### + # OBP Quotas Checks + ####################################### + def __obp_check_quotas(self): + if self.__quotas: + self.obp_foundations_checks['Quotas']['Status'] = True + self.obp_foundations_checks['Quotas']['OBP'] = self.__quotas + ####################################### + # OBP Audit Logs to SIEM check + ####################################### def __obp_check_audit_log_compartments(self): # Building a Hash Table of Parent Child Hierarchy for Audit dict_of_compartments = {} @@ -4922,7 +5038,10 @@ def __obp_check_audit_log_compartments(self): exists_already = list(filter(lambda source: source['id'] == record['id'] and source['region'] == record['region'], self.obp_foundations_checks['SIEM_Audit_Log_All_Comps']['OBP'])) if not exists_already: self.obp_foundations_checks['SIEM_Audit_Log_All_Comps']['OBP'].append(record) - + + ####################################### + # OBP Cloud Guard Check + ####################################### def __obp_check_cloud_guard(self): ####################################### # Cloud Guard Checks @@ -5270,6 +5389,58 @@ def __obp_check_bucket_logs(self): self.obp_foundations_checks['SIEM_Read_Bucket_Logs']['Status'] = True + ####################################### + # OBP Service Limit Check + ####################################### + def __obp_check_close_service_limits(self): + if True: + for limit in self.__service_limits: + # If the limit is greater than 80% we should note it for an OBP + if limit['service_limit_availability'] and limit['service_limit_availability'] >= 80.0: + self.obp_foundations_checks['Service_Limits']['Findings'].append(limit) + else: + self.obp_foundations_checks['Service_Limits']['OBP'].append(limit) + + if self.obp_foundations_checks['Service_Limits']['Findings']: + self.obp_foundations_checks['Service_Limits']['Status'] = False + elif self.obp_foundations_checks['Service_Limits']['OBP']: + self.obp_foundations_checks['Service_Limits']['Status'] = True + ####################################### + # OBP ADB Checks + ####################################### + def __obp_check_adbs(self): + for adb in self.__autonomous_databases: + if not adb['is_mtls_connection_required']: + self.obp_foundations_checks['ADB_MTLS']['Findings'].append(adb) + else: + self.obp_foundations_checks['ADB_MTLS']['OBP'].append(adb) + if not adb['encryption_key']['provider'] == 'ORACLE_MANAGED': + self.obp_foundations_checks['ADB_CMK']['Findings'].append(adb) + else: + self.obp_foundations_checks['ADB_CMK']['OBP'].append(adb) + + if not adb['private_endpoint_ip']: + self.obp_foundations_checks['ADB_Private_IP']['Findings'].append(adb) + else: + self.obp_foundations_checks['ADB_Private_IP']['OBP'].append(adb) + + if not adb['data_safe_status'] == "REGISTERED": + self.obp_foundations_checks['ADB_DataSafe']['Findings'].append(adb) + else: + self.obp_foundations_checks['ADB_DataSafe']['OBP'].append(adb) + + if not adb['customer_contacts']: + self.obp_foundations_checks['ADB_Contacts']['Findings'].append(adb) + else: + self.obp_foundations_checks['ADB_Contacts']['OBP'].append(adb) + + for key in self.obp_foundations_checks.keys(): + if key.startswith("ADB_"): + if self.obp_foundations_checks[key]['Findings']: + self.obp_foundations_checks[key]['Status'] = False + else: + self.obp_foundations_checks[key]['Status'] = True + ########################################################################## # Analyzes Tenancy Data for Oracle Best Practices Report ########################################################################## @@ -5282,7 +5453,9 @@ def __obp_analyze_tenancy_data(self): self.__obp_check_certificates() self.__obp_check_bucket_logs() self.__obp_check_subnet_logs() - + self.__obp_check_close_service_limits() + self.__obp_check_adbs() + self.__obp_check_quotas() ########################################################################## # Orchestrates data collection and CIS report generation @@ -5319,8 +5492,8 @@ def __report_generate_cis_report(self, level): "Total": (str(len(recommendation['Total'])) if len(recommendation['Total']) > 0 else " "), "Compliance Percentage Per Recommendation": compliance_percentage, "Title": recommendation['Title'], - "CIS v8": recommendation['CISv8'], - "CCCS Guard Rail": recommendation['CCCS Guard Rail'], + self.__primary_framework_name : recommendation[self.__primary_framework_name], + self.__other_framework_name : recommendation[self.__other_framework_name], "Filename": report_filename if len(recommendation['Findings']) > 0 else " ", "Remediation": self.cis_report_data[key]['Remediation'] } @@ -5356,7 +5529,7 @@ def __report_generate_cis_report(self, level): summary_file_name = self.__print_to_json_file("cis", "summary_report", summary_report) summary_files.append(summary_file_name) - summary_file_name = self.__report_generate_html_summary_report("cis", "html_summary_report", summary_report) + summary_file_name = self.__report_generate_html_summary_report("cis", "summary_report", summary_report) summary_files.append(summary_file_name) if OUTPUT_DIAGRAMS: @@ -5373,7 +5546,7 @@ def __report_generate_cis_report(self, level): for key, recommendation in self.cis_foundations_benchmark_3_0.items(): if recommendation['Level'] <= level: - report_file_name = self.__print_to_csv_file("cis", recommendation['section'] + "_" + recommendation['recommendation_#'], recommendation['Findings']) + report_file_name = self.__print_to_csv_file("cis", f"{recommendation['section']}_{recommendation['recommendation_#']}", recommendation['Findings']) if report_file_name and self.__output_bucket: self.__os_copy_report_to_object_storage( self.__output_bucket, report_file_name) @@ -5381,8 +5554,10 @@ def __report_generate_cis_report(self, level): ########################################################################## # Generate summary diagrams ########################################################################## - diagram_colors = ['#4C825C','#C74634'] + diagram_colors = ['#4C825C', '#C74634'] diagram_values = ['Compliant', 'Non-compliant'] + diagram_colors_na = ['#4C825C', '#C74634', '#E0DEDE'] + diagram_values_na = ['Compliant', 'Non-compliant', 'Not applicable'] diagram_sections = ( 'Identity and Access Management', 'Networking', @@ -5397,17 +5572,22 @@ def __report_generate_cis_report(self, level): ########################################################################## # __cis_compliance ########################################################################## - def __cis_compliance(self, filename, title, values=None): + def __cis_compliance(self, filename, title, values=None, has_na_values=False): plt.close('all') - plt.figure(figsize=(6,5)) - wegdes, labels, pcttexts = plt.pie(values, labels=self.diagram_values, colors=self.diagram_colors, autopct='%.0f%%', wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'}, startangle=90, counterclock=False, radius=1.1) + plt.figure(figsize=(6, 5)) + labels = self.diagram_values + colors = self.diagram_colors + if has_na_values: + labels = self.diagram_values_na + colors = self.diagram_colors_na + wegdes, labels, pcttexts = plt.pie(values, labels=labels, colors=colors, autopct='%.0f%%', wedgeprops={'linewidth': 3.0, 'edgecolor': 'white'}, startangle=90, counterclock=False, radius=1.1) for t in labels: t.set_fontweight(self.diagram_fontweight) for p in pcttexts: p.set_fontweight(self.diagram_fontweight) p.set_color(self.diagram_fontcolor_reverse) plt.title(title, fontweight=self.diagram_fontweight, pad=30.0) - plt.savefig(filename) + plt.savefig(filename, transparent=True) ########################################################################## # __cis_compliance_by_area @@ -5415,7 +5595,7 @@ def __cis_compliance(self, filename, title, values=None): def __cis_compliance_by_area(self, filename, title, section_values=None): plt.close('all') height = 0.4 - fig, ax = plt.subplots(figsize=(10,5), layout='compressed') + fig, ax = plt.subplots(figsize=(10, 5), layout='compressed') y = np.arange(len(self.diagram_sections)) p = ax.barh(y - height/2, section_values[self.diagram_values[0]], height, color=self.diagram_colors[0]) ax.bar_label(p, padding=-16, color=self.diagram_fontcolor_reverse, fontweight=self.diagram_fontweight) @@ -5427,7 +5607,7 @@ def __cis_compliance_by_area(self, filename, title, section_values=None): ax.set_yticklabels(self.diagram_sections, fontweight=self.diagram_fontweight) ax.invert_yaxis() plt.tick_params(left=False, right=False, labelbottom=False, bottom=False) - plt.savefig(filename) + plt.savefig(filename, transparent=True) ########################################################################## # __generate_compliance_diagram @@ -5435,13 +5615,16 @@ def __cis_compliance_by_area(self, filename, title, section_values=None): def __generate_compliance_diagram(self, header, file_subject, data): compliant = 0 non_compliant = 0 + not_applicable = 0 for finding in data: if finding['Compliant'] == 'Yes': compliant += 1 + elif finding['Compliant'] == 'N/A': + not_applicable += 1 else: non_compliant += 1 cis_compliance_file = self.__get_output_file_path(header, file_subject, '.png') - self.__cis_compliance(cis_compliance_file, 'CIS Recommendation Compliance', [compliant, non_compliant]) + self.__cis_compliance(cis_compliance_file, 'CIS Recommendation Compliance', [compliant, non_compliant, not_applicable] if not_applicable > 0 else [compliant, non_compliant], has_na_values=True if not_applicable > 0 else False) return cis_compliance_file ########################################################################## @@ -5453,10 +5636,13 @@ def __generate_compliance_by_area_diagram(self, header, file_subject, data): for section in self.diagram_sections: compliant = 0 non_compliant = 0 + not_applicable = 0 for finding in data: if section in finding['Section']: if finding['Compliant'] == 'Yes': compliant += 1 + elif finding['Compliant'] == 'N/A': + not_applicable += 1 else: non_compliant += 1 compliants.append(compliant) @@ -5537,12 +5723,18 @@ def __report_generate_html_summary_report(self, header, file_subject, data): .u30brand{height:50px;display:flex;flex-direction:column;justify-content:center;align-items:flex-start;max-width:1344px;padding:0 48px;margin:0 auto} .u30brandw1{display:flex;flex-direction:row;color:#fff;text-decoration:none;align-items:center} @media (max-width:1024px){.u30brand{padding:0 24px}} #u30skip2,#u30skip2content{transform:translateY(-100%);position:fixed} .rtl #u30{direction:rtl} #td_override { background: #fff; border-bottom: 1px solid rgba(122,115,110,0.2) !important } -
""") +
+
""") html_file.write(f'

{html_title.replace("-", "–")}

') + html_file.write(""" +
+
+
+ """) html_file.write(f'

Tenancy Name: {self.__tenancy.name}

') # Get the extract date r = result[0] - extract_date = r['extract_date'].replace('T',' ') + extract_date = r['extract_date'].replace('T', ' ') html_file.write(f'
Extract Date: {extract_date} UTC
') html_file.write('
') if OUTPUT_DIAGRAMS: @@ -5581,7 +5773,7 @@ def __report_generate_html_summary_report(self, header, file_subject, data): column_width = '63%' html_file.write(f'{th}') html_file.write('') - # Creating HTML Table of the summary report + # Creating the compliant HTML Table of the summary report html_appendix = [] for row in result: compliant = row['Compliant'] @@ -5609,13 +5801,14 @@ def __report_generate_html_summary_report(self, header, file_subject, data): html_file.write('Remediation') html_file.write(f'{str(row["Remediation"])}') html_file.write('Level') - html_file.write('CIS v8') - html_file.write('CCCS Guard Rail') + html_file.write(f'{self.__primary_framework_name}') + html_file.write(f'{self.__other_framework_name}') html_file.write('File') html_file.write(f'{str(row["Level"])}') - cis_v8 = str(row["CIS v8"]).replace("[","").replace("]","").replace("'","") - html_file.write(f'{cis_v8}') - html_file.write(f'{str(row["CCCS Guard Rail"])}') + primary_framework = str(row[self.__primary_framework_name]).replace("[", "").replace("]", "").replace("'", "") + other_framework = str(row[self.__other_framework_name]).replace("[", "").replace("]", "").replace("'", "") + html_file.write(f'{primary_framework}') + html_file.write(f'{other_framework}') v = str(row['Filename']) if v == ' ': html_file.write(' ') @@ -5648,11 +5841,11 @@ def __report_generate_html_summary_report(self, header, file_subject, data): column_width = '63%' html_file.write(f'{th}') html_file.write('') - # Creating HTML Table of the summary report + # Creating the non-compliant HTML Table of the summary report html_appendix = [] for row in result: compliant = row['Compliant'] - if compliant == 'Yes': + if compliant != 'No': continue html_appendix.append(row['Recommendation #']) text_color = 'red' @@ -5678,13 +5871,14 @@ def __report_generate_html_summary_report(self, header, file_subject, data): html_file.write('Remediation') html_file.write(f'{str(row["Remediation"])}') html_file.write('Level') - html_file.write('CIS v8') - html_file.write('CCCS Guard Rail') + html_file.write(f'{self.__primary_framework_name}') + html_file.write(f'{self.__other_framework_name}') html_file.write('File') html_file.write(f'{str(row["Level"])}') - cis_v8 = str(row["CIS v8"]).replace("[", "").replace("]", "").replace("'", "") - html_file.write(f'{cis_v8}') - html_file.write(f'{str(row["CCCS Guard Rail"])}') + primary_framework = str(row[self.__primary_framework_name]).replace("[", "").replace("]", "").replace("'", "") + other_framework = str(row[self.__other_framework_name]).replace("[", "").replace("]", "").replace("'", "") + html_file.write(f'{primary_framework}') + html_file.write(f'{other_framework}') v = str(row['Filename']) if v == ' ': html_file.write(' ') @@ -5734,7 +5928,7 @@ def __report_generate_html_summary_report(self, header, file_subject, data):
\n') - print("HTML: " + file_subject.ljust(22) + " --> " + file_path) + print(f"HTML: {file_subject.ljust(22)} --> {file_path}") # Used by Upload return file_path @@ -5755,10 +5949,11 @@ def __report_generate_obp_report(self): # Adding data to summary report for key, recommendation in self.obp_foundations_checks.items(): padding = str(key).ljust(25, " ") - print(padding + "\t\t" + str(recommendation['Status']) + "\t" + "\t" + str(len(recommendation['Findings'])) + "\t" + "\t" + str(len(recommendation['OBP']))) + compliant = ("Yes" if recommendation['Status'] is True else "No" if recommendation['Status'] is False else "N/A") + print(padding + "\t\t" + compliant + "\t" + "\t" + str(len(recommendation['Findings'])) + "\t" + "\t" + str(len(recommendation['OBP']))) record = { "Recommendation": str(key), - "Compliant": ('Yes' if recommendation['Status'] else 'No'), + "Compliant": compliant, "OBP": (str(len(recommendation['OBP'])) if len(recommendation['OBP']) > 0 else " "), "Findings": (str(len(recommendation['Findings'])) if len(recommendation['Findings']) > 0 else " "), "Documentation": recommendation['Documentation'] @@ -5824,7 +6019,8 @@ def __collect_tenancy_data(self): if self.__obp_checks: obp_home_region_functions = [ self.__budget_read_budgets, - self.__cloud_guard_read_cloud_guard_targets + self.__cloud_guard_read_cloud_guard_targets, + self.__quota_read, ] else: obp_home_region_functions = [] @@ -5886,8 +6082,8 @@ def __collect_tenancy_data(self): if self.__all_resources: all_resources = [ - self.__network_topology_dump, - self.__search_resources_all_resources_in_tenancy + self.__search_resources_all_resources_in_tenancy, + self.__service_limits_utilization ] else: all_resources = [] @@ -5941,6 +6137,7 @@ def __report_generate_raw_data_output(self): "keys_and_vaults": self.__kms_keys, "ons_subscriptions": self.__subscriptions, "budgets": self.__budgets, + "quotas" : self.__quotas, "service_connectors": list(self.__service_connectors.values()), "network_fastconnects": list(itertools.chain.from_iterable(self.__network_fastconnects.values())), "network_ipsec_connections": list(itertools.chain.from_iterable(self.__network_ipsec_connections.values())), @@ -5949,7 +6146,8 @@ def __report_generate_raw_data_output(self): "regions": self.__raw_regions, "network_drg_attachments": list(itertools.chain.from_iterable(self.__network_drg_attachments.values())), "instances": self.__Instance, - "certificates" : self.__raw_oci_certificates + "certificates" : self.__raw_oci_certificates, + "service_limits" : self.__service_limits } for key in raw_csv_files: rfn = self.__print_to_csv_file('raw_data', key, raw_csv_files[key]) @@ -5957,19 +6155,11 @@ def __report_generate_raw_data_output(self): raw_json_files = { "all_resources": self.__all_resources_json, - "oci_network_topologies": oci.util.to_dict(self.__network_topology_json) } for key in raw_json_files: rfn = self.__print_to_json_file('raw_data', key, raw_json_files[key]) list_report_file_names.append(rfn) - raw_pkl_files = { - "oci_network_topologies": self.__network_topology_json - } - for key in raw_pkl_files: - rfn = self.__print_to_pkl_file('raw_data', key, raw_json_files[key]) - list_report_file_names.append(rfn) - if self.__output_bucket: for raw_report in list_report_file_names: if raw_report: @@ -6054,6 +6244,8 @@ def __print_to_csv_file(self, header, file_subject, data): writer.writeheader() for row in result: + if 'rules' in row: + row['rules'] = str(row['rules']).replace('\n', '') writer.writerow(row) # print(row) @@ -6100,32 +6292,6 @@ def __print_to_json_file(self, header, file_subject, data): except Exception as e: raise Exception("Error in print_to_json_file: " + str(e.args)) - ########################################################################## - # Print to PKL - ########################################################################## - def __print_to_pkl_file(self, header, file_subject, data): - - try: - # if no data - if len(data) == 0: - return None - - # get the file name of the PKL - file_path = self.__get_output_file_path(header, file_subject, '.pkl') - - # Writing to json file - with open(file_path, 'wb') as pkl_file: - pickle.dump(data,pkl_file) - - - print("PKL: " + file_subject.ljust(22) + " --> " + file_path) - - # Used by Upload - return file_path - - - except Exception as e: - raise Exception("Error in __print_to_pkl_file: " + str(e.args)) ########################################################################## # Orchestrates Data collection and reports @@ -6173,8 +6339,8 @@ def get_obp_checks(self): # Create CSV Hyperlink ########################################################################## def __generate_csv_hyperlink(self, url, name): - if len(url) < 255: - return '=HYPERLINK("' + url + '","' + name + '")' + if len(url) < 2079: # Excel limit + return f'=HYPERLINK("{url}","{name}")' else: return url @@ -6384,7 +6550,7 @@ def execute_report(): config, signer = create_signer(cmd.file_location, cmd.config_profile, cmd.is_instance_principals, cmd.is_delegation_token, cmd.is_security_token) config['retry_strategy'] = oci.retry.DEFAULT_RETRY_STRATEGY report = CIS_Report(config, signer, cmd.proxy, cmd.output_bucket, cmd.report_directory, cmd.report_prefix, cmd.report_summary_json, cmd.print_to_screen, \ - cmd.regions, cmd.raw, cmd.obp, cmd.redact_output, oci_url=cmd.oci_url, debug=cmd.debug, all_resources=cmd.all_resources, disable_api_keys=cmd.disable_api_usage_check) + cmd.regions, cmd.raw, cmd.obp, cmd.redact_output, oci_url=cmd.oci_url, debug=cmd.debug, all_resources=cmd.all_resources, disable_api_keys=cmd.disable_api_usage_check) csv_report_directory = report.generate_reports(int(cmd.level)) if OUTPUT_TO_XLSX: @@ -6425,8 +6591,12 @@ def execute_report(): reader = csv.reader(f) for r, row in enumerate(reader): for c, col in enumerate(row): - # Skipping the deep link due to formating errors in xlsx - if "=HYPERLINK" not in col: + # Format URL only if the column starts with "=HYPERLINK" + if col.startswith("=HYPERLINK"): + url_info = re.findall(r'"(.*?)"', col) + if url_info and len(url_info[0]) < 2079: # Excel Link limit + worksheet.write_url(r, c, url_info[0], string=url_info[1]) + else: worksheet.write(r, c, col) worksheet.autofilter(0, 0, r - 1, c - 1) worksheet.autofit() diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/standard.sh b/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/standard.sh index a1f092241..b6d4ceec1 100755 --- a/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/standard.sh +++ b/security/security-design/shared-assets/oci-security-health-check-standard/files/oci-security-health-check-standard/standard.sh @@ -7,7 +7,7 @@ # # Author: Olaf Heimburger # -VERSION=250722 +VERSION=251104 graal_version=24.2.2 OS_TYPE=$(uname) diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/Example_Output.png b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/Example_Output.png index 7d58190f6..659ecb210 100644 Binary files a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/Example_Output.png and b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/Example_Output.png differ diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512 b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512 deleted file mode 100644 index 41dc40bce..000000000 --- a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512 +++ /dev/null @@ -1 +0,0 @@ -ede5f2a1f3889d4bf7a6503e272b4844f0dd9eb011045961d036f0c5b02a09f2fc0c0531d4560ec1b32f2a935cee8c94f6301695b813089103fbfc1682c4a96a oci-security-health-check-standard-250722.zip diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512256 b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512256 deleted file mode 100644 index 6724995f7..000000000 --- a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.sha512256 +++ /dev/null @@ -1 +0,0 @@ -d6477bf3727cecf96848bb0ffe0eb918f776433b029090fe08aab3aefae4f72d oci-security-health-check-standard-250722.zip diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.zip b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.zip deleted file mode 100644 index ee7247030..000000000 Binary files a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-250722.zip and /dev/null differ diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512 b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512 new file mode 100644 index 000000000..e79a7f0da --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512 @@ -0,0 +1 @@ +3af7a1fa96792e1fd23ee12c57833fd7e602c7ad8785d41275c7b5839712b405e31d11ec3760660b8a8cf75331e7f3ca7376cd9779f70d9c6732bbf96701cd85 oci-security-health-check-standard-251104.zip \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512256 b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512256 new file mode 100644 index 000000000..a183f6288 --- /dev/null +++ b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.sha512256 @@ -0,0 +1 @@ +fd68eb70092eb37c82539aac426dfa601dc15bab337ae94225dc8b0afb465a1b oci-security-health-check-standard-251104.zip \ No newline at end of file diff --git a/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.zip b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.zip new file mode 100644 index 000000000..f92c250ef Binary files /dev/null and b/security/security-design/shared-assets/oci-security-health-check-standard/files/resources/oci-security-health-check-standard-251104.zip differ