Skip to content

Customer financial profiler#1128

Closed
sharathrushi wants to merge 2 commits intogoogle-gemini:mainfrom
sharathrushi:customer-financial-profiler
Closed

Customer financial profiler#1128
sharathrushi wants to merge 2 commits intogoogle-gemini:mainfrom
sharathrushi:customer-financial-profiler

Conversation

@sharathrushi
Copy link

Given a customer and his financial documents
This notebook does the following:

  1. Checks whether the customer is involved in money laundering or illegal activities on google search
  2. Retrieves customer's monthly income, total assets amount.

This is helpful for financial institutions use cases for loan eligibility.

Colab link

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions github-actions bot added status:awaiting review PR awaiting review from a maintainer component:examples Issues/PR referencing examples folder labels Feb 2, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @sharathrushi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request delivers a new Colab notebook that automates the process of generating a financial profile for customers. It integrates AI capabilities to perform critical checks, such as identifying potential involvement in illicit activities through public information, and intelligently extracts structured financial data from diverse document formats, including scanned PDFs. This tool is designed to enhance the efficiency and accuracy of financial assessments for institutions.

Highlights

  • New Customer Financial Profiler Notebook: Introduces Customer_financial_profiler.ipynb, a new Colab notebook designed for comprehensive customer financial analysis.
  • AML Screening Integration: The notebook incorporates a function for Anti-Money Laundering (AML) screening, utilizing Google Search to check for customer involvement in illegal activities.
  • Advanced Document Processing: Features robust PDF processing capabilities, including OCR (using nougat-ocr) to extract text from both text-based and scanned PDF financial documents.
  • Financial Data Extraction: Extracts and aggregates key financial information such as monthly income, movable assets (cash/stocks), and immovable assets (real estate) from processed documents, presenting the output in a structured JSON format.
  • README Update: The examples/README.md file has been updated to list and describe the new "Customer Financial Profiler" notebook, making it discoverable within the examples.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new example notebook, Customer_financial_profiler.ipynb, which demonstrates how to use the Gemini API for a financial use case: analyzing customer documents to create a financial profile. The notebook is well-structured and provides a practical example of using multimodal and grounding features.

My review includes several suggestions to improve the notebook's maintainability, robustness, and adherence to the repository's style guide. Key points include:

  • Removing hardcoded dependencies on external repositories.
  • Improving the robustness of JSON extraction by using the API's structured output feature.
  • Aligning the code with the style guide regarding import placement and helper function visibility.
  • Cleaning up unused code and variables.

I've also suggested an improvement to the README.md file for better formatting. Overall, this is a valuable addition to the cookbook, and with these changes, it will be even better.

Comment on lines +320 to +332
"source": [
"# Replace below code to load your customer/s financial documents\n",
"file_numbers = [1, 2, 3]\n",
"base_github_url = \"https://github.com/sharathrushi/generative-ai/blob/587bddd11aea294f9ae512e1a860465452449e87/Jesse%20Nathan_\"\n",
"\n",
"for num in file_numbers:\n",
" github_url = f\"{base_github_url}{num}.pdf\"\n",
" raw_github_url = github_url.replace(\"github.com\", \"raw.githubusercontent.com\").replace(\"/blob\", \"\")\n",
" file_name = raw_github_url.split('/')[-1].replace('%20', ' ') # Added .replace('%20', ' ')\n",
" print(f\"Downloading '{file_name}'...\")\n",
" !wget -O \"{file_name}\" \"{raw_github_url}\"\n",
" print(f\"Downloaded '{file_name}' to local storage.\\n\")"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The notebook downloads sample files from a hardcoded URL pointing to a specific commit in a personal GitHub repository. This is not maintainable and makes the notebook difficult to reproduce if the external repository changes or becomes unavailable. According to best practices for example notebooks, dependencies like sample files should be hosted within the same repository to ensure they are self-contained.

Please add the PDF files to this repository (e.g., in a resources or testdata folder) and update the code to load them from a relative path.

Comment on lines +693 to +708
"def verify_and_extract(text, target_name):\n",
" prompt = f\"\"\"\n",
" Analyze the following text from multiple sources regarding {target_name}.\n",
" 1. Verify if all sources document is primarily about {target_name}. (Yes/No)\n",
" 2. Aggregate the values to retrieve the following information:\n",
" - Monthly Income\n",
" - Immovable Assets (Real Estate)\n",
" - Movable Assets (Cash/Stocks)\n",
" 3. Format as JSON.\n",
" Text: {text[:10000]} # Truncate for token limits\n",
" \"\"\"\n",
" response = client.models.generate_content(\n",
" model=MODEL_ID,\n",
" contents=prompt\n",
" )\n",
" return response.text"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function verify_and_extract asks the model for a JSON output but relies on fragile regex parsing of a markdown block later in the code (line 863). A more robust approach is to use the Gemini API's structured output feature.

You can specify response_mime_type="application/json" and provide a schema to ensure the model returns a valid JSON object directly. The notebook already defines a json_schema variable in a later cell that could be used for this. Consider moving the json_schema definition before this function and updating generate_content to use it in the generation_config.

"source": [
"# This step might ask you to restart the session for the installed packages to be reflected\n",
"\n",
"%pip install -U -q pymupdf nougat-ocr tools \"albumentations==1.3.1\""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pip install command includes a package named tools. This is a very generic name, and the package does not appear to be used anywhere in the notebook. Installing unnecessary packages should be avoided as it can lead to unexpected conflicts and increases the setup time.

%pip install -U -q pymupdf nougat-ocr "albumentations==1.3.1"

Comment on lines +358 to +366
"source": [
"import io\n",
"import os\n",
"import json\n",
"import re\n",
"import pymupdf as fitz\n",
"from PIL import Image\n",
"from transformers import AutoProcessor, AutoModelForVision2Seq"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This cell imports multiple modules at once. According to the repository style guide, imports should be placed right before they are first used to improve readability and avoid large, monolithic import cells. This makes it easier for readers to understand where each module is being utilized.

References
  1. The style guide states that imports should be placed when they are first used, and to avoid having a big 'import' cell at the beginning. (link)

Comment on lines +513 to +521
"source": [
"all_files = os.listdir('.')\n",
"person_of_interest_files = [f for f in all_files if f.startswith(person_of_interest)]\n",
"results = []\n",
"model_id = \"facebook/nougat-small\"\n",
"local_model_dir = \"./nougat_local_model\"\n",
"processor = AutoProcessor.from_pretrained(model_id)\n",
"model_base = AutoModelForVision2Seq.from_pretrained(model_id)"
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This cell has a few issues that affect code clarity:

  1. The variables results and local_model_dir are initialized but never used. Unused code should be removed.
  2. The variable name model_id shadows the MODEL_ID variable defined earlier for the Gemini model. This can be confusing for readers.

Renaming the local model_id and removing unused variables will make the code cleaner and easier to understand.

all_files = os.listdir('.')
person_of_interest_files = [f for f in all_files if f.startswith(person_of_interest)]
nougat_model_id = "facebook/nougat-small"
processor = AutoProcessor.from_pretrained(nougat_model_id)
model_base = AutoModelForVision2Seq.from_pretrained(nougat_model_id)

},
"outputs": [],
"source": [
"def add_citations(response):\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

According to the repository style guide, helper functions should be in a collapsible cell to keep the notebook clean. Please add # @title Helper function to add citations as the first line of this code cell to hide it by default.

References
  1. The style guide recommends hiding necessary but uninteresting code, like helper functions, in a toggleable code cell by adding # @title as the first line. (link)

Comment on lines +718 to +759
"source": [
"json_schema = \"\"\"\n",
" {\n",
" \"all_sources_primarily_about_target_name\": \"Yes/No\",\n",
" \"aggregated_values\": {\n",
" \"monthly_income\": {\n",
" \"value\": \"number\",\n",
" \"currency\": \"string\",\n",
" \"period\": \"string\"\n",
" },\n",
" \"immovable_assets\": [\n",
" {\n",
" \"description\": \"string\",\n",
" \"value\": \"number\",\n",
" \"currency\": \"string\",\n",
" \"type\": \"string\",\n",
" \"valuation_basis\": \"string\",\n",
" \"age_of_property_years\": \"number\"\n",
" }\n",
" ],\n",
" \"movable_assets\": {\n",
" \"stocks\": [\n",
" {\n",
" \"company_name\": \"string\",\n",
" \"number_of_shares\": \"number\",\n",
" \"par_value_per_share\": \"number\",\n",
" \"currency_per_share\": \"string\",\n",
" \"total_value\": \"number\",\n",
" \"total_value_currency\": \"string\",\n",
" \"owner\": \"string\"\n",
" }\n",
" ],\n",
" \"cash\": {\n",
" \"value\": \"number or 0\",\n",
" \"currency\": \"string or null\",\n",
" \"note\": \"string\"\n",
" }\n",
" }\n",
" }\n",
" }\n",
" \"\"\""
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The json_schema variable is defined in a code cell but is never used. If it's intended for documentation, it should be in a markdown cell. If it's for use with the model, it should be passed to the API to enforce structured output (as mentioned in another comment). Leaving it as an unused variable in a code cell is confusing for the reader.

| [Entity extraction](./Entity_Extraction.ipynb) | Use Gemini API to speed up some of your tasks, such as searching through text to extract needed information. Entity extraction with a Gemini model is a simple query, and you can ask it to retrieve its answer in the form that you prefer. | Embeddings | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Entity_Extraction.ipynb) |
| [Google I/O 2025 Live coding session](./Google_IO2025_Live_Coding.ipynb) | Play with the notebook used during the Google I/O 2025 live coding session delivered by the Google DeepMind DevRel team. Work with the Gemini API SDK, know and practice with the GenMedia models, the thinking capable models, start using the Gemini API tools and more! | Gemini API and its models and features | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Google_IO2025_Live_Coding.ipynb) |

| [Customer Financial Profiler](./Customer_financial_profiler.ipynb) | demonstrates estimating customer financial status Given a customer's financial documents such as payslips, rental agreements, house valuation, shares.Estimating customer monthly income, movable assets, immovable assets.This is helpful for financial institutions to check for loan eligibility, estimating customer's value | Gemini API and its models and features | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Customer_financial_profiler.ipynb) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The description for the new notebook is a single long line, which makes the table difficult to read and breaks the table formatting. Please wrap the text and fix the minor grammatical issues for better readability.

Suggested change
| [Customer Financial Profiler](./Customer_financial_profiler.ipynb) | demonstrates estimating customer financial status Given a customer's financial documents such as payslips, rental agreements, house valuation, shares.Estimating customer monthly income, movable assets, immovable assets.This is helpful for financial institutions to check for loan eligibility, estimating customer's value | Gemini API and its models and features | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Customer_financial_profiler.ipynb) |
| [Customer Financial Profiler](./Customer_financial_profiler.ipynb) | This notebook demonstrates how to estimate a customer's financial status from documents like payslips and rental agreements. It's a useful example for financial institutions assessing loan eligibility. | Gemini API and its models and features | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Customer_financial_profiler.ipynb) |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:examples Issues/PR referencing examples folder status:awaiting review PR awaiting review from a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant