Customer financial profiler#1128
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary of ChangesHello @sharathrushi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request delivers a new Colab notebook that automates the process of generating a financial profile for customers. It integrates AI capabilities to perform critical checks, such as identifying potential involvement in illicit activities through public information, and intelligently extracts structured financial data from diverse document formats, including scanned PDFs. This tool is designed to enhance the efficiency and accuracy of financial assessments for institutions. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new example notebook, Customer_financial_profiler.ipynb, which demonstrates how to use the Gemini API for a financial use case: analyzing customer documents to create a financial profile. The notebook is well-structured and provides a practical example of using multimodal and grounding features.
My review includes several suggestions to improve the notebook's maintainability, robustness, and adherence to the repository's style guide. Key points include:
- Removing hardcoded dependencies on external repositories.
- Improving the robustness of JSON extraction by using the API's structured output feature.
- Aligning the code with the style guide regarding import placement and helper function visibility.
- Cleaning up unused code and variables.
I've also suggested an improvement to the README.md file for better formatting. Overall, this is a valuable addition to the cookbook, and with these changes, it will be even better.
| "source": [ | ||
| "# Replace below code to load your customer/s financial documents\n", | ||
| "file_numbers = [1, 2, 3]\n", | ||
| "base_github_url = \"https://github.com/sharathrushi/generative-ai/blob/587bddd11aea294f9ae512e1a860465452449e87/Jesse%20Nathan_\"\n", | ||
| "\n", | ||
| "for num in file_numbers:\n", | ||
| " github_url = f\"{base_github_url}{num}.pdf\"\n", | ||
| " raw_github_url = github_url.replace(\"github.com\", \"raw.githubusercontent.com\").replace(\"/blob\", \"\")\n", | ||
| " file_name = raw_github_url.split('/')[-1].replace('%20', ' ') # Added .replace('%20', ' ')\n", | ||
| " print(f\"Downloading '{file_name}'...\")\n", | ||
| " !wget -O \"{file_name}\" \"{raw_github_url}\"\n", | ||
| " print(f\"Downloaded '{file_name}' to local storage.\\n\")" | ||
| ] |
There was a problem hiding this comment.
The notebook downloads sample files from a hardcoded URL pointing to a specific commit in a personal GitHub repository. This is not maintainable and makes the notebook difficult to reproduce if the external repository changes or becomes unavailable. According to best practices for example notebooks, dependencies like sample files should be hosted within the same repository to ensure they are self-contained.
Please add the PDF files to this repository (e.g., in a resources or testdata folder) and update the code to load them from a relative path.
| "def verify_and_extract(text, target_name):\n", | ||
| " prompt = f\"\"\"\n", | ||
| " Analyze the following text from multiple sources regarding {target_name}.\n", | ||
| " 1. Verify if all sources document is primarily about {target_name}. (Yes/No)\n", | ||
| " 2. Aggregate the values to retrieve the following information:\n", | ||
| " - Monthly Income\n", | ||
| " - Immovable Assets (Real Estate)\n", | ||
| " - Movable Assets (Cash/Stocks)\n", | ||
| " 3. Format as JSON.\n", | ||
| " Text: {text[:10000]} # Truncate for token limits\n", | ||
| " \"\"\"\n", | ||
| " response = client.models.generate_content(\n", | ||
| " model=MODEL_ID,\n", | ||
| " contents=prompt\n", | ||
| " )\n", | ||
| " return response.text" |
There was a problem hiding this comment.
The function verify_and_extract asks the model for a JSON output but relies on fragile regex parsing of a markdown block later in the code (line 863). A more robust approach is to use the Gemini API's structured output feature.
You can specify response_mime_type="application/json" and provide a schema to ensure the model returns a valid JSON object directly. The notebook already defines a json_schema variable in a later cell that could be used for this. Consider moving the json_schema definition before this function and updating generate_content to use it in the generation_config.
| "source": [ | ||
| "# This step might ask you to restart the session for the installed packages to be reflected\n", | ||
| "\n", | ||
| "%pip install -U -q pymupdf nougat-ocr tools \"albumentations==1.3.1\"" |
There was a problem hiding this comment.
The pip install command includes a package named tools. This is a very generic name, and the package does not appear to be used anywhere in the notebook. Installing unnecessary packages should be avoided as it can lead to unexpected conflicts and increases the setup time.
%pip install -U -q pymupdf nougat-ocr "albumentations==1.3.1"
| "source": [ | ||
| "import io\n", | ||
| "import os\n", | ||
| "import json\n", | ||
| "import re\n", | ||
| "import pymupdf as fitz\n", | ||
| "from PIL import Image\n", | ||
| "from transformers import AutoProcessor, AutoModelForVision2Seq" | ||
| ] |
There was a problem hiding this comment.
This cell imports multiple modules at once. According to the repository style guide, imports should be placed right before they are first used to improve readability and avoid large, monolithic import cells. This makes it easier for readers to understand where each module is being utilized.
References
- The style guide states that imports should be placed when they are first used, and to avoid having a big 'import' cell at the beginning. (link)
| "source": [ | ||
| "all_files = os.listdir('.')\n", | ||
| "person_of_interest_files = [f for f in all_files if f.startswith(person_of_interest)]\n", | ||
| "results = []\n", | ||
| "model_id = \"facebook/nougat-small\"\n", | ||
| "local_model_dir = \"./nougat_local_model\"\n", | ||
| "processor = AutoProcessor.from_pretrained(model_id)\n", | ||
| "model_base = AutoModelForVision2Seq.from_pretrained(model_id)" | ||
| ] |
There was a problem hiding this comment.
This cell has a few issues that affect code clarity:
- The variables
resultsandlocal_model_dirare initialized but never used. Unused code should be removed. - The variable name
model_idshadows theMODEL_IDvariable defined earlier for the Gemini model. This can be confusing for readers.
Renaming the local model_id and removing unused variables will make the code cleaner and easier to understand.
all_files = os.listdir('.')
person_of_interest_files = [f for f in all_files if f.startswith(person_of_interest)]
nougat_model_id = "facebook/nougat-small"
processor = AutoProcessor.from_pretrained(nougat_model_id)
model_base = AutoModelForVision2Seq.from_pretrained(nougat_model_id)
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "def add_citations(response):\n", |
There was a problem hiding this comment.
According to the repository style guide, helper functions should be in a collapsible cell to keep the notebook clean. Please add # @title Helper function to add citations as the first line of this code cell to hide it by default.
References
- The style guide recommends hiding necessary but uninteresting code, like helper functions, in a toggleable code cell by adding
# @titleas the first line. (link)
| "source": [ | ||
| "json_schema = \"\"\"\n", | ||
| " {\n", | ||
| " \"all_sources_primarily_about_target_name\": \"Yes/No\",\n", | ||
| " \"aggregated_values\": {\n", | ||
| " \"monthly_income\": {\n", | ||
| " \"value\": \"number\",\n", | ||
| " \"currency\": \"string\",\n", | ||
| " \"period\": \"string\"\n", | ||
| " },\n", | ||
| " \"immovable_assets\": [\n", | ||
| " {\n", | ||
| " \"description\": \"string\",\n", | ||
| " \"value\": \"number\",\n", | ||
| " \"currency\": \"string\",\n", | ||
| " \"type\": \"string\",\n", | ||
| " \"valuation_basis\": \"string\",\n", | ||
| " \"age_of_property_years\": \"number\"\n", | ||
| " }\n", | ||
| " ],\n", | ||
| " \"movable_assets\": {\n", | ||
| " \"stocks\": [\n", | ||
| " {\n", | ||
| " \"company_name\": \"string\",\n", | ||
| " \"number_of_shares\": \"number\",\n", | ||
| " \"par_value_per_share\": \"number\",\n", | ||
| " \"currency_per_share\": \"string\",\n", | ||
| " \"total_value\": \"number\",\n", | ||
| " \"total_value_currency\": \"string\",\n", | ||
| " \"owner\": \"string\"\n", | ||
| " }\n", | ||
| " ],\n", | ||
| " \"cash\": {\n", | ||
| " \"value\": \"number or 0\",\n", | ||
| " \"currency\": \"string or null\",\n", | ||
| " \"note\": \"string\"\n", | ||
| " }\n", | ||
| " }\n", | ||
| " }\n", | ||
| " }\n", | ||
| " \"\"\"" | ||
| ] |
There was a problem hiding this comment.
The json_schema variable is defined in a code cell but is never used. If it's intended for documentation, it should be in a markdown cell. If it's for use with the model, it should be passed to the API to enforce structured output (as mentioned in another comment). Leaving it as an unused variable in a code cell is confusing for the reader.
| | [Entity extraction](./Entity_Extraction.ipynb) | Use Gemini API to speed up some of your tasks, such as searching through text to extract needed information. Entity extraction with a Gemini model is a simple query, and you can ask it to retrieve its answer in the form that you prefer. | Embeddings | [](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Entity_Extraction.ipynb) | | ||
| | [Google I/O 2025 Live coding session](./Google_IO2025_Live_Coding.ipynb) | Play with the notebook used during the Google I/O 2025 live coding session delivered by the Google DeepMind DevRel team. Work with the Gemini API SDK, know and practice with the GenMedia models, the thinking capable models, start using the Gemini API tools and more! | Gemini API and its models and features | [](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Google_IO2025_Live_Coding.ipynb) | | ||
|
|
||
| | [Customer Financial Profiler](./Customer_financial_profiler.ipynb) | demonstrates estimating customer financial status Given a customer's financial documents such as payslips, rental agreements, house valuation, shares.Estimating customer monthly income, movable assets, immovable assets.This is helpful for financial institutions to check for loan eligibility, estimating customer's value | Gemini API and its models and features | [](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Customer_financial_profiler.ipynb) | |
There was a problem hiding this comment.
The description for the new notebook is a single long line, which makes the table difficult to read and breaks the table formatting. Please wrap the text and fix the minor grammatical issues for better readability.
| | [Customer Financial Profiler](./Customer_financial_profiler.ipynb) | demonstrates estimating customer financial status Given a customer's financial documents such as payslips, rental agreements, house valuation, shares.Estimating customer monthly income, movable assets, immovable assets.This is helpful for financial institutions to check for loan eligibility, estimating customer's value | Gemini API and its models and features | [](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Customer_financial_profiler.ipynb) | | |
| | [Customer Financial Profiler](./Customer_financial_profiler.ipynb) | This notebook demonstrates how to estimate a customer's financial status from documents like payslips and rental agreements. It's a useful example for financial institutions assessing loan eligibility. | Gemini API and its models and features | [](https://colab.research.google.com/github/google-gemini/cookbook/blob/main/examples/Customer_financial_profiler.ipynb) | |
Given a customer and his financial documents
This notebook does the following:
This is helpful for financial institutions use cases for loan eligibility.
Colab link