Skip to content

Commit bbf7124

Browse files
authored
added credit karma financial insights for use cases (#35039)
* added credit_karma_financial_insights for use cases * fix feedback * minor fixes * changed the name * fixed the name * updated the image
1 parent bb7f445 commit bbf7124

File tree

3 files changed

+135
-0
lines changed

3 files changed

+135
-0
lines changed
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
---
2+
title: "Credit Karma: Leveraging Apache Beam for Enhanced Financial Insights"
3+
name: "Credit Karma: Beam for Financial Insights"
4+
icon: /images/logos/powered-by/credit-karma.png
5+
hasNav: true
6+
category: study
7+
cardTitle: "Credit Karma: Leveraging Apache Beam for Enhanced Financial Insights"
8+
cardDescription: "With Apache Beam and Dataflow, Credit Karma achieved a 99% uptime for critical data pipelines, a significant jump from 80%. This reliability, coupled with faster development (1 engineer vs. 3 estimated), has been crucial for enabling real-time financial insights for our more than 140 million members."
9+
authorName: "Raj kiran gupta Katakam"
10+
authorPosition: "Staff Machine Learning Engineer @ Credit Karma"
11+
authorImg: /images/case-study/credit_karma/raj_katakam.jpeg
12+
publishDate: 2025-05-25T00:12:00+00:00
13+
---
14+
<!--
15+
Licensed under the Apache License, Version 2.0 (the "License");
16+
you may not use this file except in compliance with the License.
17+
You may obtain a copy of the License at
18+
19+
http://www.apache.org/licenses/LICENSE-2.0
20+
21+
Unless required by applicable law or agreed to in writing, software
22+
distributed under the License is distributed on an "AS IS" BASIS,
23+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
24+
See the License for the specific language governing permissions and
25+
limitations under the License.
26+
-->
27+
<!-- div with class case-study-opinion is displayed at the top left area of the case study page -->
28+
<div class="case-study-opinion">
29+
<div class="case-study-opinion-img">
30+
<img src="/images/logos/powered-by/acredit-karma.png"/>
31+
</div>
32+
<blockquote class="case-study-quote-block">
33+
<p class="case-study-quote-text">
34+
“Intuit Credit Karma is a financial management platform that aims to propel its more than 140 million members forward wherever they are on their financial journey by leveraging data and AI to connect the dots on their finances. ”
35+
</p>
36+
<div class="case-study-quote-author">
37+
<div class="case-study-quote-author-img">
38+
<img src="/images/case-study/credit_karma/raj_katakam.jpeg">
39+
</div>
40+
<div class="case-study-quote-author-info">
41+
<div class="case-study-quote-author-name">
42+
Raj kiran gupta Katakam
43+
</div>
44+
<div class="case-study-quote-author-position">
45+
Staff Machine Learning Engineer @ Credit Karma
46+
</div>
47+
</div>
48+
</div>
49+
</blockquote>
50+
</div>
51+
52+
<!-- div with class case-study-post is the case study page main content -->
53+
<div class="case-study-post">
54+
55+
# Credit Karma: Leveraging Apache Beam for Enhanced Financial Insights
56+
57+
## Background
58+
59+
Credit Karma is a personal finance technology company that powers financial progress for its more than 140 million members through personalized insights and recommendations. Credit Karma uses data and AI platforms to create complex machine learning models that provide real-time insights so its members can make more informed financial decisions. The company uses Apache Beam to transform raw data into intelligent insights to power machine learning models.
60+
61+
## Journey to Apache Beam
62+
63+
Credit Karma's previous system, which used in-house Akka streams and Kubernetes-powered tools, was insufficient for our current needs. Specifically, it lacked the ability to:
64+
65+
* Intelligently partition real-time data streams into windows for analysis and transformation.
66+
* Maintain state at low cost and complexity to enable fraud detection in money transactions.
67+
* Reduce overall development time to speed up delivery of solutions for our data science stakeholders.
68+
69+
Apache Beam was chosen to address these shortcomings and provide the necessary capabilities.
70+
71+
## Use Cases
72+
73+
Credit Karma leverages Apache Beam to address a broad spectrum of data processing requirements, particularly real-time data transformation to bolster machine learning models. Key applications include:
74+
75+
* preprocessing data and constructing graphs for live model scoring,
76+
* large-scale ETL (Extract, Transform, Load) operations for analytics, and
77+
* **real-time aggregation of features to furnish near-instantaneous insights to models**
78+
79+
The real-time aggregation of features is particularly crucial for fraud prevention, where metrics like average spending over time or transaction frequency can be strong indicators of fraudulent activity. Apache Beam's capabilities \- including windowing (analyzing data streams over sliding time windows), stateful processing (maintaining state information across data elements), and triggers (enabling actions based on specific conditions within the data stream) \- allow Credit Karma to satisfy the stringent demands of real-time data processing. These demands encompass low latency data ingestion, sub-10ms end-to-end latency, and self-service feature creation for data scientists.
80+
81+
These requirements were challenging to achieve with Credit Karma's previous infrastructure. By adopting Apache Beam in conjunction with Google Cloud Dataflow (a fully managed service for running Apache Beam pipelines), Credit Karma has not only simplified and accelerated the development of real-time features but also drastically reduced time to market. This empowers Credit Karma to rapidly iterate and innovate in critical areas like fraud detection, ultimately enhancing the customer experience.
82+
83+
*** A Use Case in Focus: Aggregated Features for Fraud Detection**
84+
85+
In one specific use case, Credit Karma implemented a unique aggregation strategy that differed from traditional streaming systems, which typically operate on either event-driven or timer-driven triggers. The use case required both types of triggers to operate simultaneously. Event-driven triggers generated aggregated features, stored in live databases for model scoring engines. Meanwhile, timer-driven triggers managed the expiration and removal of these features \- all within the same job.
86+
87+
Apache Beam here processes approximately millions of transactions per day, primarily structured data, with a data size of \~100GB per day. The streaming data pipeline, with both event-based and timer-based triggers, and states, sources data from BigQuery and PubSub, and sinks it into BigQuery, Spanner, PubSub, and Firestore. Additionally, de-identified data is used for feature creation.
88+
89+
<div class="post-scheme">
90+
<a href="/images/case-study/credit_karma/credit_karma_financial_insights.png" target="_blank" title="Click to enlarge">
91+
<img src="/images/case-study/credit_karma/credit_karma_financial_insights.png" alt="Diagram of Aggregated Features for Fraud Detection at Credit Karma">
92+
</a>
93+
</div>
94+
95+
96+
***Advantages of Using Dataflow for this Use Case**
97+
98+
* **Simple Setup**: Easy to implement within the CK environment due to extensive use of GCP.
99+
* **High-Performing Streaming Engine**: Research suggests Dataflow's streaming engine outperforms Spark's structured streaming and rivals Flink, the open-source standard for streaming applications.
100+
* **Non-Critical Path**: Aggregations are computed offline and asynchronously, keeping Dataflow off the critical path for transaction completeness.
101+
* **Minimal Maintenance**: As a Google-managed service, Dataflow doesn't require significant team upkeep.
102+
* **Built-In Monitoring**: Comes with pre-built monitoring dashboards and alerting systems via cloud monitoring and logging.
103+
* **Integrated State Storage**: Dataflow's streaming engine utilizes BigTable for state storage, eliminating the need for external state management and on-call responsibilities.
104+
* **Easy Updates**: Established guidelines allow for updating existing Dataflow jobs without data loss, given backward-compatible schemas.
105+
* **Straightforward Backfilling**: Computed data is pushed into BQ, and the service can be lifted and replayed on any BQ table, simplifying data backfilling (though speed is untested).
106+
107+
## Future Work
108+
109+
To enhance the member experience and provide more relevant insights into their financial journey, we propose to modify the existing Credit Karma application use case.
110+
111+
Behavioral interaction events are generally high in traffic volume, eliminating the need for computation refresh at regular intervals. Hence, we plan to leverage the same technology, with the exception of the timer, to enable real-time calculation and delivery of application interaction intelligence. This will allow us to deliver personalized content and experiences to the member, leading to greater engagement and improved financial outcomes.
112+
113+
## Results
114+
115+
By using Apache Beam, Credit Karma has achieved significant improvements, including:
116+
117+
* Reduced alerts and on-call duties
118+
* Enabled machine learning on streaming data
119+
* Connected a third-party service to tokenize client data
120+
* Improved flexibility and pluggability
121+
* Infrastructure cost optimization: Although costly, the solution was reliable for this use case.
122+
* Faster time-to-value: Delivered a working solution in one quarter with 1 engineer, compared to the original estimate of 3 team members and two quarters.
123+
124+
Credit Karma's tech stack includes DataflowRunner, Google Cloud Platform (GCP), and custom containers. Their teams use Java and Python for programming.
125+
126+
<!-- case_study_feedback adds feedback buttons -->
127+
{{< case_study_feedback "CreditKarmaFinancialInsights" >}}
128+
129+
</div>
130+
<div class="clear-nav"></div>

website/www/site/data/en/quotes.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,11 @@
8181
logoUrl: /images/logos/powered-by/accenture.png
8282
linkUrl: case-studies/accenture_baltics/index.html
8383
linkText: Learn more
84+
- text: With Apache Beam and Dataflow, Credit Karma achieved a 99% uptime for critical data pipelines, a significant jump from 80%. This reliability, coupled with faster development (1 engineer vs. 3 estimated), has been crucial for enabling real-time financial insights for our more than 140 million members.
85+
icon: icons/quote-icon.svg
86+
logoUrl: images/logos/powered-by/credit-karma.png
87+
linkUrl: case-studies/creditkarmainsights/index.html
88+
linkText: Learn more
8489
- text: Have a story to share? Your logo could be here.
8590
icon: icons/quote-icon.svg
8691
logoUrl: images/logos/powered-by/blank.jpg
135 KB
Loading

0 commit comments

Comments
 (0)