Skip to content
This repository was archived by the owner on Dec 30, 2024. It is now read-only.

Commit ce09a65

Browse files
committed
Update to version v1.5.0
1 parent 5dc54c3 commit ce09a65

File tree

289 files changed

+38465
-117292
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

289 files changed

+38465
-117292
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ node_modules
1414
cdk.out
1515

1616
!**/cdk-solution-helper/index.js
17+
!get-cdk-version.js
1718

1819
deployment/global-s3-assets
1920
deployment/regional-s3-assets

CHANGELOG.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,24 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.5.0] - 2021-07-22
9+
10+
### Added
11+
12+
- Ingest RSS feeds from over ~3000+ news websites across the world
13+
14+
### Updated
15+
16+
- AWS CDK version to 1.110.1
17+
- AWS SDK version to 2.945.0
18+
- Updated Nodejs Lambda runtimes to use Nodejs 14.x
19+
- Updated Amazon QuickSight analysis and dashboard to reflect the new ingestion source
20+
- Updated AWS StepFunction workflows to handle parallel ingestion (tweets from Twitter and RSS feeds from News websites)
21+
22+
### Fixed
23+
24+
- Truncated tweets through merging [GitHub pull request #26](https://github.com/awslabs/discovering-hot-topics-using-machine-learning/pull/26)
25+
826
## [1.4.0] - 2021-02-04
927

1028
### Added

NOTICE.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ crhelper - Apache-2.0
2828
jest - MIT license
2929
momentjs - MIT license
3030
moto - Apache-2.0
31+
newscatcher - MIT license
3132
nock - MIT license
3233
node - MIT license
3334
pytest-cov - MIT license

README.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
The Discovering Hot Topics Using Machine Learning solution helps you identify the most dominant topics associated with your products, policies, events, and brands. Implementing this solution helps you react quickly to new growth opportunities, address negative brand associations, and deliver higher levels of customer satisfaction.
44

5-
The solution uses machine learning algorithms to automate digital asset (text and image) ingestion and perform near real-time topic modeling, sentiment analysis, and image detection. The solution then visualizes these large-scale customer analyses using an Amazon QuickSight dashboard. This guide provides step-by-step instructions to building a dashboard that provides you with the context and insights necessary to identify trends that help or harm you brand.
5+
The solution automates digital asset (text and image) ingestion from twitter and RSS news feeds to provide near-real-time inferences using machine learning algorithms through Amazon Comprehend, Amazon Translate, and Amazon Rekognition to perform topic modeling, sentiment analysis, entity and key phrase detection, and detect any unsafe images. The solution then visualizes these large-scale customer analyses using an Amazon QuickSight dashboard. This guide provides step-by-step instructions for deploying this solution including a pre-built dashboard that provides you with the context and insights necessary to identify trends that help or harm your brand.
66

77
The solution performs the following key features:
88

@@ -11,6 +11,8 @@ The solution performs the following key features:
1111
- **Determines if images associated with your brand contain unsafe content**: detects unsafe and negative imagery in content
1212
- **Helps customers identify insights in near real-time**: you can use a visualization dashboard to better understand context, threats, and opportunities almost instantly
1313

14+
This solution deploys an AWS CloudFormation template that supports both Twitter and RSS feeds as data source options for ingestion, but the solution can be customized to aggregate other social media platforms and internal enterprise systems.
15+
1416
For a detailed solution deployment guide, refer to [Discovering Hot Topics using Machine Learning](https://aws.amazon.com/solutions/implementations/discovering-hot-topics-using-machine-learning)
1517

1618
## On this Page
@@ -52,6 +54,7 @@ After you deploy the solution, use the included Amazon QuickSight dashboard to v
5254
- aws-lambda-dynamodb
5355
- aws-lambda-s3
5456
- aws-lambda-step-function
57+
- aws-sqs-lambda
5558

5659
## Deployment
5760

@@ -64,26 +67,37 @@ The solution is deployed using a CloudFormation template with a lambda backed cu
6467
```
6568
├── deployment [folder containing build scripts]
6669
│   ├── cdk-solution-helper [A helper function to help deploy lambda function code through S3 buckets]
70+
│ ├── build-s3-dist.sh [Build script to build the solution]
6771
└── source [source code containing CDK App and lambda functions]
6872
├── bin [entrypoint of the CDK application]
6973
├── lambda [folder containing source code the lambda functions]
70-
   ├── firehose-text-proxy [lambda function to write text analysis output to Amazon Kinesis Firehose]
74+
├── capture_news_feed [lambda function to ingest news feeds]
7175
│   ├── firehose_topic_proxy [lambda function to write topic analysis output to Amazon Kinesis Firehose]
76+
│   ├── firehose-text-proxy [lambda function to write text analysis output to Amazon Kinesis Firehose]
7277
│   ├── ingestion-consumer [lambda function that consumes messages from Amazon Kinesis Data Stream]
7378
│   ├── ingestion-producer [lambda function that makes Twitter API call and pushes data to Amazon Kinesis Data Stream]
7479
│   ├── integration [lambda function that publishes inference outputs to Amazon Events Bridge]
80+
│ ├── layers [lambda layer function library]
81+
│ │ ├── aws-nodesk-custom-config
82+
│ ├── quicksight-custom-resources [lambda function to create Amazon QuickSight resources, example: data source, data sets, analysis and dashboards]
83+
│ ├── shared [lambda layer function library (specific to python lambda runtimes)]
84+
│ ├── solution_helper [lambda function that allows capturing metrics for this solution]
7585
│   ├── storage-firehose-processor [lambda function that writes data to S3 buckets to build a relational model]
7686
│   ├── wf-analyze-text [lambda function to detect sentiments, key phrases and entities using Amazon Comprehend]
7787
│   ├── wf-check-topic-model [lambda function to check status of topic modeling jobs on Amazon Comprehend]
88+
│ ├── wf-detect-language [lambda function to detect language of ingested text content using Amazon Comprehend]
7889
│   ├── wf-detect-moderation-labels [lambda function to detect content moderation using Amazon Rekognition]
7990
│   ├── wf-extract-text-in-image [lambda function to extract text content from images using Amazon Rekognition]
8091
│   ├── wf-publish-text-inference [lambda function to publish Amazon Comprehend inferences]
8192
│   ├── wf-submit-topic-model [lambda function to submit topic modeling job]
8293
│   ├── wf-translate-text [lambda function to translate non-english text using Amazon Translate]
8394
│   └── wf_publish_topic_model [lambda function to publish topic modeling inferences from Amazon Comprehend]
8495
├── lib
96+
│ ├── aspects [CDK Aspects definitions to inject attributes during the prepare phase]
97+
│ ├── awsnodejs-lambda-layer [Lambda layer construct for lambda functions that run on Nodejs runtime]
8598
│   ├── ingestion [CDK constructs for data ingestion]
8699
│   ├── integration [CDK constructs for Amazon Events Bridge]
100+
│ ├── quicksight-custom-resources [CDK construct that invokes custom resources to create Amazon QuickSight resources]
87101
│   ├── storage [CDK constructs that define storage of the inference events]
88102
│   ├── text-analysis-workflow [CDK constructs for text analysis of ingested data]
89103
│   ├── topic-analysis-workflow [CDK constructs for topic visualization of ingested data]
@@ -136,13 +150,23 @@ $CF_TEMPLATE_BUCKET_NAME - The name of the S3 bucket where the CloudFormation te
136150
$QS_TEMPLATE_ACCOUNT - The account from which the Amazon QuickSight templates should be sourced for Amazon QuickSight Analysis and Dashboard creation
137151
```
138152

153+
- When creating and using buckets it is recommeded to:
154+
155+
- Use randomized names or uuid as part of your bucket naming strategy.
156+
- Ensure buckets are not public.
157+
- Verify bucket ownership prior to uploading templates or code artifacts.
158+
139159
- Deploy the distributable to an Amazon S3 bucket in your account. _Note:_ you must have the AWS Command Line Interface installed.
140160

141161
```
142162
aws s3 cp ./global-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
143163
aws s3 cp ./regional-s3-assets/ s3://my-bucket-name-<aws_region>/discovering-hot-topics-using-machine-learning/<my-version>/ --recursive --acl bucket-owner-full-control --profile aws-cred-profile-name
144164
```
145165

166+
## Collection of operational metrics
167+
168+
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. For more information, including how to disable this capability, please see the [implementation guide](https://docs.aws.amazon.com/solutions/latest/discovering-hot-topics-using-machine-learning/operational-metrics.html).
169+
146170
---
147171

148172
Copyright 2020-2021 Amazon.com, Inc. or its affiliates. All Rights Reserved.

deployment/build-s3-dist.sh

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,6 @@
2525
[ "$DEBUG" == 'true' ] && set -x
2626
set -e
2727

28-
# Important: CDK global version number
29-
cdk_version=1.78.0
30-
3128
# Check to see if input has been provided:
3229
if [ -z "$1" ] || [ -z "$2" ] || [ -z "$3" ] || [ -z "$4" ] || [ -z "$5" ] || [ -z "$6" ]; then
3330
echo "Please provide all required parameters for the build script"
@@ -100,10 +97,18 @@ echo "[Synth] CDK Project"
10097
echo "------------------------------------------------------------------------------"
10198
cd $source_dir
10299

100+
# Important: CDK global version number
101+
cdk_version=$(node ../deployment/get-cdk-version.js) # Note: grabs from node_modules/aws-cdk/package.json
102+
103+
echo "------------------------------------------------------------------------------"
104+
echo "[Install] Installing CDK $cdk_version"
105+
echo "------------------------------------------------------------------------------"
106+
103107
npm install aws-cdk@$cdk_version
104108

105109
## Option to suppress the Override Warning messages while synthesizing using CDK
106-
# export overrideWarningsEnabled=false
110+
export overrideWarningsEnabled=false
111+
echo "setting override warning to $overrideWarningsEnabled"
107112

108113
node_modules/aws-cdk/bin/cdk synth --output=$staging_dist_dir
109114

deployment/cdk-solution-helper/index.js

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ const fs = require('fs');
1717
// Paths
1818
const global_s3_assets = '../global-s3-assets';
1919

20+
//this regular express also takes into account lambda functions defined in nested stacks
21+
const _regex = /[\w]*AssetParameters/g;
22+
2023
// For each template in global_s3_assets ...
2124
fs.readdirSync(global_s3_assets).forEach(file => {
2225
// Import and parse template file
@@ -26,27 +29,34 @@ fs.readdirSync(global_s3_assets).forEach(file => {
2629
// Clean-up Lambda function code dependencies
2730
const resources = (template.Resources) ? template.Resources : {};
2831
const lambdaFunctions = Object.keys(resources).filter(function (key) {
29-
return resources[key].Type === 'AWS::Lambda::Function';
32+
return (resources[key].Type === 'AWS::Lambda::Function' || resources[key].Type === 'AWS::Lambda::LayerVersion');
3033
});
3134

3235
lambdaFunctions.forEach(function (f) {
3336
const fn = template.Resources[f];
34-
if (fn.Properties.Code.hasOwnProperty('S3Bucket')) {
37+
let prop;
38+
if (fn.Properties.hasOwnProperty('Code')) {
39+
prop = fn.Properties.Code;
40+
} else if (fn.Properties.hasOwnProperty('Content')) {
41+
prop = fn.Properties.Content;
42+
}
43+
44+
if (prop.hasOwnProperty('S3Bucket')){
3545
// Set the S3 key reference
36-
let artifactHash = Object.assign(fn.Properties.Code.S3Bucket.Ref);
46+
let artifactHash = Object.assign(prop.S3Bucket.Ref);
3747
// console.debug(`Old artificatHash is ${artifactHash}`);
38-
artifactHash = artifactHash.replace(/[\w]*AssetParameters/g, '');
48+
artifactHash = artifactHash.replace(_regex, '');
3949
artifactHash = artifactHash.substring(0, artifactHash.indexOf('S3Bucket'));
4050
// console.debug(`New artificatHash is ${artifactHash}`);
4151
const assetPath = `asset${artifactHash}`;
42-
fn.Properties.Code.S3Key = `%%SOLUTION_NAME%%/%%VERSION%%/${assetPath}.zip`;
52+
prop.S3Key = `%%SOLUTION_NAME%%/%%VERSION%%/${assetPath}.zip`;
4353

4454
// Set the S3 bucket reference
45-
fn.Properties.Code.S3Bucket = {
55+
prop.S3Bucket = {
4656
'Fn::Sub': '%%BUCKET_NAME%%-${AWS::Region}'
4757
};
4858
} else {
49-
// console.debug(`Here is the fn dump ${JSON.stringify(fn)}`);
59+
console.warn(`No S3Bucket Property found for ${JSON.stringify(prop)}`);
5060
}
5161
});
5262

@@ -61,19 +71,19 @@ fs.readdirSync(global_s3_assets).forEach(file => {
6171
'Fn::Join': [
6272
'',
6373
[
64-
'https://s3.',
74+
'https://%%TEMPLATE_BUCKET_NAME%%.s3.',
6575
{
6676
'Ref' : 'AWS::URLSuffix'
6777
},
6878
'/',
69-
`%%TEMPLATE_BUCKET_NAME%%/%%SOLUTION_NAME%%/%%VERSION%%/${fn.Metadata.nestedStackFileName}`
79+
`%%SOLUTION_NAME%%/%%VERSION%%/${fn.Metadata.nestedStackFileName}`
7080
]
7181
]
7282
};
7383

7484
const params = fn.Properties.Parameters ? fn.Properties.Parameters : {};
7585
const nestedStackParameters = Object.keys(params).filter(function(key) {
76-
if (key.search(/[\w]*AssetParameters/g) > -1) {
86+
if (key.search(_regex) > -1) {
7787
return true;
7888
}
7989
return false;
@@ -87,12 +97,9 @@ fs.readdirSync(global_s3_assets).forEach(file => {
8797
// Clean-up parameters section
8898
const parameters = (template.Parameters) ? template.Parameters : {};
8999
const assetParameters = Object.keys(parameters).filter(function (key) {
90-
console.debug(`key to analyze ${key}`);
91-
if (key.search(/[\w]*AssetParameters/g) > -1) {
92-
// console.debug('Pattern match');
100+
if (key.search(_regex) > -1) {
93101
return true;
94102
}
95-
// console.debug('Pattern did not match');
96103
return false;
97104
});
98105
assetParameters.forEach(function (a) {

source/bin/discovering-hot-topics-app.ts

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,12 @@
1414

1515

1616
import * as cdk from '@aws-cdk/core';
17+
import { ApplytoLambda } from '../lib/aspects/apply-to-lambda';
1718
import { DiscoveringHotTopicsStack } from '../lib/discovering-hot-topics-stack';
1819

1920
const app = new cdk.App();
20-
new DiscoveringHotTopicsStack(app, 'discovering-hot-topics-using-machine-learning', {
21-
description: '(SO0122) - Discovering Hot Topics using Machine Learning. Version %%VERSION%%',
22-
solutionID: 'SO0122',
23-
solutionName: 'discovering-hot-topics-using-machine-learning'
21+
const dht = new DiscoveringHotTopicsStack(app, 'discovering-hot-topics-using-machine-learning', {
22+
description: `(${app.node.tryGetContext('solution_id')}) - Discovering Hot Topics using Machine Learning. Version %%VERSION%%`
2423
});
24+
25+
app.node.applyAspect(new ApplytoLambda(dht, 'CustomConfig'));

source/cdk.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
{
22
"app": "npx ts-node bin/discovering-hot-topics-app.ts",
33
"context": {
4-
"quicksight_source_template_arn": "arn:aws:quicksight:us-east-1:%%TEMPLATE_ACCOUNT_ID%%:template/%%DIST_QUICKSIGHT_NAMESPACE%%_%%SOLUTION_NAME%%_%%DASHED_VERSION%%"
4+
"quicksight_source_template_arn": "arn:aws:quicksight:us-east-1:%%TEMPLATE_ACCOUNT_ID%%:template/%%DIST_QUICKSIGHT_NAMESPACE%%_%%SOLUTION_NAME%%_%%DASHED_VERSION%%",
5+
"solution_id": "SO0122",
6+
"solution_name": "%%SOLUTION_NAME%%",
7+
"solution_version": "%%VERSION%%"
58
}
69
}

source/images/architecture.png

69.6 KB
Loading

source/images/dashboard.png

-644 KB
Loading

0 commit comments

Comments
 (0)