Skip to content

Commit 761f211

Browse files
committed
Merge branch 'feature/tca' into main
2 parents 5ea550d + 4c75622 commit 761f211

14 files changed

+131
-22
lines changed

pca-server/src/pca/pca-aws-file-drop-trigger.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
"""
2+
This python function is triggered when a new audio file is dropped into the S3 bucket that has
3+
been configured for audio ingestion. It will ensure that no Transcribe job already exists for this
4+
filename, and will then trigger the main Step Functions workflow to process this file.
5+
6+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
7+
SPDX-License-Identifier: Apache-2.0
8+
"""
19
import json
210
import urllib.parse
311
import boto3
@@ -7,7 +15,6 @@
715
def lambda_handler(event, context):
816
# Load our configuration
917
cf.loadConfiguration()
10-
print("S3 Event: " + str(event["Records"][0]))
1118

1219
# Get the object from the event and validate it exists
1320
s3 = boto3.client("s3")

pca-server/src/pca/pca-aws-sf-bulk-files-count.py

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
1+
"""
2+
This python function is part of the bulk files workflow. The system will load the Bulk configuration values
3+
once, and re-use them throughout the run, so the config values at the start of the run will remain valid.
4+
There is not quick way to count the files in an S3 bucket, so rather than track what's left in the bucket
5+
we just care about having any left to process and instead count how far we've gotten instead.
6+
7+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
8+
SPDX-License-Identifier: Apache-2.0
9+
"""
110
import pcaconfiguration as cf
211
import copy
312
import boto3
413

14+
515
def lambda_handler(event, context):
6-
"""
7-
Entrypoint for bulk loading audio files. The system will load the Bulk configuration values
8-
once, and re-use them throughout the run, so the config values at the start of the run will
9-
remain valid. There is not quick way to count the files in an S3 bucket, so rather than track
10-
what's left in the bucket we just care about having any left to process and instead count
11-
how far we've gotten instead.
12-
"""
1316

1417
# Get our params, looking them up if we haven't got them
1518
if "sourceBucket" in event:
@@ -46,6 +49,7 @@ def lambda_handler(event, context):
4649
# Return current event data
4750
return sfData
4851

52+
4953
if __name__ == "__main__":
5054
event = {}
5155
print(lambda_handler(event, ""))

pca-server/src/pca/pca-aws-sf-bulk-move-files.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,17 @@
1+
"""
2+
This python function is part of the bulk files workflow. Based upon the queueSpace parameter, this will
3+
move up to that many files into the PCA audio bucket, but only up to a maximum number as specified by
4+
the dripRate - this ensures that we don't overload they system
5+
6+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
7+
SPDX-License-Identifier: Apache-2.0
8+
"""
19
import copy
210
import boto3
311

12+
413
def lambda_handler(event, context):
5-
"""
6-
Based upon the queueSpace parameter, this will move up to that many file into the PCA audio bucket, but
7-
only up to a maximum number as specified by the dripRate - this ensures that we don't overload they system
8-
"""
14+
915
# Load our event
1016
sfData = copy.deepcopy(event)
1117
filesLimit = sfData["filesLimit"]
@@ -42,6 +48,7 @@ def lambda_handler(event, context):
4248
sfData.pop("queueSpace", None)
4349
return sfData
4450

51+
4552
if __name__ == "__main__":
4653
event = {
4754
"sourceBucket": "pca-bulk-upload",

pca-server/src/pca/pca-aws-sf-bulk-queue-space.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
1+
"""
2+
This python function is part of the bulk files workflow. Checks the current state of the Transcribe job queue,
3+
taking into account running and queued jobs. It then returns the calculated head-space in the queue that the
4+
Bulk process is able to use. If any of the API calls to Transcribe or S3 get throttled then we say the queue
5+
is full this cycle and carry on
6+
7+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
8+
SPDX-License-Identifier: Apache-2.0
9+
"""
110
import copy
211
import boto3
312

13+
414
def countTranscribeJobsInState(status, client, filesLimit):
515
"""
616
Queries Transcribe for the number of jobs with the given status. If there are more than 100
@@ -15,12 +25,9 @@ def countTranscribeJobsInState(status, client, filesLimit):
1525

1626
return found
1727

28+
1829
def lambda_handler(event, context):
19-
"""
20-
Checks the current state of the Transcribe job queue, taking into account running and queued jobs.
21-
It then returns the calculated head-space in the queue that the Bulk process is able to use. If any
22-
of the API calls to Transcribe or S3 get throttled then we say the queue is full this cycle and carry on
23-
"""
30+
2431
# Load our event, but we no longer need "filesToMove"
2532
sfData = copy.deepcopy(event)
2633
filesLimit = sfData["filesLimit"]
@@ -41,6 +48,7 @@ def lambda_handler(event, context):
4148
sfData["queueSpace"] = max(0, (filesLimit - found))
4249
return sfData
4350

51+
4452
if __name__ == "__main__":
4553
event = {
4654
''

pca-server/src/pca/pca-aws-sf-get-detected-language.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
"""
2+
This python function is part of the main processing workflow. It picks out the result of a transcription job
3+
and extracts the languag code. This is only used on jobs that were started on a short audio clip with the
4+
sole purpose of language identification.
5+
6+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
7+
SPDX-License-Identifier: Apache-2.0
8+
"""
19
from urllib.parse import urlparse
210
import boto3
311
import copy
@@ -39,6 +47,7 @@ def lambda_handler(event, context):
3947
sfData["langCode"] = transcribeJobInfo["LanguageCode"]
4048
return sfData
4149

50+
4251
# Main entrypoint for testing
4352
if __name__ == "__main__":
4453
event = {

pca-server/src/pca/pca-aws-sf-language-detection.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
"""
2+
This python function is part of the main processing workflow. It will create a 30-second clip of our original
3+
audio file and submit it to standard Amazon Transcribe, on the understanding that the next workflow step
4+
is interested in the detected language code that this job generates and not the transcript of the clip
5+
6+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
7+
SPDX-License-Identifier: Apache-2.0
8+
"""
19
import copy
210
import boto3
311
import pcaconfiguration as cf

pca-server/src/pca/pca-aws-sf-process-turn-by-turn.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
"""
2-
Parses the output from an Amazon Transcribe job into turn-by-turn
3-
speech segments with sentiment analysis scores from Amazon Comprehend
2+
This python function is part of the main processing workflow. Parses the output from an Amazon Transcribe job into
3+
turn-by-turn speech segments with sentiment analysis scores from Amazon Comprehend
4+
5+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
6+
SPDX-License-Identifier: Apache-2.0
47
"""
58
from pathlib import Path
69
from datetime import datetime
@@ -408,7 +411,7 @@ def create_combined_tca_graphic(self):
408411

409412
# Upload the graphic to S3
410413
s3Client = boto3.client('s3')
411-
object_key = cf.appConfig[cf.CONF_PREFIX_PARSED_RESULTS] + "/tcaImagery/" + base_filename
414+
object_key = "tcaImagery/" + base_filename
412415
s3Client.upload_file(chart_filename, cf.appConfig[cf.CONF_S3BUCKET_OUTPUT], object_key)
413416

414417
# Remove the local file and return our S3 URL so that the UI can create signed URLs for browser rendering
@@ -1544,9 +1547,9 @@ def lambda_handler(event, context):
15441547
# "key": "originalAudio/stereo.mp3",
15451548
# "apiMode": "analytics",
15461549
# "jobName": "stereo.mp3",
1547-
"key": "originalAudio/example-call.wav",
1550+
"key": "originalAudio/Auto1_GUID_001_AGENT_AndrewK_DT_2021-12-01T07-55-51.wav",
15481551
"apiMode": "analytics",
1549-
"jobName": "example-call.wav",
1552+
"jobName": "Auto1_GUID_001_AGENT_AndrewK_DT_2021-12-01T07-55-51.wav",
15501553
"langCode": "en-US",
15511554
"transcribeStatus": "COMPLETED"
15521555
}

pca-server/src/pca/pca-aws-sf-start-transcribe-job.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
1+
"""
2+
This python function is part of the main processing workflow. It will start a job in the Amazon Transcribe service,
3+
using whatever configuration parameters are set. It handles all of the cross-validation of parameters, and takes
4+
into account the audio format - it will then degrade certain feature requests; e.g. if you have configured the app
5+
to do channel-separated audio jobs but the audio file is mono then it switch to speaker-separation mode.
6+
7+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
8+
SPDX-License-Identifier: Apache-2.0
9+
"""
110
import copy
211
import boto3
312
import subprocess
@@ -50,6 +59,7 @@ def delete_existing_job(job_name, transcribe, api_mode):
5059
# If the job has already been deleted then we don't need to take any action
5160
pass
5261

62+
5363
def count_audio_channels(bucket, key):
5464
'''
5565
Examines an audio file using the FFPROBE utility to determine the number of audio channels in the file. If

pca-server/src/pca/pca-aws-sf-transcribe-failed.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,16 @@
1+
"""
2+
This python function is part of the main processing workflow. It handles the clean-up for when the workflow fails
3+
for expected reasons, such as being unable to perform Language Identification, and clears up or moves any resources
4+
associated with this execution.
5+
6+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
7+
SPDX-License-Identifier: Apache-2.0
8+
"""
19
import boto3
210
import pcaconfiguration as cf
311
import pcacommon
412

13+
514
def lambda_handler(event, context):
615
"""
716
When a file has failed to transcribe then we need to do two things:
@@ -35,6 +44,7 @@ def lambda_handler(event, context):
3544
# Return our input data as the final result
3645
return event
3746

47+
3848
if __name__ == "__main__":
3949
event = {
4050
"bucket": "ajk-call-analytics-demo",

pca-server/src/pca/pca-aws-sf-wait-for-transcribe-notification.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
"""
2+
This python function is part of the main processing workflow. It is called when a Transcribe job is started, and it
3+
will create an entry in a DynamoDB table that holds some job information and the Step Functions task token. The Step
4+
Function should then wait for another task to read this task token from DynamoDB and resume the execution.
5+
6+
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
7+
SPDX-License-Identifier: Apache-2.0
8+
"""
19
import json
210
import boto3
311
import os

0 commit comments

Comments
 (0)