10 Oct, 2023
One problem with running long running tasks on AWS in a server less way is Lambda's 15 minute timeout.
An alternative is Fargate, but that requires setting up Docker Containers, Elastic Container Registry and a VPC so feels a bit less serverless.
If your long running task is just doing lots of smaller things you can have a lambda function that is called repeatedly by StepFunctions until it completes. That's what I'll describe here.
(Another alternative is to use Lambda Power Tools batch processing which uses SQS behind the scenes.)
To make this work we also want retires and custom error handling. Specifically we want an AbortError to not retry and that is a little tricky. Read this to understand the background:
Now, we'll 'mock' the Lambda's behaviour using a value in DynamoDB and return either successfully, with an error, or with an AbortError. Success ends the state machine execution, error forces retries with backoff and AbortError moves to a fallback state which prints the error and ends.
Deploy the stack by uploading it in the CloudFormation web UI (or any other way). You'll need to choose a name for the stack such as Tasks
and a name for the DynamoDB table such has Tasks
. This will name the actual table TasksTasks
.
AWSTemplateFormatVersion: "2010-09-09"
Description: An example template with an IAM role for a Lambda state machine.
Parameters:
TableName:
Description: The name of the table (the stack name will be pre-pended)
Type: String
Default: Tasks
MinLength: "1"
Resources:
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
LambdaFunction:
Type: AWS::Lambda::Function
Properties:
Handler: index.handler
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import boto3
import os
dynamodb = boto3.client('dynamodb', region_name=os.environ["AWS_REGION"])
class Abort(Exception):
pass
class Error(Exception):
pass
def handler(event, context):
response = dynamodb.get_item(
TableName=f'{os.environ["STACK_NAME"]}{os.environ["TABLE_NAME"]}',
Key={
'pk': {
'S': 'pk1',
},
'sk': {
'S': 'sk1',
}
}
)
data = response['Item']['data']['S'].lower().strip()
print('Got data:', data)
if data.lower() == 'abort':
raise Abort('This is an Abort error which should get caught and result in transition to the Fallback state!')
elif data.lower() == 'error':
raise Error('This is an Error which should cause a retry!')
else:
event['data'] = data
return event
Runtime: python3.11
Timeout: "25"
Architectures:
- arm64
Environment:
Variables:
STACK_NAME: !Sub '${AWS::StackName}'
TABLE_NAME: !Sub '${TableName}'
DynamoDbTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: !Sub '${AWS::StackName}${TableName}'
BillingMode: PAY_PER_REQUEST
AttributeDefinitions:
- AttributeName: pk
AttributeType: S
- AttributeName: sk
AttributeType: S
KeySchema:
- AttributeName: pk
KeyType: HASH
- AttributeName: sk
KeyType: RANGE
TimeToLiveSpecification:
AttributeName: TimeToLive
Enabled: true
LambdaDynamoPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub '${AWS::StackName}LambdaDynamoPolicy'
Description: Managed policy for a Lambda function launched by CloudFormation
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:Query
- dynamodb:PutItem
- dynamodb:UpdateItem
Resource:
- !GetAtt DynamoDbTable.Arn
# Define the role here, rather than the managed policy on the role, to avoid a circular dependency
Roles:
- !Ref LambdaExecutionRole
LambdaCloudWatchPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
ManagedPolicyName: !Sub '${AWS::StackName}LambdaCloudWatchPolicy'
Description: Managed policy for a Lambda function launched by CloudFormation
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- logs:CreateLogStream
Resource:
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${LambdaLogGroup}:*"
- Effect: Allow
Action:
- logs:PutLogEvents
Resource:
- !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${LambdaLogGroup}:*"
# Define the role here, rather than the managed policy on the role, to avoid a circular dependency
Roles:
- !Ref LambdaExecutionRole
# See https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/147
# https://typicalrunt.me/2019/09/20/enforcing-least-privilege-when-logging-lambda-functions-to-cloudwatch/
# WARNING: If the lambda function gets updated, its name will change, so the log group will change, so the old logs will get deleted despite the retention period here.
LambdaLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub "/aws/lambda/${LambdaFunction}"
RetentionInDays: 30
StatesExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service:
- !Sub states.${AWS::Region}.amazonaws.com
Action: sts:AssumeRole
Path: /
Policies:
- PolicyName: StatesExecutionPolicy
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- lambda:InvokeFunction
Resource: '*'
MyStateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
RoleArn: !GetAtt StatesExecutionRole.Arn
Definition:
Comment: A Hello World example using an AWS Lambda function
StartAt: HelloWorld
States:
HelloWorld:
Type: Task
Resource: !GetAtt LambdaFunction.Arn
Retry:
- ErrorEquals: [Abort]
MaxAttempts: 0
- ErrorEquals: [States.ALL]
IntervalSeconds: 20
MaxAttempts: 4
BackoffRate: 1.2
Catch:
- ErrorEquals: [Abort]
Next: Fallback
End: true
Fallback:
Type: Pass
Parameters:
Cause.$: States.StringToJson($.Cause)
End: true
Now have a play:
Set the value of pk1 sk1 data in your DynamoDB table as 'Error' and watch the retries occuring
Start an execution with any initial state, such as this:
{
"comment": "Change the pk1 sk1 'data' value to 'Error', 'Abort' or 'Success' to trigger different parts of the state machine."
}
Watch as retries occur, starting at 20 seconds, but taking and longer with each retry
Change the DynamoDB value to 'Abort' to watch a transition to the Fallback state, or anything else like 'Success' to watch it succeeed.
Some useful plugins:
Format CloudFormation:
cfn-format -w tasks.yml
Future?
Be the first to comment.
Copyright James Gardner 1996-2020 All Rights Reserved. Admin.