Categorias
Programação Technology

GitHub Actions workflow for deploying a scheduled task using AWS ECS and EventBridge Scheduler

Cron jobs are repetitive tasks scheduled to run periodically at fixed times, dates, or intervals. It typically automates system maintenance or administration. Some workloads that are still running on non-containerized platforms (VMs, bare metal, etc.) are suitable to be moved to Serveless with multiple alternatives, depending on the context of each task.

Considering AWS services, for most of the options EventBridge Scheduler will be used to manage tasks as it is capable of invoking lots of AWS services. One of them is invoking a containerized application, or ECS task.

Amazon EventBridge Scheduler is a serverless scheduler that allows you to create, run, and manage tasks from one central, managed service. Highly scalable, EventBridge Scheduler allows you to schedule millions of tasks that can invoke more than 270 AWS services and over 6,000 API operations. Without the need to provision and manage infrastructure, or integrate with multiple services, EventBridge Scheduler provides you with the ability to deliver schedules at scale and reduce maintenance costs.
-- https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html
(EventBridge Scheduler is recommended to be used instead of CloudWatch Scheduler with EventBridge rules)

I worked on an application running on EC2 to ECS that is still using its cron jobs. Cron jobs were migrated to EventBridge Scheduler. Our CI/CD uses GitHub Actions and Terraform. AWS provides actions that can create and deploy a ECS task definition (the container blueprint) to an ECS service, but there is no action to deploy an ECS task to the EventBridge Scheduler, as the cron task is not executed under a service.

To deploy the new code we have to write some code to the GitHub Action and I think it might benefit others in a similar context. We use Terraform as Infrastructure as a Code, so keep this in mind if you need to adapt to your IaaS solution.

There will be the full Yaml file here but I will comment parts of it separately afterwards.

name: Deploy Scheduled task XYZ

on:
  workflow_dispatch:
    inputs:
      imageHash:
        description: 'Image hash to deploy'
        required: true
        type: string
      environment:
        description: 'Environment to run tests against'
        type: environment
        required: true
        default: test

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

env:
  APP_NAME: application-name-here
  AWS_REGION: "ap-southeast-2"
  ECR_NAME: ecr-name-here
  ECS_CLUSTER: cluster-name-here
  IMAGE_NAME: application-image-name-here
  TASK_NAME: cron-task-name-here

permissions:
  id-token: write
  contents: read    # This is required for actions/checkout

jobs:
  deploy:
    name: Deploy to ${{ inputs.environment }}
    runs-on: ubuntu-latest
    environment: ${{ inputs.environment }}
    env:
      JOB_ENV: ${{ inputs.environment }}
    steps:
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets[format('iam_role_to_assume_{0}', inputs.environment)] }}
          role-session-name: github-ecr-push-workflow-${{ inputs.environment }}
          aws-region: ${{ env.AWS_REGION }}

      - name: Verify image
        run: aws ecr describe-images --repository-name ${{ inputs.environment }}-${{ env.ECR_NAME }}-${{ env.APP_NAME }} --image-ids imageTag=${{ inputs.imageHash }}

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Download the task definition ${{ env.TASK_NAME }}
        run: aws ecs describe-task-definition --task-definition ${{ env.TASK_NAME }} --query taskDefinition > task-definition.json

      - name: Fill in the new image ID in the Amazon ECS task definition ${{ env.TASK_NAME }}
        id: task-def-cron
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
        with:
          task-definition: task-definition.json
          container-name: ${{ env.TASK_NAME }}
          image: ${{ env.ECR_REGISTRY }}/${{ env.JOB_ENV }}-${{ env.ECR_NAME }}-${{ env.APP_NAME }}:${{ inputs.imageHash }}

      - name: Deploy Amazon ECS task definition ${{ env.TASK_NAME }}
        id: deploy-cron
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def-cron.outputs.task-definition }}
          cluster: ${{ inputs.environment }}-${{ env.ECS_CLUSTER }}

      - name: Checkout infrastructure
        uses: actions/checkout@v4
        with:
          repository: orgnamehere/iaas-repo-here
          ref: main
          path: './working-path'
          token: ${{ secrets.PAT_TOKEN }}

      - name: Update schedule ${{ env.TASK_NAME }}
        working-directory: './working-path'
        env:
          GH_TOKEN: ${{ secrets.PAT_TOKEN }}
          INFRASTRUCTURE_FILE: 'path/to/your/module/terraform-file-here.tf'
          UNESCAPED_ARN: ${{ steps.deploy-cron.outputs.task-definition-arn }}
        run: |
          # Escape regexp non-safe characters from the ARN to prevent sed to fail
          export ESCAPED_ARN=${UNESCAPED_ARN//:/\\:}
          export ESCAPED_ARN=${ESCAPED_ARN//\//\\/}
          echo "Escaped ARN: $ARN"
          # Retrieve <appl-name>:<version> part of the ARN to use in PR 
          export ARRAY_ARN_PARTS=(${UNESCAPED_ARN//\// })
          export VERSION_PART=${ARRAY_ARN_PARTS[1]}
          export COMMIT_MESSAGE="DEPLOY: Deployment on ${{ inputs.environment }} - $VERSION_PART"
          # Use task definition version for branch name
          export BRANCH_NAME="deploy-${VERSION_PART//:/-}"
          git config user.email "[email protected]"
          git config user.name "Github Actions Pipeline"
          git checkout -b ${BRANCH_NAME}
          sed -i '/task_definition_arn /s/".*/'"\"${ESCAPED_ARN}"\"'/' $INFRASTRUCTURE_FILE
          git add ${{ env.INFRASTRUCTURE_FILE }}
          git commit -m "$COMMIT_MESSAGE"
          git push --set-upstream origin ${BRANCH_NAME}
          gh pr create --fill --body "- [x] $COMMIT_MESSAGE"

This pipeline will create the scheduled task definition, check out the IaaS repository, change the Scheduler task definition ARN and create a PR in the IaaS repository. We are using that also as a way to have approval to deploy to Production, but it can be automated if needed.

Let's comment on some parts:

on:
  workflow_dispatch:
    inputs:
      imageHash:
        description: 'Image hash to deploy'

It is good to separate the image creation from the deployment. This input is required assuming the image was created and published to the registry. This promotes reusability and flexibility.

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets[format('iam_role_to_assume_{0}', inputs.environment)] }}
          role-session-name: github-ecr-push-workflow-${{ inputs.environment }}
          aws-region: ${{ env.AWS_REGION }}

Environment is a required input and this pipeline can be executed against any environment you defined in your repository.

      - name: Deploy Amazon ECS task definition ${{ env.TASK_NAME }}
        id: deploy-cron
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
        with:
          task-definition: ${{ steps.task-def-cron.outputs.task-definition }}
          cluster: ${{ inputs.environment }}-${{ env.ECS_CLUSTER }}

Although the action name is "deploy task definition" it will create the task, but not deploy, as this is only possible when you provide a service input (by the time this article is being written). But we are not deploying to a service though, so the action will only create the task definition, but the EventBridge Scheduler remains calling the same task definition it was invoking before the creation of this task definition.

      - name: Checkout infrastructure
        uses: actions/checkout@v4
        with:
          repository: orgnamehere/iaas-repo-here
          ref: main
          path: './working-path'
          token: ${{ secrets.PAT_TOKEN }}

Using AWS CLI was an alternative we considered, but changing the target of the EventBridge scheduler becomes a little bit confusing and brings some cognitive complexity in case we need to change something. We decided then to fetch the IaaS repository and control the task definition version to be the target of the scheduler in Terraform code, so we could also be sure any dependency that the target change could have would be managed by Terraform, instead of another CLI change in this pipeline. We checkout the IaaS repo and save the path as ./working-path to keep the workspace clean. The name is your choice.

      - name: Update schedule ${{ env.TASK_NAME }}
        working-directory: './working-path'
        env:
          GH_TOKEN: ${{ secrets.PAT_TOKEN }}
          INFRASTRUCTURE_FILE: 'path/to/your/module/terraform-file-here.tf'
          UNESCAPED_ARN: ${{ steps.deploy-cron.outputs.task-definition-arn }}
        run: |
          # Escape regexp non-safe characters from the ARN to prevent sed to fail
          export ESCAPED_ARN=${UNESCAPED_ARN//:/\\:}
          export ESCAPED_ARN=${ESCAPED_ARN//\//\\/}
          echo "Escaped ARN: $ARN"
          # Retrieve <appl-name>:<version> part of the ARN to use in PR 
          export ARRAY_ARN_PARTS=(${UNESCAPED_ARN//\// })
          export VERSION_PART=${ARRAY_ARN_PARTS[1]}
          export COMMIT_MESSAGE="DEPLOY: Deployment on ${{ inputs.environment }} - $VERSION_PART"
          # Use task definition version for branch name
          export BRANCH_NAME="deploy-${VERSION_PART//:/-}"
          git config user.email "[email protected]"
          git config user.name "Github Actions Pipeline"
          git checkout -b ${BRANCH_NAME}
          sed -i '/task_definition_arn /s/".*/'"\"${ESCAPED_ARN}"\"'/' $INFRASTRUCTURE_FILE
          git add ${{ env.INFRASTRUCTURE_FILE }}
          git commit -m "$COMMIT_MESSAGE"
          git push --set-upstream origin ${BRANCH_NAME}
          gh pr create --fill --body "- [x] $COMMIT_MESSAGE"

This is where we use sed to search and replace the ARN in the terraform code. We scape the ARN before applying sed to not mess with the search regexp.

The terraform code expected to be changed will be something like this:

# main.tf
-        task_definition_arn     = "arn:aws:ecs:ap-southeast-2:123456789012:task-definition/task-definition-cron-name:57"
+       task_definition_arn     = "arn:aws:ecs:ap-southeast-2:123456789012:task-definition/task-definition-cron-name:58"

Links:

Categorias
Solving Problems

Solving problems 1: ECS, Event Bridge Scheduler, PHP, migrations

I love Mondays and Business as Usual. Solving problems is a delightful day-to-day task. Maybe this is what working with software means in the end. Do not take me wrong, it opens the doors for greenfield projects and experimentation. While mastering the business I can experiment, change and rebuild.

The solving problems series is just a way to share small ideas, experiences and outcomes of solving daily problems as I go. I wonder if some tips or experiences shared can help you build better what you are working on right now.


During the last months, I have been migrating an important PHP service to ECS Fargate along with the runtime upgrade. The service is composed of a lot of parts and we have been architecting the migration so the operation causes no downtime to customers, even when they are over four different continents and many time zones.

One very important part of the service is already running in production for some months with success. We are preparing the next service.

For the migration plan, we deployed infrastructure ahead of starting moving traffic, planned to daily incremental traffic switch, like 5, 10, 25, 50, 75, and close monitoring. Also prepared a second plan to avoid rollback in case some performance issue arises. While monitoring we created backlog tickets with the observability outcomes.

During migration phases prepare yourself beforehand for the initial (1%, or 5%) traffic switch, so you can catch quickly those hidden use cases that only happen in production and act quickly. If you do so, other phases are just a matter of watching how scaling works.

Using containers (of course Kubernetes is a great alternative) is a fantastic opportunity to upgrade PHP runtimes efficiently at the same time where we use a much better platform that helps with delivery and developer experiences. The very first and most important step I recommend is to review how you deal with your secret and environment variables. This is pivotal for the success of a smooth migration.

We can expect that those type of applications has a fair amount of cron jobs associated with them. This is a great opportunity to follow the old saying "use the right tool for the right problem" and my suggestion would be to rewrite it, turning it into Lambda or Step Functions, as applicable to each of what the cron job is doing. This is closer to what and how a job should run.

It happens that not always we can start refactoring right away, and then I can say that my experiences with Event Bridge Scheduler triggering ECS tasks (previously cron jobs) are great. They are interestingly cheap alternatives while waiting for the refactoring project to take over. Don't take this as your permanent solution though, because it is not just right and a waste of resources and couple the cron job too much with parts of the application not really related.

We were reviewing the backlog and observability results of the last service. As we could prioritise and execute some backlog tickets, the dashboard and metrics highlighted that we had some room to review scaling and resource thresholds. We changed them carefully, resulting in a bill ~50% cheaper, CPU and memory resource stable and no performance degradation.

Some notes:

  • Investing in test automation is good for your developer experience, site reliability and revenue; also a great support for technology improvements
  • It is worth taking a look at the ALBRequestCountPerTarget metric if you have CPU-heavy processes as you can better control how ECS will handle scale policies, avoiding peak of CPU where the CPU average metric is not enough for scaling

Links: