Traefik 2.3 + ECS + Fargate : Reverse proxy serverless in AWS

Traefik 2.3 + ECS + Fargate : Reverse proxy serverless in AWS

Traefik is a reverse proxy that we have already mentioned on this blog in the past. Very powerful coupled with containers, it allows a fine and light management of traffic.

A few days ago, Containous, the editor of Traefik, announced the release of Traefik 2.3.0-rc2. This new version brings some changes, including :

  • The addition of a new service: Traefik Pilot.
  • The ability to add plugins to Traefik
  • The addition of the ECS provider

I have already covered the first two points on this blog and I will focus here on the support of the ECS (Elastic Container Service) backend on AWS via a new Traefik provider.

Disclaimer : This post is a translated version of the blog post I made for my company, you can find the french version here, on WeScale blog.

Traefik in the land of providers

What's a provider?

Like Terraform, Traefik uses a notion of provider to define the services it will connect to.

Each provider has its own vocabulary and configuration. The basic idea is of course to have a light kernel, Traefik, and to load only the providers we use.

There are now about fifteen providers available for Traefik, such as Docker, Kubernetes, Rancher, Etcd, Consul etc...

The provider is the data source that Traefik will use to discover the backends it will connect to.

Why the addition of the ECS provider changes the game

A bit of context: ECS is the AWS managed orchestrator, it allows to drive containers on EC2 or on another service, Fargate, which allows to run its containers in serverless mode.

In Fargate, we simply reserve resources, and Amazon takes care of the underlying infrastructure for us (because serverless is not magic).

The addition of the ECS provider allows Traefik to dynamically discover ECS driven resources, in order to attach them directly to itself, which gives more dynamism in your deployments. This discovery is based on Traefik itself using polling via AWS APIs.

Moreover, it allows not to have one AWS ALB per resource, or with a lot of rules, but just one that sends back to Traefik, and it is the latter that takes care of all the routing. Enough to interconnect Traefik with ECS wherever it is deployed. The example here reuses ECS for simplicity, it is not necessary, you can host it wherever you want.

The provider in action

Disclaimer

I therefore propose a small hands-on using this provider, with a small disclaimer:

  • This hands-on is made on a "Release candidate" version, so the final version can be slightly different.
  • It is realized with Terraform, which is absolutely not a prerequisite, you can achieve the same result with CloudFormation for example
  • A bug is currently present in the metadata management on ECS, hence the fact that we go through an API Key, which is not a good security practice in AWS.
  • This hands-on is a demonstration, in a productive environment, we will implement stronger hardening on some parameters

What we are going to deploy

Description of the deployment

The complete Terraform deployment code is available here.

I won't describe here the whole Terraform deployment, as it's not the heart of this article, I'll rather focus on the points we are interested in for Traefik, and the nuances to take into account.

The IAM policy, also used as an IAM user in the example, because of the bug described above :

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "TraefikECSReadAccess",
            "Effect": "Allow",
            "Action": [
                "ecs:ListClusters",
                "ecs:DescribeClusters",
                "ecs:ListTasks",
                "ecs:DescribeTasks",
                "ecs:DescribeContainerInstances",
                "ecs:DescribeTaskDefinition",
                "ec2:DescribeInstances"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

As can be noted, these are read-only rights on ECS and EC2 services, as ECS can exploit EC2 for its execution nodes.

Note that it is quite possible to restrict this policy, in a least privilege model, as recommended by AWS, for example by reducing the scope of ECS visible by Traefik, or by allowing only EC2 with a specific tag.

The policy indicated here is taken from the official documentation.

Deployment prerequisites

Due to the bug with retrieving role information, it is necessary to create a user and assign him an AWS API key and the policy described above. This user is not provided in the code available on GitLab because Terraform does not allow the secret access key to be stored securely. However, if you still want to create this user via Terraform, the following code is required:

/*
Workaround for issue Traefik#7096 : https://github.com/containous/traefik/issues/7096
*/
 
resource "aws_iam_user" "traefik" {
  name = "traefik"
  path = "/system/"
}
 
resource "aws_iam_access_key" "traefik" {
  user = aws_iam_user.traefik.name
}
 
data "aws_iam_policy_document" "traefik_user" {
  statement {
    sid = "main"
 
    actions = [
      "ecs:ListClusters",
      "ecs:DescribeClusters",
      "ecs:ListTasks",
      "ecs:DescribeTasks",
      "ecs:DescribeContainerInstances",
      "ecs:DescribeTaskDefinition",
      "ec2:DescribeInstances"
    ]
 
    resources = [
      "*",
    ]
  }
}
 
resource "aws_iam_user_policy" "traefik_user" {
  name   = "traefik_user"
  user   = aws_iam_user.traefik.name
  policy = data.aws_iam_policy_document.traefik_user.json
}
 
/*Store access keys in Secret manager to retrieve it with Fargate*/
resource "aws_secretsmanager_secret" "traefik_secret_access_key" {
  name        = "traefik-secret_access_key_value"
  description = "contains traefik secret access key"
}
 
resource "aws_secretsmanager_secret_version" "key" {
  secret_id     = aws_secretsmanager_secret.traefik_secret_access_key.id
  secret_string = aws_iam_access_key.traefik.secret
}
 
output "access_key" {
  value = aws_iam_access_key.traefik.id
}
 
output "secret_id" {
  value = aws_secretsmanager_secret.traefik_secret_access_key.id
}

The settings of the ECS tasks

Traefik's task force is quite simple:

[
    {
      "name": "traefik",
      "image": "traefik:v2.3.0-rc2",
      "entryPoint": ["traefik", "--providers.ecs.clusters", "${ecs_cluster_name}", "--log.level", "DEBUG", "--providers.ecs.region", "${region}", "--api.insecure"],
      "essential": true,
      "logConfiguration":{
        "logDriver": "awslogs",
        "options": {
            "awslogs-group": "${loggroup}",
            "awslogs-region": "${region}",
            "awslogs-stream-prefix": "traefik"
        }
      },
      "Environment" : [{
        "name": "AWS_ACCESS_KEY_ID",
        "value": "${aws_access_key}"
      }],
      "Secrets" :[{
        "name": "AWS_SECRET_ACCESS_KEY",
        "valuefrom": "${secret_arn}"
      }],
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 80
        },
        {
          "containerPort": 8080,
          "hostPort": 8080
        }
      ]
    }
  ]

The important information here is of course the entrypoint. The entrypoint is the command that will be executed by ECS when the container is launched.

traefik --providers.ecs.clusters ${ecs_cluster_name} --providers.ecs.region ${region} --api.insecure

So when we look at the command line parameters we see:

  • traefik : The name of the binary to launch, mandatory, since I replace the existing entrypoint in the image.
  • --providers.ecs.clusters=${ecs_cluster_name} : This is to name the name of the ECS cluster on which Traefik must search for resources, it is possible to invoke this parameter multiple times, or to tell Traefik to search in all clusters with "--providers.ecs.autoDiscoverClusters=true".
  • --providers.ecs.region=${region} : Although it is not specified in the documentation, this parameter is mandatory in order to use access key authentication.
  • --api.insecure : Purely optional parameter, it allows access to the Traefik dashboard without the need for authentication, on a productive environment, this parameter is of course not activated.

In addition, we can notice, in environment variables :

  • The access key is loaded in plain text, this information is not sensitive, so it doesn't need to be loaded from secrets manager.
  • The secret access key, on the other hand, is loaded from secrets manager, so that it is not visible, although in a "secret" section, it is still loaded as an environment variable, but invisible since the definition of the ECS task.

I chose to do the configuration via the command line, but it's quite possible to do it via the Traefik configuration files, for more information about this, I invite you to have a look at the official documentation.

The Whoami taskdefinition, the backend is intended to expose the information used by Traefik :

[
    {
      "name": "whoami",
      "image": "containous/whoami:v1.5.0",
      "essential": true,
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 80
        }
      ],
      "dockerLabels": 
        {
          "traefik.http.routers.whoami.rule": "Host(`${alb_endpoint}`)",
          "traefik.enable": "true"
        }
      
    }
  ]

Whoami is a minimalist image, created by Containous for demonstration purposes. It does not require any particular parameters.

You can see that I add two parameters, via Docker labels on my container :

  • "traefik.enable: true" indicates that I'm asking Traefik to reference this service.
  • "traefik.http.routers.whoami.rule" indicates that I want to create a Traefik "router" called whoami with a rule to forward all traffic that passes through Traefik. These rules are dynamic and can be more complex, once again, feel free to see the official documentation.

Let's deploy our infrastructure

It is now possible to deploy our Terraform code, which will set up our ECS cluster, containers, IAM management and a load balancer front end.

After a few minutes, our Traefik server is now available.

You can access the Traefik dashboard via the url of your load balancer (returned by Terraform), on port 8080.

When you go to the services, you normally see a whoami service, which corresponds to our deployment.

As you can see on the screenshot below:

  • The provider is ECS
  • We see our "rule" that we had set on the container
  • The 3 deployed containers are clearly visible in the backend.

An access to the load balancer url returns the whoami container, when refreshing the page, you should see the IP change, which means that we have load balancing on our service.

To conclude

Traefik adds a new string to its bow by enabling native discovery of services deployed in ECS.

At the moment, you can feel that the varnish is not yet dry, due to the bug that I've reported, but also due to the fact that the documentation lacks clarity on some aspects. So, I do not recommend you to use this feature in its current state on a production environment, again it is a release candidate, we prefer to wait for the stable version for this use.

Nevertheless, this french tech product proves once again that it is able to evolve and adapt to the needs of its users. No doubt about the fact that ECS will soon be a provider among others for Traefik.