If you haven’t heard of Grafana, you’re missing out. The folks on their team have built an incredible data visualization and monitoring tool designed to beautifully display platform metrics. If you’re interested in learning more about Grafana’s capabilities, check out their website.
I encountered Grafana while working on a monitoring project and wanted to explore the platform’s capabilities in an isolated environment. Queue AWS Fargate. Borrowing the marketing pitch from AWS:
AWS Fargate allows you to run Docker and AWS-hosted containers without having to manage servers or clusters.
To make things better, Grafana has a public docker image. With all these tools conveniently aligning, I set a goal of deploying a Grafana instance as an AWS Fargate Service via a CloudFormation template.
Note: Deploying resources with CloudFormation will incur normal AWS usage charges. Be mindful of the created resources and remove any unneeded builds
What is AWS CloudFormation
Starting from the top, AWS CloudFormation allows you to create infrastructure as code. The entire architecture of an application, like the one we’re building today, can be stored in version control and deployed to any AWS account in an automated and secure manner.
CloudFormation template files are written in .yaml or .json and contain all the requirements for a deployment. Below is a .yaml example which deploys almost all the requirements to run Grafana out-of-the-box on AWS Fargate.
Curious what all these different settings mean? Keep reading.
Documentation is always appreciated by the next individual to see your code. Make sure to document appropriately!
Parameters allow for structured customization of a CloudFormation deployment. In this case, the AWS VPC and Subnets are selected upon initiation of the deployment.
Creates a CloudWatch Log Group to be used with the Grafana deployment
Clusters house logical groupings of tasks and services in Fargate. To run a service, it must live in a cluster.
To run containers in Fargate, they must be defined through Task Definition. Task Definitions adjust which images should be used, CPU/Memory limitations, Role definitions, and more.
Note: This CloudFormation template assumes ECS has been used in the account and a role/ecsTaskExecutionRole already exists
Services are responsible for running and maintaining tasks. They also handle VPC/Subnet configuration, auto-scaling, and load-balancing configurations.
Specifically highlighting the NetworkConfiguration -> AwsvpcConfiguration section, I left out a SecurityGroups configuration option. This will deploy our Service using the VPC’s default Security Group. Feel free to change this as needed.
Deploying to AWS
Now that we have a template file, we’ll deploy it using the CloudFormation interface as it provides a visual representation of the steps required. For those who prefer the AWS CLI, CloudFormation commands are available.
To begin deploying a template, navigate to the CloudFormation portal in AWS and select Create Stack.
Selecting a Template
The first step to deployment is selecting a CloudFormation template. The example file can be found on my GitHub. Although you can select a local file, I recommend first uploading the file to an S3 bucket.
The local file option also creates a copy in S3, so adding it yourself provides the freedom to define a preferred folder structure and naming convention. After providing a template file, click Next.
Once a template has been selected, details must be configured. In less complex configurations, this is just the stack name. Since our template defined parameters, they are also provided on this page. Select the VPC/Subnets to which this stack will deploy and click Next.
We won’t set any additional options this deployment, so feel free to click Next. However, options allow tags to be associated with a deployment, specific permissions to be utilized, and the enablement of automated rollbacks and alarms.
There are no surprises in this last step. Review the details and initiate your deployment by selecting Create.
Monitoring the Deployment Process
As the template deploys, the Status column in the CloudFormation home page allows insight to the build’s progress.
Additionally, more granular insight is available by selecting a build and navigating to the Events tab. Once a deployment completes, the status will switch to Create_Complete.
If you would like to remove any stacks, simply select the stack name and click Actions -> Delete Stack.
One more step
I mentioned this template almost deploys everything needed to run Grafana. The exception is modifying the Security Group applied to our Service Definition to allow inbound access to Grafana’s default port: 3000.
Although Security Group ingress rules can be included in CloudFormation templates, I wanted to avoid automatically modifying inbound traffic rules in this demo for security’s sake.
Unless you modified the CloudFormation template, the VPC’s default Security Group should have been used. To modify the inbound rules, select EC2 -> Network & Security -> Security Groups. Select your Security Group, choose Inbound from the available tabs, and edit the existing configuration. I chose to allow all inbound requests to port 3000, but feel free to restrict access as needed.
Accessing Grafana’s Public IP
Now that inbound access is enabled, let’s find the public address of our created task. Jump to the ECS portal and select the newly created Cluster. Inside, we can see that our grafana-dashboard-service is active.
Changing to the Tasks tab, we have one running task.
Clicking on the task ID, an overview of the task details is provided. You’re looking for the public IP. Once found, open a browser window with the public IP on port 3000. For example, if my public IP is 255.255.255.255, I will visit 255.255.255.255:3000.
You should now see the Grafana login page. By default, Grafana credentials are username: admin | password: admin. Be sure to change these after you log in. To get started inside Grafana, follow this guide.
Taking it a Step Further
To test-drive functionality, this works great, but it brings to light another set of problems.
First, what if the existing Grafana task unexpectedly fails? Although the service will boot up another task, the new task’s public IP has changed and our previous method of accessing Grafana is now invalid.
Additionally, if not customized, Grafana persists data (dashboards, users, datasources, etc) through a sqlite3 database within the Grafana container. If this container is no longer running, the sqlite3 database with our previously stored data is unavailable.
Although there are likely many ways to address these issues, here’s how I handled them.
Application Load Balancer and Target Groups
Since our Service can create Grafana tasks dynamically, a static upstream entry point allows the underlying containers to be accessed consistently. Services can be configured with Target Groups, providing Load Balancers insight to any Task created by the Service.
Persistent Data Store
Grafana provides the ability to override default server configurations by adjusting environment variables. One of these options is the database used to store persistent dashboard details. Decoupling the database from the Grafana container allows each built task to reference a persistent data source.
If you read the previous section but thought, there’s no way I’m putting my database credentials into plain-text environment variables on a CloudFormation template, then we had the same thoughts.
I modified the standard Grafana Dockerfile to utilize a different run.sh script, which retrieves an S3 file and obtains stored secrets based on IAM permissions through the AWS CLI. If you’re curious, here’s a post by AWS regarding the subject.
Another option for solving the above problems is to forego AWS Fargate completely and run Grafana on a more traditional server stack. This is a viable solution, but defeats the learning experience of deploying Grafana on Fargate.
Grafana is an incredibly powerful tool and the team deserves a round of applause. Hopefully this guide proved to be useful in getting started with Grafana, Fargate, or learning more about AWS in general.
Enjoy posts like these? Follow me on Twitter @andepaulj