Skip to content

AMI Based Deployment

Andrew Spyker edited this page Sep 9, 2013 · 70 revisions

TL;DR

  • These instructions will create
  • Two mandatory Netflix OSS Servers - Asgard (for application management) and Eureka (for application endpoint discovery)
  • One mandatory WebSphere eXtreme Scale Server - the catalog server (discovery server for the containers)
  • One cluster of WebSphere eXtreme Scale Containers to host the data
  • One cluster of Acme Air Auth Service micro-service servers
  • One cluster of Acme Air Web App front end web app servers
  • Once you have this basic setup, you can expand the setup to HA configurations and elastically scaled configurations with minimum effort

Licenses

  • Please review and agree to the licenses documented at AMI Licenses before proceeding.

General notes

  • Theoretically any instance size should work given tuning, but we used m1.mediums for all our deployments
  • For simple deployments, always deploy each image to a specific availability zone to decrease data transfer costs. For bigger HA deployments you'll want to balance your instances across availability zones
  • The below configuration avoids the need for a DNS domain and server. If you were to set this up with your own domain name, it would be easier in some areas. This approach was done to help users get started without DNS.
  • Sometimes instances will come up with an ip-10-1-1-1 private hostname and sometimes they will some up with a domU-10-1-1-1 private hostname. If you get a domU address, please kill and re-deploy the instance. As is, these hostnames won't work. This only happens 20% of the time. We need to address it, but for now we haven't.
  • All AMI's are based on Amazon's Linux and this means when ssh'ing you need to ssh as 'ec2-user' using your key file and then sudo to root.
  • We expect that other questions will come as people use the AMI's. Please see the [AMI FAQ](AMI FAQ) for a running list of clarifications.

Before doing anything be careful to select the right region and availability zones where appropriate

  • The below is written assuming "us-east-1" and "us-east-1a", but you could do this in any region and availability zone
  • Note that the AMI's are only currently available in "us-east-1" so this will only work in "us-east-1" for now.
  • When using the EC2 console, ensure that you select your region in the upper right drop down. Given I used "us-east-1", I selected "US East (N Virginia)"
  • When using the Asgard console, ensure you select your region from the top middle drop down. I used "us-east-1"
  • When creating clusters, ensure you delete all availability zones except for one that you use consistently. I used "us-east-1a" and deleted all others like "us-east-1b", "us-east-1c".
  • Beyond this tutorial, you can (and should) consider using multiple availability zones and regions for high availability, etc. However, this tutorial tries to simplify the setup and reduce cost by using a single region/availability zone.

Create a ssh key pair

  • create a ssh keypair called 'acmeair-netflix' under the EC2 dashboard by clicking "Key Pairs" and then the "Create Key Pair" button
  • ssh key pair
  • Download the key file and use this to ssh into your instances
  • For all following deployments or cluster creations use this key pair

Create a security group

  • create a security group called 'acmeair-netflix'
  • Setup the security group to pass all network traffic
  • Note this is not a production configuration and this is only done for simplicity
  • ssh key pair
  • For all following deployments or cluster creations use this security group

Deploy the Eureka AMI

  • Link to AMI
  • https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-b7fab5de
  • Contains
  • Eureka 1.1.111 snapshot build form 2013-08-27
  • customized with eureka.enableSelfPreservation=false and eureka.registration.enabled=false deploy screen 1 deploy screen 2 deploy screen 3 deploy screen 4
  • Note the Eureka internal DNS name (of the form ip-X-X-X-X.compute-1.amazonaws.com) for later use in Asgard configuration
  • For this walkthough, we'll consider the IP of this instance to be 'ip-10-1-1-1'
  • You'll need to replace this with your actual internal DNS name for this instance.
  • Note that the screen shot showed setting the name tag to 'acmeair-eureka'. You should similarly name the next two instances 'acmeair-asgard' and 'acmeair-wxscat'
  • Validate that the instance came up correctly by visiting http://ec2-1-2-3-4.compute-1.amazonaws.com/eureka/ (replace ec2-1-2-3-4.compute-1.amazonaws.com with the correct public hostname)
  • At this point you should see a webpage that shows no instances current registered against Eureka

Deploy the WXS Catalog AMI

  • Link to AMI
  • https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-87fab5ee
  • Contains
  • Acme Air application to allow later loading of data to the WXS grid
  • Note the Eureka internal DNS name (of form ip-X-X-X-X) for later use in Asgard configuration
    • For this walkthough, we'll consider the IP of this instance to be 'ip-10-2-2-2'
    • You'll need to replace this with your actual internal DNS name for this instance.
  • Validate that the instance came up correctly by logging in and looking at listHosts
sudo su
/opt/ObjectGrid/bin/xscmd.sh -c listHosts
  • At this point you should see the following
WXSI0109W: The command listHosts did not find any started container servers.

Deploy the Asgard AMI

  • Link to AMI
  • https://console.aws.amazon.com/ec2/home?region=us-east-1#launchAmi=ami-affab5c6
  • Contains
  • Asgard 1.2
  • After it boots, go to http://ec2-1-2-3-4.compute-1.amazonaws.com:8080 (replace ec2-1-2-3-4.compute-1.amazonaws.com with the correct public hostname)
  • The username is tomcat
  • The password is acmeair-netflix-2013
  • You can and should modify this password under /opt/apache-tomcat-7.0.42/conf/tomcat-users.xml
  • Fill in the required fields for access id, secret key, and account number asgard config screen
  • After this Asgard goes to every region and downloads all AWS data, so it can take minutes for the loading screen to go away. Be patient and wait until you see the console.
  • You will see a message about Asgard not knowing about Eureka. This is ok for now as we haven't enabled the integration between Asgard and Eureka at this point.
  • login to the instance as become root and stop asgard
sudo su
killall -9 java
  • Add the Acmeair plugin to your Asgard config
  • issue the following commands
cd /root
cat asgard-post-deploy/Config.groovy.append >> ~/.asgard/Config.groovy
mv asgard-post-deploy/plugins ~/.asgard/plugins
  • Add our EC2 account ('665469383253') which holds the Asgard managed AMI's to the list of providers by adding to the publicResourceAccounts line in the file /root/.asgard/Config.groovy
cloud {
        accountName='prod'
        publicResourceAccounts=['665469383253']
}
  • edit the file /root/.asgard/plugins/AcmeAirUserDataProvider.groovy
  • put in the correct EC2 internal DNS names for the catalog server and eureka server
exportAcmeAirVar('EUREKA_ADDRESS', 'EUREKA_PRIVATE_DNS_NAME_HERE') +
exportAcmeAirVar('WXSCAT_ADDRESS', 'WXSCAT_PRIVATE_DNS_NAME_HERE')
  • Be sure they are the internal non-fully qualified hostnames. Do not use something like 'ip-10-1-1-1.ec2.internal' or 'ec2-54-1-1-1.compute-1.amazonaws.com' For example (using the above eureka and wxs catalog example private dns names):
exportAcmeAirVar('EUREKA_ADDRESS', 'ip-10-1-1-1') +
exportAcmeAirVar('WXSCAT_ADDRESS', 'ip-10-2-2-2')
  • note that this step to pass server addresses through user data is not a standard Netflix approach and is only done for simplicity to avoid the need for full blown DNS configuration in a production environment
  • restart asgard and again wait patiently until the web console is available
service tomcatd start

General info on deploying clusters in Asgard

  • Note: deploy these clusters in order ensuring each is up before starting the next

Acme Air WXS Container Servers - ami-9dfab5f4

  • Load the asgard web console
  • Create an application by clicking on Apps->Applications->Create New Application asgard create application screen
  • Create a cluster by clicking on Cluster->Clusters->Create New Auto Scaling Group asgard create cluster screen
  • Validate that the instance came up correctly by logging in to the WXS Catalog server again and looking at listHosts (this time it should say 1 host after a minute or so)
sudo su
/opt/ObjectGrid/bin/xscmd.sh -c listHosts
  • This should return something like
Starting at: 2013-08-29 16:06:59.864

CWXSI0068I: Executing command: listHosts

Command listHosts is a technology preview.  The command usage and output is subject to change.

*** Show all online hosts for AcmeGrid data grid and mapSet map set.
   ip-10-1-1-1.ec2.internal

  Hosts matching         = 1
  Total known containers = 1
  Total known hosts      = 1

CWXSI0040I: The listHosts command completed successfully.

Ending at: 2013-08-29 16:07:03.951
  • Troubleshooting: If the WXS container host doesn't appear, follow these steps
  • ssh into into the catalog server and restart it
sudo su
killall -9 java
service wxscat start
  • kill off the non-working WXS container instance using Asgard's terminate instance button and wait for the auto scaling to create another instance and repeat the listHosts validation of if the host is shown

Acme Air Auth Service Servers - ami-67b0fe0e

  • Create an application, but use "Web Service" type and then create a cluster as before
  • Validate that the instance came up correctly by visiting http://ec2-1-2-3-4.compute-1.amazonaws.com/eureka/ (replace ec2-1-2-3-4.compute-1.amazonaws.com with the correct public eureka server hostname)
  • After about two minutes, you should see that the Eureka console shows one registered acmeair_auth_service service instance

Acme Air Web App Server - ami-e7b1ff8e

  • Create an application, but use "Web Application" type and then create a cluster as before
  • Validate that the instance came up correctly by visiting http://ec2-1-2-3-4.compute-1.amazonaws.com/eureka/ (replace ec2-1-2-3-4.compute-1.amazonaws.com with the correct public eureka server hostname)
  • After about two minutes, you should see that the Eureka console shows one registered acmeair_auth_service and one registered acmeair_webapp web application instance

Load data into the data grid

  • Login into the WXS Catalog Server
sudo su
cd /opt/acmeair-netflix
./gradlew :acmeair-loader:run

Use the Acme Air application

  • load http://ec2-1-2-3-4.compute-1.amazonaws.com:8080 (replace ec2-1-2-3-4.compute-1.amazonaws.com with the correct public hostname of one of your Acme Air Web App Server instances)
  • you should see the Acme Air web application
  • Force known errors to occur
  • click "login" menu item twice (and the "ok" button to login as uid0 twice)
  • Note that you have to do the login twice as Hystrix fails the first due to initial JIT compilation and server startup cost.
  • After clearing this known error, click "Flights" menu item and look for flights between "Paris" and "New York" by using "Browse Flights", and look at your account with the "Account" menu item
  • You can book a set of flights and then cancel them by using the "Checkin" menu item

What to do next

  • You could size up any of the three clusters and show that the new instances are dynamically found by other servers
  • You could size down the web application or auth service and show that the other servers handle this appropriately
  • You could test HA by terminating an auth service instance and observe how the auto scaling group recovers that instance
  • You could make changes to a running instance (switch what application code is being run) and capture a new AMI and create a new version of a cluster or a whole new cluster
  • You could make Eureka HA
  • You could do load testing against the application using the main Acme Air JMeter scripts
  • You could secure the environment with better security groups
  • You could integrate with front end ELB's for the Web Application
  • You could upgrade the Netflix OSS components and servers
  • Or many other experiments ...