JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net
JPPF, java, parallel computing, distributed computing, grid computing, parallel, distributed, cluster, grid, cloud, open source, android, .net

The open source
grid computing

 Home   About   Features   Download   Documentation   Forums 

Running JPPF on Amazon's EC2, Rackspace, or other Cloud Services

From JPPF 5.2 Documentation

Jump to: navigation, search


Main Page > Deployment > JPPF on Cloud services

1 Java Cloud Toolkit

Apache jclouds® provides an open source Java toolkit for accessing EC2, Rackspace, and several other cloud providers. The Compute interface provides several methods for creating server(s) from either the provider's or your own saved image, file transfer, executing commands on the new server's shell, deleting servers, and more. Servers can be managed individually or in groups, permitting your JPPF client to provision and configure servers on-the-fly.

When you create a cloud server with this toolkit, you programmatically have access to its NodeMetadata including IP addresses and login credentials. By creating your driver and nodes in the right sequence, you can pass IP information between them, as well as create different server types and use Job SLA's based on the IPs to vary the types of servers you want for different types of jobs.

2 Server discovery

Cloud servers do not allow multicast network communication, so JPPF nodes must know which server to use ahead of time instead of using the auto-discovery feature. So the server property file must set:

jppf.discovery.enabled = false
jppf.peer.discovery.enabled = false

And the node property file must set:

jppf.discovery.enabled = false = IP_or_DNS_hostname

Similarly the client must set:

jppf.discovery.enabled = false
jppf.drivers = driverA = IP_or_DNS_hostname
driverA.jppf.server.port = 11111

Amazon, Rackspace, and others charge for network access to a public IP, so you'll want the node to communicate with the internal 10.x.x.x address and not a public IP. More on this detail below...

3 Firewall configuration

EC2 puts all nodes into “security groups” that define allowed network access. Make sure to start JPPF servers with a special security group that allows access to the standard port 11111 and if you use the management tools remotely, 11198. You may also want to limit these to internal IPs if your clients, servers and nodes are all within EC2.

Rackspace cloud servers have no default restrictions on private IPs and ports at the same datacenter, so JPPF will work out-of-the-box on an all-cloud network. If added security is desired, you can create an Isolated Cloud Network with your own set of private IP addresses (192.168.x.x). In order to associate cloud servers with a dedicated (managed) server at Rackspace, you must request to configure RackConnect to merge your cloud and managed accounts and use all-private IPs.

4 Instance type

EC2 and Rackspace nodes vary the number of available cores and available memory, so you may want a different node property file and startup script for each instance type you start, with an appropriate number of threads. For instance, on a EC2 c1.xlarge instance with 8 cores, you might want to have one additional thread so the CPU would be busy if any one thread was waiting on I/O:

jppf.processing.threads = 9

If your tasks require more I/O, you may need to experiment to find the best completion rate. You may want to configure multiple JPPF nodes on the same server.

5 IP Addresses

All EC2 and Rackspace instances will have both a public IP address (chosen randomly or your selected elastic IP), and a private internal IP 10.x.x.x. You are charged for traffic between availability zones regardless of address, and even within the same zone if you use the external IP. So you'll want to try to have the systems connect using the 10.x.x.x addresses.

Unfortunately, this complicates things a bit. Ideally you probably want to set up a pre-configured node image (AMI) and launch instances from that image as needed for your JPPF tasks. But you may not know the internal IP of the driver at the time. And you don't want to spend time creating a new AMI each time you launch a new task with a new driver. The following approaches will probably work:

One solution is to use a static elastic IP that you will always associate with the JPPF driver and eat the cost of EC2 traffic. It isn't that much really...

Or you can use DNS to publish your 10.x.x.x IP address for the driver before launching nodes, and configure the node AMI to use a fixed DNS hostname.

Or you can do a little programming with the EC2 or Rackspace API to pass the information around. This is the recommended approach. To this effect, JPPF provides a configuration hook, which will allow a node to read its configuration from a source other than a static and local configuration file. The node configuration plugin can read a properties file from S3 instead of a file already on the node. A matching startup task on the driver instance would publish an appropriate properties file to S3.

As of JPPF 4.1, you can use the getPrivateAddresses() method of the jclouds NodeMetadata class to return the private IP of the server, and then use the runScriptOnNode() method of the ComputeService class to set an environment variable or publish the IP in a file which can be referenced using substitutions or includes.

There are lots of other approaches that will give you the same results – just have the server publish its location to some known location (including possibly the node itself) and have the node read this and dynamically create its properties instead of having a fixed file.

Main Page > Deployment > JPPF on Cloud services

JPPF Copyright © 2005-2018 Powered by MediaWiki