Read-replica Support for Apache HBase

You can now create read-replica Apache HBase clusters pointed to the same underlying HBase tables in Amazon S3 on Amazon EMR release 5.7.0. Apache HBase is a distributed, non-relational database built for random, strictly consistent realtime access for tables with billions of rows and millions of columns. By using read-replicas, you can increase availability by creating HBase clusters in different Amazon EC2 Availability Zones that read from the same dataset in Amazon S3.

Continue reading

Amazon EMR now supports launching clusters with custom Amazon Linux AMIs

You can now create Amazon EMR clusters with custom Amazon Machine Images (AMI) running Amazon Linux. This enables you to preload additional software on your AMI and use AMIs that you customize and control. You can also encrypt the Amazon EBS root volume of your AMIs with AWS Key Management Service (KMS) keys. Additionally, you can now adjust the Amazon EBS root volume size for instances in your Amazon EMR cluster. Previously, this was fixed at 10 GiB.

Continue reading

Introducing Amazon EC2 G3 Instances

You can now launch G3 instances, the latest generation of Amazon EC2 Accelerated Compute Instances. G3 instances make it easy to procure a powerful combination of GPU, CPU, and host memory for workloads such as 3D rendering, 3D visualizations, graphics-intensive remote workstations, video encoding, and virtual reality applications. Continue reading

MRI takes to the Cloud

Illinois is deeply tied to the development of MRI technology. The Nobel Prize-winning inventor of the technology, Paul Lauterbur, served on the faculty at Illinois for 22 years, for example. And now Illinois researchers are innovating again, turning to the power of the cloud to make MRI data faster and more cost-effective than ever to process.

Brain imaging. Image provided by Dr. Sutton.

Brain imaging. Image provided by Dr. Sutton.

Dr. Brad Sutton talks about how moving MRI data processing to Amazon Web Services (AWS) is helping researchers at Illinois contribute to our evolving understanding of the human brain.

Technology Services (Tech Services): What does a typical Beckman Institute’s Biomedical Imaging Center (BIC) project look like?

Brad Sutton (BPS): Our most common imaging projects are neuroimaging, where research groups are trying to identify brain-based biomarkers of physiological differences in the brain, either between two groups of subjects (such as younger and older adults) or before and after an intervention. Interventions could include a wide variety of things, including aerobic exercise, cognitive training, or nutritional supplements.

We collect 1-2 hours of neuroimaging data on our MRI scanners and then a variety of post-acquisition image processing and statistic steps need to be done to determine what is different in the brain measures between the two groups. Our MRI acquisitions provide information about brain structure, anatomy, blood flow, connectivity between different regions of the brain, and brain function during particular tasks or at rest. We also have methods that measure the mechanical properties of the brain or other aspect of brain physiology.

Image processing steps can be very computationally intensive. One typical measure that we may get (which is the one I did on AWS) is a structural connectivity map, looking at the white matter (cabling in the brain) to see the likelihood that different regions in the brain are wired together. A typical workflow to get structural connectivity would require about 16 hours processing to segment an individual’s brain into distinct, labeled regions; about 16 hours to process diffusion weighted data to determine which direction the cabling is going at every region in the brain, then about 12 hours to determine which regions are connected to which other regions. When a typical study includes 50-100 subjects, this can exceed the computational capabilities in a particular lab. And this is for only one type of measure that we will extract from a neuroimaging session. Other types of measures, such as functional connectivity, may take similar amounts of time.

Tech Services: Why did you want to use AWS? In particular, what features of AWS make your work/BIC’s work easier?

BPS: The project that I used with AWS had ~230 subjects that were part of an intervention. For each subject, we had measures taken pre-intervention and post-intervention, meaning 460 data sets from which to extract the structural connectivity measures described above. We had run some test analyses on our own private cloud at BIC using the Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC, www.nitrc.org) computational images, which have all the common neuroimaging software installed. Our private cloud runs Eucalyptus and is similar to AWS, but only has the ability to launch 30 dual core workstations. The particular data in this project required analysis to be done quickly in order to meet grant-sponsor deadlines and to get our hypotheses addressed before the data set has to be shared with other neuroimaging researchers. Given that we had processing pipelines tested on the NITRC images and that these NITRC images are available for use on AWS, we decided to use the scaling capabilities of AWS to run all 460 analyses runs in parallel. Potentially getting our entire processing pipeline done on all subjects in about 2 days.

Tech Services: What were you using before AWS?

BPS: The Biomedical Imaging Center has a Neuroimaging Compute Cloud, called BICNICC, that runs Eucalyptus and enables users of BIC to access properly configured machines to process their data. Some disk images on BICNICC include custom image reconstruction software that enables users to collect advanced acquisitions for which the MRI scanner cannot produce images on its own. The BICNICC enables us to provide a flexible environment to distribute workflows that are tested and with limited scaling. The NITRC image is our most popularly used software on the BICNICC as it includes as one of the main neuroimaging processing software packages. Many psychology and neuroscience users only have Windows workstations in their own lab, but this neuroimaging software must be run on a Linux workstation. The BICNICC resource, coupled to the NITRIC image, gives users access to a Linux workstation through their web browser.

Tech Services: Can you describe the MRI processing workflow before and after AWS? What has changed?

BPS: AWS has enabled us to scale our processing capabilities. It is an ideal setup: users can test something locally without additional costs, on similar software to what they will have access to on AWS. They can run small pilot studies for minimal costs. When they are ready to scale and run their analysis, the transition requires a minimal amount of changes to their processing scripts, only accessing data through S3 on AWS instead of through a network drive mount. An additional aspect of running this large INSIGHT data on AWS is that it has provided useful information on how much this scaling costs. We were surprised by the low costs associated with storage of a very large dataset. We were also surprised to be able to get this large computing power at basically a cost of $5 per subject for analysis. When compared to the costs of acquiring the data, this is a very small price to get all the results almost immediately.

Tech Services: What does increased processing time allow you to do?

BPS: There are quite a few benefits to this scaling. First, we can try several different parameters related to analysis. We can also see how sensitive our results were to the specific parameters that we are using. Often, the computational requirements can just be met for one analysis run. Now, we can explore more about how fine-scale parcellation of the brain into more regions impacts the specificity of the structural connections in the brain.

Second, we can do large intermediate runs in order to be ready for a grant-sponsor site visit or preliminary results for a conference presentation. Previously, we would either make an analysis decision based on software available at the start of a multi-year project and try to keep up with the data acquisition. Or we would wait until the end and then spend a significant focused time on applying the analysis to all the data. The AWS workflow enables us to update pipelines as new software becomes available.

Tech Services: Can you name a few specific projects that have really benefitted from the switch to AWS?

BPS: Since we just did this run in the last couple of months, other users are just now starting to include the modest costs of AWS in their grant submissions. We will be able to roll-in wide-scale use of AWS resources as the new projects are started.

If you are a University of Illinois researcher interested in using AWS in your work, please visit https://aws.illinois.edu/ or email aws-support@illinois.edu for more information.