Cloud Computing with R

Author: Rudradev Sengupta

Cloud computing has become a buzzword in the context of high performance computing. Over the past few years, it has experienced unparalleled growth and has attracted the attention of many researchers across different fields. Cloud computing is literally “computing done in cloud or internet” – it is a type of Internet-based computing which enables the user to use shared resources and services, over the internet, that are not physically available to him or her. The use of cloud-based computing has increased significantly because the ease of its accessibility from anywhere around the world, as long as the user has an active internet connection. It reduces the requirement for physical ownership and maintenance of local, on-site hardware and software and, therefore, the cost. In summary, cloud computing can be described as scalable, customisable computing service, available for rent, which is maintained by a third-party provider.

thumbnail image: Cloud Computing with R

There are four different cloud models as per different business needs:

• Private Cloud: resources for the dedicated use of a single organization.
• Community Cloud: resources for collaborative use by several organizations belonging to a community with shared interests.
• Public Cloud: resources owned by a third-party provider which are made available to any individual or organisation who pays the provider for the use of those resources and services.
• Hybrid Cloud: a cloud computing environment which contains an orchestrated combination of resources from private, public and community cloud environments.
Similarly, based on the resources offered, there are three different types of services available:
• Software as a Service (SaaS): users are provided access to applications and software which are maintained by the provider.
• Infrastructure as a Service (IaaS): users are provided access to operating systems and the applications which are maintained by the provider.
• Platform as a Service (PaaS): users can develop and run their software solutions on cloud platforms which are maintained by the provider.

Powerful hardware resources are expensive. Moreover, institutional computing resources can be difficult to procure, maintain and use. Even when user has access to such resources, there is a dependency on other users who are using those resources. For example, one might have to wait for few hours or days to get access to the required number of cores for a particular job to run, whereas, in cloud, one can have almost instant access to his or her own cluster. That is why cloud services provided by Amazon and Microsoft have become very convenient in the field of cloud computing. Amazon and Microsoft both provide an easy and cheap way to use computational resources in the cloud. They also have a free tier (with limited resources) available, that any user can select to test their framework to see if is suitable for their purpose. The details about the free trial for Azure cloud are available at Details for the Amazon cloud are available at . If the user decides to use their services, then it is very easy to scale up and get different types of instances or machines in the cloud to suit their requirements. For example, one user might need memory-optimized instances while another might need compute-optimized instances. The user can easily customize the resources depending on his or her requirements for a specific project.

These two platforms have become very popular among Statisticians because of their integration with R.

Figure 1. RStudio login page in the Amazon Cloud

Amazon offers AMIs (Amazon Machine Images) with RStudio pre-installed, which are maintained by Louis Aslett. When users launch an instance of one of these AMIs, they have immediate access to RStudio in the cloud (Figure 1) which can be configured and ready for their use in just a few minutes. The user can then either shut down the instance and release it, or keep it for future use, depending on the budget and other requirements. If you are frequently launching instances with a specific configuration, it is very easy and straightforward to create your own AMI with specific software or R packages pre-installed so that you do not need to reconfigure it every time you launch an instance.

More information about different types of instances can be found at .

The pricing details are available at

In 2017, Microsoft introduced the “doAzureParallel” R package, which enables the user to use Azure’s computing resources from within their local R session (Figure 2). This is a parallel backend which gives parallel processing access the user in an Azure cluster. The details about the package are well documented at Demo code is available to help users to quickly gain a better understanding of the package. Azure cloud offers a very similar set of services to those which are available in Amazon Cloud. The pricing details for Azure are available at

Figure 2: The architecture of the “doAzureParallel” R package


Copyright: Image – kalawin jongpo/Getty Images