⏩ Quickstart

Welcome to the guide for getting started with COMPSs!

This section will walk you through the simple process of installing the COMPSs framework and creating your very first application.

Following these steps, you will be able to quickly set up your environment, understand the basic structure of a COMPSs application, and see a working application in action.

Let’s dive in and get started!

Install COMPSs

Warning

For macOS distributions, only installations local to the user are supported (both with pip and building from sources). This is due to the System Integrity Protection (SIP) implemented in the newest versions of macOS, that does not allow modifications in the /System directory, even when having root permissions in the machine.

Warning

There is no support for Windows, but COMPSs can be installed and used in Windows by using WSL.

Choose the installation method:

Pip

Local to the user

Requirements:

Ensure that the required system Dependencies are installed.
Check that your JAVA_HOME environment variable points to the Java JDK folder, that the GRADLE_HOME environment variable points to the GRADLE folder, and the gradle binary is in the PATH environment variable.
Enable SSH passwordless to localhost. See Configure SSH passwordless.

COMPSs will be installed within the $HOME/.local/ folder (or alternatively within the active virtual environment).

$ pip install pycompss -v
Important

Please, update the environment after installing COMPSs:
$ source ~/.bashrc  # or alternatively reboot the machine
If installed within a virtual environment, deactivate and activate it to ensure that the environment is properly updated.

Warning

If using Ubuntu 18.04 or higher, you will need to comment some lines of your .bashrc and do a complete logout. Please, check the Post installation Section for detailed instructions.

See ⚙️ Installation and Configuration section for more information

Systemwide

Requirements:

Ensure that the required system Dependencies are installed.
Check that your JAVA_HOME environment variable points to the Java JDK folder, that the GRADLE_HOME environment variable points to the GRADLE folder, and the gradle binary is in the PATH environment variable.
Enable SSH passwordless to localhost. See Configure SSH passwordless.

COMPSs will be installed within the /usr/lib64/pythonX.Y/site-packages/pycompss/ folder.

$ sudo -E pip install pycompss -v
Important

Please, update the environment after installing COMPSs:
$ source /etc/profile.d/compss.sh  # or alternatively reboot the machine
Warning

If using Ubuntu 18.04 or higher, you will need to comment some lines of your .bashrc and do a complete logout. Please, check the Post installation Section for detailed instructions.

See ⚙️ Installation and Configuration section for more information

Build from sources

Local to the user

Requirements:

Ensure that the required system Dependencies are installed.
Check that your JAVA_HOME environment variable points to the Java JDK folder, that the GRADLE_HOME environment variable points to the GRADLE folder, and the gradle binary is in the PATH environment variable.
Enable SSH passwordless to localhost. See Configure SSH passwordless.

COMPSs will be installed within the $HOME/COMPSs/ folder.

$ git clone https://github.com/bsc-wdc/compss.git
$ cd compss
$ ./submodules_get.sh
$ cd builders/
$ export INSTALL_DIR=$HOME/COMPSs/
$ ./buildlocal ${INSTALL_DIR}

The different installation options can be found in the command help.

$ ./buildlocal -h

Please, check the Post installation Section.

See ⚙️ Installation and Configuration section for more information

Systemwide

Requirements:

Ensure that the required system Dependencies are installed.
Check that your JAVA_HOME environment variable points to the Java JDK folder, that the GRADLE_HOME environment variable points to the GRADLE folder, and the gradle binary is in the PATH environment variable.
Enable SSH passwordless to localhost. See Configure SSH passwordless.

COMPSs will be installed within the /opt/COMPSs/ folder.

$ git clone https://github.com/bsc-wdc/compss.git
$ cd compss
$ ./submodules_get.sh
$ cd builders/
$ export INSTALL_DIR=/opt/COMPSs/
$ sudo -E ./buildlocal ${INSTALL_DIR}

The different installation options can be found in the command help.

$ ./buildlocal -h

Please, check the Post installation Section.

See ⚙️ Installation and Configuration section for more information

Supercomputer

Please, check the Supercomputer section.

Docker

COMPSs can be used within Docker using the PyCOMPSs CLI.

Requirements (Optional):

Since the PyCOMPSs CLI package is available in PyPI (pycompss-cli), it can be easily installed with pip as follows:

$ python3 -m pip install pycompss-cli

A complete guide about the PyCOMPSs CLI installation and usage can be found in the 🪄 CLI Section.

Tip

Please, check the PyCOMPSs CLI Installation Section for the further information with regard to the requirements installation and troubleshooting.

Write your first app

Choose your flavour:

Java

Application Overview

A COMPSs application is composed of three parts:

Main application code: the code that is executed sequentially and contains the calls to the user-selected methods that will be executed by the COMPSs runtime as asynchronous parallel tasks.
Remote methods code: the implementation of the tasks.
Task definition interface: It is a Java annotated interface which declares the methods to be run as remote tasks along with metadata information needed by the runtime to properly schedule the tasks.

The main application file name has to be the same of the main class and starts with capital letter, in this case it is Simple.java. The Java annotated interface filename is application name + Itf.java, in this case it is SimpleItf.java. And the code that implements the remote tasks is defined in the application name + Impl.java file, in this case it is SimpleImpl.java.

All code examples are in the /home/compss/tutorial_apps/java/ folder of the development environment.

Main application code

In COMPSs, the user’s application code is kept unchanged, no API calls need to be included in the main application code in order to run the selected tasks on the nodes.

The COMPSs runtime is in charge of replacing the invocations to the user-selected methods with the creation of remote tasks also taking care of the access to files where required. Let’s consider the Simple application example that takes an integer as input parameter and increases it by one unit.

The main application code of Simple application is shown in the following code block. It is executed sequentially until the call to the increment() method. COMPSs, as mentioned above, replaces the call to this method with the generation of a remote task that will be executed on an available node.

Code 1 Simple in Java (Simple.java)

package simple;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import simple.SimpleImpl;

public class Simple {

  public static void main(String[] args) {
    String counterName = "counter";
    int initialValue = args[0];

    //--------------------------------------------------------------//
    // Creation of the file which will contain the counter variable //
    //--------------------------------------------------------------//
    try {
       FileOutputStream fos = new FileOutputStream(counterName);
       fos.write(initialValue);
       System.out.println("Initial counter value is " + initialValue);
       fos.close();
    }catch(IOException ioe) {
       ioe.printStackTrace();
    }

    //----------------------------------------------//
    //           Execution of the program           //
    //----------------------------------------------//
    SimpleImpl.increment(counterName);

    //----------------------------------------------//
    //    Reading from an object stored in a File   //
    //----------------------------------------------//
    try {
       FileInputStream fis = new FileInputStream(counterName);
       System.out.println("Final counter value is " + fis.read());
       fis.close();
    }catch(IOException ioe) {
       ioe.printStackTrace();
    }
  }
}

Remote methods code

The following code contains the implementation of the remote method of the Simple application that will be executed remotely by COMPSs.

Code 2 Simple Implementation (SimpleImpl.java)

package simple;

import  java.io.FileInputStream;
import  java.io.FileOutputStream;
import  java.io.IOException;
import  java.io.FileNotFoundException;

public class SimpleImpl {
  public static void increment(String counterFile) {
    try{
      FileInputStream fis = new FileInputStream(counterFile);
      int count = fis.read();
      fis.close();
      FileOutputStream fos = new FileOutputStream(counterFile);
      fos.write(++count);
      fos.close();
    }catch(FileNotFoundException fnfe){
      fnfe.printStackTrace();
    }catch(IOException ioe){
      ioe.printStackTrace();
    }
  }
}

Task definition interface

This Java interface is used to declare the methods to be executed remotely along with Java annotations that specify the necessary metadata about the tasks. The metadata can be of three different types:

For each parameter of a method, the data type (currently File type, primitive types and the String type are supported) and its directions (IN, OUT, INOUT, COMMUTATIVE or CONCURRENT).
The Java class that contains the code of the method.
The constraints that a given resource must fulfill to execute the method, such as the number of processors or main memory size.

The task description interface of the Simple app example is shown in the following figure. It includes the description of the Increment() method metadata. The method interface contains a single input parameter, a string containing a path to the file counterFile. In this example there are constraints on the minimum number of processors and minimum memory size needed to run the method.

Code 3 Interface of the Simple application (SimpleItf.java)

package simple;

import  es.bsc.compss.types.annotations.Constraints;
import  es.bsc.compss.types.annotations.task.Method;
import  es.bsc.compss.types.annotations.Parameter;
import  es.bsc.compss.types.annotations.parameter.Direction;
import  es.bsc.compss.types.annotations.parameter.Type;

public interface SimpleItf {

  @Constraints(computingUnits = "1", memorySize = "0.3")
  @Method(declaringClass = "simple.SimpleImpl")
  void increment(
      @Parameter(type = Type.FILE, direction = Direction.INOUT)
      String file
  );

}

Application compilation

A COMPSs Java application needs to be packaged in a jar file containing the class files of the main code, of the methods implementations and of the Itf annotation. This jar package can be generated using the commands available in the Java SDK or creating your application as a Apache Maven project.

To integrate COMPSs in the maven compile process you just need to add the compss-api artifact as dependency in the application project.

<dependencies>
    <dependency>
        <groupId>es.bsc.compss</groupId>
        <artifactId>compss-api</artifactId>
        <version>${compss.version}</version>
    </dependency>
</dependencies>

To build the jar in the maven case use the following command

$ mvn package

Next we provide a set of commands to compile the Java Simple application (detailed at |:coffee:| Java Sample applications:).

$ cd tutorial_apps/java/simple/src/main/java/simple/
$~/tutorial_apps/java/simple/src/main/java/simple$ javac *.java
$~/tutorial_apps/java/simple/src/main/java/simple$ cd ..
$~/tutorial_apps/java/simple/src/main/java$ jar cf simple.jar simple/
$~/tutorial_apps/java/simple/src/main/java$ mv ./simple.jar ../../../jar/

In order to properly compile the code, the CLASSPATH variable has to contain the path of the compss-engine.jar package. The default COMPSs installation automatically add this package to the CLASSPATH; please check that your environment variable CLASSPATH contains the compss-engine.jar location by running the following command:

$ echo $CLASSPATH | grep compss-engine

If the result of the previous command is empty it means that you are missing the compss-engine.jar package in your CLASSPATH. We recommend to automatically load the variable by editing the .bashrc file:

$ echo "# COMPSs variables for Java compilation" >> ~/.bashrc
$ echo "export CLASSPATH=$CLASSPATH:/opt/COMPSs/Runtime/compss-engine.jar" >> ~/.bashrc

Application execution

A Java COMPSs application is executed through the runcompss script. An example of an invocation of the script is:

$ runcompss --classpath=/home/compss/tutorial_apps/java/simple/jar/simple.jar simple.Simple 1

A comprehensive description of the runcompss command is available in the Executing COMPSs applications section.

In addition to Java, COMPSs supports the execution of applications written in other languages by means of bindings. A binding manages the interaction of the no-Java application with the COMPSs Java runtime, providing the necessary language translation.

Python

Let’s write your first Python application parallelized with PyCOMPSs.
Consider the following code:

Code 4 increment.py

import time
from pycompss.api.api import compss_wait_on
from pycompss.api.task import task

@task(returns=1)
def increment(value):
  time.sleep(value * 2)  # mimic some computational time
  return value + 1

def main():
    values = [1, 2, 3, 4]
    start_time = time.time()
    for pos in range(len(values)):
        values[pos] = increment(values[pos])
    values = compss_wait_on(values)
    assert values == [2, 3, 4, 5]
    print(values)
    print("Elapsed time: " + str(time.time() - start_time))

if __name__=='__main__':
    main()

This code increments the elements of an array (values) by calling iteratively to the increment function.
The increment function sleeps the number of seconds indicated by the value parameter to represent some computational time.
On a normal python execution, each element of the array will be incremented after the other (sequentially), accumulating the computational time.
PyCOMPSs is able to parallelize this loop thanks to its @task decorator, and synchronize the results with the compss_wait_on API call.

Note

If you are using the PyCOMPSs CLI (pycompss-cli), it is time to deploy the COMPSs environment within your current folder:

$ pycompss init

Please, be aware that the first time needs to download the docker image from the repository, and it may take a while.

Copy and paste the increment code it into increment.py.

Execution

Now let’s execute increment.py. To this end, we will use the runcompss script provided by COMPSs:

$ runcompss -g increment.py
  [Output in next step]

Or alternatively, the pycompss run command if using the PyCOMPSs CLI (which wraps the runcompss command and launches it within the COMPSs’ docker container):

$ pycompss run -g increment.py
  [Output in next step]

Note

The -g flag enables the task dependency graph generation (used later).

The runcompss command has a lot of supported options that can be checked with the -h flag. They can also be used within the pycompss run command.

Tip

It is possible to run also with the python command using the pycompss module, which accepts the same flags as runcompss:

$ python -m pycompss -g increment.py  # Parallel execution
  [Output in next step]

Having PyCOMPSs installed also enables to run the same code sequentially without the need of removing the PyCOMPSs syntax.

$ python increment.py  # Sequential execution
  [2, 3, 4, 5]
  Elapsed time: 20.0161030293

Output

$ runcompss -g increment.py
  [  INFO] Inferred PYTHON language
  [  INFO] Using default location for project file: /opt/COMPSs/Runtime/configuration/xml/projects/default_project.xml
  [  INFO] Using default location for resources file: /opt/COMPSs/Runtime/configuration/xml/resources/default_resources.xml
  [  INFO] Using default execution type: compss

  ----------------- Executing increment.py --------------------------

  WARNING: COMPSs Properties file is null. Setting default values
  [(433)    API]  -  Starting COMPSs Runtime v3.4
  [2, 3, 4, 5]
  Elapsed time: 11.5068922043
  [(4389)    API]  -  Execution Finished

  ------------------------------------------------------------

Nice! it run successfully in my 8 core laptop, we have the expected output, and PyCOMPSs has been able to run the increment.py application in almost half of the time required by the sequential execution. What happened under the hood?

COMPSs started a master and one worker (by default configured to execute up to four tasks at the same time) and executed the application (offloading the tasks execution to the worker).

Let’s check the task dependency graph to see the parallelism that COMPSs has extracted and taken advantage of.

Task dependency graph

COMPSs stores the generated task dependency graph within the $HOME/.COMPSs/<APP_NAME>_<00-99>/monitor directory in dot format.
The generated graph is complete_graph.dot file, which can be displayed with any dot viewer.

Tip

COMPSs provides the compss_gengraph script which converts the given dot file into pdf.

$ cd $HOME/.COMPSs/increment.py_01/monitor
$ compss_gengraph complete_graph.dot
$ evince complete_graph.pdf  # or use any other pdf viewer you like

It is also available within the PyCOMPSs CLI:

$ cd $HOME/.COMPSs/increment.py_01/monitor
$ pycompss gengraph complete_graph.dot
$ evince complete_graph.pdf  # or use any other pdf viewer you like

And you should see:

Figure 1 The dependency graph of the increment application

COMPSs has detected that the increment of each element is independent, and consequently, that all of them can be done in parallel. In this particular application, there are four increment tasks, and since the worker is able to run four tasks at the same time, all of them can be executed in parallel saving precious time.

Check the performance

Let’s run it again with the tracing flag enabled:

$ runcompss -t increment.py
  [  INFO] Inferred PYTHON language
  [  INFO] Using default location for project file: /opt/COMPSs//Runtime/configuration/xml/projects/default_project.xml
  [  INFO] Using default location for resources file: /opt/COMPSs//Runtime/configuration/xml/resources/default_resources.xml
  [  INFO] Using default execution type: compss

  ----------------- Executing increment.py --------------------------

  Welcome to Extrae 3.5.3

  [... Extrae prolog ...]

  WARNING: COMPSs Properties file is null. Setting default values
  [(434)    API]  -  Starting COMPSs Runtime v3.4
  [2, 3, 4, 5]
  Elapsed time: 13.1016821861

  [... Extrae eplilog ...]

  mpi2prv: Congratulations! ./trace/increment.py_compss_trace_1587562240.prv has been generated.
  [(24117)    API]  -  Execution Finished

  ------------------------------------------------------------

The execution has finished successfully and the trace has been generated in the $HOME/.COMPSs/<APP_NAME>_<00-99>/trace directory in prv format, which can be displayed and analyzed with PARAVER.

$ cd $HOME/.COMPSs/increment.py_02/trace
$ wxparaver increment.py_compss_trace_*.prv

Note

In the case of using the PyCOMPSs CLI, the trace will be generated in the .COMPSs/<APP_NAME>_<00-99>/trace directory:

$ cd .COMPSs/increment.py_02/trace
$ wxparaver increment.py_compss_trace_*.prv

Once Paraver has started, lets visualize the tasks:

Click in File and then in Load Configuration
Look for /PATH/TO/COMPSs/Dependencies/paraver/cfgs/compss_tasks.cfg and click Open.

Note

In the case of using the PyCOMPSs CLI, the configuration files can be obtained by downloading them from the COMPSs repositoy.

And you should see:

Figure 2 Trace of the increment application

The X axis represents the time, and the Y axis the deployed processes (the first three (1.1.1-1.1.3) belong to the master and the fourth belongs to the master process in the worker (1.2.1) whose events are shown with the compss_runtime.cfg configuration file).

The increment tasks are depicted in blue. We can quickly see that the four increment tasks have been executed in parallel (one per core), and that their lengths are different (depending on the computing time of the task represented by the time.sleep(value * 2) line).

Paraver is a very powerful tool for performance analysis. For more information, check the 🎯 Tracing Section.

Note

If you are using the PyCOMPSs CLI, it is time to stop the COMPSs environment:

$ pycompss stop

C/C++

Application Overview

As in Java, the application code is divided in 3 parts: the Task definition interface, the main code and task implementations. These files must have the following notation,: <app_name>.idl, for the interface file, <app_name>.cc for the main code and <app_name>-functions.cc for task implementations. Next paragraphs provide an example of how to define this files for matrix multiplication parallelized by blocks.

Task Definition Interface

As in Java the user has to provide a task selection by means of an interface. In this case the interface file has the same name as the main application file plus the suffix “idl”, i.e. Matmul.idl, where the main file is called Matmul.cc.

Code 5 Matmul.idl

interface Matmul
{
      // C functions
      void initMatrix(inout Matrix matrix,
                      in int mSize,
                      in int nSize,
                      in double val);

      void multiplyBlocks(inout Block block1,
                          inout Block block2,
                          inout Block block3);
};

The syntax of the interface file is shown in the previous code. Tasks can be declared as classic C function prototypes, this allow to keep the compatibility with standard C applications. In the example, initMatrix and multiplyBlocks are functions declared using its prototype, like in a C header file, but this code is C++ as they have objects as parameters (objects of type Matrix, or Block).

The grammar for the interface file is:

["static"] return-type task-name ( parameter {, parameter }* );

return-type = "void" | type

ask-name = <qualified name of the function or method>

parameter = direction type parameter-name

direction = "in" | "out" | "inout"

type = "char" | "int" | "short" | "long" | "float" | "double" | "boolean" |
       "char[<size>]" | "int[<size>]" | "short[<size>]" | "long[<size>]" |
       "float[<size>]" | "double[<size>]" | "string" | "File" | class-name

class-name = <qualified name of the class>

Main Program

The following code shows an example of matrix multiplication written in C++.

Code 6 Matrix multiplication

#include "Matmul.h"
#include "Matrix.h"
#include "Block.h"
int N; //MSIZE
int M; //BSIZE
double val;
int main(int argc, char **argv)
{
      Matrix A;
      Matrix B;
      Matrix C;

      N = atoi(argv[1]);
      M = atoi(argv[2]);
      val = atof(argv[3]);

      compss_on();

      A = Matrix::init(N,M,val);

      initMatrix(&B,N,M,val);
      initMatrix(&C,N,M,0.0);

      cout << "Waiting for initialization...\n";

      compss_wait_on(B);
      compss_wait_on(C);

      cout << "Initialization ends...\n";

      C.multiply(A, B);

      compss_off();
      return 0;
}

The developer has to take into account the following rules:

A header file with the same name as the main file must be included, in this case Matmul.h. This header file is automatically generated by the binding and it contains other includes and type-definitions that are required.
A call to the compss_on binding function is required to turn on the COMPSs runtime.
As in C language, out or inout parameters should be passed by reference by means of the “&” operator before the parameter name.
Synchronization on a parameter can be done calling the compss_wait_on binding function. The argument of this function must be the variable or object we want to synchronize.
There is an implicit synchronization in the init method of Matrix. It is not possible to know the address of “A” before exiting the method call and due to this it is necessary to synchronize before for the copy of the returned value into “A” for it to be correct.
A call to the compss_off binding function is required to turn off the COMPSs runtime.

Functions file

The implementation of the tasks in a C or C++ program has to be provided in a functions file. Its name must be the same as the main file followed by the suffix “-functions”. In our case Matmul-functions.cc.

#include "Matmul.h"
#include "Matrix.h"
#include "Block.h"

void initMatrix(Matrix *matrix,int mSize,int nSize,double val){
     *matrix = Matrix::init(mSize, nSize, val);
}

void multiplyBlocks(Block *block1,Block *block2,Block *block3){
     block1->multiply(*block2, *block3);
}

In the previous code, class methods have been encapsulated inside a function. This is useful when the class method returns an object or a value and we want to avoid the explicit synchronization when returning from the method.

Additional source files

Other source files needed by the user application must be placed under the directory “src”. In this directory the programmer must provide a Makefile that compiles such source files in the proper way. When the binding compiles the whole application it will enter into the src directory and execute the Makefile.

It generates two libraries, one for the master application and another for the worker application. The directive COMPSS_MASTER or COMPSS_WORKER must be used in order to compile the source files for each type of library. Both libraries will be copied into the lib directory where the binding will look for them when generating the master and worker applications.

Application Compilation

The user command “compss_build_app” compiles both master and worker for a single architecture (e.g. x86-64, armhf, etc). Thus, whether you want to run your application in Intel based machine or ARM based machine, this command is the tool you need.

When the target is the native architecture, the command to execute is very simple;

$~/matmul_objects> compss_build_app Matmul
[ INFO ] Java libraries are searched in the directory: /usr/lib/jvm/java-11.0-openjdk-amd64//jre/lib/amd64/server
[ INFO ] Boost libraries are searched in the directory: /usr/lib/

...

[Info] The target host is: x86_64-linux-gnu

Building application for master...
g++ -g -O3 -I. -I/Bindings/c/share/c_build/worker/files/ -c Block.cc Matrix.cc
ar rvs libmaster.a Block.o Matrix.o
ranlib libmaster.a

Building application for workers...
g++ -DCOMPSS_WORKER -g -O3 -I. -I/Bindings/c/share/c_build/worker/files/ -c Block.cc -o Block.o
g++ -DCOMPSS_WORKER -g -O3 -I. -I/Bindings/c/share/c_build/worker/files/ -c Matrix.cc -o Matrix.o
ar rvs libworker.a Block.o Matrix.o
ranlib libworker.a

...

Command successful.

Application Execution

The following environment variables must be defined before executing a COMPSs C/C++ application:

JAVA_HOME: Java JDK installation directory (e.g. /usr/lib/jvm/java-11-openjdk/)

After compiling the application, two directories, master and worker, are generated. The master directory contains a binary called as the main file, which is the master application, in our example is called Matmul. The worker directory contains another binary called as the main file followed by the suffix “-worker”, which is the worker application, in our example is called Matmul-worker.

The runcompss script has to be used to run the application:

$ runcompss /home/compss/tutorial_apps/c/matmul_objects/master/Matmul 3 4 2.0

The complete list of options of the runcompss command is available in Section Executing COMPSs applications.

R

Let’s write your first R application parallelized with PyCOMPSs.
Consider the following code:

Code 7 add.R

add <- function(x, y) {
   return(x + y)
}

Code 8 addition.R

library(RCOMPSs)
source("add.R")
compss_start()

add.t <- task(add, "add.R", info_only = FALSE, return_value = TRUE)

a <- 2; b <- 3;
c <- 4; d <- 5;
e <- 6; f <- 7;
g <- 8; h <- 9;

# Task (1) a + b
ab <- add.t(a, b)
# Task (2) c + d
cd <- add.t(c, d)
# Task (3) e + f
ef <- add.t(e, f)
# Task (4) g + h
gh <- add.t(g, h)

# Task (5) ab + cd
abcd <- add.t(ab, cd)
# Task (6) ef + gh
efgh <- add.t(ef, gh)

# Task (7) abcd + efgh
result <- add.t(abcd, efgh)

# Retrieve the result
result <- compss_wait_on(result)
cat("The result is:", result, "\n")

compss_stop()

This code uses the add function described in the add.R file to add:

a and b into ab
c and d into cd
e and f into ef
g and h into gh

Then adds these partial results:

ab and cd into abcd
ef and gh into efgh

And finally adds these partial results to achieve the final result:

abcd and efgh into result

On a normal R execution, each addition will be done after the other (sequentially), accumulating the computational time.
RCOMPSs is able to parallelize this code thanks to its task decorator which wraps the add function instantiating the add.t function, and synchronize the results with the compss_wait_on API call.

Note

If you are using the PyCOMPSs CLI (pycompss-cli), it is time to deploy the COMPSs environment within your current folder:

$ pycompss init

Please, be aware that the first time needs to download the docker image from the repository, and it may take a while.

Copy and paste the addition code it into addition.R and add code into add.R.

Execution

Now let’s execute addition.R. To this end, we will use the runcompss script provided by COMPSs:

$ runcompss --lang=r -g addition.R
  [Output in next step]

Or alternatively, the pycompss run command if using the PyCOMPSs CLI (which wraps the runcompss command and launches it within the COMPSs’ docker container):

$ pycompss run --lang=r -g addition.R
  [Output in next step]

Note

The --lang=r flag indicates that the application is written in R.

The -g flag enables the task dependency graph generation (used later).

The runcompss command has a lot of supported options that can be checked with the -h flag. They can also be used within the pycompss run command.

Output

$ runcompss --lang=r -g addition.R
  [  INFO] Inferred PYTHON language
  [  INFO] Using default location for project file: /opt/COMPSs/Runtime/configuration/xml/projects/default_project.xml
  [  INFO] Using default location for resources file: /opt/COMPSs/Runtime/configuration/xml/resources/default_resources.xml
  [  INFO] Using default execution type: compss

  ----------------- Executing addition.R --------------------------

  WARNING: COMPSs Properties file is null. Setting default values
  [(763)    API]  -  Starting COMPSs Runtime v3.4
  The result is: 44
  [(9528)    API]  -  Execution Finished

  ------------------------------------------------------------

Nice! it run successfully in my 8 core laptop, we have the expected output, and RCOMPSs has been able to run the addition.R application in almost half of the time required by the sequential execution. What happened under the hood?

COMPSs started a master and one worker (by default configured to execute up to four tasks at the same time) and executed the application (offloading the tasks execution to the worker).

Let’s check the task dependency graph to see the parallelism that COMPSs has extracted and taken advantage of.

Task dependency graph

COMPSs stores the generated task dependency graph within the $HOME/.COMPSs/<APP_NAME>_<00-99>/monitor directory in dot format.
The generated graph is complete_graph.dot file, which can be displayed with any dot viewer.

Tip

COMPSs provides the compss_gengraph script which converts the given dot file into pdf.

$ cd $HOME/.COMPSs/addition.R_01/monitor
$ compss_gengraph complete_graph.dot
$ evince complete_graph.pdf  # or use any other pdf viewer you like

It is also available within the PyCOMPSs CLI:

$ cd $HOME/.COMPSs/addition.R_01/monitor
$ pycompss gengraph complete_graph.dot
$ evince complete_graph.pdf  # or use any other pdf viewer you like

And you should see:

Figure 3 The dependency graph of the addition application

COMPSs has detected that the addition of a+b and c+d is independent, and consequently, that they can be done in parallel. While the addition of res1+res2 waits for the previous additions.

Check the performance

Let’s run it again with the tracing flag enabled:

$ runcompss -t addition.R
  [  INFO] Inferred PYTHON language
  [  INFO] Using default location for project file: /opt/COMPSs//Runtime/configuration/xml/projects/default_project.xml
  [  INFO] Using default location for resources file: /opt/COMPSs//Runtime/configuration/xml/resources/default_resources.xml
  [  INFO] Using default execution type: compss

  ----------------- Executing addition.R --------------------------

  Welcome to Extrae 3.8.3

  [... Extrae prolog ...]

  WARNING: COMPSs Properties file is null. Setting default values
  [(843)    API]  -  Starting COMPSs Runtime v3.4
  The result is: 44

  [... Extrae eplilog ...]

  mpi2prv: Congratulations! ./trace/addition.R_compss_trace.prv has been generated.
  [(24117)    API]  -  Execution Finished

  ------------------------------------------------------------

The execution has finished successfully and the trace has been generated in the $HOME/.COMPSs/<APP_NAME>_<00-99>/trace directory in prv format, which can be displayed and analyzed with PARAVER.

$ cd $HOME/.COMPSs/addition.R_02/trace
$ wxparaver addition.R_compss_trace_*.prv

Note

In the case of using the PyCOMPSs CLI, the trace will be generated in the .COMPSs/<APP_NAME>_<00-99>/trace directory:

$ cd .COMPSs/addition.R_02/trace
$ wxparaver addition.R_compss_trace.prv

Once Paraver has started, lets visualize the tasks:

Click in File and then in Load Configuration
Look for $COMPSS_HOME/Dependencies/paraver/cfgs/compss_tasks.cfg and click Open.

Note

In the case of using the PyCOMPSs CLI, the configuration files can be obtained by downloading them from the COMPSs repository.

And you should see:

Figure 4 Trace of the addition.R application

The X axis represents the time, and the Y axis the deployed processes (the first five (1.1.1-1.1.5) belong to the master and the next three belongs to the master process in the worker (2.1.1-2.1.3) whose events are shown with the compss_runtime.cfg configuration file).

The addition tasks are depicted in blue. We can quickly see that the first four add tasks have been executed in parallel (one per core), the next two as well, and finally, the last one that accumulates all partial results at the end.

Paraver is a very powerful tool for performance analysis. For more information, check the 🎯 Tracing Section.

Note

If you are using the COMPSs CLI, it is time to stop the COMPSs environment:

$ pycompss stop

Useful information

Choose your flavour:

Java

Syntax detailed information -> ☕ Java
Constraint definition -> Constraints annotations
Execution details -> Executing COMPSs applications
Graph, tracing and monitoring facilities -> 🛠️ Ecosystem
Other execution environments (Supercomputers, Docker, etc.) -> 🚀 Execution
Performance analysis -> 🎯 Tracing
Troubleshooting -> 🚑 Troubleshooting
Sample applications -> ☕ Java Sample applications:
Using COMPSs with persistent storage frameworks (e.g. dataClay, Hecuba) -> 🗄️ Persistent Storage

Python

Syntax detailed information -> 🐍 Python
Constraint definition -> @constraint
Execution details -> Executing COMPSs applications
Graph, tracing and monitoring facilities -> 🛠️ Ecosystem
Other execution environments (Supercomputers, Docker, etc.) -> 🚀 Execution
Performance analysis -> 🎯 Tracing
Troubleshooting -> 🚑 Troubleshooting
Sample applications -> 🐍 Python Sample applications:
Using COMPSs with persistent storage frameworks (e.g. dataClay, Hecuba) -> 🗄️ Persistent Storage

C/C++

Syntax detailed information -> 🧙 C/C++
Constraint definition -> Task Constraints
Execution details -> Executing COMPSs applications
Graph, tracing and monitoring facilities -> 🛠️ Ecosystem
Other execution environments (Supercomputers, Docker, etc.) -> 🚀 Execution
Performance analysis -> 🎯 Tracing
Troubleshooting -> 🚑 Troubleshooting
Sample applications -> 🧙 C/C++ Sample applications:

R

Syntax detailed information -> 📊 R
Execution details -> Executing COMPSs applications
Graph, tracing and monitoring facilities -> 🛠️ Ecosystem
Other execution environments (Supercomputers, Docker, etc.) -> 🚀 Execution
Performance analysis -> 🎯 Tracing
Troubleshooting -> 🚑 Troubleshooting
Sample applications -> 📊 R Sample applications: