Installing Execution Engine

This page will walk through the steps to install and configure Execution Engine on Linux, Windows, and Mac OS. What you will need to succeed:

First install Docker Desktop for your system using the instructions provided on the Docker website here: https://docs.docker.com/engine/install/

Next create a new directory/folder on your computer where you will put your configuration settings.

Choose how you would like to set up security. There are three options:

  1. Open ID Connect

  2. Keycloak

  3. No security

Since execution engine requires direct access to databases to run analyses security is recommended. The “No security” option is suitable for testing or if another security/login method is already in place.

To install Execution Engine without security simply copy and paste the following YAML into a new text file in your executionengine folder named compose.yml. You may choose to alter some of these settings if you with. Common changes might be setting AUTH_ENABLED=true in order to enable Open ID Connect. You might also want to change the database credentials and the 32 character string that is use as the ENCRYPTION_KEY . You should not need to change anything else but if you are familiar with Docker go right ahead and make the adjustments you need to the compose.yml file.

version: '3.8'

services:

  engine-ui:
    image: "executionengine.azurecr.io/execution-engine-ui:dev"
    ports:
      - "4200:4200"
    environment:
      - BACKEND_BASE_URL=http://localhost:8083/api/v1
      - AUTH_ENABLED=false
#      Required if you enable Open ID authentication by setting AUTH_ENABLED=true
#      - OIDC_AUTHORITY=https://login.microsoftonline.com/1234-34532-34534/v2.0
#      - OIDC_CLIENT_ID=2234e6c40-3451-49c6-a99e-234b8fff36a4b
#      - OIDC_REDIRECT_URI=http://localhost:5173/deck-portal

  engine-api:
    image: "executionengine.azurecr.io/execution-engine-api:dev"
    restart: always
    ports:
      - "8083:8083"
    environment:
      - SPRING_DATASOURCE_USERNAME=postgres
      - SPRING_DATASOURCE_PASSWORD=postgres
      - SPRING_DATASOURCE_URL=jdbc:postgresql://engine_db:5432/execution_engine
      - AUTHENTICATION_ENABLED=false
      - ENCRYPTION_KEY=PleaseEnterANew32CharacterString
    volumes:
      - ./studies:/app/files
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - execution_network
    depends_on:
      db:
        condition: service_healthy

  db:
    container_name: engine_db
    image: "postgres:15.4"
    restart: always
    environment:
      - POSTGRES_DB=execution_engine
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    networks:
      - execution_network
    volumes:
      - db-data:/var/lib/postgresql/data
    healthcheck:
      test: [ "CMD-SHELL", "pg_isready -U postgres" ]
      interval: 10s
      timeout: 5s
      retries: 10

networks:
  execution_network:
    name: execution_network

volumes:
  db-data:

Once you have saved the compose.yml file open a terminal, shell, or command prompt and navigate to the executionengine folder you just created. Run the following docker command.

{bash, eval=FALSE} docker compose up -d

This tells docker to pull all of the necessary applications to start up the application. Note that this will not download any R runtime environments. If you are preparing an environment for offline execution then you will need to be sure to pull your images from the internet (wherever you R runtime images are located) down onto your local machine. Note that you may need to put these images on a USB drive to transfer them to an air-gapped computer.

After running docker compose up -d and waiting a few minutes your instance of the Execution engine should be live at http://localhost:4200.

Configure Data Sources

The first thing you will want to do is configure access to one or more data sources in the OMOP Common Data Model (CDM) format. To do this go to the “Data Sources” tab and click the “New Data Source” button.

Make sure to fill out as much information as you have about your data source. A “data source” represents a single OMOP CDM instance. For example, if you have three OMOP CDMs in a single database you will need to add three data sources, one for each OMOP CDM instance.

The values that you put in the form will be saved in the execution engine’s internal database. The password will be encrypted. The values will then be made available in R code as environment variables.

For example if your CDM Name is “Synpuf 100k”, R programmers will be able to access this value in their code using Sys.getenv('CDM_NAME') .

The database related environment variables available for R programmers are:

Data Source Field Environment variable Description
“Select a database” DBMS_TYPE The database management system: postgresql, redshift, sql server, oracle, snowflake, spark
CDM Name DATA_SOURCE_NAME The name of the CDM
OMOP CDM Version CDM_VERSION The version of the OMOP CDM being used. Either 5.3 or 5.4
JDBC Connection String CONNECTION_STRING The JDBC connection string often used when connecting to the database with DatabaseConnector.
Database Catalog DBMS_CATALOG Some databases have a compound schema (e.g. catalog.schema). For example SQL Server usually has something like cdm53.dbo for the full schema name. This field can be left blank if you do not need it.
Database Server DBMS_SERVER The database server name (e.g. pgsqltest1.abcd1234.us-east-1.rds.amazonaws.com)
Database Name DBMS_NAME The name of the database used to connect.
Database Port DBMS_PORT The port of the database used to connect.
Database Username DBMS_USERNAME The username used to connect to the database.
Database Password DBMS_PASSWORD The password used to connect to the database.
CDM Schema DBMS_SCHEMA The name of the schema that contains the OMOP CDM.
Target Schema TARGET_SCHEMA A schema where the user has write access.
Result Schema RESULT_SCHEMA A schema where the user has write access.
Cohort Table Name COHORT_TARGET_TABLE The name of a table to use to store cohorts.

Important: All of these variables are optional and R study programmers have the option to use them or not in their R code. If the variable has not been set then the environment variable Sys.getenv('CDM_NAME'), will, by default, return the empty string ““.