Sunday, January 01, 2017

Using AWS Machine Learning from ABAP to predict runtimes

Happy new year everybody!

Today I tried out Amazon's Machine Learning capabilities. After running over the basic AWS Machine Learning tutorial and getting to know how the guys at AWS deal with the subject I got quite exited.



Everythings sounds quite easy:

  1. Prepare example data in a single CSV file with good and distinct features for test and training purposes
  2. Create a data source from that CSV file, which basically means verifying that the column types were detected correctly and specifying a result column. 
  3. Create a Machine Learning model from the data source, running an evaluation on it
  4. Create an Endpoint, so your model becomes consumable via a URL based service

My example use case was to predict the runtime of one of our analysis tools - SNP System Scan - given some system parameters. In general any software will probably benefit from good runtime predictions as this is a good way to improve the user experience. We all know the infamous progress bar metaphor that quickly reaches 80% but then takes ages to get to 100%. As a human being I expect progress to be more... linear ;-)


So this seems like a perfect starting point for exploring Machine Learning. I got my data perpared and ran through all the above steps. I was dealing with numerical and categorical columns with my datasource but also boolean and text are available. Text is good for unstructured data such as natural language analysis, but I did not get into that yet. Everything so far was quite easy and went well.

Now I needed to incorporate the results into the software, which is in ABAP. Hmmm, no SDK for ABAP. Figured! But I still want to enable all my colleagues to take advantage of this new buzzword techology and play around with it. I decided for a quick implementation using the proxy pattern.


So I have created an ABAP based API that calls a PHP based REST Service via HTTP, which then utilizes the PHP SDK for AWS to talk to the AWS Machine Learning Endpoint I previously created.

For the ABAP part I wanted to be both as easy and as generic as possible, so the API should work with any ML model and any record structure. The way that ABAP application developers would interact with this API would look like this:


REPORT  /snp/aws01_ml_predict_scan_rt.

PARAMETERS: p_comp TYPE string LOWER CASE OBLIGATORY DEFAULT 'SAP ECC 6.0'.
PARAMETERS: p_rel TYPE string LOWER CASE OBLIGATORY DEFAULT '731'.
PARAMETERS: p_os TYPE string LOWER CASE OBLIGATORY DEFAULT 'HP-UX'.
PARAMETERS: p_db TYPE string LOWER CASE OBLIGATORY DEFAULT 'ORACLE 12'.
PARAMETERS: p_db_gb TYPE i OBLIGATORY DEFAULT '5000'. "5 TB System
PARAMETERS: p_uc TYPE c AS CHECKBOX DEFAULT 'X'. "Is this a unicode system?
PARAMETERS: p_ind TYPE string LOWER CASE OBLIGATORY DEFAULT 'Retail'. "Industry
PARAMETERS: p_svers TYPE string LOWER CASE OBLIGATORY DEFAULT '16.01'. "Scan Version

START-OF-SELECTION.
  PERFORM main.

FORM main.
*"--- DATA DEFINITION -------------------------------------------------
  "Definition of the record, based on which a runtime predition is to be made
  TYPES: BEGIN OF l_str_system,
          comp_version TYPE string,
          release TYPE string,
          os TYPE string,
          db TYPE string,
          db_used TYPE string,
          is_unicode TYPE c,
          company_industry1 TYPE string,
          scan_version TYPE string,
         END OF l_str_system.

  "AWS Machine Learning API Class
  DATA: lr_ml TYPE REF TO /snp/aws00_cl_ml.
  DATA: ls_system TYPE l_str_system.
  DATA: lv_runtime_in_mins TYPE i.
  DATA: lv_msg TYPE string.
  DATA: lr_ex TYPE REF TO cx_root.

*"--- PROCESSING LOGIC ------------------------------------------------
  TRY.
      CREATE OBJECT lr_ml.

      "set parameters
      ls_system-comp_version = p_comp.
      ls_system-release = p_rel.
      ls_system-os = p_os.
      ls_system-db = p_db.
      ls_system-db_used = p_db_gb.
      ls_system-is_unicode = p_uc.
      ls_system-company_industry1 = p_ind.
      ls_system-scan_version = p_svers.

      "execute prediction
      lr_ml->predict(
        EXPORTING
          iv_model   = 'ml-BtUpHOFhbQd' "model name previously trained in AWS
          is_record  = ls_system
        IMPORTING
          ev_result  = lv_runtime_in_mins
      ).

      "output results
      lv_msg = /snp/cn00_cl_string_utils=>text( iv_text = 'Estimated runtime of &1 minutes' iv_1 = lv_runtime_in_mins ).
      MESSAGE lv_msg TYPE 'S'.

    CATCH cx_root INTO lr_ex.

      "output errors
      lv_msg = lr_ex->get_text( ).
      PERFORM display_lines USING lv_msg.

  ENDTRY.

ENDFORM.

FORM display_lines USING iv_multiline_test.
*"--- DATA DEFINITION -------------------------------------------------
  DATA: lt_lines TYPE stringtab.
  DATA: lv_line TYPE string.

*"--- PROCESSING LOGIC ------------------------------------------------
  "split into multiple lines...
  SPLIT iv_multiline_test AT cl_abap_char_utilities=>newline INTO TABLE lt_lines.
  LOOP AT lt_lines INTO lv_line.
    WRITE: / lv_line. "...and output each line individually
  ENDLOOP.

ENDFORM.

Now on the PHP side I simply used the AWS SDK for PHP. Setting it up is as easy as extracting a ZIP file, require the auto-load mechanism and just use the API. I wrote a little wrapper class that I could easily expose as a REST Service (not shown here).

<?php

class SnpAwsMachineLearningApi {

   /**
   * Create an AWS ML Client Object
   */
   private function getClient($key,$secret) {
      return new Aws\MachineLearning\MachineLearningClient([
         'version' => 'latest',
         'region'  => 'us-east-1',
         'credentials' => [
            'key'    => $key,
            'secret' => $secret
         ],
      ]);
   }

   /**
   * Determine the URL of the Model Endpoint automatically
   */
   private function getEndpointUrl($model,$key,$secret) {

      //fetch metadata of the model
      $modelData = $this->getClient($key,$secret)->getMLModel([
         'MLModelId'=>$model,
         'Verbose'=>false
      ]);

      //check if model exists
      if(empty($modelData)) {
         throw new Exception("model ".$model." does not exist");
      }

      //getting the endpoint info
      $endpoint = $modelData['EndpointInfo'];

      //check if endpoint was created
      if(empty($endpoint)) {
         throw new Exception("no endpoint exists");
      }

      //check if endpoint is ready
      if($endpoint['EndpointStatus'] != 'READY') {
         throw new Exception("endpoint is not ready");
      }

      //return the endpoint url
      return $endpoint['EndpointUrl'];
   }

   /**
   * Execute a prediction
   */
   public function predict($model,$record,$key,$secret) {
      return $this->getClient($key,$secret)->predict(array(

          //provide the model name
         'MLModelId'       => $model,

         //make sure it's an associative array that is passed as the record
         'Record'          => json_decode(json_encode($record),true),

         //determine the URL of the endpoint automatically, assuming there is
         //only and exactely one
         'PredictEndpoint' => $this->getEndpointUrl($model,$key,$secret)
      ));
   }

}

And that is basically it. Of course for the future it would be great to get rid of the PHP part and have an SDK implementation purely ABAP based but again, this was supposed to be a quick and easy implementation.

Currently it enables ABAP developers to execute predictions on AWS Machine Learning Platform on any trained model without having to leave their terrain.

In the future this could be extended to initially providing or updating datasources from ABAP internal tables, creating and training models on the fly and of course abstracting stuff even so far, that other Machine Learning providers can be plugged in. So why not explore the native SAP HANA capabilities next...

0 comments: