Scriptella ETL

Data lines in a MAPPER report are extracted to a comma-separated-values (CSV) file and l oaded into corresponding database table using Scriptella ETL. The main advantage of using a lightweight ETL tool like Scriptella is that inputs are simplistic files: the Scriptella script is lucid XML and the data file is plain CSV. Such files are easily examined and edited in case customization or "tweaks" are required, while running the ETL process is uncomplicated.

For example, here is the Scriptella script that loads the .CSV file containing order entity data from the Footwear Orders sample MAPPER report into the FOOTWEAR_ORDER_LINEITEM Oracle table. Note the date and time fields in the MAPPER report are concatenated and parsed into java.util.Date for storage in ORDER_DATE_TIME, which is an Oracle DATE column. The script also loads the MJ_FOOTWEAR_ORDER_LINEITEM shadow table.

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd">
<etl>
    <!--
      The following are passed as properties to this script:

      csvFileUrl: file as URL that contains CSV heading and data rows
      null_placeholder: string that represent a database NULL in CSV columns
      mjEtlUserId: integer id of existing user to use as created-by
      mprCabinet: integer cabinet number of original Mapper report
      mprDrawer: drawer letter of original Mapper report
      mprReport: integer report number of original Mapper report
    -->
    <description>
      <![CDATA[
      Initial load of legacy Mapper data from `12D3954 (FootWearOrders)`
        to com.arsi.mj.testapp.hibgen.model.FootwearOrderLineitem
      ]]>
    </description>

    <connection id="csv" driver="csv" url="${csvFileUrl}">
      null_string=${null_placeholder}
    </connection>

    <connection id="db" driver="oracle" classpath="../mj-lib/ojdbc6.jar"
      url="jdbc:oracle:thin:@localhost:1521:XE" user="hr" password="tiger" />

    <!-- CSV query is empty to select all lines in input file -->
    <query connection-id="csv">
      <!-- *** INSERT INTO table containing order entity data *** -->
      <script connection-id="db">
        <![CDATA[
        insert into FOOTWEAR_ORDER_LINEITEM (
          order_no, QTY, SHOE_TYPE, SHOE_GENDER, SHOE_SIZE,
          SHOE_COLOR, PRICE, ORDER_DATE_TIME, COST,
          CREATE_TIMESTAMP, DATE_OF_LAST_UPDATE, CREATED_BY_USER, UPDATE_USER_ID, FW_ORDER_LI_ID)
        values (?{OrdNo}, ?{Qty}, ?{Type}, ?{Gender}, ?{Siz},
                ?{Color}, ?{Price}, ?{etl.date.parse(Date+Time, 'yyyyMMddHH:mm')}, ?{Cost},
                CURRENT_TIMESTAMP, CURRENT_TIMESTAMP, ?{mjEtlUserId}, ?{mjEtlUserId},
                SEQ_FOOTWEAR_ORDER_LINEITEM.nextval)
        ]]>
      </script>

      <!-- *** INSERT INTO shadow table *** -->
      <script connection-id="db">
        <![CDATA[
        insert into MJ_FOOTWEAR_ORDER_LINEITEM (
          MJ_CABINET, MJ_DRAWER, MJ_REPORT, MJ_LINENUM, MJ_LINETYPE, FW_ORDER_LI_ID)
        values (?{mprCabinet}, ?{mprDrawer}, ?{mprReport}, ?{__mprLinenum}, ?{__mjLinetypeEnum},
                SEQ_FOOTWEAR_ORDER_LINEITEM.currval)
        ]]>
      </script>
    </query>
</etl>

The first several lines from the CSV file processed by the script are shown below:

"__mprLinenum","OrdNo","Qty","Type","Gender","Siz","Color","Price","Date","Time","Cost"
100,"32001","1","BSKBL","M","9½","White/Black-Hot Lime","130.00","25-MAR-98","07:37",""
101,"32001","2","RUN","F","9½","White/Zen Grey/Neptune Blue","65.00","25-MAR-98","07:37",""
102,"32001","4","BSKBL","M","9½","White/Black-Hot Lime","130.00","25-MAR-98","07:37",""
103,"32001","6","GOLF","F","9½","White/Fluorescent Pink","75.00","25-MAR-98","07:37",""
104,"32002","3","GOLF","M","12","White/Tech Grey","70.00","25-MAR-98","07:46",""
105,"32002","7","RUN","M","9½","White/Zen Grey/New Spruce","65.00","25-MAR-98","07:46",""
106,"32002","9","BSKBL","M","12","White/Black-Hot Lime","130.00","25-MAR-98","07:46",""

To load entity data from the MAPPER report into the database table, Scriptella executes the SQL INSERT statements in the script for each row in the CSV input file. Scriptella may be run from the command line, Ant, Maven or a Java application. MJ leverages the support for properties provided by Scriptella's EtlExecutor class to pass arguments to the script:

Map etlProps = new HashMap();
etlProps.put("csvFileUrl", new File("FootWearOrders_12D3954.csv").toURL().toString());
etlProps.put("null_placeholder", etlConfig.getNullString());
etlProps.put("mjEtlUserId", new Long(etlConfig.getEtlUserId()));
etlProps.put("mprCabinet", new Integer(3954));
etlProps.put("mprDrawer", "D");
etlProps.put("mprReport", new Integer(12));
EtlExecutor executor = EtlExecutor.newExecutor(fileScript.toURL(), etlProps);
executor.execute();