StelsXML JDBC Driver v1.3 Documentation

 

 

Contents 


Getting Started

Schema File

Driver Properties

Data Types

Supported SQL Syntax

Connection Example

Handling XML Namespaces

Supported XPath

Driver Modes

User-defined SQL functions

   

 

Getting Started


To process your XML documents via StelsXML JDBC driver, you should do the following easy steps:

 

Schema File


The schema file is used to define the specifications of tables the driver can later access using SQL queries. Each table is defined with the help of XPath expressions to various elements of an XML document (i.e. tag elements, attributes, etc).

Suppose we have an simple XML document that stores data about the company's employees:

<?xml version="1.0" encoding="UTF-8"?>

<employees>

     <document_name>Employees doc</document_name>

     <employee id="1">

         <first_name>Bill</first_name>

         <last_name>Adams</last_name>

         <age>25</age>

         <hire_date>12-06-1995</hire_date>

         <title>Java programmer</title>

     </employee>

     <employee id="2">

         <first_name>Mary</first_name>

         <last_name>Jones</last_name>

         <age>32</age>

         <hire_date>22-09-2001</hire_date>

         <title>Sales manager</title>

     </employee>

</employees>

 

And we want to generate the "employees" table based on it. In this case the table specification may look as follows:

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="employees" file ="employees.xml" path="/employees/employee">

  <!-- The attribute 'name' specifies the name of the table-->

  <!-- The attribute 'file' specifies the file path to the XML document. This path may be absolute (e.g.: 'c:/xmlfiles/employees.xml') or relative to the schema file path (e.g.: 'employees.xml' or 'subdir/employees.xml'). Also, you can use the HTTP URL (e.g.: 'http://www.example.com/rss_feed.xml') or FTP URL (e.g.: ftp://user:passw@www.sample.com:21/test/test.xml') -->

  <!-- The attribute 'path' defines that table entries are based on an XPath expression (path) '/employees/employee', i.e. it is based on 'employee' elements-->

    <column name="documentname" type="string" path="/employees/document_name"/>

   <!-- The attribute 'name' specifies the name of the column-->

   <!-- The attribute 'type' specifies the data type of the column-->

   <!-- The attribute 'path' defines the XPath path to the XML element the column is based on. In this case the XPath expression '/employees/document_name' is an absolute path from the root of the XML document-->

    <column name="firstname" type="string" path="first_name"/>

   <!--The value 'first_name' of the 'path' attribute defines the XPath expression which is relative to the main specification defined in the 'path' attribute of the 'table' element.  I.e. the full path to the column is '/employees/employee/first_name'-->

    <column name="lastname" type="string" path="last_name"/>

    <column name="title" type="string" path="title"/>

    <column name="id" type="integer" path="@id"/>

   <!-- This 'path' attribute defines the XPath path to the attribute of the XML element-->

    <column name="age" type="integer" path="age"/>

    <column name="hiredate" type="datetime" path="hire_date"/>

  </table> 

</schema>

 

Now, after the table has been specified, we can access it using an SQL query (see Connection Example for more details).

 

Additional notes on the schema file:

 

Driver Properties


The driver supports a number of parameters that change default behavior of the driver.

These properties are:

 

charset is used to set the character encoding for output. To specify the input encoding you need to add the encoding declaration to the XML file: <?xml version="1.0" encoding="some_charset"?> (Default charset is "UTF-8")

 

dateFormat is used to specify a format for date/time values. A value of this property is a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd |  HH:mm:ss.SSS |  HH:mm:ss"). For more details about date/time format used in the driver please see chapter "time format syntax" in the java.util.SimpleDateFormat class documentation. 

 

decimalFormatInput,  decimalFormatOutput are used to specify input and output formats for double and float values in a CSV file. For instance, you can use these formats to specify currency values, e.g.: "###,###.##$". For more information about patterns used in "decimalFormat" please see the documentation for java.text.DecimalFormat class.

 

namespaceAware is used to enable namespace support when parsing XML documents (Default is "true")

 

namespaces is used to define XML namespaces used in the XML document (see Handling XML Namespaces for more details )

 

To set the driver properties, you can use one of the following ways:

1)  using Properties class:

Properties props = new java.util.Properties();
 
props.put("dateFormat","MM.dd.yyyy");
props.put("namespaceAware","false");
 
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0],props);
 
2) appending the properties to the URL:
Connection conn = 
DriverManager.getConnection(
"jdbc:jstels:xml:schema.xml?dateFormat=MM.dd.yyyy&namespaceAware=false");
 
3) adding the properties as attributes to the <table> tag in the schema file 
(only for local table properties such as "charset", "dateFormat", "namespaces", etc):

<?xml version="1.0" encoding="UTF-8"?>

<schema>

<table name="products" file ="products.xml" path="/products/product" 
charset="ISO-8859-2" dateFormat="dd-MM-yyyy">
	<column name="prodid" type="int" path="prodid"/>
	<column name="description" type="string" path="description"/>
</table>
</schema>

 

 

Data Types


The driver supports the following data types: INTEGER, BIGINT, FLOAT, DOUBLE, VARCHAR, DATETIME.

The table below demonstrates the mapping scheme between SlelsXML data types and JDBC data types:

StelsXML data type

 JDBC returned type (java.sql.Types.*)

Java class used in StelsXML

Integer, INT

java.sql.Types.INTEGER

java.lang.Integer

Bigint

java.sql.Types.BIGINT

java.lang.Long

FLOAT

java.sql.Types.FLOAT

java.lang.Float

Double

java.sql.Types.DOUBLE

java.lang.Double

BIGDECIMAL (recommended for storing currency values)

java.sql.Types.NUMERIC

java.math.BigDecimal

STRING, VARCHAR

java.sql.Types.VARCHAR

java.lang.String

DATETIME

java.sql.Types.TIMESTAMP

java.util.Date

 

 

Supported SQL Syntax


StelsXML supports the most part of ANSI 92 grammar like SELECT, INSERT, UPDATE, DELETE and CREATE statements.

 

An SQL query must meet the following conditions:

Examples of SELECT queries:

SELECT * FROM employees WHERE title = 'Java programmer' ORDER BY last_name

SELECT name FROM salesreps WHERE ( rep_office IN ( 22, 11, 12 ) )  OR ( manager IS NULL AND hire_date >= to_date ( '01-05-2002','dd-MM-yyyy' ) OR ( sales > quota AND NOT sales > 600000.0 )

SELECT city, target, sales FROM "offices.xml" WHERE region = 'Eastern' AND sales > target ORDER BY city

SELECT * FROM prices ps JOIN regions regs ON ps.regionid = regs.id JOIN products prod ON prod.prodid = ps.prodid

 

You can use the standard SQL operations INSERT, UPDATE, DELETE as well to add, update and delete records in tables.

Examples:

INSERT INTO employees (firstname, lastname, title, id, hiredate) VALUES('John', 'Doe', 'Web admin', 3, to_date('07:02:2007','dd:MM:yyyy'))

DELETE FROM employees WHERE lastname LIKE 'Henry%' 

UPDATE "c:/xmlfiles/customers.xml" SET credit_limit = 50000.00 WHERE company = 'Acme Mfg.'

 

 

Also, the driver supports a number of additional commands for processing XML files:

Examples:

SAVE TABLE employees // saves the table "employees" by using default file and columns settings of the corresponding specification "employees"

SAVE TABLE employees AS "c:/copy/employees_copy.xml" // saves "employees" to another file "employees_copy.xml" by using column settings of the specification "employees"

SAVE TABLE employees AS "employees_html_table.html" USING SPECIFICATION employees_html_table // saves "employees" to another file "employees_html_table.html" by using column settings of the specification "employees_html_table"

SAVE (SELECT * employees WHERE title LIKE '%programmer') AS "employees_programmers.xml" USING SPECIFICATION employees // saves the SQL query result to the file "employees_programmers.xml" by using column settings of the specification "employees"

CREATE TABLE html_table (td1 string, td2 string, td3 int) // creates the table "html_table" with three columns by using column settings of the corresponding specification "html_table".

CREATE TABLE employees USING SPECIFICATION // creates the table "employees" by using column settings of the corresponding specification "employees"

 

For more information about the supported SQL syntax, please see the SQL specification and examples of use containing in the StelsXML package.

 
 
 
Connection Example

This complete example shows how the driver is used. You can download this example with the driver package here.

The driver connection code:

import java.sql.*;

 

public class DriverTest {

 

  public static void main(String[] args) {

    try {

      // load the driver into memory

      Class.forName("jstels.jdbc.xml.XMLDriver");

 

      // create a connection. The first command line parameter is assumed to

      // be the file path to the schema file

      Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0]);

 

      // create a Statement object to execute the query with

      Statement stmt = conn.createStatement();

 

      // execute a query

      ResultSet rs = stmt.executeQuery("SELECT * FROM employees");

 

      // read data and put it to the console

      for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) {

        System.out.print(rs.getMetaData().getColumnName(j) + "\t");

      }

      System.out.println();

 

      while (rs.next()) {

        for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) {

          System.out.print(rs.getObject(j) + "\t");

        }

        System.out.println();

      }

 

      // close the objects

      rs.close();

      stmt.close();

      conn.close();

    }

    catch (Exception e) {

      e.printStackTrace();

    }

  }

}

 

The specification for the table "employees" used in the SQL query above:

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="employees" file ="employees.xml" path="/employees/employee" dateFormat="dd-MM-yyyy">

    <!-- In this example the file "employees.xml" is located in the same directory where the schema is-->

    <!-- The attribute "dateFormat" sets a local date format used in the specified XML file-->

    <column name="documentname" type="string" path="/employees/document_name"/>

    <column name="firstname" type="string" path="first_name"/>

    <column name="lastname" type="string" path="last_name"/>

    <column name="title" type="string" path="title"/>

    <column name="id" type="integer" path="@id"/>

    <column name="age" type="integer" path="age"/>

    <column name="hiredate" type="datetime" path="hire_date"/>

  </table> 

</schema>

 

The XML file "employees.xml":

<?xml version="1.0" encoding="UTF-8"?>

<employees>

     <document_name>Employees doc</document_name>

     <employee id="1">

         <first_name>Bill</first_name>

         <last_name>Adams</last_name>

         <age>25</age>

         <hire_date>12-06-1995</hire_date>

         <title>Java programmer</title>

     </employee>

     <employee id="2">

         <first_name>Mary</first_name>

         <last_name>Jones</last_name>

         <age>32</age>

         <hire_date>22-09-2001</hire_date>

         <title>Sales manager</title>

     </employee>

     <employee id="3">

         <first_name>Dan</first_name>

         <last_name>Roberts</last_name>

         <age>30</age>

         <hire_date>01-11-1999</hire_date>

         <title>Project manager</title>

     </employee>

     <employee id="4">

         <first_name>Paul</first_name>

         <last_name>Crauz</last_name>

         <age>24</age>

         <hire_date>02-05-2002</hire_date>

         <title>C++ programmer</title>

     </employee>

</employees>

 

The result of the test:

DOCUMENTNAME  FIRSTNAME LASTNAME TITLE           ID AGE HIREDATE          

Employees doc Bill      Adams    Java programmer 1  25  1995-06-12 00:00:00.0         

Employees doc Mary      Jones    Sales manager   2  32  2001-09-22 00:00:00.0         

Employees doc Dan       Roberts  Project manager 3  30  1999-11-01 00:00:00.0         

Employees doc Paul      Crauz    C++ programmer  4  24  2002-05-02 00:00:00.0         

 

 

Handling XML Namespaces


By default, XML parsing is namespace aware in the driver. You can change this with the namespaceAware property.

In order to process XML documents that use namespaces, you must specify each namespace using the driver property namespaces. The value format for this property is the following:

prefix1:namespace1|prefix2:namespace2| ...

where namespace is a URI identifier of the namespace used in a XML document and prefix is a prefix identifier for use in XPath expressions in the schema file. The prefix may be arbitrary.

 

Example:

The XML file "employees.xml" that uses the namespace "http://www.example.com/employees" and prefixed XML elements:

<?xml version="1.0" encoding="UTF-8"?>

<empl:employees xmlns:empl="http://www.example.com/employees">
     <empl:employee id="1">

         <empl:first_name>Bill</empl:first_name>

         <empl:last_name>Adams</empl:last_name>

         <empl:age>25</empl:age>

         <empl:hire_date>12-06-1995</empl:hire_date>

         <empl:title>Java programmer</empl:title>

     </empl:employee>

     <empl:employee id="2">

         <empl:first_name>Mary</empl:first_name>

         <empl:last_name>Jones</empl:last_name>

         <empl:age>32</empl:age>

         <empl:hire_date>22-09-2001</empl:hire_date>

         <empl:title>Sales manager</empl:title>

     </empl:employee>

</empl:employees>

 

The table specification for the "employees.xml":

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="employees" file="employees.xml" path="/empl:employees/empl:employee" dateFormat="dd-MM-yyyy" namespaces="empl:http://www.example.com/employees">

    <!-- Note that in this case all XPath expressions must use the prefix 'empl:' -->

    <column name="firstname" type="string" path="empl:first_name"/>

    <column name="lastname" type="string" path="empl:last_name"/>

    <column name="title" type="string" path="empl:title"/>

    <column name="id" type="integer" path="@id"/>

    <column name="age" type="integer" path="empl:age"/>

    <column name="hiredate" type="datetime" path="empl:hire_date"/>

  </table> 

</schema>

 

 

Supported XPath


The following XPath expressions are supported:

  

Not supported:

 

 

Driver Modes


There are two main modes the driver can run in: the mode with data caching and the mode with data swapping. Running in the first mode (default mode), the driver caches XML files in the RAM making it possible to achieve maximum performance while processing XML. Keep in mind that Java Virtual Machine must have enough free memory allotted for storing large tables (use -Xms and -Xmx JVM options). You can also use the DROP TABLE <table name> FROM CACHE command to force tables to be removed from the cache.

 

The second mode is recommended for processing large XML files (>100 MB). To use this mode, set the driver property caching to false. There are also some properties for configuring this mode:

readOnlySubMode is used to set an additional submode. The submode readOnlySubMode = true  is optimized for using SELECT queries only, it doesn't support write operations. The submode readOnlySubMode = false supports all SQL operations. (by default this parameter is true).

tempPath - directory where temporary files will be created (by default it is a OS temporary directory, specified by JVM environment variable "java.io.tmpdir").

maxRecordsInMemoryPerTable is used to define how many records should be saved in the RAM for each table (by default - 5000 records).

maxSwapStringSize - default size for string columns in swap files (by default - 50 characters).

Example:

Properties props = new java.util.Properties();

props.setProperty("caching", "false");         // switch to the swapping mode
props.setProperty("tempPath", "c:/temp");     
props.setProperty("maxRecordsInMemoryPerTable", "7000");

props.setProperty("maxSwapStringSize", "60");
props.setProperty("readOnlySubMode", "true");

Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0], props);

 

To reduce the swap size and increase the overall performance in the swapping mode, we recommend to set the 'size' attribute for each string column in the schema file. If this attribute is not specified, the column size by default is equals to the value of the maxSwapStringSize property.

Example:

<?xml version="1.0" encoding="UTF-8"?>

<schema>

  <table name="employees" file="employees.xml" path="/employees/employee" dateFormat="dd-MM-yyyy">

    <column name="firstname" type="string" path="first_name" size="10"/>

    <column name="lastname" type="string" path="last_name" size="15"/>

    <column name="title" type="string" path="title" size="20"/>

    <column name="id" type="integer" path="@id"/>

    <column name="age" type="integer" path="age"/>

    <column name="hiredate" type="datetime" path="hire_date"/>

  </table> 

</schema>

 

 

User-defined SQL functions


You can use your own SQL functions in the driver. To use this feature, you should do the following: 

1) Create a static method that will act as an SQL function
    Mind that:

Example:

package my_pack;

public class MyFuncs{

// user-defined SQL function that formats the given argument into a date/time string with the specified format

public static String format_date(java.util.Date d, String format) {
    // process the null values

    if (d == null || format == null)
    return null;
    java.text.SimpleDateFormat sdf = new java.text.SimpleDateFormat(format);
   
// return a function result with java.lang.String type

    return sdf.format(d);
}
}

 

2) Register the function using the static method jstels.jdbc.xml.XMLDriver.registerFunction (String funcName, String handler). Where funcName argument is the name by which this method will be called in SQL queries and handler argument is the fully-qualified method name.

Example:

jstels.jdbc.xml.XMLDriver.registerFunction ("format_date", "my_pack.MyFuncs.format_date")

 

Also, you can use the driver property function:<my_func>.

Example:

Properties props = new java.util.Properties();
props.put("function:formate_date","my_pack.MyFuncs.format_date");
...  
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0], props);
// or append this property to the URL
Connection conn2 = DriverManager.getConnection("jdbc:jstels:xml:" + args[0]
 + "?function:formate_date=my_pack.MyFuncs.format_date");

 

3) Call the function in an SQL query

Example:

Statement st = connection.createStatement();

st.execute( "SELECT format_date(date_column, 'yyyy-MM-dd') FROM test" );

 

[HOME]   [TOP]