To process your XML documents via StelsXML JDBC driver, you should do the following easy steps:
Add the file xmldriver.jar to your classpath or extract the jar file in the directory of the application.
Create the schema file and add your table specifications (please see Schema File for more details).
Register the driver in your Java code:
|
Class.forName("jstels.jdbc.xml.XMLDriver"); |
Connect to the driver using java.sql.Connection class.
|
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:c:/xmlfiles/schema.xml"); |
The connection URL of StelsXML is jdbc:jstels:xml:schema_path, where schema_path may be the following:
▪ absolute or relative file path to the schema file:
|
|||
▪ FTP URL to the schema file:
▪ HTTP URL to the schema file:
▪ HTTP URL to the dynamic server page (JSP, ASP, PHP, CGI, etc) which returns the schema file as output:
|
Execute an SQL query using java.sql.Statement class:
|
Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("SELECT * FROM test"); |
|
<?xml version="1.0" encoding="UTF-8"?> <employees> <document_name>Employees doc</document_name> <employee id="1"> <first_name>Bill</first_name> <last_name>Adams</last_name> <age>25</age> <hire_date>12-06-1995</hire_date> <title>Java programmer</title> </employee> <employee id="2"> <first_name>Mary</first_name> <last_name>Jones</last_name> <age>32</age> <hire_date>22-09-2001</hire_date> <title>Sales manager</title> </employee> </employees> |
|
<?xml version="1.0" encoding="UTF-8"?> <schema> <table name="employees" file ="employees.xml" path="/employees/employee"> <!-- The attribute 'name' specifies the name of the table--><!-- The attribute 'file' specifies the file path to the XML document. This path may be absolute (e.g.: 'c:/xmlfiles/employees.xml') or relative to the schema file path (e.g.: 'employees.xml' or 'subdir/employees.xml'). Also, you can use the HTTP URL (e.g.: 'http://www.example.com/rss_feed.xml') or FTP URL (e.g.: ftp://user:passw@www.sample.com:21/test/test.xml') --><!-- The attribute 'path' defines that table entries are based on an XPath expression (path) '/employees/employee', i.e. it is based on 'employee' elements--><column name="documentname" type="string" path="/employees/document_name"/> <!-- The attribute 'name' specifies the name of the column--><!-- The attribute 'type' specifies the data type of the column--><!-- The attribute 'path' defines the XPath path to the XML element the column is based on. In this case the XPath expression '/employees/document_name' is an absolute path from the root of the XML document--><column name="firstname" type="string" path="first_name"/> <!--The value 'first_name' of the 'path' attribute defines the XPath expression which is relative to the main specification defined in the 'path' attribute of the 'table' element. I.e. the full path to the column is '/employees/employee/first_name'--><column name="lastname" type="string" path="last_name"/> <column name="title" type="string" path="title"/> <column name="id" type="integer" path="@id"/> <!-- This 'path' attribute defines the XPath path to the attribute of the XML element--><column name="age" type="integer" path="age"/> <column name="hiredate" type="datetime" path="hire_date"/> </table> </schema> |
Now, after the table has been specified, we can access it using an SQL query (see Connection Example for more details).
Additional notes on the schema file:
The table may be specified without the 'name' attribute:
|
<table file ="c:/xmlfiles/employees.xml" path="/employees/employee"> |
Then, as a table, you have to specify the full path to the XML file in the SQL query: SELECT * FROM "c:/xmlfiles/employees.xml"
If some XML documents have the same format you can use the file pattern:
|
<table file ="c:/xmlfiles/??employees*.xml" path="/employees/employee"> |
Where the wildcard '*' denotes any string of zero or more characters and the wildcard '?' denotes any single character. In an SQL query you have to specify the full path to one of these files: SELECT * FROM "c:/xmlfiles/myemployees001.xml"
You can also set the local driver properties for each table directly in the 'table' element (see Driver properties for more details):
|
<table name="employees" file ="employees.xml" path="/employees/employee" dateFormat="dd-MM-yyyy" namespaces="empl:http://www.example.com/employees"> |
See supported XPath API here.
The
driver supports a number of parameters that change default behavior of the driver.
These properties are:
charset is used to set the character encoding for output. To specify the input encoding you need to add the encoding declaration to the XML file: <?xml version="1.0" encoding="some_charset"?> (Default charset is "UTF-8").
dateFormat is used to specify a format for date/time values. A value of this property is a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd | HH:mm:ss.SSS | HH:mm:ss"). For more details about date/time format used in the driver please see chapter "time format syntax" in the java.util.SimpleDateFormat class documentation.
decimalFormatInput, decimalFormatOutput are used to specify input and output formats for double and float values in a CSV file. For instance, you can use these formats to specify currency values, e.g.: "###,###.##$". For more information about patterns used in "decimalFormat" please see the documentation for java.text.DecimalFormat class.
namespaceAware is used to enable namespace support when parsing XML documents (Default is "true")
namespaces is used to define XML namespaces used in the XML document (see Handling XML Namespaces for more details )
To set the driver properties, you can use one of the following ways:
1) using Properties class:
Properties props = new java.util.Properties();
props.put("dateFormat","MM.dd.yyyy"); props.put("namespaceAware","false");
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0],props); |
2) appending the properties to the URL:
Connection conn =
DriverManager.getConnection(
"jdbc:jstels:xml:schema.xml?dateFormat=MM.dd.yyyy&namespaceAware=false");
|
3) adding the properties as attributes to the <table> tag in the schema file
(only for local table properties such as "charset", "dateFormat", "namespaces", etc):
|
<?xml version="1.0" encoding="UTF-8"?> <schema> <table name="products" file ="products.xml" path="/products/product"
charset="ISO-8859-2" dateFormat="dd-MM-yyyy"> <column name="prodid" type="int" path="prodid"/> <column name="description" type="string" path="description"/> </table> </schema>
|
|
StelsXML data type |
JDBC returned type (java.sql.Types.*) |
Java class used in StelsXML |
|
Integer, INT |
java.sql.Types.INTEGER |
java.lang.Integer |
|
Bigint |
java.sql.Types.BIGINT |
java.lang.Long |
|
FLOAT |
java.sql.Types.FLOAT |
java.lang.Float |
|
Double |
java.sql.Types.DOUBLE |
java.lang.Double |
|
BIGDECIMAL (recommended for storing currency values) |
java.sql.Types.NUMERIC |
java.math.BigDecimal |
|
STRING, VARCHAR |
java.sql.Types.VARCHAR |
java.lang.String |
|
DATETIME |
java.sql.Types.TIMESTAMP |
java.util.Date |
StelsXML supports the most part of ANSI 92 grammar like SELECT, INSERT, UPDATE, DELETE and CREATE statements.
An SQL query must meet the following conditions:
It must corespond to the supported SQL specification.
A column using an SQL reserved word as a name or containing spaces and other delimiters must be quoted in double quotes in a query. For example, SELECT "Date", "My integer-column" FROM "test.xml"
To use single quotes (') and backslashes (\) in a string constant you should use an escape character (\). For example: 'abcd\'ef\\g'
SELECT * FROM employees WHERE title = 'Java programmer' ORDER BY last_name
SELECT name FROM salesreps WHERE ( rep_office IN ( 22, 11, 12 ) ) OR (
manager IS NULL AND hire_date >= to_date ( '01-05-2002','dd-MM-yyyy' ) OR ( sales > quota AND
NOT sales > 600000.0 )
SELECT city, target, sales FROM "offices.xml" WHERE region = 'Eastern' AND sales > target ORDER BY citySELECT * FROM prices ps JOIN regions regs ON ps.regionid = regs.id JOIN products prod ON prod.prodid = ps.prodid |
You can use the standard SQL operations INSERT, UPDATE, DELETE as well to add, update and delete records in tables.
Examples:
INSERT INTO employees (firstname, lastname, title, id, hiredate) VALUES('John', 'Doe', 'Web admin', 3, to_date('07:02:2007','dd:MM:yyyy'))DELETE FROM employees WHERE lastname LIKE 'Henry%'UPDATE "c:/xmlfiles/customers.xml" SET credit_limit = 50000.00 WHERE company = 'Acme Mfg.' |
Also, the driver supports a number of additional commands for processing XML files:
To save a table into an XML file with specific parameters, use the command SAVE TABLE <table_name> [AS <file_name>] [USING SPECIFICATION <table_spec>], where:
file_name - absolute or relative path to the XML file.
table_spec - table specification in the schema.
To save the result of an SQL query execution into an XML file, use the command SAVE ( sql_query ) AS <file_name> USING SPECIFICATION <table_spec>, where:
file_name - absolute or relative path to the XML file.
table_spec - table specification in the schema.
To create a new table (XML file), use the command CREATE TABLE <table_name> VALUES (value [,...]) or CREATE TABLE <table_name> USING SPECIFICATION.
Examples:
SAVE TABLE employees // saves the table "employees" by using default file and columns settings of the corresponding specification "employees"SAVE TABLE employees AS "c:/copy/employees_copy.xml" // saves "employees" to another file "employees_copy.xml" by using column settings of the specification "employees"SAVE TABLE employees AS "employees_html_table.html" USING SPECIFICATION employees_html_table // saves "employees" to another file "employees_html_table.html" by using column settings of the specification "employees_html_table"SAVE (SELECT * employees WHERE title LIKE '%programmer') AS "employees_programmers.xml" USING SPECIFICATION employees // saves the SQL query result to the file "employees_programmers.xml" by using column settings of the specification "employees"CREATE TABLE html_table (td1 string, td2 string, td3 int) // creates the table "html_table" with three columns by using column settings of the corresponding specification "html_table". CREATE TABLE employees USING SPECIFICATION // creates the table "employees" by using column settings of the corresponding specification "employees" |
For more information about the supported SQL syntax, please see the SQL specification and examples of use containing in the StelsXML package.
Connection Example
This complete example shows how the driver is used. You can download this example with the driver package here.
The driver connection code:
|
import java.sql.*;
public class DriverTest {
public static void main(String[] args) { try { // load the driver into memory Class.forName("jstels.jdbc.xml.XMLDriver");
// create a connection. The first command line parameter is assumed to // be the file path to the schema file Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0]);
// create a Statement object to execute the query with Statement stmt = conn.createStatement();
// execute a query ResultSet rs = stmt.executeQuery("SELECT * FROM employees");
// read data and put it to the console for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) { System.out.print(rs.getMetaData().getColumnName(j) + "\t"); } System.out.println();
while (rs.next()) { for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) { System.out.print(rs.getObject(j) + "\t"); } System.out.println(); }
// close the objects rs.close(); stmt.close(); conn.close(); } catch (Exception e) { e.printStackTrace(); } } } |
The specification for the table "employees" used in the SQL query above:
|
<?xml version="1.0" encoding="UTF-8"?> <schema> <table name="employees" file ="employees.xml" path="/employees/employee" dateFormat="dd-MM-yyyy"> <!-- In this example the file "employees.xml" is located in the same directory where the schema is--><!-- The attribute "dateFormat" sets a local date format used in the specified XML file--><column name="documentname" type="string" path="/employees/document_name"/> <column name="firstname" type="string" path="first_name"/> <column name="lastname" type="string" path="last_name"/> <column name="title" type="string" path="title"/> <column name="id" type="integer" path="@id"/> <column name="age" type="integer" path="age"/> <column name="hiredate" type="datetime" path="hire_date"/> </table> </schema> |
The XML file "employees.xml":
|
<?xml version="1.0" encoding="UTF-8"?> <employees> <document_name>Employees doc</document_name> <employee id="1"> <first_name>Bill</first_name> <last_name>Adams</last_name> <age>25</age> <hire_date>12-06-1995</hire_date> <title>Java programmer</title> </employee> <employee id="2"> <first_name>Mary</first_name> <last_name>Jones</last_name> <age>32</age> <hire_date>22-09-2001</hire_date> <title>Sales manager</title> </employee> <employee id="3"> <first_name>Dan</first_name> <last_name>Roberts</last_name> <age>30</age> <hire_date>01-11-1999</hire_date> <title>Project manager</title> </employee> <employee id="4"> <first_name>Paul</first_name> <last_name>Crauz</last_name> <age>24</age> <hire_date>02-05-2002</hire_date> <title>C++ programmer</title> </employee> </employees> |
The result of the test:
|
DOCUMENTNAME FIRSTNAME LASTNAME TITLE ID AGE HIREDATE Employees doc Bill Adams Java programmer 1 25 1995-06-12 00:00:00.0 Employees doc Mary Jones Sales manager 2 32 2001-09-22 00:00:00.0 Employees doc Dan Roberts Project manager 3 30 1999-11-01 00:00:00.0 Employees doc Paul Crauz C++ programmer 4 24 2002-05-02 00:00:00.0 |
By default, XML parsing is namespace aware in the driver. You can change this with the namespaceAware property.
In order to process XML documents that use namespaces, you must specify each namespace using the driver property namespaces. The value format for this property is the following:
prefix1:namespace1|prefix2:namespace2| ...
where namespace is a URI identifier of the namespace used in a XML document and prefix is a prefix identifier for use in XPath expressions in the schema file. The prefix may be arbitrary.
Example:
The XML file "employees.xml" that uses the namespace "http://www.example.com/employees" and prefixed XML elements:
|
<?xml version="1.0" encoding="UTF-8"?>
<empl:employees xmlns:empl="http://www.example.com/employees"> <empl:first_name>Bill</empl:first_name> <empl:last_name>Adams</empl:last_name> <empl:age>25</empl:age> <empl:hire_date>12-06-1995</empl:hire_date> <empl:title>Java programmer</empl:title> </empl:employee> <empl:employee id="2"> <empl:first_name>Mary</empl:first_name> <empl:last_name>Jones</empl:last_name> <empl:age>32</empl:age> <empl:hire_date>22-09-2001</empl:hire_date> <empl:title>Sales manager</empl:title> </empl:employee> </empl:employees> |
The table specification for the "employees.xml":
|
<?xml version="1.0" encoding="UTF-8"?> <schema> <table name="employees" file="employees.xml" path="/empl:employees/empl:employee" dateFormat="dd-MM-yyyy" namespaces="empl:http://www.example.com/employees"> <!-- Note that in this case all XPath expressions must use the prefix 'empl:' --><column name="firstname" type="string" path="empl:first_name"/> <column name="lastname" type="string" path="empl:last_name"/> <column name="title" type="string" path="empl:title"/> <column name="id" type="integer" path="@id"/> <column name="age" type="integer" path="empl:age"/> <column name="hiredate" type="datetime" path="empl:hire_date"/> </table> </schema> |
The following XPath expressions are supported:
self::node(), .
root/child::foo, root/foo
root/descendant::foo
descendant-or-self::node()/foo,
//foo
namespace::foo
attribute::foo, @foo
foo[1=position()], foo[1]
string(foo)
concat(foo,bar)
starts-with('foo','f')
contains('foo','fo')
substring-before('foo','oo')
substring-after('foo','f')
substring('foo',2,1)
string-length(foo)
normalize-space(foo)
translate('foo','f','F')
boolean(foo)
not(true())
true()
false()
number(foo)
floor(1.1)
ceiling(1.9)
round(1.5)
lang('en')
true):
foo/text()[last()]
local-name(.)
namespace-uri(.)
name(.)
foo[single]
foo[mul][tiple]
foo[nes[ted]]
*
text()
node()
/
//
[]
.
@
:
::
* (multiply)
+
-
=
!=
<
<=
>
>=
and
or
mod
div
* (any
node)
$
Not supported:
parent::node(), .. root/ancestor::foo
ancestor-or-self::foo
following::foo
following-sibling::foo
preceding::foo
preceding-sibling::foo</UL<
li>
comment
processing-instruction
() (node
sets grouping)
| (based
on node sets)
There are two main modes the driver can run in: the mode with data caching and the mode with data swapping. Running in the first mode (default mode), the driver caches XML files in the RAM making it possible to achieve maximum performance while processing XML. Keep in mind that Java Virtual Machine must have enough free memory allotted for storing large tables (use -Xms and -Xmx JVM options). You can also use the DROP TABLE <table name> FROM CACHE command to force tables to be removed from the cache.
The second mode is recommended for processing large XML files (>100 MB). To use this mode, set the driver property caching to false. There are also some properties for configuring this mode:
readOnlySubMode is used to set an additional submode. The submode readOnlySubMode = true is optimized for using SELECT queries only, it doesn't support write operations. The submode readOnlySubMode = false supports all SQL operations. (by default this parameter is true).
tempPath - directory where temporary files will be created (by default it is a OS temporary directory, specified by JVM environment variable "java.io.tmpdir").
maxRecordsInMemoryPerTable is used to define how many records should be saved in the RAM for each table (by default - 5000 records).
maxSwapStringSize - default size for string columns in swap files (by default - 50 characters).
Example:
Properties props = new java.util.Properties();
props.setProperty("caching",
"false"); // switch to the
swapping mode
props.setProperty("maxSwapStringSize", "60");
Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0], props); |
To reduce the swap size and increase the overall performance in the swapping mode, we recommend to set the 'size' attribute for each string column in the schema file. If this attribute is not specified, the column size by default is equals to the value of the maxSwapStringSize property.
Example:
|
<?xml version="1.0" encoding="UTF-8"?> <schema> <table name="employees" file="employees.xml" path="/employees/employee" dateFormat="dd-MM-yyyy"> <column name="firstname" type="string" path="first_name" size="10"/> <column name="lastname" type="string" path="last_name" size="15"/> <column name="title" type="string" path="title" size="20"/> <column name="id" type="integer" path="@id"/> <column name="age" type="integer" path="age"/> <column name="hiredate" type="datetime" path="hire_date"/> </table> </schema> |
You can use your own SQL functions in the driver. To use this feature, you should do the following:
1)
Create a
static method that will act as an SQL function
Mind that:
The method must have the public modifier
The number of arguments in the method is arbitrary, but they must belong to one of the java classes specified in the data types table
It is advisable to process the null values for arguments in the method code
The method can return the null value as well
Example:
|
package my_pack; public class MyFuncs{ // user-defined SQL function that formats the given argument into a date/time string with the specified format
public static
String format_date(java.util.Date d, String format) {
if (d ==
null || format == null)
return
sdf.format(d); |
2) Register the function using the static method jstels.jdbc.xml.XMLDriver.registerFunction (String funcName, String handler). Where funcName argument is the name by which this method will be called in SQL queries and handler argument is the fully-qualified method name.
Example:
|
jstels.jdbc.xml.XMLDriver.registerFunction ("format_date", "my_pack.MyFuncs.format_date") |
Also, you can use the driver property function:<my_func>.
Example:
Properties props = new java.util.Properties(); props.put("function:formate_date","my_pack.MyFuncs.format_date"); ... Connection conn = DriverManager.getConnection("jdbc:jstels:xml:" + args[0], props);
// or append this property to the URL
Connection conn2 = DriverManager.getConnection("jdbc:jstels:xml:" + args[0]
+ "?function:formate_date=my_pack.MyFuncs.format_date"); |
3) Call the function in an SQL query
Example:
|
Statement st = connection.createStatement(); st.execute( "SELECT format_date(date_column, 'yyyy-MM-dd') FROM test" ); |