Using hash-indexes for increasing the performance
Add
the file csvdriver.jar to your classpath or extract the jar file in the directory of the
application.
|
Description |
Classes |
|
Driver class (JDBC API v1.0) |
jstels.jdbc.csv.CsvDriver |
|
Data Source class (JDBC API v2.0) |
jstels.jdbc.csv.CsvDataSource |
|
Connection Pool Data Source class (JDBC API v2.0) |
jstels.jdbc.csv.CsvConnectionPoolDataSource |
The connection URL is jdbc:jstels:csv:csvdir, where csvdir may be the following:
absolute or relative path to the local directory containing text files, e.g.:
|
jdbc:jstels:csv:c:/mydir/csvfiles jdbc:jstels:csv:csvfiles |
path to the directory entry within the zip (jar) file, e.g.:
|
jdbc:jstels:csv:zip://c:/myArchive.zip/csvfiles jdbc:jstels:csv:zip://myApp.jar/csvfiles |
FTP URL to the FTP-server directory (syntax: ftp://user:password@hostname[:port]/[dirpath/]), e.g.:
|
jdbc:jstels:csv:ftp://login:password@somesite.com:21/csvfiles |
HTTP URL to the web-server directory, e.g.:
|
jdbc:jstels:csv:http://www.somesite.com/csvfiles |
HTTP URL to the dynamic server page (JSP, ASP, PHP, CGI, etc) which returns a CSV file as output. For more information please see useWebParam driver property.
|
jdbc:jstels:csv:http://www.somesite.com/out.jsp |
The
driver supports a number of parameters that change default behavior of the driver.
These
properties are the following:
charset is used to specify a different than default charset encoding of input file (Default is the JVM default charset)
commentLine is used to specify a string denoting comment line (By default is not-defined)
defaultColumnType is used to specify a default data type for columns in a CSV file (Default is "Varchar").
dateFormat is used to specify a sequence of date/time formats separated by the '|' character, e.g: "dd.MM.yy | dd.MM | dd". (Default is "yyyy-MM-dd HH:mm:ss.SSS | yyyy-MM-dd HH:mm:ss | yyyy-MM-dd | HH:mm:ss.SSS | HH:mm:ss"). For more details about date/time format used in the driver please see chapter "time format syntax" in the SimpleDateFormat class documentation.
fileExtension is used to specify a different file extension (Default is ".txt")
If extension set to ".txt" then both "myTable.txt" and myTable are valid.
Note: You should stick to one of these table naming types in the SQL queries, i.e. either "myTable.txt" or myTable.
rowDelimiter is used to specify a custom row delimiter for a CSV file. (Default is a line separator specified by JVM environment variable "line.separator", e.g. for Windows - "\r\n", for Linux - "\r")
schema is used to specify a path to the schema file. It can be absolute or relative to a CSV directory path, e.g.: "c:/schemas/schema1.xml" or "schemas/schema2.xml". (Default is "schema.xml")
separator
is
used to specify a different column separator (Default is '\t' (tab))
suppressHeaders is used to specify if the first line contains column header information (Default is false; column header are on first line).
Advanced Properties
escapeEOLInQuotes is used to protect default line separators within quotes. This parameter should not be used, when the rowDelimiter parameter is set to a custom value. (Default is false).
logPath is used to set a file path to the log file.
emptyStringAsNull. If 'emptyStringAsNull' is set to 'true', empty strings are treated as Null values. (By default is true).
nullString is a string value that is treated as Null value. Not case sensitive. (By default is "NULL").
paddingChar - padding char for fixed-length files. (By default is space).
trimBlanks. If 'trimBlanks' is set to 'true', the driver trims leading and trailing spaces for string values when reading a text file. (By default is true).
useWebParam is used to specify the name of the web parameter that will be used to transfer a CSV file name in the dynamic server page. For instance, if you specify the following value: 'tablename', the driver will access server page using the following HTTP URL:
http://www.site.com/out.jsp?tablename=sometable
(where 'sometable' is the table specified in the SQL query or the schema file)
Note: If you specify driver properties directly in the driver URL and
server page also has its own parameters, you should separate them with '??':
jdbc:jstels:csv:http://www.site.com/out.jsp?param1=value1??useWebParam=tablename&suppressHeaders=true
quoteString is used to
enable/disable writing of double quotes for string values while inserting and
updating records in a CSV file (Default is true).
This
following example code shows how these properties are used:
Properties props = new java.util.Properties();
props.put("separator","|"); // separator is a bar props.put("suppressHeaders","false"); // column headers are on the first line props.put("fileExtension",".txt"); // file extension is .txt props.put("charset","ISO-8859-2"); // file encoding is "ISO-8859-2" props.put("commentLine","--"); // string denoting comment line is "--" // date/time format props.put("dateFormat","yyyy-MM-dd HH:mm | dd/MM/yyyy");
Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0],props); |
You can also use jstels.jdbc.csv.CsvDataSource class:
CsvDataSource csvDS = new CsvDataSource();
csvDS.setPath("c:/csvfiles"); // path to the CSV directory csvDS.setSeparator("|"); // separator is a bar csvDS.setSuppressHeaders(false); // column headers are on the first line csvDS.setFileExtension(".txt"); // file extension is .txt
Connection conn = csvDS.getConnection(); |
Besides, the driver allows to append the properties to the URL like this:
Connection conn =
DriverManager.getConnection(
"jdbc:jstels:csv:path?suppressHeaders=true&dateFormat=yyyy-MM-dd HH:mm | dd/MM/yyyy&caching=false"); |
<schema>
<table
name = "my_table1.txt">
<column name = "my_field1"
type = "Integer"/>
<!--
this tag assigns the SQL data type to the column by its name in the text file-->
<column pos = "2" type = "Integer"/>
<!--
this tag assigns the SQL data type to the column by its position in the text
file-->
<column name = "my_field3" pos = "3" type = "Integer"/>
<!--
if you do not use the column header in the text file, you should set column names by using
both the 'pos' and the 'name' attributes-->
</table>
<!--
you
can also use file templates if your files have the same format-->
<table
name = "*.csv">
<!-- all files with the template "*.csv" --><!-- the wildcard '*' denotes any string of zero or more characters -->
<column name = "col1"
type = "Varchar"/>
<column name = "col2"
type = "Integer"/>
</table>
<table
name = "file????.*">
<!-- all files with the template "file????.*" --><!-- the wildcard '?' denotes any single character --> <column name = "id"
type = "Integer"/>
<column name = "descr"
type = "Varchar"/>
</table>
... ... ...
</schema> |
<schema>
<table
name = "my_table1.txt"
charset = "ISO-8859-2"
suppressHeaders =
"true"
commentLine = "--"
separator = ","
dateFormat = "dd/MM/yyyy">
<column name = "my_field1"
pos = "1" type = "Integer"/>
<column name = "my_field2" pos = "2" type = "Integer"/> <column name = "my_field3"
pos = "3" type = "Date"/>
</table>
...
... ...
</schema> |
Column names specified in the schema don't override column names in the header of the text file ('suppressHeaders' is false).
If you do not use the column header in the text file ('suppressHeaders' is true), you should set column names by using both the 'pos' and the 'name' attributes
Local driver properties specified in the schema override global properties.
|
Data Type |
JDBC Corresponding Type (java.sql.Types.*) |
Java class used in StelsCSV |
|
Int, Integer, Tinyint, Smallint, SHORT |
java.sql.Types.INTEGER |
java.lang.Integer |
|
Long, Bigint |
java.sql.Types.BIGINT |
java.lang.Long |
|
Float, Real |
java.sql.Types.FLOAT |
java.lang.Float |
|
Double, Decimal, Numeric |
java.sql.Types.DOUBLE |
java.lang.Double |
|
String, Char, Varchar |
java.sql.Types.VARCHAR |
java.lang.String |
|
Datetime, Date, Time, Timestamp |
java.sql.Types.TIMESTAMP |
java.util.Date |
An SQL query must meet the following conditions:
It must corespond to the SQL syntax supported by the driver. For detailed information about supported SQL syntax please see the specification here.
À column using an SQL reserved word as a name or containing spaces and other delimiters must be quoted in double quotes in a query. For example, SELECT "Date", "My integer-column" FROM "test.txt"
To use single quotes (') and backslashes (\) in a string constant you should use an escape character (\). For example: 'abcd\'ef\\g'
SELECT
SUM(a) AS col1, MAX(b) /
MAX(c) AS col2
FROM "test.txt" GROUP
BY a HAVING AVG(a) >
30
SELECT name FROM "salesreps.txt" WHERE ( rep_office IN ( 22, 11, 12 ) ) OR (
manager IS NULL AND hire_date >= to_date ( '01-05-2002','dd-MM-yyyy' ) OR ( sales > quota AND
NOT sales > 600000.0 )
SELECT city, target, sales FROM "offices.txt" WHERE region = 'Eastern' AND sales > target ORDER BY citySELECT * FROM "prices.txt" ps JOIN regions regs ON ps.regionid = regs.id JOIN "products.txt" prod ON prod.prodid = ps.prodidINSERT INTO "salesreps.txt" (name, age, empl_num, sales, title) VALUES ('Henry Smith', 35, 111, NULL, 'Sales Mgr')DELETE FROM "salesreps.txt" WHERE NAME LIKE 'Henry%'UPDATE "customers.txt" SET credit_limit = 50000.00 WHERE company = 'Acme Mfg.' |
Connection Example
This example code shows how the driver is used. You can download it here.
import java.sql.*;
public class DriverTest { public static void main(String[] args) { try { // load the driver into memory Class.forName("jstels.jdbc.csv.CsvDriver");
// create a connection. The first command line parameter is assumed to // be the directory in which the .csv files are held Connection conn = DriverManager.getConnection("jdbc:jstels:csv:" + args[0] );
// create a Statement object to execute the query with Statement stmt = conn.createStatement();
// execute a query ResultSet rs = stmt.executeQuery("SELECT * FROM \"test.txt\"");
// read the data and put it to the console for (int j = 1; j <= rs.getMetaData().getColumnCount(); j++) { System.out.print(rs.getMetaData().getColumnName(j) + "\t"); } System.out.println();
while (rs.next()) { for(int j=1; j <= rs.getMetaData().getColumnCount(); j++){ System.out.print(rs.getObject(j)+ "\t"); } System.out.println(); }
// close the objects rs.close(); stmt.close(); conn.close(); } catch(Exception e) { |