r3wp [groups: 83 posts: 189283]
  • Home
  • Script library
  • AltME Archive
  • Mailing list
  • Articles Index
  • Site search
 

World: r3wp

[Parse] Discussion of PARSE dialect

BrianH
18-Dec-2011
[6057x4]
Yeah, blocks for cells are so far outside the data model of everything 
else that uses CSV files that TO-CSV was written to assume that you 
forgot to put an explicit translation to a string or binary in there 
(MOLD, FORM, TO-BINARY), or more likely that the block got in there 
by accident. Same goes for functions and a few other types.
As for that TO-ISO-DATE behavior, yes, it's a bug. Surprised I didn't 
know that you can't use /hour, /minute and /second on date! values 
with times in them in R2. It can be fixed by changing the date/hour 
to date/time/hour, etc. I'll update the script on REBOL.org.
Having to put an explicit conversion from blocks, parens, objects, 
maps, errors, function types, structs, routines and handles, reminds 
you that you would need to explicitly convert them back when you 
LOAD-CSV. Or more often, triggers valuable errors that tell you that 
unexpected data made it in to your output.
TO-ISO-DATE fixed on REBOL.org
Henrik
18-Dec-2011
[6061]
Thanks
GrahamC
18-Dec-2011
[6062x2]
dunno if it's faster but to left pad days and months, I add 100 to 
the value and then do a next, followed by a form ie. regarding you 
p0 function
eg. next form 100 + date/month
BrianH
18-Dec-2011
[6064]
It's worth timing. I'll try both, in R2 and R3.
GrahamC
19-Dec-2011
[6065]
and the outcome was?
BrianH
19-Dec-2011
[6066x2]
Twice the speed using your method :)
Updated on REBOL.org to use new method.
GrahamC
20-Dec-2011
[6068]
Yeah, generally math is faster than using logic.  And old Forth trick.
BrianH
20-Dec-2011
[6069]
Added a TO-CSV /with delimiter option, in case commas aren't your 
thing. It only specifies the field delimiter, not the record delimiter, 
since TO-CSV only makes CSV lines, not whole files.
Endo
20-Dec-2011
[6070]
I'm using it to prepare data to bulk insert into a SQL Server table 
using BCP command line tool.

I need to make some changes like /no-quote to not quote string values. 
Because there is no option in BCP to tell my data has quoted string 
values.
BrianH
20-Dec-2011
[6071]
Be careful, if you don't quote string values then the character set 
of your values can't include cr, lf or your delimiter. It requires 
so many changes that it would be more efficient to add new formatter 
functions to the associated FUNCT/with object, then duplicate the 
code in TO-CSV that calls the formatter. Like this:

to-csv: funct/with [
	"Convert a block of values to a CSV-formatted line in a string."
	data [block!] "Block of values"

 /with "Specify field delimiter (preferably char, or length of 1)"
	delimiter [char! string! binary!] {Default ","}
	; Empty delimiter, " or CR or LF may lead to corrupt data
	/no-quote "Don't quote values (limits the characters supported)"
] [
	output: make block! 2 * length? data
	delimiter: either with [to-string delimiter] [","]
	either no-quote [
		unless empty? data [append output format-field-nq first+ data]

  foreach x data [append append output delimiter format-field-nq :x]
	] [
		unless empty? data [append output format-field first+ data]
		foreach x data [append append output delimiter format-field :x]
	]
	to-string output
] [
	format-field: func [x [any-type!] /local qr] [

  ; Parse rule to put double-quotes around a string, escaping any inside

  qr: [return [insert {"} any [change {"} {""} | skip] insert {"}]]
		case [
			none? :x [""]
			any-string? :x [parse copy x qr]
			:x = #"^(22)" [{""""}]
			char? :x [ajoin [{"} x {"}]]
			money? :x [find/tail form x "$"]
			scalar? :x [form x]
			date? :x [to-iso-date x]

   any [any-word? :x binary? :x any-path? :x] [parse to-string :x qr]
			'else [cause-error 'script 'expect-set reduce [

    [any-string! any-word! any-path! binary! scalar! date!] type? :x
			]]
		]
	]
	format-field-nq: func [x [any-type!]] [
		case [
			none? :x [""]
			any-string? :x [x]
			money? :x [find/tail form x "$"]
			scalar? :x [form x]
			date? :x [to-iso-date x]
			any [any-word? :x binary? :x any-path? :x] [to-string :x]
			'else [cause-error 'script 'expect-set reduce [

    [any-string! any-word! any-path! binary! scalar! date!] type? :x
			]]
		]
	]
]


If you want to add error checking to make sure the data won't be 
corrupted, you'll have to pass in the delimiter to format-field-nq 
and trigger an error if it, cr or lf are found in the field data.
Henrik
20-Dec-2011
[6072]
Is this related to what you wrote above?

>> to-csv [34]
== {""""}
BrianH
20-Dec-2011
[6073x3]
Nope, that's a bug in the R2 version only. Change this:
			:x = #"^(22)" [{""""}]
to this:
			:x == #"^(22)" [{""""}]

Another incompatibility between R2 and R3 that I forgot :(
I'll update the script on REBOL.org.
Weirdly enough, = and =? return true in that case in R2, but only 
== returns false; false is what I would expect for =? at least.
Updated, Henrik.
Henrik
20-Dec-2011
[6076]
Thanks.
Endo
20-Dec-2011
[6077]
Thanks BrianH
BrianH
20-Dec-2011
[6078x2]
Note that that was a first-round mockup of the R3 version, Endo. 
If you want to make an R2 version, download the latest script and 
edit it similarly.
Have you looked into the native type formatting of bcp? It might 
be easier to make a more precise data file that way.
Endo
20-Dec-2011
[6080x2]
It uses a format file, it is very strict, but no chance to set a 
quote char for fields.
Native formats runs well if you export from one SQL server and import 
from other.
BrianH
20-Dec-2011
[6082]
I figure it might be worth it (for me at some point) to do some test 
exports in native format in order to reverse-engineer the format, 
then write some code to generate that format ourselves. I have to 
do a lot of work with SQL Server, so it seems inevitable that such 
a tool will be useful at some point, or at least the knowledge gained 
in the process of writing it.
Endo
20-Dec-2011
[6083x2]
The biggest problem would be the different datatypes for different 
versions of SQL Server, if there is no good documentation for the 
native format. But BCP does the job quite well. I CALL it when necessary 
and try to FIND if any error output. 

There is XML format files as well, easier to understand but no functional 
differencies betwenn non-XML format files.
I'm working with SQL Server for a long time, if anything I can help 
or test for you, feel free to ask if you need.
Endo
5-Jan-2012
[6085]
Any one knows how do I find rebolek's R2E2 - REBOL Regular Expressions 
Engine. 
This link is dead I think http://bolek.techno.cz/reb/regex.r

I saw it on http://www.rebol.org/documentation.r?script=regset.r
Rebolek
5-Jan-2012
[6086]
Endo, I will try to find newest version and let you know. But do 
not expect it to translate every regular expession.
Endo
5-Jan-2012
[6087:last]
Thank you, I don't need an exact regexp library, but would be nice 
to have some regexp functionality.