possible data format ... Re: Re: dbms3.r 01
[1/24] from: petr:krenzelok:trz:cz at: 15-Jan-2002 6:56
Hi once again,
I would like to suggest more complex data format, but then it really depends upon
what you want to achieve. Let's talk 1 db = 1 file only.
1 ["Petr" "Krenzelok" [petr--krenzelok--trz--cz] 29] "R"
2 ["Someone" "Else" [someone--else--com] 18] "D"
Some rules:
- file would be organised in 1 rec = 1 line, so read/lines could be eventually
used for in-memory mappings ...
- new records would be write/appended to the end
- record is provided with the status field - important one - by default deletion
of record doesn't physically delete it from the file = speed = ability of easily
recovering ('recall function in XBase land) deleted data. If you want to
physically delete such records, use 'patch maintanance function, which could be
performed once per some time period (it would also reassing record numbers, to
remove holes in rec-no sequence) ... We could also use such field for locking
mechanism ....
- using record numbers is good for indices, you easily just keep record numbers
and once creating some grid-view, you filter them out. But I haven't measured yet,
how long it would take to do some e.g. 1K of Rebol 'select-s. Of course we can't
probably use such aproach, untill open/seek is available ...
- what do we lack is proper and real grid. Rebol list view is just a toy for some
hundreds of records. For larger amount of records to browse we need grid with
dynamic caching ...(although not so necessary in the beginning of the project)
Few notes:
- love ir or hate it - my description is not the demagogy, it can be easily put in
trashcan :-)
- it is also good for row/column oriented design, although I can imagine, that
record data block could be of variable size. What do I miss (and Carl once told it
would be cool :-), is the ability of parse function to work on opened/uncached
file (e.g. look at Netscape/Mozilla stored mailboxes - would be handy)
... well, that's few of my ideas in the early morning :-)
-pekr-
[2/24] from: joel:neely:fedex at: 15-Jan-2002 5:56
Hi, Petr, Gabriele, et al,
Petr Krenzelok wrote:
> Hi once again,
> I would like to suggest more complex data format, but then it
<<quoted lines omitted: 9>>
> by default deletion of record doesn't physically delete it from
> the file ...
This certainly fits the KISS mandate.
Let me mention one slight variation that I've used:
- new AND UPDATED AND DELETED records are appended to the end of
the file
which has two consequences,
1) assuming that write/append is significantly faster than reading
and rewriting the entire file, it minimizes the time for any
revisions, and
2) obtaining (the current state of) a record requires either
sweeping the entire file or reading backward for the last
occurrence of the record'd unique ID.
I've used this in situations where the normal read is, in fact,
a query looking for all records (or all records meeting some set
of criteria), which means that sweeping the entire file is not
a mjor penalty.
-jn-
--
; sub REBOL {}; sub head ($) {@_[0]}
REBOL []
# despam: func [e] [replace replace/all e ":" "." "#" "@"]
; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"}
print head reverse despam "moc:xedef#yleen:leoj" ;
[3/24] from: rgaither:triad:rr at: 15-Jan-2002 8:43
Hi Joel, Petr, All,
>> I would like to suggest more complex data format, but then it
>> really depends upon what you want to achieve. Let's talk
>> 1 db = 1 file only.
In reviewing some design options I am having to rethink the
desire for a 1 file solution. With a text db it just doesn't seem
the best option.
>> 1 ["Petr" "Krenzelok" [petr--krenzelok--trz--cz] 29] "R"
>> 2 ["Someone" "Else" [someone--else--com] 18] "D"
Nice. I didn't think of the status option.
>> Some rules:
>> - file would be organised in 1 rec = 1 line, so read/lines could
>> be eventually used for in-memory mappings ...
This would be nice but I'm not sure it is reasonable for records with
a large text block. One of the things I'd like to avoid is imposing limits
on record size so something like a webpage could be a record if
desired.
>> - new records would be write/appended to the end
Yes, this seems best in this illustration.
>> - record is provided with the status field - important one -
>> by default deletion of record doesn't physically delete it from
>> the file ...
Sounds good also.
>This certainly fits the KISS mandate.
>
>Let me mention one slight variation that I've used:
>
>- new AND UPDATED AND DELETED records are appended to the end of
> the file
This is something I have considered as well. Another option would be
to have a transaction file or files that either contain whole record changes
or just "individual operations". This audit/log could then be applied via
a utility to update the main data file when desired. Updating the main
file would improve the read operations and bring the db back into an
easy to view form.
>which has two consequences,
>1) assuming that write/append is significantly faster than reading
<<quoted lines omitted: 3>>
> sweeping the entire file or reading backward for the last
> occurrence of the record'd unique ID.
All of these issues show the trade offs in the different operations
you need to do on the database. Read and search performance
versus update and write operations and so on. They conflict quite
a bit with the organization I would pick to keep the data in a single
file with visually "nice" organization. :-(
I will throw out a summary of format options I worked up last
night in another post.
Thanks, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[4/24] from: rgaither:triad:rr at: 15-Jan-2002 9:22
Hi All,
Here is my summary of file organization options for
a text db.
File Options -
1. Single file
Pro - Very easy to manage and relocate
Pro - Simple read logic
Con - Poor write performance
Con - Hard to read data mixed with schema
Con - Limited to small record counts
2. Schema, Data, and Index file
Pro - Provides clean content by purpose
Pro - Supports larger record counts
Pro - Still reasonable to manage and relocate
Con - Poor write performance
Con - Still somewhat limited in record counts
Con - Multiple file synchronization issues
3. Schema file, Data and Index files for each table
Pro - Provides clean content by purpose
Pro - Supports larger record counts
Pro - Moderate write performance
Pro - Moderate read complexity
Con - DB is made up of many files
Con - Multiple file synchronization issues
Variations -
1. Any of the above, changed records appended at end of file
Same as Joel's suggestion.
Pro - Improves write performance
Con - Complicates read logic and performance
Con - Harder to manually view data
1. Any of the above, changed records in a log file
Like Joel's suggestion but stored in a dedicated file.
Pro - Improves write performance
Pro - Provides place to implement limited transaction support
Con - Complicates read logic and performance
Con - Harder to manually view data
3. Any of the above, individual data operations in a dedicated file
Like the append changed records options but just recording
the action and new data. Field levels changes for updates
or record level changes for add and deletes.
Pro - Improves write performance even more
Pro - Provides place to implement limited transaction support
Con - Complicates read logic and performance
Con - Harder to manually view data
FWIW, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[5/24] from: cribbsj:oakwood at: 15-Jan-2002 10:06
Rod Gaither wrote:
>Hi All,
>Here is my summary of file organization options for
<<quoted lines omitted: 44>>
>Oak Ridge, NC - USA
>[rgaither--triad--rr--com]
Just FYI, but I contributed a script to the library (I think it is
called db.r) that provides some pretty simplistic db routines for
maintaining a single file db. It addresses some of the wishlist items
posted on this topic. The code is pretty ugly, but it is functional and
fairly speedy as long as the files don't get too big.
Also, during the development of the script, I corresponded with Tim
Johnson a lot. He was working on something similar for a client. He
had some great ideas and code that we were looking to incorporate in my
script, but I got sidetracked with other stuff.
Jamey Cribbs
[6/24] from: rgaither:triad:rr at: 15-Jan-2002 10:27
Hi Jamey,
>Just FYI, but I contributed a script to the library (I think it is
>called db.r) that provides some pretty simplistic db routines for
>maintaining a single file db. It addresses some of the wishlist items
>posted on this topic. The code is pretty ugly, but it is functional and
>fairly speedy as long as the files don't get too big.
I already have your db.r right next to Gabriele's dbms.r and am reviewing
both of them. :-)
>Also, during the development of the script, I corresponded with Tim
>Johnson a lot. He was working on something similar for a client. He
>had some great ideas and code that we were looking to incorporate in my
>script, but I got sidetracked with other stuff.
Thanks for the pointers!
Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[7/24] from: joel:neely:fedex at: 15-Jan-2002 9:37
HI, Rod,
Nice summaries!
Rod Gaither wrote:
> This would be nice but I'm not sure it is reasonable for records
> with a large text block. One of the things I'd like to avoid is
> imposing limits on record size so something like a webpage could
> be a record if desired.
>
I think the underlying issue is which data types (in the generic
sense of the word) should be supported. Many databases with which
I'm familiar treat "running text" (with large size potential) as a
different type than "character data field" (with some upper limit
on capacity). To avoid dependence on filesystem issues (assuming
we DO want to stay away from entanglement with any specific OS
features/limits) one strategy is to store string data up to a
certain size within the record; above that size, each such value
would be kept in a separate file, with the name of the file as a
string within the original record. Certainly not the fastest
option for some purposes, but it does allow such things as searches
on non-running-text fields (e.g. keys) to be done without the
overhead of reading/skipping the big chunks.
The same approach can be used for BLOb data.
> >
> >- new AND UPDATED AND DELETED records are appended to the end of
<<quoted lines omitted: 5>>
> when desired. Updating the main file would improve the read
> operations and bring the db back into an easy to view form.
Perhaps I should have been more explicit. I assumed the existence
of a utility, also alluded to in Petr's post, which would "pack"
the data back to canonical form (one physical record per logical
record). A trivial variation on that utility also allows for
viewing the data without actually rewriting the packed file to disk.
The "append all stuff" strategy essentially puts the log (audit trail)
within the file itself. I believe it's faster to do that with one
file (scanned as needed when un-packed) than with a separate main
and transaction files.
> All of these issues show the trade offs in the different operations
> you need to do on the database. Read and search performance
> versus update and write operations and so on. They conflict quite
> a bit with the organization I would pick to keep the data in a single
> file with visually "nice" organization. :-(
>
Simplicity imposes limits. I don't know how to define it except
with respect to intended uses.
The simplest format I know (from the point of view of writing code
to read the data) is fixed-field layout, where each data element is
right-blank-padded (if it is "string-like") or left-zero-padded (if
it is "number-like") to constant size across all records.
1013John Doe 1983221
3253Hermione Fibbershins 1984616
4506ThrockmortonWilberforce 1985703
5323Johannes Dingsda 1988445
7151Tuxedo Penguin 1995499
9598Bill Cat 1997525
The simplest format I know (from the point of view of a human trying
to look at the raw data is a forms-like presentation with explicit
labels for all data values:
Employee ID: 1013
First Name: John
Last Name: Doe
Year Hired: 1983
Office: 221
Employee ID: 3253
First Name: Hermione
Last Name: Fibbershins
Year Hired: 1984
Office: 616
Employee ID: 4506
First Name: Throckmorton
Last Name: Wilberforce
Year Hired: 1985
Office: 703
Employee ID: 5323
First Name: Johannes
Last Name: Dingsda
Year Hired: 1988
Office: 445
Employee ID: 7151
First Name: Tuxedo
Last Name: Penguin
Year Hired: 1995
Office: 499
Employee ID: 9598
First Name: Bill
Last Name: Cat
Year Hired: 1997
Office: 525
I have a strong motivation to make it *possible* for humans (e.g.,
me!) to read my data files since that's often useful for debugging
and troubleshooting. However most of the access is done by programs,
so I tend to make it "just enough" human readable and prefer to ease
the parsing burden on the program.
-jn-
[8/24] from: rgaither:triad:rr at: 15-Jan-2002 11:31
Hi Joel,
>Nice summaries!
Thanks! :-)
>> This would be nice but I'm not sure it is reasonable for records
>> with a large text block. One of the things I'd like to avoid is
<<quoted lines omitted: 14>>
>on non-running-text fields (e.g. keys) to be done without the
>overhead of reading/skipping the big chunks.
A good point. I might lean towards a single file for each kind of
BLOB (or should that be TLOB :-)). I still have a strong aversion to
creating lots of files to represent the DB. Or perhaps even better
is the option to have both - just a file reference or a large text
block collection field type as well.
>The same approach can be used for BLOb data.
Yes indeed.
>Perhaps I should have been more explicit. I assumed the existence
No, I got those parts as valid assumptions.
>The "append all stuff" strategy essentially puts the log (audit trail)
>within the file itself. I believe it's faster to do that with one
>file (scanned as needed when un-packed) than with a separate main
>and transaction files.
Part of my reason for the separate files was to vary the format as
needed. Also it allows the main file to be read only and assumed
static so it would not have to be altered if a transaction was not
applied. Don't know about the speed impacts though, I believe there
are lots of issues depending on how the db is used.
>> All of these issues show the trade offs in the different operations
>> you need to do on the database. Read and search performance
<<quoted lines omitted: 4>>
>Simplicity imposes limits. I don't know how to define it except
>with respect to intended uses.
That is the problem. The one size fits all kind of solution is very
hard to find, perhaps impossible if you want a "Good" fit. :-)
I keep comming back to wanting this to be binary. :-)
[snip good simple examples]
I do like Pekr's simple example as well and would only want to
include some table and column name information. Sort of a block
oriented REBOL version of a CSV file. I can live with the one line
record approach and related storage for the big text or binary objects
for the benefits it returns.
>I have a strong motivation to make it *possible* for humans (e.g.,
>me!) to read my data files since that's often useful for debugging
>and troubleshooting. However most of the access is done by programs,
>so I tend to make it "just enough" human readable and prefer to ease
>the parsing burden on the program.
I also like to have the format simple enough to read and even
create manually if possible. I know this is implying a limit on the
size and style that does not match Gabriele's requirements though
so some compromise is needed.
Thanks, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[9/24] from: petr:krenzelok:trz:cz at: 15-Jan-2002 19:05
Rod Gaither wrote:
>Hi Joel,
>>Nice summaries!
<<quoted lines omitted: 22>>
>BLOB (or should that be TLOB :-)). I still have a strong aversion to
>creating lots of files to represent the DB.
Why? :-) We really do use one file strategy for database table, one file
for index bag (multiple orders), one file for memo. I think that we are
talking lightweight db here anyway, so, why are we talking blobs here?
:-) In reality, I prefer each table in separate file, it is also somehow
safier to me and you can copy tables around your harddrive to process
them in external analatic tools. Once I use ODBC driver for e.g., it
treats directory containing multiple db, index, memo files as one real
database, where you can query its tables.
As for putting changed records at the end of file - I thought that
constant size fields/record would be enough? That way you could just
perform 'find on the file (seeking for record you are about to change),
and simply replace it. Longer text sizes should come to external memo
file, but first - I am duplicating XBase functionality here and 2) I can
feel that is something which you probably would like to avoid to happen
for "simple" Rebol dbms system :-)
.. so, btw - I am somehow lost - what do we want to achieve? Any
conclusions so far? :-) We can try to prepare description for several
scenarios provided, and continue to post their pros/cons and then decide
upon implementation. ...
-pekr-
[10/24] from: rgaither:triad:rr at: 15-Jan-2002 13:47
Hi Pekr,
>>A good point. I might lean towards a single file for each kind of
>>BLOB (or should that be TLOB :-)). I still have a strong aversion to
>>creating lots of files to represent the DB.
>>
>Why? :-) We really do use one file strategy for database table, one file
>for index bag (multiple orders), one file for memo. I think that we are
It is really only a preference. I am used to systems with 100s of tables
and everything included in 1 db file. :-) I remember the dBase way of
doing things as a pain, but admit in this case it probably is not worth
the effort for single file or even reduced file versions.
>talking lightweight db here anyway, so, why are we talking blobs here?
I think we need to talk about large text fields at least. I would love
this little db to manage my saved mail in a more usable format for
example. :-)
>:-) In reality, I prefer each table in separate file, it is also somehow
>safier to me and you can copy tables around your harddrive to process
>them in external analatic tools. Once I use ODBC driver for e.g., it
>treats directory containing multiple db, index, memo files as one real
>database, where you can query its tables.
Good points all.
>As for putting changed records at the end of file - I thought that
>constant size fields/record would be enough? That way you could just
>perform 'find on the file (seeking for record you are about to change),
>and simply replace it. Longer text sizes should come to external memo
I am not suggesting a fixed length record or field where this is possible.
>file, but first - I am duplicating XBase functionality here and 2) I can
>feel that is something which you probably would like to avoid to happen
>for "simple" Rebol dbms system :-)
Yes duplication with any rdbms is happening with this effort. It is part
of the problem though that all those products are external to REBOL
and thus limited in their own ways.
>.. so, btw - I am somehow lost - what do we want to achieve? Any
>conclusions so far? :-) We can try to prepare description for several
>scenarios provided, and continue to post their pros/cons and then decide
>upon implementation. ...
I apologise for this. :-) I want to make an effort to bring focus back to the
project before I distract the discussion any further.
I think the next step is to settle on a "starting" file format so we can
get back to the functional questions Gabriele proposed. :-)
What I (my opinions only) conclude so far, though review of some of
the existing implementations may change this -
1. Text files for persistent storage
2. REBOL Blocks for grouping values
3. Native REBOL representation for values
4. Use single base directory - no directory structure required
5. Use multiple (named) text files to map to tables, indexes and large text blocks
6. Keep at least the table data file simple, readable, and self contained
7. One record per line
8. Use some manner of file append to manage record changes
9. Delete record operation is only a marker until packed
10. Need utilities to pack db and provide canonical form
11. Every row has an automatic internal "row-id"
12. Manipulate records as blocks of values
13. Result sets are built as blocks of blocks of values
14. The in-memory format should not be constrained by the persistent format
The rest, and yes there were more good points only not quite as
well defined, can wait for the next round of design theory. :-)
Comments?
Samples?
Thanks, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[11/24] from: jason:cunliffe:verizon at: 15-Jan-2002 14:52
> >file, but first - I am duplicating XBase functionality here and 2) I can
> >feel that is something which you probably would like to avoid to happen
> >for "simple" Rebol dbms system :-)
fyi: here's another part of the puzzle to add.
I found this Xbase Rebol/View program on
http://rebolfrance.multimania.com/telechargement.html#applications
http://rebolfrance.multimania.com/projets/dbf/dbview.zip
François Jouen a développé un superbe explorateur de fichiers xbase
compatible avec les formats dBase III, FoxPro 2.5, dbase IV et FoxPro 3. La
version 1.3 authorise la visualisation du contenu des champs ainsi que de la
structure de la base de données. L'utilisateur peut également rechercher une
valeur parmi les informations stockées et exporter la table dans le format
de données Rebol.
..for non francophones I think that says:
François Jouen has devopelopped a supeb xbase file browser compatible with
dbaseIII, FoxPro 2.5, , dbase IV et FoxPro 3. Version 1.3 allows you to see
the contents of fields as well the structure of the database. The user can
also seearch for a value within the stored information and export the table
in a REBOL data format.
./Jason
[12/24] from: rotenca:telvia:it at: 16-Jan-2002 0:47
Hi Rod
> What I (my opinions only) conclude so far, though review of some of
> the existing implementations may change this -
<<quoted lines omitted: 3>>
> 4. Use single base directory - no directory structure required
> 5. Use multiple (named) text files to map to tables, indexes and large text
blocks
> 6. Keep at least the table data file simple, readable, and self contained
> 7. One record per line
<<quoted lines omitted: 8>>
> well defined, can wait for the next round of design theory. :-)
> Comments?
I substantially agree.
> Samples?
Now i feel myself in a relational maze. But from now, Gabriele has a week of
time to give us a complete bug-free program. Count down has started :-)
About all the rebol db programs, i wait a comparative review from you (another
count down:-)
Can't you say us the link to find the db which are not on rebol library?
> Thanks, Rod.
Thank you.
---
Ciao
Romano
[13/24] from: rgaither:triad:rr at: 15-Jan-2002 22:32
Hi Romano,
>Now i feel myself in a relational maze. But from now, Gabriele has a week of
>time to give us a complete bug-free program. Count down has started :-)
I sure am glad you picked Gabriele for that task! :-)
>About all the rebol db programs, i wait a comparative review from you (another
>count down:-)
Oops, I guess I spoke too soon. :-)
I will try and post a review by this weekend.
>Can't you say us the link to find the db which are not on rebol library?
I will work on this as well but can't give the ones from the books
as they are part of those products. I will check on the others though
and either point to them or post them to the list if short.
Thanks, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[14/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 6:24
> What I (my opinions only) conclude so far, though review of some of
> the existing implementations may change this -
<<quoted lines omitted: 12>>
> 13. Result sets are built as blocks of blocks of values
> 14. The in-memory format should not be constrained by the persistent format
Very good one - at least for me - starting point we could agree upon :-) Few questions,
comments, which are not clear for me yet:
- status markers could be used even for things as locking ....
a) now I am not sure if locking should happen on index level (dangerous if someone
will not use any index), or in database itself ...
b) if we have no db server and our app fails, record would stay locked forever, so we
would have to complicate things with some time-stamps or so ...
- I don't understand how you want to use file appends to manage changes to records.
What rec-number will be used for changed record? It should be the same one. If so, then
we need to introduce versioning (we use such aproach in one of our tables, although for
different purpose)
- record-numbers - I hope it is clear enough, that 'pack-ing the database will reassign
record numbers to remove holes caused by record deletion ...
- variable record size - a block. OK - how will we map it to any rebol
structure/object? I just hope that at least that number of columns are the same thru
all records or I can't imagine it in other way than duplicating column names in each
record, e.g. [name: "pekr" lastname: "krenzelok"], which is not good aproach to follow
imo?
So, anyone answers my questions? :-)
-pekr-
[15/24] from: greggirwin:mindspring at: 15-Jan-2002 23:59
Hi Pekr,
<< a) now I am not sure if locking should happen on index level (dangerous
if someone
will not use any index), or in database itself ...
b) if we have no db server and our app fails, record would stay locked
forever, so we
would have to complicate things with some time-stamps or so ...>>
If we don't have a central server script, there has to be some kind of lock
publishing mechanism. The easiest way, perhaps the only feasible way, is to
use a persistent marker as you suggest. In order to unlock a frozen lock you
need a manual unlock feature, or a timeout specified for each lock. If you
go the timeout route, you may want to implement some kind of time-extension
callback feature so you can be notified if a lock you hold is going to
expire, with the option to hold on to it. The manual unlock approach is
probably simpler, but not nearly as slick and not suitable for unattended
use.
<< - record-numbers - I hope it is clear enough, that 'pack-ing the database
will reassign
record numbers to remove holes caused by record deletion ... >>
I thought the intent was to assign a UUID to each record, which it would use
forever, so location was not important.
<< - variable record size - a block. OK - how will we map it to any rebol
structure/object? I just hope that at least that number of columns are the
same thru
all records or I can't imagine it in other way than duplicating column names
in each
record, e.g. [name: "pekr" lastname: "krenzelok"], which is not good aproach
to follow
imo? >>
I don't like wasting space any more than the next guy, but I *love*
self-describing data. Since we're targeting small to moderate size data
stores, I wouldn't be opposed to blocks of name/value pairs. If we load a
large block of records which share common words as their column names, REBOL
should handle that very efficiently, correct?
--Gregg
[16/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 9:48
Hi,
Gregg Irwin wrote:
> Hi Pekr,
> << a) now I am not sure if locking should happen on index level (dangerous
<<quoted lines omitted: 6>>
> publishing mechanism. The easiest way, perhaps the only feasible way, is to
> use a persistent marker as you suggest.
Maybe I don't think so anymore :-) I talked the problem with my colleagues here,
and Michael suggested, that any kind of locking, is done in memory. We found one
interesting document though, which describes several scenarios, even one with
lock field and time stamp ....
http://www.rebol.cz/~asko/xbase.txt - very cool description ....
> In order to unlock a frozen lock you
> need a manual unlock feature, or a timeout specified for each lock. If you
> go the timeout route, you may want to implement some kind of time-extension
> callback feature so you can be notified if a lock you hold is going to
> expire, with the option to hold on to it.
yes, it could work, although it would require e.g. your form to have some timer
implemented ...
> The manual unlock approach is
> probably simpler, but not nearly as slick and not suitable for unattended
> use.
>
uhmm, not this one probably :-)
> << - record-numbers - I hope it is clear enough, that 'pack-ing the database
> will reassign
> record numbers to remove holes caused by record deletion ... >>
>
> I thought the intent was to assign a UUID to each record, which it would use
> forever, so location was not important.
in XBase, it has its meaning. You can open dbase file without opening any index,
so browser sorts records according to their rec-no.
> << - variable record size - a block. OK - how will we map it to any rebol
> structure/object? I just hope that at least that number of columns are the
<<quoted lines omitted: 9>>
> large block of records which share common words as their column names, REBOL
> should handle that very efficiently, correct?
I think that it depends upon what your intention is - do you want to have
variable column records? I would not suggest it, as it is a little bit more
difficult for any kind of anylysing software, which simply counts something in a
loop ....
I would suggest:
rec-no ["various" 3 'or [more--datatypes]] ["D" time-stam-of-creation last-changed
[locked-time-stamp user-info whatever]]
above structure only shows aproach I would go with - first info, 'selecta-ble or
'find-able record-number, then record itself, followed with third block
containing some system info - navigation is pretty straightforward, and we could
use wrapper functions
get-lock-stamp: func [db rec-no][first last last next next find blk 5] , where 5
= rec-no :-)
-pekr-
[17/24] from: rebol665:ifrance at: 16-Jan-2002 10:48
Hi Jason
Your translation is excellent and shows how close english and french really
are. May I ask where you learned your french.
Patrick
ps. For Alexandre Dumas (author of the three musketeer) english was just
misspelt and mispronounced french.
----- Original Message -----
From: "Jason Cunliffe" <[jason--cunliffe--verizon--net]>
To: <[rebol-list--rebol--com]>
Sent: Tuesday, January 15, 2002 8:52 PM
Subject: [REBOL] Re: possible data format ... Re: Re: dbms3.r 01
> >file, but first - I am duplicating XBase functionality here and 2) I can
> >feel that is something which you probably would like to avoid to happen
> >for "simple" Rebol dbms system :-)
fyi: here's another part of the puzzle to add.
I found this Xbase Rebol/View program on
http://rebolfrance.multimania.com/telechargement.html#applications
http://rebolfrance.multimania.com/projets/dbf/dbview.zip
François Jouen a développé un superbe explorateur de fichiers xbase
compatible avec les formats dBase III, FoxPro 2.5, dbase IV et FoxPro 3. La
version 1.3 authorise la visualisation du contenu des champs ainsi que de la
structure de la base de données. L'utilisateur peut également rechercher une
valeur parmi les informations stockées et exporter la table dans le format
de données Rebol.
..for non francophones I think that says:
François Jouen has devopelopped a supeb xbase file browser compatible with
dbaseIII, FoxPro 2.5, , dbase IV et FoxPro 3. Version 1.3 allows you to see
the contents of fields as well the structure of the database. The user can
also seearch for a value within the stored information and export the table
in a REBOL data format.
./Jason
[18/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 11:37
> I would suggest:
> rec-no ["various" 3 'or [more--datatypes]] ["D" time-stam-of-creation last-changed
<<quoted lines omitted: 5>>
> get-lock-stamp: func [db rec-no][first last last next next find blk 5] , where 5
> = rec-no :-)
uhm, when I think about it - what a nonsense :-) record-num 5 can be part of data
block, so we can't use such simple mechanism. Maybe the best way is to really wait
for open/direct/seek mode and write proper driver skipping here and there in opened
file ... the question is - wait for how long? :-)
-pekr-
[19/24] from: rgaither:triad:rr at: 16-Jan-2002 8:53
Hi Pekr, Gregg,
>- status markers could be used even for things as locking ....
> a) now I am not sure if locking should happen on index level (dangerous if someone
>will not use any index), or in database itself ...
> b) if we have no db server and our app fails, record would stay locked forever, so
we
>would have to complicate things with some time-stamps or so ...
I don't think we are ready yet to worry about locking. I have not
forgotten it but want to see the first round of file format and function
designs before looking at it seriously. Just like I'm not sure what and
how to store the table schema information yet but know it needs to
be considered.
>- I don't understand how you want to use file appends to manage changes to records.
>What rec-number will be used for changed record? It should be the same one. If so, then
>we need to introduce versioning (we use such aproach in one of our tables, although
for
>different purpose)
Yes, it should use the same row-id. My rough cut at the table file has
outer blocks for some grouping. This makes for simple organization and
basic versioning just by group and position in the file.
>- record-numbers - I hope it is clear enough, that 'pack-ing the database will reassign
>record numbers to remove holes caused by record deletion ...
I was not clear on this and am not sure I agree. It will take some more
consideration working through some options to be sure but I am leaning
towards the row-id being an integer sequence value by table that stays
with the record for life. If we make it match the position in the table and
change with the pack operation then we must change all the references
to that value. It requires some more thought as it seems to me I am mixing
some internal position usage and primary key roles in what I am doing so
far. Perhaps it should only be an automatic primary key value that goes
with the record values not outside the block.
>- variable record size - a block. OK - how will we map it to any rebol
>structure/object? I just hope that at least that number of columns are the same thru
>all records or I can't imagine it in other way than duplicating column names in each
>record, e.g. [name: "pekr" lastname: "krenzelok"], which is not good aproach to follow
>imo?
By variable record size I just meant the fields are not fixed length. They
are exactly a block of values where each value is whatever size it needs
to be. I am not saying variable number of columns. The columns should
match a list of column names in order and number. While I like self describing
data I agree that I don't want name:value pairs repeating in each record.
Here is a sample file structure to consider and review -
REBOL [
; keep header info like table name, namespace, next row-id value
; better in a named value block like columns?
]
columns [col-name col-name col-name ...]
defaults [
col-name [now/date]
...
]
rows [
1 [col-value col-value col-value ...] "Status"
2 [col-value col-value col-value ...] "Status"
3 [col-value col-value col-value ...] "Status"
]
updates [
2 [col-value col-value col-value ...] "Status"
...
]
Please note - this is just a first cut at how a table data file
might look. I am putting it out to the list so it can be reviewed
and torn apart as needed. :-)
My questions on it so far -
1. How best to provide some header information?
2. I like the extra block grouping for flexibility but want some feedback?
3. The defauls block is an example of how the grouping supports extendability
4. The whole row-id/primary-key or both design issue?
5. What status values do we need?
6. Would a action/operation update style vs full records be better?
>So, anyone answers my questions? :-)
Yes, or at least the start of answers for some. :-)
Thanks, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[20/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 15:14
Rod Gaither wrote:
> Hi Pekr, Gregg,
> >- status markers could be used even for things as locking ....
<<quoted lines omitted: 7>>
> how to store the table schema information yet but know it needs to
> be considered.
OK
> >- I don't understand how you want to use file appends to manage changes to records.
> >What rec-number will be used for changed record? It should be the same one. If so,
then
<<quoted lines omitted: 11>>
> change with the pack operation then we must change all the references
> to that value.
Record numbers should be in no way related to data content. What relations are you talking
about? We relate tables thru some column or set of columns, not some internal database
stored information. Record numbers should serve for index values, where e.g. sorting
order
of name + last-name would store something like:
1 "Petr Krenzelok"
35 "Petr someone"
3 "Olaf xyz"
so that you can choose your index tag, and set filter for e.g. to see only all Petrs
- your
table-grid viewer driver will then fill in your grid directly selecting only record numbers
1 and 35 in above example ...
> Here is a sample file structure to consider and review -
> REBOL [
<<quoted lines omitted: 15>>
> ...
> ]
One thing I don't like so far - why to add changes to table/database itself? I think
that
more correct is one of following:
a) replace record ...
b) if you want to track changes, introduce some logging, but don't complicate database
with
some values and their updates/history .... just imo :-)
> Please note - this is just a first cut at how a table data file
> might look. I am putting it out to the list so it can be reviewed
> and torn apart as needed. :-)
>
> My questions on it so far -
>
> 1. How best to provide some header information?
a) as first block/row in a file
b) separate file, e.g. .def file, specifying index (or more of them), unique ids, field
names = record structure, or other things for maintanance
I prefer b), simply to have clear and containing-records-only data file ...
> 2. I like the extra block grouping for flexibility but want some feedback?
> 3. The defauls block is an example of how the grouping supports extendability
> 4. The whole row-id/primary-key or both design issue?
> 5. What status values do we need?
Deleted
, later maybe "locked" etc .... let's start with "deleted" one, or just let's
preserve such field for future usage :-)
> 6. Would a action/operation update style vs full records be better?
? Could you explain this one, please?
Thanks,
-pekr-
[21/24] from: rgaither:triad:rr at: 16-Jan-2002 10:40
Hi Pekr,
>Record numbers should be in no way related to data content. What relations are you talking
>about? We relate tables thru some column or set of columns, not some internal database
<<quoted lines omitted: 3>>
>35 "Petr someone"
>3 "Olaf xyz"
I am talking about internal db relationships like the indexes. I was starting
to confuse the row-id with my desire to always have a unique ID value that
is auto-generated as a value for each row of a table - to server as a primary
key. After thinking about it a bit though they have to be two separate things
and I'm not sure everyone will go along with a default ID column in a record
anyway. :-)
>so that you can choose your index tag, and set filter for e.g. to see only all Petrs
- your
>table-grid viewer driver will then fill in your grid directly selecting only record
numbers
>1 and 35 in above example ...
I don't know if I want to have to update all the index data every time we pack
the DB. That can be looked at later though when starting to work on how
the functions interact with the files.
>> Here is a sample file structure to consider and review -
>>
<<quoted lines omitted: 20>>
>more correct is one of following:
>a) replace record ...
I don't think you can in the text db. How are you planning on doing this
without having to rewrite the entire file for every change?
>b) if you want to track changes, introduce some logging, but don't complicate database
with
>some values and their updates/history .... just imo :-)
I agree, but I am not - see above. :-)
I am not sure I want the updates in the same file myself. That was a
point Joel brought up where I was thinking of a separate file. I am in
the middle on where the list of changes goes, I just know it has to be
at the end of some file so append can be used.
>> My questions on it so far -
>>
<<quoted lines omitted: 3>>
>names = record structure, or other things for maintanance
>I prefer b), simply to have clear and containing-records-only data file ...
I like (a) for some parts and (b) for the full schema information. The only
reason for (a) is to give included mapping between columns and data
and store real values - not schema - such as the next row-id with the
data.
>> 2. I like the extra block grouping for flexibility but want some feedback?
>> 3. The defauls block is an example of how the grouping supports extendability
>> 4. The whole row-id/primary-key or both design issue?
>> 5. What status values do we need?
>
>"Deleted", later maybe "locked" etc .... let's start with "deleted" one, or just let's
>preserve such field for future usage :-)
Ok.
>> 6. Would a action/operation update style vs full records be better?
>
>? Could you explain this one, please?
I have a problem with repeating the entire record in the update section
if the only thing that has changed is the value of one column. By having
a list of operations that could be at the field level or record level there is
finer granularity to the process. This might also be considered a low level
data manipulation language that gets applied to the table data to bring it
in back in sync during packing. It complicates things at the engine level
but should give better performance and more flexibility in activities.
Some examples:
Whole record -
2 ["Rod" "Gaither" [rgaither--triad--rr--com] ...] "U"
Action -
Update 2 first-name "Roderic"
Other examples -
Delete 3
Create ["Gabriele" "Santilli" [g--santilli--tiscalinet--it]] "N"
For transaction support we may need to consider putting these update
lists in there own file so we can manage multiple table changes as a
single transaction.
Thanks, Rod.
Rod Gaither
Oak Ridge, NC - USA
[rgaither--triad--rr--com]
[22/24] from: chalz:earthlink at: 16-Jan-2002 13:16
I know this is a sadistic request *bwahaha*, but is there any chance
someone could collect the data format/dbms/related posts, once the thread
quiets down, and sort of 'compress' it to a list of
desires/wants/standards/protocols/etc? I find this project fascinating, but
right now I don't have the time to wade through all of the mail (the sheer
amount in one day is blowing me away), and the thread keeps diverging ;)
And BTW, it's Petr, not Pekr. *snicker*
--Charles
[23/24] from: greggirwin:mindspring at: 16-Jan-2002 11:50
Hi Pekr,
<< Maybe I don't think so anymore :-) I talked the problem with my
colleagues here,
and Michael suggested, that any kind of locking, is done in memory. We found
one
interesting document though, which describes several scenarios, even one
with
lock field and time stamp ....>>
But for this to work, you need some kind of file locking mechanism (e.g.
SHARE under DOS), or shared memory, correct? I.e. if you have two separate
dbms scripts running against the same file, how do they communicate lock
information if REBOL doesn't give us the ability to "flag" locks against
specific file offsets?
<< http://www.rebol.cz/~asko/xbase.txt - very cool description .... >>
Great information!
<<RE: rec-num: ...in XBase, it has its meaning. You can open dbase file
without opening any index,
so browser sorts records according to their rec-no. >>
Right, if we have access to a block of records, whether as a complete file
on disk or as the result of a query, there would be an implied order to
them, but I see that as separate from a record identifier (its GUID).
<< I think that it depends upon what your intention is - do you want to have
variable column records? I would not suggest it, as it is a little bit more
difficult for any kind of anylysing software, which simply counts something
in a
loop .... >>
Agreed, do we consider a database to contain only homogeneous records, does
it maintain separate internal tables for different record types, or does it
do what some OODBMS' do and store a collection of heterogeneous records
which share a small common core allowing it to navigate and identify records
of various types. In that scenario, reading all the records in a "table" is
actually more like a query to find records by type (and a record may exist
in multiple "extents" when you consider an inheritance model).
--Gregg
[24/24] from: g:santilli:tiscalinet:it at: 17-Jan-2002 21:27
Hello Petr!
On 15-Gen-02, you wrote:
PK> For larger amount of records to browse we need grid with
PK> dynamic caching ...(although not so necessary in the
PK> beginning of the project)
I've included DB-CACHED-QUERY in mysql-wrapper.r just for that. I
use my own multicolumn list style for display; is still slow
because of iteration, so I think I will switch to a non-iterated
version in the future.
Regards,
Gabriele.
--
Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer
Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/
Notes
- Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted