possible data format ... Re: Re: dbms3.r 01

[1/24] from: petr:krenzelok:trz:cz at: 15-Jan-2002 6:56

Hi once again, I would like to suggest more complex data format, but then it really depends upon what you want to achieve. Let's talk 1 db = 1 file only. 1 ["Petr" "Krenzelok" [petr--krenzelok--trz--cz] 29] "R" 2 ["Someone" "Else" [someone--else--com] 18] "D" Some rules: - file would be organised in 1 rec = 1 line, so read/lines could be eventually used for in-memory mappings ... - new records would be write/appended to the end - record is provided with the status field - important one - by default deletion of record doesn't physically delete it from the file = speed = ability of easily recovering ('recall function in XBase land) deleted data. If you want to physically delete such records, use 'patch maintanance function, which could be performed once per some time period (it would also reassing record numbers, to remove holes in rec-no sequence) ... We could also use such field for locking mechanism .... - using record numbers is good for indices, you easily just keep record numbers and once creating some grid-view, you filter them out. But I haven't measured yet, how long it would take to do some e.g. 1K of Rebol 'select-s. Of course we can't probably use such aproach, untill open/seek is available ... - what do we lack is proper and real grid. Rebol list view is just a toy for some hundreds of records. For larger amount of records to browse we need grid with dynamic caching ...(although not so necessary in the beginning of the project) Few notes: - love ir or hate it - my description is not the demagogy, it can be easily put in trashcan :-) - it is also good for row/column oriented design, although I can imagine, that record data block could be of variable size. What do I miss (and Carl once told it would be cool :-), is the ability of parse function to work on opened/uncached file (e.g. look at Netscape/Mozilla stored mailboxes - would be handy) ... well, that's few of my ideas in the early morning :-) -pekr-

[2/24] from: joel:neely:fedex at: 15-Jan-2002 5:56

Hi, Petr, Gabriele, et al, Petr Krenzelok wrote:

> Hi once again, > I would like to suggest more complex data format, but then it

<<quoted lines omitted: 9>>

> by default deletion of record doesn't physically delete it from > the file ...

This certainly fits the KISS mandate. Let me mention one slight variation that I've used: - new AND UPDATED AND DELETED records are appended to the end of the file which has two consequences, 1) assuming that write/append is significantly faster than reading and rewriting the entire file, it minimizes the time for any revisions, and 2) obtaining (the current state of) a record requires either sweeping the entire file or reading backward for the last occurrence of the record'd unique ID. I've used this in situations where the normal read is, in fact, a query looking for all records (or all records meeting some set of criteria), which means that sweeping the entire file is not a mjor penalty. -jn- -- ; sub REBOL {}; sub head ($) {@_[0]} REBOL [] # despam: func [e] [replace replace/all e ":" "." "#" "@"] ; sub despam {my ($e) = @_; $e =~ tr/:#/.@/; return "\n$e"} print head reverse despam "moc:xedef#yleen:leoj" ;

[3/24] from: rgaither:triad:rr at: 15-Jan-2002 8:43

Hi Joel, Petr, All,

>> I would like to suggest more complex data format, but then it >> really depends upon what you want to achieve. Let's talk >> 1 db = 1 file only.

In reviewing some design options I am having to rethink the desire for a 1 file solution. With a text db it just doesn't seem the best option.

>> 1 ["Petr" "Krenzelok" [petr--krenzelok--trz--cz] 29] "R" >> 2 ["Someone" "Else" [someone--else--com] 18] "D"

Nice. I didn't think of the status option.

>> Some rules: >> - file would be organised in 1 rec = 1 line, so read/lines could >> be eventually used for in-memory mappings ...

This would be nice but I'm not sure it is reasonable for records with a large text block. One of the things I'd like to avoid is imposing limits on record size so something like a webpage could be a record if desired.

>> - new records would be write/appended to the end

Yes, this seems best in this illustration.

>> - record is provided with the status field - important one - >> by default deletion of record doesn't physically delete it from >> the file ...

Sounds good also.

>This certainly fits the KISS mandate. > >Let me mention one slight variation that I've used: > >- new AND UPDATED AND DELETED records are appended to the end of > the file

This is something I have considered as well. Another option would be to have a transaction file or files that either contain whole record changes or just "individual operations". This audit/log could then be applied via a utility to update the main data file when desired. Updating the main file would improve the read operations and bring the db back into an easy to view form.

>which has two consequences, >1) assuming that write/append is significantly faster than reading

<<quoted lines omitted: 3>>

> sweeping the entire file or reading backward for the last > occurrence of the record'd unique ID.

All of these issues show the trade offs in the different operations you need to do on the database. Read and search performance versus update and write operations and so on. They conflict quite a bit with the organization I would pick to keep the data in a single file with visually "nice" organization. :-( I will throw out a summary of format options I worked up last night in another post. Thanks, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[4/24] from: rgaither:triad:rr at: 15-Jan-2002 9:22

Hi All, Here is my summary of file organization options for a text db. File Options - 1. Single file Pro - Very easy to manage and relocate Pro - Simple read logic Con - Poor write performance Con - Hard to read data mixed with schema Con - Limited to small record counts 2. Schema, Data, and Index file Pro - Provides clean content by purpose Pro - Supports larger record counts Pro - Still reasonable to manage and relocate Con - Poor write performance Con - Still somewhat limited in record counts Con - Multiple file synchronization issues 3. Schema file, Data and Index files for each table Pro - Provides clean content by purpose Pro - Supports larger record counts Pro - Moderate write performance Pro - Moderate read complexity Con - DB is made up of many files Con - Multiple file synchronization issues Variations - 1. Any of the above, changed records appended at end of file Same as Joel's suggestion. Pro - Improves write performance Con - Complicates read logic and performance Con - Harder to manually view data 1. Any of the above, changed records in a log file Like Joel's suggestion but stored in a dedicated file. Pro - Improves write performance Pro - Provides place to implement limited transaction support Con - Complicates read logic and performance Con - Harder to manually view data 3. Any of the above, individual data operations in a dedicated file Like the append changed records options but just recording the action and new data. Field levels changes for updates or record level changes for add and deletes. Pro - Improves write performance even more Pro - Provides place to implement limited transaction support Con - Complicates read logic and performance Con - Harder to manually view data FWIW, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[5/24] from: cribbsj:oakwood at: 15-Jan-2002 10:06

Rod Gaither wrote:

>Hi All, >Here is my summary of file organization options for

<<quoted lines omitted: 44>>

>Oak Ridge, NC - USA >[rgaither--triad--rr--com]

Just FYI, but I contributed a script to the library (I think it is called db.r) that provides some pretty simplistic db routines for maintaining a single file db. It addresses some of the wishlist items posted on this topic. The code is pretty ugly, but it is functional and fairly speedy as long as the files don't get too big. Also, during the development of the script, I corresponded with Tim Johnson a lot. He was working on something similar for a client. He had some great ideas and code that we were looking to incorporate in my script, but I got sidetracked with other stuff. Jamey Cribbs

[6/24] from: rgaither:triad:rr at: 15-Jan-2002 10:27

Hi Jamey,

>Just FYI, but I contributed a script to the library (I think it is >called db.r) that provides some pretty simplistic db routines for >maintaining a single file db. It addresses some of the wishlist items >posted on this topic. The code is pretty ugly, but it is functional and >fairly speedy as long as the files don't get too big.

I already have your db.r right next to Gabriele's dbms.r and am reviewing both of them. :-)

>Also, during the development of the script, I corresponded with Tim >Johnson a lot. He was working on something similar for a client. He >had some great ideas and code that we were looking to incorporate in my >script, but I got sidetracked with other stuff.

Thanks for the pointers! Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[7/24] from: joel:neely:fedex at: 15-Jan-2002 9:37

HI, Rod, Nice summaries! Rod Gaither wrote:

> This would be nice but I'm not sure it is reasonable for records > with a large text block. One of the things I'd like to avoid is > imposing limits on record size so something like a webpage could > be a record if desired. >

I think the underlying issue is which data types (in the generic sense of the word) should be supported. Many databases with which I'm familiar treat "running text" (with large size potential) as a different type than "character data field" (with some upper limit on capacity). To avoid dependence on filesystem issues (assuming we DO want to stay away from entanglement with any specific OS features/limits) one strategy is to store string data up to a certain size within the record; above that size, each such value would be kept in a separate file, with the name of the file as a string within the original record. Certainly not the fastest option for some purposes, but it does allow such things as searches on non-running-text fields (e.g. keys) to be done without the overhead of reading/skipping the big chunks. The same approach can be used for BLOb data.

> > > >- new AND UPDATED AND DELETED records are appended to the end of

<<quoted lines omitted: 5>>

> when desired. Updating the main file would improve the read > operations and bring the db back into an easy to view form.

Perhaps I should have been more explicit. I assumed the existence of a utility, also alluded to in Petr's post, which would "pack" the data back to canonical form (one physical record per logical record). A trivial variation on that utility also allows for viewing the data without actually rewriting the packed file to disk. The "append all stuff" strategy essentially puts the log (audit trail) within the file itself. I believe it's faster to do that with one file (scanned as needed when un-packed) than with a separate main and transaction files.

> All of these issues show the trade offs in the different operations > you need to do on the database. Read and search performance > versus update and write operations and so on. They conflict quite > a bit with the organization I would pick to keep the data in a single > file with visually "nice" organization. :-( >

Simplicity imposes limits. I don't know how to define it except with respect to intended uses. The simplest format I know (from the point of view of writing code to read the data) is fixed-field layout, where each data element is right-blank-padded (if it is "string-like") or left-zero-padded (if it is "number-like") to constant size across all records. 1013John Doe 1983221 3253Hermione Fibbershins 1984616 4506ThrockmortonWilberforce 1985703 5323Johannes Dingsda 1988445 7151Tuxedo Penguin 1995499 9598Bill Cat 1997525 The simplest format I know (from the point of view of a human trying to look at the raw data is a forms-like presentation with explicit labels for all data values: Employee ID: 1013 First Name: John Last Name: Doe Year Hired: 1983 Office: 221 Employee ID: 3253 First Name: Hermione Last Name: Fibbershins Year Hired: 1984 Office: 616 Employee ID: 4506 First Name: Throckmorton Last Name: Wilberforce Year Hired: 1985 Office: 703 Employee ID: 5323 First Name: Johannes Last Name: Dingsda Year Hired: 1988 Office: 445 Employee ID: 7151 First Name: Tuxedo Last Name: Penguin Year Hired: 1995 Office: 499 Employee ID: 9598 First Name: Bill Last Name: Cat Year Hired: 1997 Office: 525 I have a strong motivation to make it *possible* for humans (e.g., me!) to read my data files since that's often useful for debugging and troubleshooting. However most of the access is done by programs, so I tend to make it "just enough" human readable and prefer to ease the parsing burden on the program. -jn-

[8/24] from: rgaither:triad:rr at: 15-Jan-2002 11:31

Hi Joel,

>Nice summaries!

Thanks! :-)

>> This would be nice but I'm not sure it is reasonable for records >> with a large text block. One of the things I'd like to avoid is

<<quoted lines omitted: 14>>

>on non-running-text fields (e.g. keys) to be done without the >overhead of reading/skipping the big chunks.

A good point. I might lean towards a single file for each kind of BLOB (or should that be TLOB :-)). I still have a strong aversion to creating lots of files to represent the DB. Or perhaps even better is the option to have both - just a file reference or a large text block collection field type as well.

>The same approach can be used for BLOb data.

Yes indeed.

>Perhaps I should have been more explicit. I assumed the existence

No, I got those parts as valid assumptions.

>The "append all stuff" strategy essentially puts the log (audit trail) >within the file itself. I believe it's faster to do that with one >file (scanned as needed when un-packed) than with a separate main >and transaction files.

Part of my reason for the separate files was to vary the format as needed. Also it allows the main file to be read only and assumed static so it would not have to be altered if a transaction was not applied. Don't know about the speed impacts though, I believe there are lots of issues depending on how the db is used.

>> All of these issues show the trade offs in the different operations >> you need to do on the database. Read and search performance

<<quoted lines omitted: 4>>

>Simplicity imposes limits. I don't know how to define it except >with respect to intended uses.

That is the problem. The one size fits all kind of solution is very hard to find, perhaps impossible if you want a "Good" fit. :-) I keep comming back to wanting this to be binary. :-) [snip good simple examples] I do like Pekr's simple example as well and would only want to include some table and column name information. Sort of a block oriented REBOL version of a CSV file. I can live with the one line record approach and related storage for the big text or binary objects for the benefits it returns.

>I have a strong motivation to make it *possible* for humans (e.g., >me!) to read my data files since that's often useful for debugging >and troubleshooting. However most of the access is done by programs, >so I tend to make it "just enough" human readable and prefer to ease >the parsing burden on the program.

I also like to have the format simple enough to read and even create manually if possible. I know this is implying a limit on the size and style that does not match Gabriele's requirements though so some compromise is needed. Thanks, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[9/24] from: petr:krenzelok:trz:cz at: 15-Jan-2002 19:05

Rod Gaither wrote:

>Hi Joel, >>Nice summaries!

<<quoted lines omitted: 22>>

>BLOB (or should that be TLOB :-)). I still have a strong aversion to >creating lots of files to represent the DB.

Why? :-) We really do use one file strategy for database table, one file for index bag (multiple orders), one file for memo. I think that we are talking lightweight db here anyway, so, why are we talking blobs here? :-) In reality, I prefer each table in separate file, it is also somehow safier to me and you can copy tables around your harddrive to process them in external analatic tools. Once I use ODBC driver for e.g., it treats directory containing multiple db, index, memo files as one real database, where you can query its tables. As for putting changed records at the end of file - I thought that constant size fields/record would be enough? That way you could just perform 'find on the file (seeking for record you are about to change), and simply replace it. Longer text sizes should come to external memo file, but first - I am duplicating XBase functionality here and 2) I can feel that is something which you probably would like to avoid to happen for "simple" Rebol dbms system :-) .. so, btw - I am somehow lost - what do we want to achieve? Any conclusions so far? :-) We can try to prepare description for several scenarios provided, and continue to post their pros/cons and then decide upon implementation. ... -pekr-

[10/24] from: rgaither:triad:rr at: 15-Jan-2002 13:47

Hi Pekr,

>>A good point. I might lean towards a single file for each kind of >>BLOB (or should that be TLOB :-)). I still have a strong aversion to >>creating lots of files to represent the DB. >> >Why? :-) We really do use one file strategy for database table, one file >for index bag (multiple orders), one file for memo. I think that we are

It is really only a preference. I am used to systems with 100s of tables and everything included in 1 db file. :-) I remember the dBase way of doing things as a pain, but admit in this case it probably is not worth the effort for single file or even reduced file versions.

>talking lightweight db here anyway, so, why are we talking blobs here?

I think we need to talk about large text fields at least. I would love this little db to manage my saved mail in a more usable format for example. :-)

>:-) In reality, I prefer each table in separate file, it is also somehow >safier to me and you can copy tables around your harddrive to process >them in external analatic tools. Once I use ODBC driver for e.g., it >treats directory containing multiple db, index, memo files as one real >database, where you can query its tables.

Good points all.

>As for putting changed records at the end of file - I thought that >constant size fields/record would be enough? That way you could just >perform 'find on the file (seeking for record you are about to change), >and simply replace it. Longer text sizes should come to external memo

I am not suggesting a fixed length record or field where this is possible.

>file, but first - I am duplicating XBase functionality here and 2) I can >feel that is something which you probably would like to avoid to happen >for "simple" Rebol dbms system :-)

Yes duplication with any rdbms is happening with this effort. It is part of the problem though that all those products are external to REBOL and thus limited in their own ways.

>.. so, btw - I am somehow lost - what do we want to achieve? Any >conclusions so far? :-) We can try to prepare description for several >scenarios provided, and continue to post their pros/cons and then decide >upon implementation. ...

I apologise for this. :-) I want to make an effort to bring focus back to the project before I distract the discussion any further. I think the next step is to settle on a "starting" file format so we can get back to the functional questions Gabriele proposed. :-) What I (my opinions only) conclude so far, though review of some of the existing implementations may change this - 1. Text files for persistent storage 2. REBOL Blocks for grouping values 3. Native REBOL representation for values 4. Use single base directory - no directory structure required 5. Use multiple (named) text files to map to tables, indexes and large text blocks 6. Keep at least the table data file simple, readable, and self contained 7. One record per line 8. Use some manner of file append to manage record changes 9. Delete record operation is only a marker until packed 10. Need utilities to pack db and provide canonical form 11. Every row has an automatic internal "row-id" 12. Manipulate records as blocks of values 13. Result sets are built as blocks of blocks of values 14. The in-memory format should not be constrained by the persistent format The rest, and yes there were more good points only not quite as well defined, can wait for the next round of design theory. :-) Comments? Samples? Thanks, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[11/24] from: jason:cunliffe:verizon at: 15-Jan-2002 14:52

> >file, but first - I am duplicating XBase functionality here and 2) I can > >feel that is something which you probably would like to avoid to happen > >for "simple" Rebol dbms system :-)

fyi: here's another part of the puzzle to add. I found this Xbase Rebol/View program on http://rebolfrance.multimania.com/telechargement.html#applications http://rebolfrance.multimania.com/projets/dbf/dbview.zip Fran�ois Jouen a d�velopp� un superbe explorateur de fichiers xbase compatible avec les formats dBase III, FoxPro 2.5, dbase IV et FoxPro 3. La version 1.3 authorise la visualisation du contenu des champs ainsi que de la structure de la base de donn�es. L'utilisateur peut �galement rechercher une valeur parmi les informations stock�es et exporter la table dans le format de donn�es Rebol. ..for non francophones I think that says: Fran�ois Jouen has devopelopped a supeb xbase file browser compatible with dbaseIII, FoxPro 2.5, , dbase IV et FoxPro 3. Version 1.3 allows you to see the contents of fields as well the structure of the database. The user can also seearch for a value within the stored information and export the table in a REBOL data format. ./Jason

[12/24] from: rotenca:telvia:it at: 16-Jan-2002 0:47

Hi Rod

> What I (my opinions only) conclude so far, though review of some of > the existing implementations may change this -

<<quoted lines omitted: 3>>

> 4. Use single base directory - no directory structure required > 5. Use multiple (named) text files to map to tables, indexes and large text

blocks

> 6. Keep at least the table data file simple, readable, and self contained > 7. One record per line

<<quoted lines omitted: 8>>

> well defined, can wait for the next round of design theory. :-) > Comments?

I substantially agree.

> Samples?

Now i feel myself in a relational maze. But from now, Gabriele has a week of time to give us a complete bug-free program. Count down has started :-) About all the rebol db programs, i wait a comparative review from you (another count down:-) Can't you say us the link to find the db which are not on rebol library?

> Thanks, Rod.

Thank you. --- Ciao Romano

[13/24] from: rgaither:triad:rr at: 15-Jan-2002 22:32

Hi Romano,

>Now i feel myself in a relational maze. But from now, Gabriele has a week of >time to give us a complete bug-free program. Count down has started :-)

I sure am glad you picked Gabriele for that task! :-)

>About all the rebol db programs, i wait a comparative review from you (another >count down:-)

Oops, I guess I spoke too soon. :-) I will try and post a review by this weekend.

>Can't you say us the link to find the db which are not on rebol library?

I will work on this as well but can't give the ones from the books as they are part of those products. I will check on the others though and either point to them or post them to the list if short. Thanks, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[14/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 6:24

> What I (my opinions only) conclude so far, though review of some of > the existing implementations may change this -

<<quoted lines omitted: 12>>

> 13. Result sets are built as blocks of blocks of values > 14. The in-memory format should not be constrained by the persistent format

Very good one - at least for me - starting point we could agree upon :-) Few questions, comments, which are not clear for me yet: - status markers could be used even for things as locking .... a) now I am not sure if locking should happen on index level (dangerous if someone will not use any index), or in database itself ... b) if we have no db server and our app fails, record would stay locked forever, so we would have to complicate things with some time-stamps or so ... - I don't understand how you want to use file appends to manage changes to records. What rec-number will be used for changed record? It should be the same one. If so, then we need to introduce versioning (we use such aproach in one of our tables, although for different purpose) - record-numbers - I hope it is clear enough, that 'pack-ing the database will reassign record numbers to remove holes caused by record deletion ... - variable record size - a block. OK - how will we map it to any rebol structure/object? I just hope that at least that number of columns are the same thru all records or I can't imagine it in other way than duplicating column names in each record, e.g. [name: "pekr" lastname: "krenzelok"], which is not good aproach to follow imo? So, anyone answers my questions? :-) -pekr-

[15/24] from: greggirwin:mindspring at: 15-Jan-2002 23:59

Hi Pekr, << a) now I am not sure if locking should happen on index level (dangerous if someone will not use any index), or in database itself ... b) if we have no db server and our app fails, record would stay locked forever, so we would have to complicate things with some time-stamps or so ...>> If we don't have a central server script, there has to be some kind of lock publishing mechanism. The easiest way, perhaps the only feasible way, is to use a persistent marker as you suggest. In order to unlock a frozen lock you need a manual unlock feature, or a timeout specified for each lock. If you go the timeout route, you may want to implement some kind of time-extension callback feature so you can be notified if a lock you hold is going to expire, with the option to hold on to it. The manual unlock approach is probably simpler, but not nearly as slick and not suitable for unattended use. << - record-numbers - I hope it is clear enough, that 'pack-ing the database will reassign record numbers to remove holes caused by record deletion ... >> I thought the intent was to assign a UUID to each record, which it would use forever, so location was not important. << - variable record size - a block. OK - how will we map it to any rebol structure/object? I just hope that at least that number of columns are the same thru all records or I can't imagine it in other way than duplicating column names in each record, e.g. [name: "pekr" lastname: "krenzelok"], which is not good aproach to follow imo? >> I don't like wasting space any more than the next guy, but I *love* self-describing data. Since we're targeting small to moderate size data stores, I wouldn't be opposed to blocks of name/value pairs. If we load a large block of records which share common words as their column names, REBOL should handle that very efficiently, correct? --Gregg

[16/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 9:48

Hi, Gregg Irwin wrote:

> Hi Pekr, > << a) now I am not sure if locking should happen on index level (dangerous

<<quoted lines omitted: 6>>

> publishing mechanism. The easiest way, perhaps the only feasible way, is to > use a persistent marker as you suggest.

Maybe I don't think so anymore :-) I talked the problem with my colleagues here, and Michael suggested, that any kind of locking, is done in memory. We found one interesting document though, which describes several scenarios, even one with lock field and time stamp .... http://www.rebol.cz/~asko/xbase.txt - very cool description ....

> In order to unlock a frozen lock you > need a manual unlock feature, or a timeout specified for each lock. If you > go the timeout route, you may want to implement some kind of time-extension > callback feature so you can be notified if a lock you hold is going to > expire, with the option to hold on to it.

yes, it could work, although it would require e.g. your form to have some timer implemented ...

> The manual unlock approach is > probably simpler, but not nearly as slick and not suitable for unattended > use. >

uhmm, not this one probably :-)

> << - record-numbers - I hope it is clear enough, that 'pack-ing the database > will reassign > record numbers to remove holes caused by record deletion ... >> > > I thought the intent was to assign a UUID to each record, which it would use > forever, so location was not important.

in XBase, it has its meaning. You can open dbase file without opening any index, so browser sorts records according to their rec-no.

> << - variable record size - a block. OK - how will we map it to any rebol > structure/object? I just hope that at least that number of columns are the

<<quoted lines omitted: 9>>

> large block of records which share common words as their column names, REBOL > should handle that very efficiently, correct?

I think that it depends upon what your intention is - do you want to have variable column records? I would not suggest it, as it is a little bit more difficult for any kind of anylysing software, which simply counts something in a loop .... I would suggest: rec-no ["various" 3 'or [more--datatypes]] ["D" time-stam-of-creation last-changed [locked-time-stamp user-info whatever]] above structure only shows aproach I would go with - first info, 'selecta-ble or 'find-able record-number, then record itself, followed with third block containing some system info - navigation is pretty straightforward, and we could use wrapper functions get-lock-stamp: func [db rec-no][first last last next next find blk 5] , where 5 = rec-no :-) -pekr-

[17/24] from: rebol665:ifrance at: 16-Jan-2002 10:48

Hi Jason Your translation is excellent and shows how close english and french really are. May I ask where you learned your french. Patrick ps. For Alexandre Dumas (author of the three musketeer) english was just misspelt and mispronounced french. ----- Original Message ----- From: "Jason Cunliffe" <[jason--cunliffe--verizon--net]> To: <[rebol-list--rebol--com]> Sent: Tuesday, January 15, 2002 8:52 PM Subject: [REBOL] Re: possible data format ... Re: Re: dbms3.r 01

> >file, but first - I am duplicating XBase functionality here and 2) I can > >feel that is something which you probably would like to avoid to happen > >for "simple" Rebol dbms system :-)

[18/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 11:37

> I would suggest: > rec-no ["various" 3 'or [more--datatypes]] ["D" time-stam-of-creation last-changed

<<quoted lines omitted: 5>>

> get-lock-stamp: func [db rec-no][first last last next next find blk 5] , where 5 > = rec-no :-)

uhm, when I think about it - what a nonsense :-) record-num 5 can be part of data block, so we can't use such simple mechanism. Maybe the best way is to really wait for open/direct/seek mode and write proper driver skipping here and there in opened file ... the question is - wait for how long? :-) -pekr-

[19/24] from: rgaither:triad:rr at: 16-Jan-2002 8:53

Hi Pekr, Gregg,

>- status markers could be used even for things as locking .... > a) now I am not sure if locking should happen on index level (dangerous if someone >will not use any index), or in database itself ... > b) if we have no db server and our app fails, record would stay locked forever, so we >would have to complicate things with some time-stamps or so ...

I don't think we are ready yet to worry about locking. I have not forgotten it but want to see the first round of file format and function designs before looking at it seriously. Just like I'm not sure what and how to store the table schema information yet but know it needs to be considered.

>- I don't understand how you want to use file appends to manage changes to records. >What rec-number will be used for changed record? It should be the same one. If so, then >we need to introduce versioning (we use such aproach in one of our tables, although for >different purpose)

Yes, it should use the same row-id. My rough cut at the table file has outer blocks for some grouping. This makes for simple organization and basic versioning just by group and position in the file.

>- record-numbers - I hope it is clear enough, that 'pack-ing the database will reassign >record numbers to remove holes caused by record deletion ...

I was not clear on this and am not sure I agree. It will take some more consideration working through some options to be sure but I am leaning towards the row-id being an integer sequence value by table that stays with the record for life. If we make it match the position in the table and change with the pack operation then we must change all the references to that value. It requires some more thought as it seems to me I am mixing some internal position usage and primary key roles in what I am doing so far. Perhaps it should only be an automatic primary key value that goes with the record values not outside the block.

>- variable record size - a block. OK - how will we map it to any rebol >structure/object? I just hope that at least that number of columns are the same thru >all records or I can't imagine it in other way than duplicating column names in each >record, e.g. [name: "pekr" lastname: "krenzelok"], which is not good aproach to follow >imo?

By variable record size I just meant the fields are not fixed length. They are exactly a block of values where each value is whatever size it needs to be. I am not saying variable number of columns. The columns should match a list of column names in order and number. While I like self describing data I agree that I don't want name:value pairs repeating in each record. Here is a sample file structure to consider and review - REBOL [ ; keep header info like table name, namespace, next row-id value ; better in a named value block like columns? ] columns [col-name col-name col-name ...] defaults [ col-name [now/date] ... ] rows [ 1 [col-value col-value col-value ...] "Status" 2 [col-value col-value col-value ...] "Status" 3 [col-value col-value col-value ...] "Status" ] updates [ 2 [col-value col-value col-value ...] "Status" ... ] Please note - this is just a first cut at how a table data file might look. I am putting it out to the list so it can be reviewed and torn apart as needed. :-) My questions on it so far - 1. How best to provide some header information? 2. I like the extra block grouping for flexibility but want some feedback? 3. The defauls block is an example of how the grouping supports extendability 4. The whole row-id/primary-key or both design issue? 5. What status values do we need? 6. Would a action/operation update style vs full records be better?

>So, anyone answers my questions? :-)

Yes, or at least the start of answers for some. :-) Thanks, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[20/24] from: petr:krenzelok:trz:cz at: 16-Jan-2002 15:14

Rod Gaither wrote:

> Hi Pekr, Gregg, > >- status markers could be used even for things as locking ....

<<quoted lines omitted: 7>>

> how to store the table schema information yet but know it needs to > be considered.

> >- I don't understand how you want to use file appends to manage changes to records. > >What rec-number will be used for changed record? It should be the same one. If so, then

<<quoted lines omitted: 11>>

> change with the pack operation then we must change all the references > to that value.

Record numbers should be in no way related to data content. What relations are you talking about? We relate tables thru some column or set of columns, not some internal database stored information. Record numbers should serve for index values, where e.g. sorting order of name + last-name would store something like: 1 "Petr Krenzelok" 35 "Petr someone" 3 "Olaf xyz" so that you can choose your index tag, and set filter for e.g. to see only all Petrs - your table-grid viewer driver will then fill in your grid directly selecting only record numbers 1 and 35 in above example ...

> Here is a sample file structure to consider and review - > REBOL [

<<quoted lines omitted: 15>>

> ... > ]

One thing I don't like so far - why to add changes to table/database itself? I think that more correct is one of following: a) replace record ... b) if you want to track changes, introduce some logging, but don't complicate database with some values and their updates/history .... just imo :-)

> Please note - this is just a first cut at how a table data file > might look. I am putting it out to the list so it can be reviewed > and torn apart as needed. :-) > > My questions on it so far - > > 1. How best to provide some header information?

a) as first block/row in a file b) separate file, e.g. .def file, specifying index (or more of them), unique ids, field names = record structure, or other things for maintanance I prefer b), simply to have clear and containing-records-only data file ...

> 2. I like the extra block grouping for flexibility but want some feedback? > 3. The defauls block is an example of how the grouping supports extendability > 4. The whole row-id/primary-key or both design issue? > 5. What status values do we need?

Deleted , later maybe "locked" etc .... let's start with "deleted" one, or just let's preserve such field for future usage :-)

> 6. Would a action/operation update style vs full records be better?

? Could you explain this one, please? Thanks, -pekr-

[21/24] from: rgaither:triad:rr at: 16-Jan-2002 10:40

Hi Pekr,

>Record numbers should be in no way related to data content. What relations are you talking >about? We relate tables thru some column or set of columns, not some internal database

<<quoted lines omitted: 3>>

>35 "Petr someone" >3 "Olaf xyz"

I am talking about internal db relationships like the indexes. I was starting to confuse the row-id with my desire to always have a unique ID value that is auto-generated as a value for each row of a table - to server as a primary key. After thinking about it a bit though they have to be two separate things and I'm not sure everyone will go along with a default ID column in a record anyway. :-)

>so that you can choose your index tag, and set filter for e.g. to see only all Petrs - your >table-grid viewer driver will then fill in your grid directly selecting only record numbers >1 and 35 in above example ...

I don't know if I want to have to update all the index data every time we pack the DB. That can be looked at later though when starting to work on how the functions interact with the files.

>> Here is a sample file structure to consider and review - >>

<<quoted lines omitted: 20>>

>more correct is one of following: >a) replace record ...

I don't think you can in the text db. How are you planning on doing this without having to rewrite the entire file for every change?

>b) if you want to track changes, introduce some logging, but don't complicate database with >some values and their updates/history .... just imo :-)

I agree, but I am not - see above. :-) I am not sure I want the updates in the same file myself. That was a point Joel brought up where I was thinking of a separate file. I am in the middle on where the list of changes goes, I just know it has to be at the end of some file so append can be used.

>> My questions on it so far - >>

<<quoted lines omitted: 3>>

>names = record structure, or other things for maintanance >I prefer b), simply to have clear and containing-records-only data file ...

I like (a) for some parts and (b) for the full schema information. The only reason for (a) is to give included mapping between columns and data and store real values - not schema - such as the next row-id with the data.

>> 2. I like the extra block grouping for flexibility but want some feedback? >> 3. The defauls block is an example of how the grouping supports extendability >> 4. The whole row-id/primary-key or both design issue? >> 5. What status values do we need? > >"Deleted", later maybe "locked" etc .... let's start with "deleted" one, or just let's >preserve such field for future usage :-)

Ok.

>> 6. Would a action/operation update style vs full records be better? > >? Could you explain this one, please?

I have a problem with repeating the entire record in the update section if the only thing that has changed is the value of one column. By having a list of operations that could be at the field level or record level there is finer granularity to the process. This might also be considered a low level data manipulation language that gets applied to the table data to bring it in back in sync during packing. It complicates things at the engine level but should give better performance and more flexibility in activities. Some examples: Whole record - 2 ["Rod" "Gaither" [rgaither--triad--rr--com] ...] "U" Action - Update 2 first-name "Roderic" Other examples - Delete 3 Create ["Gabriele" "Santilli" [g--santilli--tiscalinet--it]] "N" For transaction support we may need to consider putting these update lists in there own file so we can manage multiple table changes as a single transaction. Thanks, Rod. Rod Gaither Oak Ridge, NC - USA [rgaither--triad--rr--com]

[22/24] from: chalz:earthlink at: 16-Jan-2002 13:16

I know this is a sadistic request *bwahaha*, but is there any chance someone could collect the data format/dbms/related posts, once the thread quiets down, and sort of 'compress' it to a list of desires/wants/standards/protocols/etc? I find this project fascinating, but right now I don't have the time to wade through all of the mail (the sheer amount in one day is blowing me away), and the thread keeps diverging ;) And BTW, it's Petr, not Pekr. *snicker* --Charles

[23/24] from: greggirwin:mindspring at: 16-Jan-2002 11:50

Hi Pekr, << Maybe I don't think so anymore :-) I talked the problem with my colleagues here, and Michael suggested, that any kind of locking, is done in memory. We found one interesting document though, which describes several scenarios, even one with lock field and time stamp ....>> But for this to work, you need some kind of file locking mechanism (e.g. SHARE under DOS), or shared memory, correct? I.e. if you have two separate dbms scripts running against the same file, how do they communicate lock information if REBOL doesn't give us the ability to "flag" locks against specific file offsets? << http://www.rebol.cz/~asko/xbase.txt - very cool description .... >> Great information! <<RE: rec-num: ...in XBase, it has its meaning. You can open dbase file without opening any index, so browser sorts records according to their rec-no. >> Right, if we have access to a block of records, whether as a complete file on disk or as the result of a query, there would be an implied order to them, but I see that as separate from a record identifier (its GUID). << I think that it depends upon what your intention is - do you want to have variable column records? I would not suggest it, as it is a little bit more difficult for any kind of anylysing software, which simply counts something in a loop .... >> Agreed, do we consider a database to contain only homogeneous records, does it maintain separate internal tables for different record types, or does it do what some OODBMS' do and store a collection of heterogeneous records which share a small common core allowing it to navigate and identify records of various types. In that scenario, reading all the records in a "table" is actually more like a query to find records by type (and a record may exist in multiple "extents" when you consider an inheritance model). --Gregg

[24/24] from: g:santilli:tiscalinet:it at: 17-Jan-2002 21:27

Hello Petr! On 15-Gen-02, you wrote: PK> For larger amount of records to browse we need grid with PK> dynamic caching ...(although not so necessary in the PK> beginning of the project) I've included DB-CACHED-QUERY in mysql-wrapper.r just for that. I use my own multicolumn list style for display; is still slow because of iteration, so I think I will switch to a non-iterated version in the future. Regards, Gabriele. -- Gabriele Santilli <[giesse--writeme--com]> - Amigan - REBOL programmer Amiga Group Italia sez. L'Aquila -- http://www.amyresource.it/AGI/

Notes

Quoted lines have been omitted from some messages.
View the message alone to see the lines that have been omitted