Posts Tagged ‘SQL Server’

MDB.BAR: Making a database out of…a database

Wednesday, November 18th, 2009

Since mid-September, I’ve hosted a limited BAR I.F. of the Microsoft Jet database format, or MDB. This I.F. breaks the database records into appropriately-sized chunks, identifying various fields.

Of course, the first I.F. for MDB only allows one to look at data table format information. It does not support the “magic row-column” formula used to extract the data, which is to say, you can plug in a record number, a column number, and then you’ve got your data for that row-column combination.

I’m working to correct this, which will allow a person to get information out of any field in a data table. But even this seems like it’s too limited. So you get the information out of row 3, column 7. What does this mean? Most people reference records by a primary key, which is one or more of the columns. Oftentimes, non-key fields must be searched, with each matching record returned. And, of course, let’s not forget that columns are usually referred to in queries by name, and not by number.

BARfly would do wonders to extract entire data tables at once, given this modification. But what would it mean to use a BAR I.F. to manage the database file like you would expect an actual database to be managed? What would you have to do?

Well, I’m not going to discuss all the details, because it would take too long. So I’ll limit my focus to just a fundamental operation programmers and DBAs routinely perform on databases: the select query.

Queries can be written in many forms, but I’m just going to focus on MDB’s preferred format, which is called SQL. A SQL query has the following general form:

SELECT alpha,gamma,beta FROM greekdb.alphabetletters WHERE (caps=”true” AND highlite<>“yellow”) ORDER BY delta

Roughly translated, we want to extract a variable number of records, with three columns (“alpha”, “gamma”, and “beta”) per record returned, from the table “alphabetletters,” which belongs to database greekdb. Limit the records returned to those whose column “caps” has a true value, and whose column “highlite” has any value other than yellow. Finally, order the results, ascending, by the contents of the “delta” field in each returned record.

Any decent SQL processing system will take a database query and strip it down to the fundamental inputs that really matter in the query itself. Once all the text is read and understood, the database query looks more like this to the query handling logic:

Search database 2, table 5, for records where (column 10 is true and column 12 has a negative string comparison result with the text yellow). From these records, extract columns 4, 6, and 5 into the result dataset, in that order, ignoring the other columns. Finally, order the records of the result dataset using column 7 (ascending by numeric comparison).

Computers are nothing but numbers, and everything ultimately breaks down into a number. Once all the SQL has been translated, the real work is retrieving (or populating) all the pertinent data in the database for the user.

So, could a BAR I.F. do all that? Translate the query text, AND collect the necessary data, AND package the results in a format the user had requested? Absolutely.

BAR, remember, has decent regular expression processing capabilities, and can translate the SQL query as needed. Offset tweaking and length calculations are a fundamental part of the BAR deserialization procedure, allowing one to perform the “magic row-column” formula as many times as needed. Finally, the Deserialize functionality allows endless possibilities for data translation after nodes have already been characterized, yielding an appropriate dataset in the order the user had requested.

SQL gurus know that many queries are far more complex than the example I’ve just given. One-to-many relationships, multiple keys, internal consistency rules, etc. are not covered by the example. Clearly, I would be reinventing many wheels by making BARfly encompass nearly every possible creative query that a person could ever dream up, with few obvious returns. It’s far more worth it to use a genuine frontend, like Microsoft Access, if you need to use all the features of the database software.

But if you’re needing to perform data recovery, or if you want to perform easy translation of data without regards to internal consistency, or if you want to migrate the data into XML, UTD, or another format in preparation for a big move to a new format, or if you are writing a program that requires a light memory footprint with little need for all the advanced features of a database-access library of functions, consider the I.F. approach.

And don’t forget this–an I.F. is platform-independent.

I want to mention one more point. Microsoft has released 2.x of the Entity framework, which attempts to provide more universal access to different types of databases, with SQL Server being the preferred target. Query format is no longer important, nor is programmatic knowledge of database-specific features (Oracle has different features of SQL Server, for instance). Great stuff. But the framework does not, to my knowledge, allow anyone to customize what, exactly, the backend is. You’re locked into a predefined set of known database formats.

BAR represents the final piece of the puzzle. With BAR, there is the very real possibility of having a unique component that defines the format for a custom database backend. Having one or more implementation files act as “backend definitions” could, theoretically, allow for a framework with unfettered access to any type of data, regardless of whether or not the data accessed is even a database at all!