Product(s) Review: SQLite Maestro

Why
SQL Maestro is a database administrator tool for people who work with a lot of different databases.  Wait! Before you say, “I just work with Oracle/MS SQL” or what ever, remember, many times you can expect to have to work with MS Excel, MS Access, CSV and other flat file formats.  Also, your personal website probably runs on MySQL or Postgres.  Most of these layouts either lack a superior administrator tool (such as the open source ones), or at best will be unfamiliar.

What
It has a graphical query designer.
It has database and table creation.
It has table data editing.
It has SQL script running
It can export DDL SQL and INSERT statement.

How Much
It’s fairly cheap.  The AnySQL version (that targets any database behind Oledb/OCBC) is very nice and it is freeware.  I recommend getting that, plus the commercial edition for the database you use day to day.

The Sqlite Code Factory appears to be a subset of features in SQLite Maestro.  It gave me an error messages of “can’t open the database, a process already has it open.”  Uh, which one?  I’m not about to reboot my computer, especially when SQLite Maestro (the other tool) can still connect.  I don’t understand why Sqlite Code Factory can’t connect, probably a bug imho.  If there was a feature of SQLite Code Factory that isn’t in SQLite Maestro, I didn’t figure it out.

I reviewed the SqlLite Data Wizard for it’s ASP.NET data entry page generation.  Two problems: it uses ODBC (which makes it inappropriate for medium trust websites), and it is buggy.  Two of pages for my tables failed the click test–i.e. just loading the page.  After I finally got it to compile by replacing the ODBC with System.Data.Sqlite objects, it raised a lot of warnings for not declaring data types.  Unfortunately, this can’t be fixed.  The templates allow for changing many options, but not the source code for the generated page.

SqlLite Data Wizard also had a task manager and import/export utility that supported a few formats.  Personally, I’d rather use ODBC + Talend Open Studio.

The others
I didn’t review the others.  SQL Express management studio is pretty usable, so I’m not too excited about the MS SQL Maestro.  I don’t have a need to use any of the others.

Pervasive Integration Architect: Dealing with workspaces

Workspaces were invented by a misanthropist so that your collection of source code files would never be able to find each other. If you are familiar with Visual Studio, workspaces +repositories roughly correspond to fuction served by solution files and project files.

Detecting something is wrong. The first sign you will get of something being wrong is failing to find the macrofiles. When a process designer file is open, going to tools/macros should bring up the right macrofile (i.e. the one that has the right connection strings, etc) Validation does not check for this. Since macros are buried in property pages, it won’t be visibly obvious if the macro file the IDE is pointed at doesn’t go with the currently open process designer file.

Undoing the damage. There are literally billions of UI deadends in the IDE, making the proper way to open a process designer file nearly undiscoverable.

- File/Manage Workspaces

- Click the down pointing triangle next to the “Workspaces Root Directory” dropdown. The tool tip is “Changes Workspace Root Directory”

- Find the directory that is one level above “Workspace1″ Workspace1 may be named something else, so you may have to look for the folder than has a “xmldb” folder in it.

- Don’t forget to double click. The property page is poorly designed, i.e. selections made in the visible tree don’t commit until you double click. Also, you can’t just type in the path of the ‘workspace root’

Verifying the selection in “Workspace Manager”.

Expect the repository to say something like “xmldb01″, “FILESYSTEM”, “./xmldb”

Expect the workspaces to say something like “Workspaces1″

Expect when you open something to find the open dialog’s “Look in” section to say:

xmldb:ref:///{YOUR WORKSPACE ROOT}/Workspace1/xmldb

Open a file and double check that the macro file has the expected values and appears to be the one at {YOUR WORKSPACE ROOT}\Workspace1\macrodef.xml

UI Dead Ends. It is possible to have multiple repositories, multiple workspaces open. The semantics are unclear and it isn’t clear what macro file you will end up using. In any case, you should avoid features that rely on having multiple workspaces, repositories active at one time if for no other reason than maintenance developers won’t be able to figure out what the hell is going on.

Observations: Pervasive Integration Architect Process and Map Designer

If you are here, you might rather be at the Pervasive Integration Support Forum.
Unfortunately, one one has answered a question there since 2005! Oh well.

Dead Locks
My first attempt to test some code led to a deadlock. Clicking [Abort] doesn’t successfully abort anything. You will have to kill the process, either in SSMS or Task Manager. Elegant.

Default Transaction
By default, Sessions are “Serializable.” Serializable maximizes locking, minimizes performance and minimizes concurrency. The poorly described “Global Transaction” seems to be a way of making tasks that use that session run in a transaction that rolls back if the “Process” fails. This is different from “Run in tranasaction” in DTS, which makes a package run in a transaction.

Integration Querybuilder
This is yet another query designer. For a tool that is aimed at non-experts (the sort that don’t write their SQL from scratch), this tool is hard to set up. It isn’t smart enough to notice that you’ve already told the Map Designer what connection you are using. Instead you have to create a new connection. Today, for me, [Query/Execute] doesn’t do anything and the tool refuses to draw the diagram for the sample query I gave it.

Session Proliferation
If connection inside the map changes, on opening it you’ll be asked to create a some new sessions. If you try to change the session back to what it was, it will quietly undo that. This is GUI dishonesty. To get a session to link to the right one, you have to add the new junk sessions, delete them, open the map, and select pre-existing sessions at that point. If you don’t manage your sessions, then the whole idea of sessions breaks down as 100′s of session objects overwhelm the session folder. It will not be obvious what sessions are actually referenced by anything without reading the XML files or a considerable about of clicking. You can get rid of the extra sessions by going down the list and attempting to delete each one. Unused sessions will be deleted, used sessions will raise an error.

Also interesting is Session orphaning.  If you rename a session, the steps that referenced that a session are orphaned and won’t be fixed until you click on the step upon which you’ll be prompted to create or attach to existing sessions.

Connection Proliferation
A connection is a connection string and a table. A source table is different from a destination table. I recommend saving source and destination connections as files– however! These saved connection are templates. The resulting transformation file will not reference the original connections. Instead the connection data is copied into the tf.xml file.

Process Navigator
The process navigator is a series of folders of which 3 are interesting: “Process Steps”, “Process Variables”, “SQL Sessions”. Process steps lists all the steps. Double clicking a step will bring up the property page. However, if you have a large complicated Process, then clicking on a step will not help you identify which process that corresponds to on the designer surface. Process Variables are just global variables referencable in RIFL. SQL Sessions are for active connections. If you are using SQL 2000, don’t forget you can have only 1 active result set on a connection– this means you need 2 sessions to do a table copy. In SQL2005, which has MARS, this may be different.

Queue Sessions, Iterators, Aggregators, Message Objects all appear to be premium priced features, something to do with EDI or something.

Evaluating a Sofware Development (SDLC) Process

Untested Process Guidance
Despite a typical manuals numerous exhortations to test, test, test, software process mandates are obviously not tested by the writers, else they would notice some obvious typical defects of SDLC methodologies.

Team Size Approaches World Population
Each time a new activity is imagine, it is imagined with a specific person to do it. After taking into account which roles act as auditors and check the work of other roles, you can calculate the minimum team size, which often is huge.  A throw away sentence that says this strictly mandated process “can be simplified” isn’t exactly guidance.

Manager Wears all Hats, Fills out all Forms
Most of the roles get assigned to the boss.  Most of the process documents involve deciding what to do, what order to do it, and so on.  As a result, in even if you have a large team of say a dozen developers– the vast majority of the paperwork burdern falls on the project manager.

What @%@^$ company is this for?
Process messes with the existing political hierarchies.  Process guidance assumes that certain committees.  Often because of the “Manager wears all hats” problem, the process paperwork gets delegated down the chain of command to someone who doesn’t actually have the power to execute the plans he’s planning.

No auditing
A mandated process without a mandated and funded auditing process is likely to turn into a sham.  A sham process uses the same words and some of the same documents as the process dictates, but doesn’t actually follow the process. 

Metaplanning and Redundancy
Process for creating process documents is a dubious activity.  While software creation is a difficult task that requires more planning and tracking than ditch digging, it’s not clear that applying a SDLC to the process of paperwork generation is a value added one.  Ironically, the artifacts most likely make it into version control repositories are the process documents, not the source code!

Because software process promulgators don’t test their own processes, they don’t realize how redundant it gets.  If there are twelve documents, the templates are likely to ask the manager to summarize the project history twelve times.

Greenfield Bias
Again, because SDLC processes aren’t actually tested before being finalized, they don’t notice the massive bias they have towards greenfield development.  Greenfield development is increasingly rare.  Most IT workers subject to SDLC mandates are maintenance developers.  Process mandates go into hyperdetail about the creation of brand new systems and give short shrift to dealing with the more common and more interesting process of upgrading a system while it is running without downtime.  Worse, organizations tend to have a bias towards living with broken code rather than risking fixing it.  Mega processes exacerbate this organizational bias.  On one hand, it sounds prudent and cautious.  In practice, it is like hiring a plumber to fix the toilet only to hear, “Well, use your neighbors, it might be dangerous to fix anything.”

Long Minimum Paths
Federal SDLC processes often mention how many meetings there should be and how long documents should be made available for review to participants.  With a few dozen mentions of minimum review periods, minimum testing periods, minimum planning periods, you get a process that takes a year or two for a single iteration.

Sham Tailoring
SDLC documents often mention the existence of so-called tailoring.  Tailoring is adapting a process to deal with reality.  Tailoring will happen, either because the SDLC is being ignored, misunderstood and tailored without permission.  Formal SDCL documents often use strong language to discourage any tailoring.  This has the effect of keeping process in the dark ages.  Scrum, Extreme Programming and many other innovate process methodologies are popping up all the time.  Using formal and mandated SDLC processes to smother competing methodologies is probably counterproductive.  This is especially ironic when SDLC documents themselves tend to be a mix of all the process fads up to the date the SDLC mandate was promulgated.

No Guidance
There often is no “Supreme Court” for SDLC documents.  Mandated guidance can be self contradictory, contradict other organizational policies, or just poorly and incompletely written.  A typical SDLC document will refer to “architectual boards” and “SDLC advisory comittees”, that are assumed to exist, but don’t actually exist in reality or as phone number to call.

Apathy
Mandated SDLC processes often ignore that they are mandates that will fail without buy in from the people who have to follow the process.  Without buy in from the process participants, the SDLC quickly turns into a process of creating dead documents no one wanted, plans no one will follow, documents no one wants to read.

Dead Documents
And on the topic of dead documents, the SDLC tends to assume all documents are dead.  Typical SDLC flow charts show document one being created, then the next, then the next, etc.  Often project plans, bug databases, feature wish lists are dynamic lists that are being updated daily for the entire life of the project.   That these documents are often portrayed as single documents that can be delivered and signed off just go to show that SDLC promugulators haven’t actually used their systems before, or if they did, they didn’t bother to watch how the documents like “Requirements” were used.  In reality, a requirements document is a point in time report from a dynamic bug and feature database.  Any point in time document that is referring to dynamic real life data will have a usuable life of a day or two.

Blogged with Flock

SQLite and ASP.NET Providers

Ok, code exists, but as of writing, it is cobbled together from several sources:

MS Sample Code
Sample code is probably re-distributable, although it is hard to say what license governs:  Here is one MS EULA that mentions sample code: http://msdn.microsoft.com/vstudio/eula.aspx?id=c8bf88e7-841c-43fd-c63d-379943617f36

Role & Membership- open source http://www.eggheadcafe.com/aboutus.aspx

http://www.eggheadcafe.com/articles/20051119.asp

Profiles
http://sqlitenet.googlepages.com/home  — no explicit license

No explicit license, uses parts of above

http://www.codeproject.com/KB/aspnet/SQLiteProviders.aspx

Obviously it would be nice if I could clean up this code and release it under a proper license with better license tracing.

The MED Software Development Life Cycle

The federal government requires it’s agencies to follow a sofware process, according to The Clinger-Cohen Act (Public Law 104-106, search for division E).  MED, which Managed Evolutionary Development is the SDLC process used by various US government agencies and almost no one else.

The methodology appears to have been born around 2002, and shows up in “Managing Software Deliverables” (c) 2004 by John Rittinghouse.

The methodology is a lightly reformed waterfall process.  The NIH version is representative of the MED directives and I’ll direct all my comments toward that one.

Embedded databases for speed and simplicity

(blog entry brought to you by time snapper.  This entry would have been lost to an electricity outage if not for my screen recorder.)

When I want a relational database, I might want something that can handle a billion transactions per second, never loses data, and can be updated by 1000 users simutaneously without corrupting data.  If my application just needs a single small table, the overhead of a relational DB is overkill.

[likewise for ETL scenarios-- writing SQL based ETL code in a full-blown RDBMS for an ETL process incurs a lot of overhead that just slows data processing down.  On the otherhand, dropping back to using text isn't very good either as you forego the power of SQL]

In particular I want an embedded database that is:

  • free
  • works in ASP.NET & ADO.NET
  • supports all important SQL statements
  • is very fast
  • does’t require setting up a secure service (i.e. daemons or windows services)
  • Support for bi-directional databinding in .NET
  • Doesn’t require an installer/COM component registration, etc. i.e. nothing that would be a barrier on a hosted account
  • Runs in medium trust
  • Secure against executable code (should be able to call OS shell functions like one can in MS-SQL, MS-Access and the like)
  • Can zip up the data with the source code and send in an email

MS-SQL 2005, MySQL, etc

These require an install, administrator rights to set up, some non-trivial know how to configure the database and users.  Nope.  Too much work.

Text and Excel

Database drivers for text files tend to be pretty primitive, low performance and often don’t support updating.  Excel has limitations on the number of rows you can deal with.

MS-SQL SQL Compact/Mobile Etc

Doesn’t allow for hosting inside of an ASP.NET worker process. End of story.

XML

Not supported for bidirectional databinding with default ASP.NET controls.  I’m pretty sure this works in medium trust

MS-Access/Fox Pro

If MDAC is available, you probably can use the Microsoft.Jet.OLEDB4.0 driver. 

Berkley DB

Berkley DB was bought by Oracle.  Oracles distribution is slanted towards support of the Java world, but a .NET driver does exist.

SQLite

See my blog entry on SQLite.  Of all my options, this is the one I liked the most.  It runs well in ASP.NET, can support large numbers of users, has Visual Studio support (better in the full version than the express versions).

Notes: SQLite for ASP.NET

Why. Safer than MS-Access. Doesn’t require your hosting account to have any JET drivers for MS-Access. Simpler than SQL Express User Instances. Doesn’t require understanding or doing the MS-SQL administrative steps. Doesn’t even need a ODBC DSN to be set up.

What. The SQLite ADO.NET Provider.

How.
How to configure the Data Provider

From documentation, add this to web.config.

<configuration>
<system.data>
<DbProviderFactories>
<remove invariant=”System.Data.SQLite”/>
<add name=”SQLite Data Provider”
invariant=”System.Data.SQLite”
description=”.Net Framework Data Provider for SQLite”
type=”System.Data.SQLite.SQLiteFactory, System.Data.SQLite” />
</DbProviderFactories>
</system.data>
</configuration>

Add System.Data.SQLite.DLL to \bin\

Add your SQLite data file to \App_Data\   The framework will not server files in App_data, so the data will be safe from a user downloading and looking at your data directly.

You do not need to add any other files to the target server! Even though SQLite is a C/C++ based data access API, the System.Data.SQLite.DLL file has everything it needs. It does to interop calls, so the website will probably need to be trusted enough to do interop. It seems the C dll is compiled into the .NET dll, something I didn’t know was possible.

To connect, it helps to have the full file path:

<connectionStrings>
<add name=”ConnectionString”
connectionString=”Data Source=d:\hosting\mywebbsite\App_Data\data.s3db;New=False;Version=3″/>
</connectionStrings>

The code is straight forward…
Imports System.Data.SQLite

Partial Class _Default
Inherits System.Web.UI.Page

Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
Dim con As New Data.SQLite.SQLiteConnection
con.ConnectionString = ConfigurationManager.ConnectionStrings(“ConnectionString”).ConnectionString
Dim com As New Data.SQLite.SQLiteCommand
con.Open()
com.Connection = con
Dim t As SQLiteTransaction = con.BeginTransaction()
com.CommandText = “SELECT * FROM Test”
Me.gvSampleData.DataSource = com.ExecuteReader
gvSampleData.DataBind()
t.Commit()
con.Close()
End Sub
End Class

SQLite Membership Provider. One exists. I haven’t used it yet.  That one has support for Membership and Roles.  I’m still looking for something with profiles support.  The other ASP.NET 2.0 database things would be SQL Session and Webparts, neither of which are anything you’d likely do with a small database anyhow.

Review: Talend Open Studio

What. Talend Open Studio (TOS) is a ETL package. It competes with Data Transformation Services (DTS), Sql Server Integration Services (SSIS) and Pervasive Integration Architect (PIA) nee Data Junction. It also competes with some products I’m not familiar with, such as Informatica, Oracle Data Integrator, IBM Data Stage nee Ardent.

This is first impressions review, based on my “hello world” app, a webcast and an webex demo done by the friendly folk at Talend. Kudos to Vincent Pineau et al.

Whats to like. Talend as an organization has their act together. They are not anti-developer, I didn’t have to talk to ten people to get a trial copy, not like, um, not to name any names, but ok, PERVASIVE. In fact they are offering Talend as FLOSS, it is GPL open source, so you can just download it and start copying tables in pubs or what have you right away. I hope more companies can make the opensource model work because it sure makes it easier to start using a new technology. (And with enough time and gumption I hope to review the other open source ETL packages, since I can actually get my hands on them)
Cool Features.
Appears to be buffer driven. Evidence is that you can have a tranformation feed into a tranformation feed into a transformation. For example, you can have a pipeline with filter, apply expressions to columns, aggregate, look up all without having to write to a table between steps. In DTS, you’d have to write to table or file between each of these. SSIS also uses a buffer architecture, which gives it remarkable performance. I’m guessing Talend would have similar performance, but I haven’t stress tested it yet.

Source control features are built in. You can do side by side visual diffs– very cool. To do the same in a comparable environment, you’d have to open two IDE’s of two snapshots of a ETL package. If you checked the generated java code into regular source control, you probably would be able to do a readable text diff, unlike SSIS, whose XML format is not entirely human readable. PIA has semi XML source code, and is sort of readable. The PIA RIFL code is stored in the source as plain text inside CDATA blocks, but the rest of the objects are stored as XML, which isn’t readable as the corresponding code generated Java.

Real programmign language(S). You can write all your functions and expressions in Java (or heaven forbid Perl). This is a leap ahead of SSIS and PIA. SSIS uses it’s own expression langauge (a hybrid of C# and VB.NET) and PIA uses RIFL, which is VBScript with glasses and a fake mustache. Also, the IDE is Eclipse– a real IDE, which you’ve probably already encountered, if you’ve done any JavaScript development (like Aptana, or Java development with Eclipse itself)

Straight forward configuration. By this I mean it should be very simple to move code from dev to test to production with minimal fuss and minimal likelihood of accidental. [Pending complete review]. I know that SSIS made it’s configuration too complex, DTS managed to permenantly root itself to the place it was written. PIA uses “Macro” files, but has an overly complex and leaves me worried that I’ll run code against the wrong server by accident.

Same behavior at design time and run time. It’s code generated, doesn’t have a separate run time engine.

JavaDoc like documenation generation. Except it is much better looking. The generate documentation also helps with the “clickity-click” problems with all ETL IDE’s, i.e. you have to do a lot of clicking to drill down into each property and find out what the *#$@$# is going on.

[I'll have to finish this later]

What’s Confusing– the business model.

Licensing. Jasper and Talend seem to both be marketing service contracts for the same code base.

If your organization won’t let you use software without a support contract behind it, there is a Silver, Gold, Platinum support contract. Not clear if one needs to buy a commercial version to be able to buy a support contract. They also sell training courses.

If you want features not in the GPL version, there are 3 commercial versions, Team, Professional and Enterprise edition. If I understand correctly, Team, Pro & Enterprise include more source control, documentation generation, deployment, and a type of multiserver load balancing feature.

Licensing/Support is per developer (so no need to count CALs, CPU’s, Servers, etc.)

Posted in ETL

SQL2005, Windows 2003 and Licensing and Technological Choices

CPU. Licensing gives you a strong incentive to stick to single socket CPU designs. Multiple socket motherboards have hefty penalties for SQL 2005, as you need a license per CPU.
Memory. At the 2GB cutoff you have to move to Windows Server Standard from Wed Edition. After the 4GB cutoff you have to move to Windows Server Enterprise. Similar issue with SQL2005– after 4GB, you need to swith to Enterprise Edition. On the otherhand, the free SQL Express edition merely doesn’t use the extra memory.
CAL Counting. Some CAL counting scenarios are easy. Others are mind bogglingly complicated. So far, most people tend to search for licenses that don’t have legal or technological CAL limits because it is too hard to count users. Terminal Service and RDP servers are one are where CAL counting is enforce by license and technology, so CAL counting is important there. Everywhere else, there isn’t a CAL technological constriant. So when faced with counting users in a complex environment with some users using file shares, some using RDP occasionally, some printer users, some intranet IIS users, some internet IIS users, some people with local accounts, some with domain accounts, some with neither…I suspect most administors throw up their hands in frustration and stop trying to count unless there is a technological contraint.
Authentication. Your choice of authetication method can trigger CAL counting. This old technet page implies if you use integrated authentication and local users, you will trigger CAL counting. This means it is cheaper to write applications that avoid integrated authentication! This Oct 2000 article agrees:

“One more item of interest: Microsoft requires that you purchase a Client Access License (CAL) to enable a client to log on to a Microsoft server. Microsoft waives this requirement for Anonymous access to IIS servers, but if you authenticate a user against an account in User Manager, you need a CAL. If you plan to have many users authenticate to an IIS server, you can end up making a significant investment in CALs. If you use a non-Microsoft means of authentication, either custom or third party, you bypass this requirement but at the expense of not integrating with NTFS permissions.”

I couldn’t find anything that specificly mentioned domain accounts, but I supposed by analogy, if you use integrated authentication for your intranet application that will trigger CAL counting.