Friday, 3 March 2017

Writing maintainable SQL functions in FileMaker

At work recently, I had to troubleshoot why a particular set of Execute SQL functions was not returning the expected results.  This code was not written by me, so I had to analyse what the developer had done.  I don't know for sure, but I suspect the developer used one of the neat SQL Generators for FileMaker, such as SQLExplorer.  Anyway, this is the code.


Substitute(ExecuteSQL ( 
"SELECT \"z_Record ID_n\", \"corroType\", \"Date\", \"Mailing Title\", \"Title\", \"Signatory\", \"Body Result\"  FROM VLCorrespondence WHERE \"zk_CommunityID_t\" = ? OR \"zk_CommunityID Child_t\" = ? OR \"zk_Family Code_t\" = ? ORDER BY \"Date\" DESC" ;  "|"; "|*|"; Student Current::Student ID; Student Current::Student ID; Student Current::Family Code ); ["¶"; "\n"]; ["|*|"; "¶"])

& "¶" &
Substitute( ExecuteSQL ( "SELECT \"z_RecordID_cn\", \"corroType\", \"Date Sent\", \"Surname Preferred Name\", \"eMail Subject\", \"eMailFromName\", \"eMail Body\"  FROM VLEmail WHERE \"zk_eMailToID_f\" = ? ORDER BY \"Date Sent\" DESC" ; "|"; "|*|"; Student Current::Student ID); ["¶"; "\n"]; ["|*|"; "¶"])

& "¶" &
Substitute(ExecuteSQL ( 
"SELECT \"zk_recordID_p\", \"corroType\", \"Date of Note\", \"Surname Preferred Name\", \"Note\", \"Category\", \"zi_CreationUser\"  FROM VLNotesAC WHERE \"zk_CommunityID_f\" = ? OR zk_FamilyCode_f = ? ORDER BY \"Date of Note\" DESC" ;  "|"; "|*|"; Student Current::Student ID; Student Current::Family Code); ["¶"; "\n"]; ["|*|"; "¶"])

& "¶" &
Substitute(ExecuteSQL ( 
"SELECT \"Ω_id\", \"corroType\", \"created_date\", \"mobile_numb\", \"message_string\" , \"Reply Text\", \"message_string\" FROM VLSMS WHERE \"Student ID\" = ? ORDER BY \"created_date\" DESC" ;  "|"; "|*|"; Student Current::Student ID); ["¶"; "\n"]; ["|*|"; "¶"])

Tersely packaged, and at first glance, not all that daunting.  Tearing this apart though, that's another matter, the nested bracketing and many semicolons can make it pretty easy to get lost in.  The idea here was that a series of SQL Queries would be run across a set of tables, one each for Correspondence database (letters, basically), Email database, Notes database, and SMS Messages database, loading them all into a global variable, and displaying the aggregated data in a virtual table.

While that function call above certainly did the job, it was not as "comprehendable" as I would like. I've been reviewing/maintaining/updating a lot of old FileMaker code lately (some of which goes back to FileMaker 4!) and I have cursed under my breath whenever I find it difficult to work out what the code was supposed to be doing. So writing easily comprehendible code has become one of my passions lately.

The first thing was that we are using the same column and row separators for each separate SQL Query, so how about using a let() function, and declare them once at the start?


fieldSeparator = "|" ;

rowSeparator = "|*|" ;

Next, let's break out the nesting a bit, and make the load easier on the next poor schmuck who has to try and comprehend the code sometime in the future. Firstly, comment on the area being queried, then put the query itself into a variable, then perform the query by inputting the variable into the ExecuteSQL() function, and lastly parse the results.  Oh yes, one other thing, and I cannot stress this enough: use variable names that describe their usage!

/* correspondence */

correspondenceQuery = "SELECT \"z_Record ID_n\", \"corroType\", \"Date\", \"Mailing Title\", \"Title\", \"Signatory\", \"Body Result\"  FROM VLCorrespondence WHERE \"zk_CommunityID_t\" = ? OR \"zk_CommunityID Child_t\" = ? OR \"zk_Family Code_t\" = ? ORDER BY \"Date\" DESC" ; 

correspondenceQueryResults = ExecuteSQL ( correspondenceQuery ; fieldSeparator ; rowSeparator ; Student Current::Student ID; Student Current::Student ID; Student Current::Family Code ) ;

correspondenceData = Substitute ( correspondenceQueryResults ; ["¶"; "\n"]; ["|*|"; "¶"] ) ;

Now each stage of the data gathering process is pretty clear.  Do all the other queries in a similar manner, and then put it all together:

/* now put them together */
assembledResults = correspondenceData & "¶" & emailData & "¶" & notesData & "¶" & smsData ;

/* now clean it up */
finalOutput = cf_cleanAndPlainText ( cf_StripEmptyValues ( assembledResults ) )




cf_cleanAndPlainText () and cf_StripEmptyValues () are custom functions created in the FileMaker Database for removing unwanted leading and trailing white spaces, and empty rows.  Yes I know about the Trim() function, but that's another essay.

Now, the amount of text in increased, but the individual steps followed to get the aggregate output are much easier to follow.  If you write something like this and come back to it 5 years later, you will be thanking your favourite deity that you went to the effort of laying it out this way. It really doesn't take much longer to write either!


fieldSeparator = "|" ;

rowSeparator = "|*|" ;

/* correspondence */

correspondenceQuery = "SELECT \"z_Record ID_n\", \"corroType\", \"Date\", \"Mailing Title\", \"Title\", \"Signatory\", \"Body Result\"  FROM VLCorrespondence WHERE \"zk_CommunityID_t\" = ? OR \"zk_CommunityID Child_t\" = ? OR \"zk_Family Code_t\" = ? ORDER BY \"Date\" DESC" ; 

correspondenceQueryResults = ExecuteSQL ( correspondenceQuery ; fieldSeparator ; rowSeparator ; Student Current::Student ID; Student Current::Student ID; Student Current::Family Code ) ;

correspondenceData = Substitute ( correspondenceQueryResults ; ["¶"; "\n"]; ["|*|"; "¶"] ) ;

/* email */

emailQuery = "SELECT \"z_RecordID_cn\", \"corroType\", \"Date Sent\", \"Surname Preferred Name\", \"eMail Subject\", \"eMailFromName\", \"eMail Body\"  FROM VLEmail WHERE \"zk_eMailToID_f\" = ? ORDER BY \"Date Sent\" DESC" ;

emailQueryResults = ExecuteSQL ( emailQuery ; fieldSeparator ; rowSeparator ; Student Current::Student ID ) ;

emailData = Substitute ( emailQueryResults ; ["¶"; "\n"]; ["|*|"; "¶"] ) ;

/* notes */

notesQuery = "SELECT \"zk_recordID_p\", \"corroType\", \"Date of Note\", \"Surname Preferred Name\", \"Note\", \"Category\", \"zi_CreationUser\"  FROM VLNotesAC WHERE \"zk_CommunityID_f\" = ? OR zk_FamilyCode_f = ? ORDER BY \"Date of Note\" DESC" ;

notesQueryResults = ExecuteSQL ( notesQuery ; fieldSeparator ; rowSeparator  ; Student Current::Student ID; Student Current::Family Code ) ;

notesData = Substitute ( notesQueryResults ; ["¶"; "\n"]; ["|*|"; "¶"] ) ;

/* SMS */

smsQuery = "SELECT \"Ω_id\", \"corroType\", \"created_date\", \"mobile_numb\", \"message_string\" , \"Reply Text\", \"message_string\" FROM VLSMS WHERE \"Student ID\" = ? ORDER BY \"created_date\" DESC" ;

smsQueryResults = ExecuteSQL ( smsQuery ; fieldSeparator ; rowSeparator ; Student Current::Student ID ) ;

smsData = Substitute ( smsQueryResults ; ["¶"; "\n"]; ["|*|"; "¶"] ) ;

/* now put them together */
assembledResults = correspondenceData & "¶" & emailData & "¶" & notesData & "¶" & smsData ;

/* now clean it up */
finalOutput = cf_cleanAndPlainText ( cf_StripEmptyValues ( assembledResults ) )




Saturday, 11 April 2015

A journey through ranking algorithms.

In Education, and particularly in School Reports, one of the things we have to do pretty often is to calculate a “ranking” of Students.  Sometimes it’s in an Assessment Task, sometimes it’s in Subject Exams, and I have no doubt that there are lots of other situations in and outside of School Admin Systems where this functionality is used.

This is a discussion of the various methodologies I went through over the years (and FileMaker versions of course!) in attempting to improve performance of this functionality. To clarify terminology I am going to use:

A “Rank” is an integer given to a student that provides a place in the student grouping, according to a “Mark”.  If two students have the same “Mark” then they are given the same ranking, and the next lower number is not used.  For instance if 4 students get marks of 99,98,98,97, the rankings would be 1,2,2,4.

A “Course” is a subject that is being undertaken by the students, e.g. Year 10 English.  A “course code” is a unique identifying code, e.g. “10ENG”, that can be used to relate the records to each other.


When I first started at Denbigh in 2002, we were using FileMaker 5 ( or 5.5?) at the time.  Our “ranking algorithm” was
1) sort the students by order of “mark”, then
2) post an integer in a “global rank” field, (starting with 1) and post the highest mark in “global mark” field, and then
3) loop through the records. If the mark was the same as in the global mark field, then assign the global rank value to the student’s “Rank” field, otherwise insert the "record number" into both the global rank field, and the student’s rank field, and the mark into the “Global Mark” field.

That algorithm has the advantage of being pretty linear in it’s processing time vs the population.  If you doubled the number of records, it took about twice as long.  The disadvantages were that it did take some time, a user had to push a button to make it happen, and if any Marks were edited the whole thing had to be run again.

Over the years that followed, I went through a couple of different algorithms, principally concerned with making it faster, and eventually making it into an unstored calculation instead.  Making it an unstored calculation then had repercussions in complexity, affecting process time severely in larger populations.  So I’m documenting my journey, arriving at the present method of an unstored calculation that evaluates reasonably fast.

The first improvement I tried required a self relationship.  On the records that carried the mark, I created a self relationship joined by the identifying code (e.g. “course code”) and the “mark” field.  This means that every record that had a mark, could see at least itself through the self join, and any other records that also had the same mark.  I also added another number field, “rankingSerialNumber”.  The algorithm was:
1) Find the records to be ranked.
2) Sort them by the “Mark” field, descending.
3) run a replace command into the “rankingSerialNumber” field, starting with 1, incrementing by 1.
4) run a replace command into the “Rank” field, = min (selfjoin::rankingSerialNumber)

This worked pretty well, instead of a loop through a found set of records, updating global fields and record in each loop, the Rank got derived by two replace commands.  Performance was improved considerably, especially over larger data sets.  It still required scripted operation though, where the user clicked the button.

The ideal would be an unstored calculation.  Along came FileMaker 7, with it’s introduction of “Recursive Custom Functions” providing a way.  If I retain the self-join relationship, this time only on the “Course Code”,  I can sort that relationship by “Mark" in descending order. Then I create a recursive custom function, that given a target relationship::field, and a “Mark Value”, the custom function can just recursively walk down the list of related records. The number of recursive calls needed to reach that “mark” value, is the “Rank”.  In implementation, I made two custom functions, for convenience.

First the recursive one:

GetRankRecursor ( sortedRelationship::searchField ; searchvalue ; n )

if ( 
   getNthRecord ( sortedRelationship::searchField ; n ) = searchvalue 
   GetRankRecursor ( searchField ; searchvalue ; n+1 ) 

Now the convenience one:

GetRank ( searchField ; searchvalue )

GetRankRecursor ( searchField ; searchvalue ; 1 )

Because the value I am searching for is guaranteed to exist, I don’t have to trap for missing values, and since the relationship is sorted, I’m guaranteed to be accessing them in order.

So to tie it all together the “Rank” field is now an unstored calculation =

GetRank ( CourseRank::Mark ; Mark )



I thought so at the time anyway.  A customer trying to rank all their year 12 Math students brought home the fact that it’s computation complexity is approximately order n squared.  Twice the population did not take twice the time, it took quite a lot more.  I tested on a set of test data with 50 mark values.  Throw 500 marks at it, and you could go to lunch while it computed.

To resolve this, it’s necessary to look at two things :

1) the relationship had to sort it’s related records each and every time the recursive custom function called itself.  Fortunately FileMaker itself seems to be doing some internal cacheing, alleviating this quite a bit, but it’s still there.

2) If I have a test set of 50 marks, then the average number of recursive function calls per student mark was half that, i.e. an average number of recursive calls to rank 50 students could be as high as 25.  Increase the number of students to 500 and the AVERAGE number of recursive custom function calls *per student* could go as high as 250!

No wonder it took a lot longer when a larger data set walked up to it.

Fortunately, FileMaker 8 provided us with variables.

To improve this:
1) The first thing that occurred to me, is that it shouldn’t have to find related records and sort them for each and every student. If I find them once, and then store the ordered set of marks in a variable, then that repetitive part of the algorithm turns into a once-only operation.  If I store them as return delimited values in a variable, I can implement the second improvement.

2) The second improvement that occurred to me was that if I wanted to find my particular value in a set of return-delimited values in a variable, I didn’t necessarily have to recurse through it using a custom function.  I could use FileMaker’s native “Position()” function to tell me where my mark was in that variable’s data, then use PatternCount() to count the number of “returns” in the text up to that point, and I have the rank number.

The sneaky little gotcha with this idea is that the very first value must be a return character. This is because we can't just search for "2", if we did it would return at the position of "22" or "2something".  So we have to search for "<return>searchvalue<return>". You also want to have a trailing return character by itself too, for the same reason.

So now, the calculated ranking field becomes slightly more complicated.  It has to look for it’s value in a valueList, and create that valueList if it doesn’t exist yet.  The value list will have to be in a global variable so that it is visible no matter what script (if any) is running.  Then I find the position of the Mark in that ValueList, and count how many return characters exist "to the left" of that position.

  $$markValueList = if ( is empty ( $$markValueList ); “¶” & list ( CourseRank::Mark ) & “¶” ; $$markValueList );
  offset = Position ( $$markValueList ; Mark ; 1 ; 1 );
  rankValue = PatternCount ( left ( $$markValueList ; offset ); “¶” ) + 1 
  /* because there is nothing to the left of the highest mark */

So now I have a calculated field for a student’s rank, and the only significant performance hit is in the sorted relationship “CourseRank” that gives me the sorted list of values.  It only has to do that once though.

Now because this is an unstored calc, it will only be evaluated when it’s needed for display, computation, or print purposes.  The catch here is that some teachers like to see the student ranks as they are entering the marks.  So not only do I refrain from calculating a rank until a Mark value exists, but I must somehow make it recalculate whenever a “Mark” is inserted or edited.  The simplest method would be an auto enter calc on the "Mark" field that causes the $$markValueList to be cleared each time a mark is inserted.  As soon as the global variable is emptied, the very next Ranking calculation will cause it to be regenerated automatically with the new data in it.

One more possible improvement : instead of having the sorted relationship “CourseRank”, an unsorted relationship could be used to gather up the list of “Mark” values.  Not having to sort the records first would mean that it could gather the values much faster, particularly on larger data sets.  This is not of critical interest where the table is very narrow, but if it's a wide table, it becomes much more critical.  This is because FileMaker cannot retrieve the value of a field without first retrieving the whole record, so the wider the table, the more data it has to retrieve from disk before sorting the Mark values.  Of course I still need to sort those values in descending order, but I can use custom functions for that.  One possibility is but in this, Google is your friend.  A caveat with loading an unsorted set of values into a global and then sorting them, is that they will be text, not numbers, so you need to watch for thinks like “3 being higher than 22”.

I am somewhat ambivalent about that last improvement : FileMaker sorting records by embedding the sort in the relationship might be just as quick as using custom functions to sort text, depending on the schema.  Worth testing it.  Also, it can get complex, so remember that sometime in the future, someone who has never seen your code before may have to try and figure out what you were doing.  So if you go down that path, make your code as easy to comprehend as possible.

One last thing is the global variable name.  You can use a “hardwired” global variable, e.g. $$courseMarkList, but if so, you must take care to clear it after use, so that if you go on to rank another set of data, your first set of values won’t be there anymore.  I’m not a fan of this approach, it would be very easy to accidentally use the wrong data set.  My preference is to use individual global variables, one for each course.  So if I am ranking the “10ENG” course marks, I would be using a global variable that is something like “$$courseMarkList10ENG”. This is really easy:

Custom Function “Create Global Variable ( variableName ; variableValue)”

evaluate ( “let($$” & variableName & “=\”“ & variableValue & \”;true)” )

Custom Function “Get Variable Value ( variableName )”
evaluate (“\”” & variableName & “\””)

So now we have a ranking algorithm that gives live ranking calculations across arbitrary data sets pretty quickly and easily.

Thursday, 4 September 2014

Parameter mechanisms in FileMaker coding.

FileMaker only lets the developer use a single parameter to a script (what we called "procedures" back in Pascal in CompSci 101 ), so developers wind up using some kind of structure mechanism to send data into the parameter, then inside the script, parse the structure to get the values they want.

The first idea, and in most ways the simplest, is to just delimit each “value” with a carriage return or other delimiter character or characters.  This idea, while it works, is also somewhat limiting in that the developer must always construct their parameter very carefully with all the intended values in the right order.

Going through various trials and errors, I ended up going for “Property Lists”, as implemented by Shawn Flisakowski, here.  You should really download and play with the "Property Lists" to get a handle on how this works.  Using PropertyLists we wind up with parameters that look like this:


This has some nice advantages.  You can construct your parameter with any number of values, and in any order. If you need to extend an existing script to add some extra functionality to it, you can just add more properties, and it won’t disturb your existing code. Also, if you are watching in the debugger, you can see what values have been assigned to which property names.

It also has a couple of relatively minor disadvantages : 
1) you should substitute any “return”, “equals”, and "semi colon" characters in the values you are putting in so as not to confuse the mechanism. Fix : modify the “Add Property” and “Get property” custom functions to handle this. We created two functions, EncodeForProperty(thetext) and DecodeFromProperty(thetext).
2) All values you pass in will be converted to text.  If you pass in a date, you get a silent date-to-text coercion happen, which has implications because it does not obey the region settings of the computer you are on, but the region settings the file was originally created in.  Fix : do a coercion of your own, getastext(getasnumber(datevalue)) when you “add property” and then do getasdate(getasnumber(datevalue)) to get it back inside your script.

This worked very well. Later on though, I had to write a routine that tested for the existence of about 20 possible properties.  The script that was executing was never going to have more than 4 or 5 of them passed in as a parameter, but it could be any combination of those 20 possible values.  Once I had written the code 

if not isempty ( get property ( property list ; property name ) )
  set field thingummy to get property ( property list ; property name ) )
end if

20 times, I realised I was making life hard for myself.  So I took another look, and if you take the statement 


and put a dollar sign in front of it


then put a double quote in after the equals sign and at the end


then wrap it in a let statement


then wrap the whole lot in an “Evaluate()” function call, escaping the quotes

Evaluate ( "Let($Surname=\"Nurk\";get(last error)")

our parameter value just turned into a script local variable.  Sounds like a lot of work doesn’t it?  However you can do it all with custom functions.

First, the single value case:
Function: ParameterToLocalVariable(property)=
If ( not ( IsEmpty ( property ) ) ;
propWithLeadingDollar = If ( Left ( property ; 1 ) = "$" ; property ; "$" & property );
propWithQuotes = Substitute ( propWithLeadingDollar ; "=" ; "=\"" ) & "\"";
propWrappedInLetStatement = "let(" & propWithQuotes & ";\"\")";
propInstantiated = Evaluate (propWrappedInLetStatement)
Get ( LastError )
; "0") // outside if

Second, the multiple value case:
Function: PropertyListToLocalVariables(propertyList)=
If ( not ( IsEmpty ( propertyList ) ) ;
 countProperties = ValueCount ( propertyList )
 If ( countProperties > 1 ; 
   PropertyToLocalVariable ( GetValue( propertyList ; 1 ) ) & PropertyListToLocalVariables(RightValues(propertyList ; countProperties-1 ))
 )  // inside if
) // let

) // outside if

Finally, we want an overall function to call at the start of the script that is going to take care of all of this for us, and return a boolean for whether or not it succeeded:

Function: ParameterConvertedToVariablesOK()=
 t = PropertyListToLocalVariables ( Get ( ScriptParameter ) )
 isempty ( Get ( ScriptParameter ) ) 
  ( GetAsNumber( t ) = 0 ) and ( PatternCount ( t ; "?" ) = 0 )

If there is no script parameter, return true.  If the conversion to script local variables fails, one of the ParameterToLocalVariable(property) calls will return a “?” in the result, causing the function to return false, otherwise all the script local variables get created from the parameter in a single function call.  Your script code looks like this:

Script “Make Person Record (properties surname firstname)” = 
if (ParameterConvertedToVariablesOK)
 new record
 set field “People::Surname” to $surname
 set field “People::First Name” to $firstname
 commit record [no dialog]

Any FileMaker Developer reading this will realise I have left out the EncodeForProperty(thetext) and DecodeFromProperty(thetext) functions that handle the “return”, “equals” and "semi colon" characters in the value, but I leave that as an exercise to the reader. (As Bugs Bunny would say, “ain’t I a stinker?” ;-)

This mechanism has been used by us a Denbigh since soon after FileMaker 7 came out, and it’s been one of the best design decisions we could have made.  As we have extended the functionality and capabilities of Denbigh Admin over the years, this mechanism has made life a lot easier for all of us.

Friday, 29 August 2014

Testing for modifier keys in FileMaker

A couple of days ago we had a developer do something like this

If ( Get ( ActiveModifierKeys = 4 ) ) //Control Key
—do stuff
End If

In testing, it worked for him, and didn’t work for me.  Why, when I was holding down the control key, did it not work for me?  Well, Get ( ActiveModifierKeys ) returns an integer, based on the sum of all the modifier keys, and I had my caps lock on accidentally, which caused Get ( ActiveModifierKeys ) to return 6, and therefore not run the intended code.  So a little analysis (and a better test for which keys are down) is necessary here.

From FileMaker’s documentation :
The number returned is calculated by summing numbers representing each modifier key being pressed. The values assigned to the keys are:

Shift = 1
Caps Lock = 2
Ctrl (Windows) and Control (OS X) = 4
Alt (Windows) and Option (OS X) = 8
Command (OS X) = 16

OK, so we have numbers that are powers of two, so obvious FileMaker is assigning each modifier key to a “bit” of the integer byte value, and we can therefore create a better test, by seeing if a specific “bit” is set.  As so often happens in the FileMaker World, someone else already did it.  In this case, it was Mikhail Edoshin, and he wrote some nice custom functions to handle this.  

Custom functions to simplify reading the current state of modifier keys

I highly recommend browsing Mikail’s stuff in the web archive, he’s written some really cool stuff.

To get back to topic at hand, Mikhail’s custom function:
Function: Bit Is Set ( number ; bit ) =
Mod( Div( number; 2^( bit - 1 ) ); 2 )

Very elegant!

From this Mikhail wrote some more custom functions

Function: Shift Is Pressed()=
Bit is Set( Get( ActiveModifierKeys ); 1 )

Function: Caps Lock Is Pressed()=
Bit is Set( Get( ActiveModifierKeys ); 2 )

Function: Control Is Pressed()=
Bit is Set( Get( ActiveModifierKeys ); 3 )

Function: Option Is Pressed()=
Bit is Set( Get( ActiveModifierKeys ); 4 )

Function: Command Is Pressed()=
Bit is Set( Get( ActiveModifierKeys ); 5 )

For Windows machines, “ALT” = Mac “Option” Key, and “Windows” Key = Mac “Command” key. 

So, back to the original problem, changing the code to this

If ( Control Is Pressed )
—do stuff
End If

does two things : 
1) reads like English, and 
2) it doesn’t matter what other modifier keys are being pressed, it will correctly determine if the “Control” key is one of them.

It's a subtle thing, but it adds the kind of polish to your code that distinguishes professionals from amateurs.

Wednesday, 9 July 2014

Doing a mass export of all data, using FileMaker’s native ExecuteSQL() function and a virtual table.

Recently at work we had a customer request for a facility to do a mass export of all their data.  The reason why such a facility did not already exists requires a little background.

Denbigh Admin” is our primary product.  While we do other work as well, this is our bread and butter.  Denbigh Admin first started out around 1998 as “.fp3” files.  Back then, a FileMaker Database file had exactly one table.  So in order to add modular functionality, new files were created and added as necessary.  Eventually, we exceeded 100 database files in our file set, which got entertaining when you consider that FileMaker Pro clients could only open a maximum of 50 files at a time back then!  When FileMaker Pro 7 came out, the “.fp7” file format allowed multiple tables in the one file, and we conducted some table consolidation around modular functionality (and user privilege areas), reducing to an eventual set of 47 files.  Over time, that has slowly crept up again, and we are now up to 83 files and around 250 tables or so, depending on which version of Denbigh Admin you have, and what customisations were specified.  I forgot to mention, every copy of Denbigh Admin is customised for each and every customer.  No two are identical.

Anyway, back to the customer request.  Because of the background I mention above, there actually isn’t a single facility in Denbigh Admin to export or import all data at once.  Where we have to upgrade a customer’s files, it’s always in specific modules only, and since no two are exactly the same, we never worried about a generic export & import setup.

So one of our developers started doing it the obvious way, going to each file, and building export routines for the tables in those files.  A little way into the process, the job came up for discussion, and I suggested another way, using FileMaker’s native ExecuteSQL() function to get the data, and a “Virtual Table” to export it.  Next question was where in the file set to do this? Use an existing file, or make a new one?  Some time back, we implemented FMDataGuard auditing, but when it was not compatible with FMP13 (yes I know it is now) we removed it in favour of an entirely native auditing solution. That is the topic of another post…  Anyway here we were with an “AuditLog” file with very little code, and only one small table.  A very nice little utility file that we could add functionality to without adding to the file population.

So I added “external data sources” to all the other files in Denbigh Admin, and then added table occurrences to the relationship graph.  Now the trick at this point is to name the table occurrences according to some sort of pattern. We had only the original AuditLog table in the relationship graph to start with, so I decided to name the table occurrences using “<filename>.<tablename>”.  This took care of a couple of potential problems.  The first was that we had a couple of different Finance-related files that had some identically named tables in them, so using this method prevented any potential ambiguity, and the second aspect was to make it easy to identify which table occurrences were there for export purposes, because they contained a “.” in their name.

Next question, how to dynamically detect all such table occurrences?  Andrew Duncan wrote an article about FileMaker’s own “schema tables”,  that can be used to determine the relationship graph contents.  I used it to get the list of table occurrences that have a “.” in their name, like this:

Custom Function
Get_export_table_list =
sqlQuery = "SELECT TableName FROM FileMaker_Tables
WHERE FileMaker_Tables.TableName LIKE '%.%'"
ExecuteSQL ( sqlQuery ; Char(9) ; Char(13))

So now I have a list of all the table occurrences where they contain a period “.” in their name.  The next thing is to get a list of the stored data fields for a table occurrence.  I made a custom function for this too, so that I can just pass in the table occurrence name, and get out the field list, like so:

Custom Function
Stored_field_names_for_table (tablename) =
tabchar = Char ( 9 );
returnchar =  Char ( 13 ) ;
sqlQuery =  "SELECT a.FieldName 
FROM FileMaker_Fields a
WHERE a.Tablename = '" & tablename & "'  AND a.FieldClass = 'Normal' AND a.FieldType NOT LIKE 'global%' AND a.FieldType != 'binary' "
ExecuteSQL( sqlQuery ; tabchar ; returnchar );

This will return the list of fields classed “normal” so it doesn’t include any calculated fields, and excludes container fields (hence the “!= ‘binary’”), and excludes global fields, since they, by definition, do not contain stored data.

So now we have the tools to discover the tables we want to export, and the fields in those tables that we want to export.  If we want to add another table to the export routine in the future, we only need to add it to the relationship graph, and name it according to the schema above.  No other configuration or scripting necessary.

Halfway there!  Or so I thought….

The initial idea here was to export the data using a virtual table, so I ran through the tables in the graph, and the widest one was the “Current Student” table with 187 stored fields.  I decided to make the virtual table with 200 fields to leave a little leeway.

The virtual table technique is described in lots of places around the Internet, but what it comes down to is making a table where every field is an “unstored calculation” on a variable, and to get data in it, you push that data into the variable. Most techniques use a global variable, and I initially did too, so that the data stayed visible to all scripts, even if I exited the script at any time. Since variables can have any number of “repetitions” we can programmatically use one as a theoretically infinite array.  In practice the limits are set by the machine’s resources, but you’d have to be a very sadistic programmer to overload the machine by populating variables!

Anyway, there are a couple of different ways to go about this, using a global variable “repetition” as the “column”, or using the global variable “repetition” as the “row”.  Initially I figured the column would be the way to go, because I could load up the $$VirtualColumn[n] with a dedicated sql call for each field.  Time for another custom function:

Custom Function
SQLQuery_For_field(tableName ; fieldName)=
tabchar = Char ( 9 );
returnchar =  Char ( 13 ) ;
sqlQuery = "SELECT a.\”” & fieldName & “\” FROM \”” & tableName & ”\” a"
ExecuteSQL ( sqlQuery ; tabchar ; returnchar )

Putting it all together:

begin script
  set $counter to 1
  set $fieldCount to valuecount ( $fieldList )
    exit loop if $counter > $fieldCount
    set variable $thisField to getvalue ( $fieldList ; $counter )
    set variable $$VirtualColumn[$counter] to SQLQuery_For_field(tableName ; $thisField)
    set variable $counter to $counter + 1
  end loop
end script

OK!  We’ve got the global variables $$VirtualColumn[n] populated.  Now we go to virtual table, where the fields are unstored calculations like this:

Field001 = getvalue ( $$VirtualColumn[1] ; get(recordnumber) )
Field002 = getvalue ( $$VirtualColumn[2] ; get(recordnumber) )
Field003 = getvalue ( $$VirtualColumn[3] ; get(recordnumber) )
Field200 = get value ( $$VirtualColumn[200] ; get(recordnumber) )

Now how many records do we need in the Virtual Table?  Even easier! ValueCount ( $$VirtualColumn[1] ) and you’ve got it.  Go to VirtualTable layout make sure there are enough records, and export.

I’ve skipped a step in here, in that the customer’s request was to export the data as Merge files, “.mer”.  This is easy, it’s just a CSV file with the first row of data being the field names.

Anyway, I thought I was home free here.  SQL queries to populate the $$VirtualColumn[n] in a matter of seconds, and then to export the data, set a $fileName variable to the “tablename” & “.mer”, add the path to desktop, and export the whole 200 fields as csv.  If the table being exported has less than 200 fields, the remaining fields are empty, no field name and no data.  No problem!

The more experienced FileMaker Developers out there are going to know what’s coming up next, they’ll have spotted the problem a couple of paragraphs back.

You see when it goes to export the first row of data, it grabs the first value from $VirtualColumn[n] for each field for that row.  Then when it exports the second row of data, it goes to the second value in $VirtualColumn[n], then when exporting the third row of data, it goes to the third value in $VirtualColumn[n] etc.  This is no problem if you only have a few thousand records at most.  But if you have a lot more you’re in trouble, because when it gets to the 10,000th record it’s parsing through 10,000 values in the variable to get the one for that field for that record.  The “Attendance Archive” table had 85,000+ records, it took 5 hours to export.  The Student Outcomes table had 430,000+ records, I wasn’t even going to attempt that!

So, we have to switch to $VirtualRow[n] structure instead of $VirtualColumn[n].  In some ways this is pretty easy. First, yet more custom functions:

Custom Function
Stored_FieldNames_For_Table_As_Field_List (tableName) =
t1 = Stored_FieldNames_For_Table ( tablename );
t2 = Substitute ( t1 ; "¶" ; "\",a.\"" );
t3 = "a.\"" & t2 & "\""

Custom Function
SQLQuery_For_Table(tableName) =
"SELECT " & Stored_FieldNames_For_Table_As_Field_List ( tablename ) & "¶FROM \"" & tablename & "\" a"

Then we get the data and load it into a variable

Set $$TableData to ExecuteSQL ( SQLQuery_For_Table(tableName) )
Then parse it out to the $$VirtualRow[n]

begin script
  set variable $counter to 1
    set variable $recordCount to ValueCount ($TableData)
    exit loop if $counter > $recordCount
    set variable $$VirtualRow[$counter] to Substitute ( getValue ( $$TableData ; 1 ) ; tabchar ; returnchar ) 
    /* so each field in the source data becomes a separate value in the virtual row */
    set variable $counter to $counter + 1
    set $$TableData to RightValues( $$TableData ; $recordCount -1 )
    /* it’s always taking the first row of data */
  end loop
end script

At this point, the Virtual Table has to have a change in it’s calculation for each field to

Field001 = get value ( $VirtualRow[get(recordnumber)] ; 1 )
Field002 = get value ( $VirtualRow[get(recordnumber)] ; 2 )
Field003 = get value ( $VirtualRow[get(recordnumber)] ; 3 )
Field004 = get value ( $VirtualRow[get(recordnumber)] ; 4 )
Field200 = get value ( $VirtualRow[get(recordnumber)] ; 200 )

Now the maximum number of values the calc engine will need to parse when exporting is 200. However getting that data from $$TableData to $$VirtualRow[n] takes a really long time.  When attempting to do it on the Student Outcomes table, with 430,000+ records, I timed the first few thousand and it was going to take WAY too long to parse it all.

So thinking about it some more, I realised that if I was parsing through the whole pile of data in one big loop, for each iteration it was going to have to parse an average of (ValueCount(all the data) / 2) values. Moreover, the time it was taking to get to a value did not seem to be increasing linearly.  In other words if I wanted the 50th value it took (t) time, but if I wanted the 100th value, it seemed to me to be taking MORE than (t x 2) to do so.  If that theory was correct, then breaking it down into smaller chunks should improve things considerably.So I tried this instead.

Set $$AllTableData to ExecuteSQL ( SQLQuery_For_Table(tableName) )

begin script
  set variable $allRecordCount to ValueCount ($AllTableData)
  set variable $counter to 1
    exit loop if ValueCount ($$AllTableData) = 0
    set $TableData to leftValues ( $$AllTableData ; 1000 )
    set $AllTableData to if ( $allRecordCount > 1000 ; RightValues ( $$AllTableData ; ValueCount ($$AllTableData) - 1000 ) ; null )
      set variable $recordCount to ValueCount ($$TableData)
      exit loop if $recordCount = 0
      set variable $thisOne to Substitute ( getValue ( $$TableData ; 1 ) ; tabchar ; returnchar ) 
      /* so each field in the source data becomes a separate value in the variable */
      set $TableData to RightValues( $TableData ; $recordCount -1 )
      /* so that it’s always taking the first row of data */
      set variable $$VirtualRow[$counter] to $thisOne
      set variable $counter to $counter + 1
    end loop
  end loop
end script

So what is this doing?  It’s taking that big whopping pile of $$AllTableData, breaking it into 1,000 row chunks, and processing each chunk.

This architecture parsed all the data of 430,000 records into the $$VirtualRow[n] global variables in well under an hour, and exported it all in the time it took me to go get another coffee.  Wished I’d thought of it sooner!

So what other catches are there?  Well, you may want to clear all the $$VirtualRow variables between each table export.  What’s worse is that you can’t do this with a recursive custom function, the maximum number of times a recursive custom function can call itself is 10,000 times. ().  So you have to make a looping script. Pass in a parameter of how many rows you had in the last export, and loop through setting $$VirtualRow[$counter] to NULL.  (I forgot to mention that I have a custom function called NULL that returns nothing at all, a surprisingly useful function actually).  There is another way to get around this, and that would be to write the parsing and exporting logic in one big script, and use a script variable $VirtualRow[n] instead of a global variable $$VirtualRow[n].  Then when you exit the script, the $VirtualRow[n] variables would all get destroyed for you by the system.  The logic would look like this:

begin script
set $tableList to Get_export_table_list
set $counter to 1
set $tableCount to ValueCount ( $tableList )
  exit loop if $counter > $tableCount
  set $thisTable to GetValue ( $tableList ; $counter )
  Export_data_from_table ( $thisTable )
  set $counter to $counter + 1
end loop

The routine “Export_data_from_table” would then do that actual determining field names, loading and parsing the data, and exporting, all in one script using script variables $VirtualRow[n]


So what to take away for this?  A technique to handle mass exports of data using ExecuteSQL and a Virtual Table, that is very generic in it’s operation, requires very little configuration change to suit almost any system, and automatically adjusts for any fields you add or remove from a table.

However, while FileMaker makes working with return delimited values really easy, it is correspondingly easy to program in a very inefficient manner.  If you have to process really large quantities of text values, break it up into smaller pieces first!

Friday, 4 July 2014

Who I am

My name is Peter Gort, and I'm a Database Programmer with Denbigh International,  I've been working for Denbigh since 2002, mostly working with FileMaker Pro databases, with a smattering of various SQL systems integration occasionally.  The vast majority of our work is in custom administration and reporting systems for educational institutions.  Given the time I have been working for Denbigh now, I must really love the job!

Before that I was Tier 2.5 Helpdesk with Apple Australia, I worked for Apple for 4 years... and a roller coaster ride it was.  I started just as Steve Jobs started turning the company around, and brother it was a ride.

Before that I wasn't in I.T. at all, I was a Diesel Mechanic, Allison Guild Craftsman, and truck and bus driver.  Something that most of the people did not realise when I applied for jobs with them, is that Tech Support and Diesel Mechanic are almost the same job.  The skill set is the same, just the machines and tools are different.

In 2004, I moved my family from Western Sydney, to Gunnedah in New South Wales.  Non-Australians will not comprehend the enormity of the move, in fact a surprising number of Australians don't know where Gunnedah is either, in spite of one of our most famous national poems being about the place.  Dorothea Mackellar's "My Country"

While the job opportunities for the rest of my family are not that great out in the country, the neighbourhood is friendly, the scenery magnificent, the air is clear, and the climate easy to cope with. Not to mention that buying a home is 1/5th the price of Sydney!  I get to work from home most of the time, but I get some good travelling in when customers want me to show up and prove I'm a real person.

This blog is mostly going to deal with technical stuff, usually experiences and techniques working with FileMaker Pro databases, but occasionally I'll diverge a bit.