A Django site.
August 10, 2008

Hans Fugal
no nic
The Fugue :
» New FamilySearch

So I finally got around to trying out the "New FamilySearch" today. I am both impressed and disappointed.

First the good parts. NFS (you didn't think I was going to type "New FamilySearch" over and over, did you?) has an impressive goal and paradigm. The goal is to create one hugemongous centralized database for all church members. The idea is to get away from the half dozen church databases (Ancestral File, IGI, etc.), and the half gazillion individual databases. A noble goal but a very scary one. It would be easy to screw this up and make a bigger mess than that with which we started. In fact this is why I have been reluctant to check it out—I didn't want to be disappointed and I wholly expected to be.

Well, they actually pull it off quite well. The new paradigm is to keep everything and to promote recording evidence. In short, genealogy done right. When you merge a person, that is recorded and available for others to see. When you want to change information, you don't change it directly (as you would in a conclusion-based program like PAF), but instead you "add an opinion" complete with sources and/or notes. If you think that a piece of information is wrong and you have evidence against it, you can dispute it (again, giving source and notes). The old "wrong" information isn't eliminated, but it is marked as disputed. The changes and choices you make about people show up in the pedigree chart etc. This is multi-user genealogy done well (I might call it "distributed genealogy", but I'll reserve that term for something better, as you'll see later).

From the perspective of an LDS member this is a fantastic system. When ordinances are performed in the temple they are immediately reflected in the database. When you want to do temple work for so-and-so, you state your intentions in the system and print out a page to take to the temple with you. If anyone else tried to do the same work, they'd see it was in process. This will drastically reduce—perhaps even essentially eliminate—duplicated effort in the temple. I have to say it's about time. It would have been cool 10 years ago. It was expected 7 years ago. Now it's finally here.

There are some other cool tidbits, too, like the pedigree view which combines couples to make better use of space (are they the first to think of it? Probably not, though I haven't personally seen this approach before):

NFS pedigree

There's an info box at the bottom with different tabs, one of which is "possible duplicates". I much much much prefer working with duplicates in this manner, rather than a global "match and merge". Very nice. There are also time lines and Google Maps integration (see where your ancestor was born, married, died, etc.). And those little temple icons unobtrusively notify you of potential temple work to do (or that has already been done). Overall they make nice use of AJAX, too.

But there's problems. Big problems.

It's slow. Painfully slow. It's slow enough to be a real pain for doing actual genealogical work. Maybe people with limited computer skills wouldn't find it slow, because it moves at about the pace they can keep up with. But for those of us in the computer age (read: almost everyone in my generation or younger) it is painful and restrictive. Why is it slow? Because it's a web app. News flash! Even AJAX web apps are slow.

Ok, it's slow. No big deal, right? Just download the GEDCOM, do your research, and upload the changes. Right? I have news for you. There's no exporting data from NFS. The help center has this to say:

Exporting Information from FamilySearch for Use in Your Personal Computer

This topic describes how to get information from FamilySearch into your family history computer program.

If you find information in FamilySearch that you do not have, you will need to either use the cut and paste features of your operating system or retype it into your computer program.

Currently, FamilySearch does not support downloading information for use with Personal Ancestral File or similar computer programs. Family history computer programs may choose to support this feature when it becomes available from FamilySearch.

Really. Cut and paste! It is a big black hole waiting to consume your information and display it to you on its terms only. Its slow terms. You want to make a family pedigree website? Write a script to spit out all the place names of your ancestors so you can put blue dots on a map? Make a Google Maps mashup? Do any number of other useful things with a GEDCOM export, including actually be able to work with it at a reasonable speed, put it on your handheld for reference at the family history library? Print out reports? No way. Uh-uh. Remember how I avoided using the term "distributed genealogy"? It's like having your genealogy in a distributed revision control system like mercurial or git, but you can only access the one single repository with a web interface. You can't check out the code. You can't work offline. You can't use your own tools. You can't write emergent scripts. You're screwed.

For understandable reasons, you can't see information on living people, and they don't show up as search results. You do get access to your own ancestors and descendents and your spouse, but apparently not your spouse's family, your siblings, or any information on living people (like your parents' birthdays, etc.). You can enter this information in, or upload it in a GEDCOM. But the first rule of genealogy is start with your 4 generations. If everyone starts with their 4 generations, but most of those people are still alive, then how much effort is duplicated? How many duplicate versions of my dad will there be? Well let's see, he has 11 siblings, various aunts and uncles who are into genealogy, 7 children (who should all see the same record, but might conceivably enter conflicting information). Not a huge problem, but an annoyance. Once you fill out the tree to the dead people (hint: upload a GEDCOM of what you already have here, but only those first couple generations), then you find and link the dead people into the tree, then you have a nice resource. So far, it's just a research resource—I wouldn't trust a lot of things further than I can throw them, but they make good research jumping-off points. Maybe eventually through the hard work of thousands it will converge to a respectable database, in the spirit of a wiki.

Also, it's presently restricted to LDS members (you need your membership number and confirmation date to register). The best genealogists I know aren't LDS. Certainly the bulk of decent genealogists I know aren't LDS. Most of the lousy genealogists I know are LDS. (Of course, that doesn't mean we have a monopoly on lousy genealogists, I just haven't had reason or opportunity to mingle with lousy non-LDS genealogists much). So this seems like a drawback across the board.

Maybe down the road (I think it's still beta, though they never use that word) it will allow GEDCOM export and be available to all genealogists. Maybe the speed issue will be addressed, or they'll come up with a desktop client. Maybe this will be the rockingest genealogy database ever. Or maybe it will be of marginal interest—a great way to prepare names for the temple and avoid duplicate temple work, but not a good tool for daily genealogical work. Time will tell.

I am impressed by the no-information-loss implementation. I'd like to propose taking it a step further. What if we could publish genealogical repositories on websites like we do with mercurial or git? What if we had the genealogical equivalent of github? What if you and all the other genealogists out there could, without information loss, match and merge and add information and correct information and unmerge faulty merges and… all without loss of information, the ability to go back in time (like you can with a revision control system), etc. A global genealogical database, a global record of genealogical discovery. Now, one huge database doesn't make a lot of sense. It'd be a pain to push and pull. So you'd have to be able to push and pull only pieces of the tree. And of course the merging, confidence, dispute, etc. aspects would have to be dealt with well (as they mostly are in NFS, though there would be unique challenges for it in a truly distributed genealogical system). Just imagine the potential. And feel free to expound on your imaginings in the comments.

July 31, 2008

Hans Fugal
no nic
The Fugue :
» gedtag

Have you ever tried to import aunt Millie's n-thousand-person GEDCOM into your database? You either ended up with a reeking mess of a database, gave up and restored from a backup, or went insane trying to clean up the mess. Believe me, I know. And my family GEDCOMs are fairly well-behaved. But then there's always Ancestral File or online generated GEDCOMs.

This is no laughing matter. In fact, it has been the single most debilitating roadblock to me doing any real genealogy since I got the bug as a teenager.

I think I finally have a way to tame the beast. It's not a magic bullet—there will still be a lot of mind-numbing match/merge. But it will maintain order and the integrity of the database.

First, start with a clean slate. If you have an existing database, export it to GEDCOM and make a new database. This step isn't strictly necessary but keeps things ultra clean. If you're afraid you'll lose information in the export/import, you need a different genealogy program.

Now, sort your GEDCOMs to import by their importance and reliability. Your original database export probably comes near the top of this list, although not necessarily. Write this down. In fact, write everything down when doing anything in genealogy.

Now, take that first GEDCOM and run it through my magic filter. This will add REFN tags to your GEDCOM that look something like this: hans@fugal.net,2008-07-30:foo.ged/INDI/I1. This tag tells you the submitter's email (or name), the date in the GEDCOM file (or today's date), the name of the original GEDCOM file, and the identifying information for this particular record. In short, it keeps track of where that record came from. It will show up in PAF as the custom ID, and likely in other software in a similar manner.

Now import the GEDCOM. In PAF, there is an option on import to reuse RINs. Uncheck this option. The import screen tells you that highest RIN currently used. Take note of this RIN. Now every record in the import will have a RIN above this RIN. The RIN is easier to use in match and merge (it's right there, you don't have to dig for it), so the tags we added are for posterity's sake.

Now, do the match and merge. Did you know that PAF has the ability to match and merge based on the _UID tags it spits out in GEDCOMs? That means if this GEDCOM and the GEDCOM(s) you've already imported have a common ancestor, the universally unique IDs will match, and you know without a doubt that someone thought they were the same person already. You can breeze through these merges with confidence that you won't merge people you shouldn't. Likewise there is an AFN match and merge, which is almost as trustworthy. (I'm a bit paranoid so I always double-check anything coming from Ancestral File. Maybe it's because there are about 5 versions of me in Ancestral File, most of which can't even spell my name right.) Finally, go through the other options (name, soundex) and do a thorough match/merge.

Now, go through all the remaining RINs greater than the RIN you noted earlier. These are the new people in your database. Get to know them. See where they sit in the pedigree. Read the notes. Make sure they meet your quality standards. Add sources if you know of them. Make notes of missing information, questionable stuff to research, etc. You should have a whole truckload of research tasks to do after this import—and some of them you should do before the next import (you'll recognize these if you take the time to think of them and write them down). Actually you should do that with every person you merge in the previous step as well, since they will merge into lower RINs. Don't hit that merge button until you've done the quality check!

After weeks, months, or years of doing this on Sunday afternoon, you will have a meticulous database that works for you. You will have laid a solid foundation which will impower your future research efforts. You will not be sorry.