A Django site.
November 1, 2008

Jesse Stay
obfuscated, Uncle_Jesse
Stay N' Alive » OSS
» 01-1.jpg

01-1.jpgThis is a picture of my Great-Grandfather, Joseph Stay. With a son named after him, I’ve spent some time reading about him and learning about the experiences of his life that I can pass down to my son. One of my favorite things to do in my spare time (when I get any) is to read about the lives of my ancestors. My faith teaches about life both before and after this life, and as such, it’s important for me to know who came before me and how I came to be. Besides that, it’s just plain fun.

Some of my ancestors were very good at tracking their lives and what they did. Some of them kept journals and records, so that their progenitors could learn about them after they passed away. I have a journal like this, as do my parents and grandparents. These journals show a glimpse into our successes, trials, and failures, and what we did to overcome them in hopes that our children and those that come after us can learn from our own mistakes and make their lives better.

This concept is great, except it only applies to those after this life - only they can learn from us because we often keep these details secret. What if we could share the skills we have, let others try them out, play with them, learn from them, just as we’re able to do with the experience we’ve learned from our ancestors, but in this life?

This is the reason I like the concept of “Open Source”, which started with Software, but really, could be applied in all expertise. The concept of “Open Source” is all about sharing the experiences we have in this life and allowing others, still in this life to try those experiences out, apply their own experience, and continue to share with others. It’s just what our ancestors did for us, but applied to this life.

What if we all, in everything we did, shared what we did with those in this life, instead of planning for the next, so that we could start that legacy of learning right here and right now. What if we as a society were working together instead of just us and those that follow us after this life? Why do we have to wait until we’re dead to let others learn about what we’ve done?

August 10, 2008

Hans Fugal
no nic
The Fugue :
» New FamilySearch

So I finally got around to trying out the "New FamilySearch" today. I am both impressed and disappointed.

First the good parts. NFS (you didn't think I was going to type "New FamilySearch" over and over, did you?) has an impressive goal and paradigm. The goal is to create one hugemongous centralized database for all church members. The idea is to get away from the half dozen church databases (Ancestral File, IGI, etc.), and the half gazillion individual databases. A noble goal but a very scary one. It would be easy to screw this up and make a bigger mess than that with which we started. In fact this is why I have been reluctant to check it out—I didn't want to be disappointed and I wholly expected to be.

Well, they actually pull it off quite well. The new paradigm is to keep everything and to promote recording evidence. In short, genealogy done right. When you merge a person, that is recorded and available for others to see. When you want to change information, you don't change it directly (as you would in a conclusion-based program like PAF), but instead you "add an opinion" complete with sources and/or notes. If you think that a piece of information is wrong and you have evidence against it, you can dispute it (again, giving source and notes). The old "wrong" information isn't eliminated, but it is marked as disputed. The changes and choices you make about people show up in the pedigree chart etc. This is multi-user genealogy done well (I might call it "distributed genealogy", but I'll reserve that term for something better, as you'll see later).

From the perspective of an LDS member this is a fantastic system. When ordinances are performed in the temple they are immediately reflected in the database. When you want to do temple work for so-and-so, you state your intentions in the system and print out a page to take to the temple with you. If anyone else tried to do the same work, they'd see it was in process. This will drastically reduce—perhaps even essentially eliminate—duplicated effort in the temple. I have to say it's about time. It would have been cool 10 years ago. It was expected 7 years ago. Now it's finally here.

There are some other cool tidbits, too, like the pedigree view which combines couples to make better use of space (are they the first to think of it? Probably not, though I haven't personally seen this approach before):

NFS pedigree

There's an info box at the bottom with different tabs, one of which is "possible duplicates". I much much much prefer working with duplicates in this manner, rather than a global "match and merge". Very nice. There are also time lines and Google Maps integration (see where your ancestor was born, married, died, etc.). And those little temple icons unobtrusively notify you of potential temple work to do (or that has already been done). Overall they make nice use of AJAX, too.

But there's problems. Big problems.

It's slow. Painfully slow. It's slow enough to be a real pain for doing actual genealogical work. Maybe people with limited computer skills wouldn't find it slow, because it moves at about the pace they can keep up with. But for those of us in the computer age (read: almost everyone in my generation or younger) it is painful and restrictive. Why is it slow? Because it's a web app. News flash! Even AJAX web apps are slow.

Ok, it's slow. No big deal, right? Just download the GEDCOM, do your research, and upload the changes. Right? I have news for you. There's no exporting data from NFS. The help center has this to say:

Exporting Information from FamilySearch for Use in Your Personal Computer

This topic describes how to get information from FamilySearch into your family history computer program.

If you find information in FamilySearch that you do not have, you will need to either use the cut and paste features of your operating system or retype it into your computer program.

Currently, FamilySearch does not support downloading information for use with Personal Ancestral File or similar computer programs. Family history computer programs may choose to support this feature when it becomes available from FamilySearch.

Really. Cut and paste! It is a big black hole waiting to consume your information and display it to you on its terms only. Its slow terms. You want to make a family pedigree website? Write a script to spit out all the place names of your ancestors so you can put blue dots on a map? Make a Google Maps mashup? Do any number of other useful things with a GEDCOM export, including actually be able to work with it at a reasonable speed, put it on your handheld for reference at the family history library? Print out reports? No way. Uh-uh. Remember how I avoided using the term "distributed genealogy"? It's like having your genealogy in a distributed revision control system like mercurial or git, but you can only access the one single repository with a web interface. You can't check out the code. You can't work offline. You can't use your own tools. You can't write emergent scripts. You're screwed.

For understandable reasons, you can't see information on living people, and they don't show up as search results. You do get access to your own ancestors and descendents and your spouse, but apparently not your spouse's family, your siblings, or any information on living people (like your parents' birthdays, etc.). You can enter this information in, or upload it in a GEDCOM. But the first rule of genealogy is start with your 4 generations. If everyone starts with their 4 generations, but most of those people are still alive, then how much effort is duplicated? How many duplicate versions of my dad will there be? Well let's see, he has 11 siblings, various aunts and uncles who are into genealogy, 7 children (who should all see the same record, but might conceivably enter conflicting information). Not a huge problem, but an annoyance. Once you fill out the tree to the dead people (hint: upload a GEDCOM of what you already have here, but only those first couple generations), then you find and link the dead people into the tree, then you have a nice resource. So far, it's just a research resource—I wouldn't trust a lot of things further than I can throw them, but they make good research jumping-off points. Maybe eventually through the hard work of thousands it will converge to a respectable database, in the spirit of a wiki.

Also, it's presently restricted to LDS members (you need your membership number and confirmation date to register). The best genealogists I know aren't LDS. Certainly the bulk of decent genealogists I know aren't LDS. Most of the lousy genealogists I know are LDS. (Of course, that doesn't mean we have a monopoly on lousy genealogists, I just haven't had reason or opportunity to mingle with lousy non-LDS genealogists much). So this seems like a drawback across the board.

Maybe down the road (I think it's still beta, though they never use that word) it will allow GEDCOM export and be available to all genealogists. Maybe the speed issue will be addressed, or they'll come up with a desktop client. Maybe this will be the rockingest genealogy database ever. Or maybe it will be of marginal interest—a great way to prepare names for the temple and avoid duplicate temple work, but not a good tool for daily genealogical work. Time will tell.

I am impressed by the no-information-loss implementation. I'd like to propose taking it a step further. What if we could publish genealogical repositories on websites like we do with mercurial or git? What if we had the genealogical equivalent of github? What if you and all the other genealogists out there could, without information loss, match and merge and add information and correct information and unmerge faulty merges and… all without loss of information, the ability to go back in time (like you can with a revision control system), etc. A global genealogical database, a global record of genealogical discovery. Now, one huge database doesn't make a lot of sense. It'd be a pain to push and pull. So you'd have to be able to push and pull only pieces of the tree. And of course the merging, confidence, dispute, etc. aspects would have to be dealt with well (as they mostly are in NFS, though there would be unique challenges for it in a truly distributed genealogical system). Just imagine the potential. And feel free to expound on your imaginings in the comments.

July 31, 2008

Hans Fugal
no nic
The Fugue :
» gedtag

Have you ever tried to import aunt Millie's n-thousand-person GEDCOM into your database? You either ended up with a reeking mess of a database, gave up and restored from a backup, or went insane trying to clean up the mess. Believe me, I know. And my family GEDCOMs are fairly well-behaved. But then there's always Ancestral File or online generated GEDCOMs.

This is no laughing matter. In fact, it has been the single most debilitating roadblock to me doing any real genealogy since I got the bug as a teenager.

I think I finally have a way to tame the beast. It's not a magic bullet—there will still be a lot of mind-numbing match/merge. But it will maintain order and the integrity of the database.

First, start with a clean slate. If you have an existing database, export it to GEDCOM and make a new database. This step isn't strictly necessary but keeps things ultra clean. If you're afraid you'll lose information in the export/import, you need a different genealogy program.

Now, sort your GEDCOMs to import by their importance and reliability. Your original database export probably comes near the top of this list, although not necessarily. Write this down. In fact, write everything down when doing anything in genealogy.

Now, take that first GEDCOM and run it through my magic filter. This will add REFN tags to your GEDCOM that look something like this: hans@fugal.net,2008-07-30:foo.ged/INDI/I1. This tag tells you the submitter's email (or name), the date in the GEDCOM file (or today's date), the name of the original GEDCOM file, and the identifying information for this particular record. In short, it keeps track of where that record came from. It will show up in PAF as the custom ID, and likely in other software in a similar manner.

Now import the GEDCOM. In PAF, there is an option on import to reuse RINs. Uncheck this option. The import screen tells you that highest RIN currently used. Take note of this RIN. Now every record in the import will have a RIN above this RIN. The RIN is easier to use in match and merge (it's right there, you don't have to dig for it), so the tags we added are for posterity's sake.

Now, do the match and merge. Did you know that PAF has the ability to match and merge based on the _UID tags it spits out in GEDCOMs? That means if this GEDCOM and the GEDCOM(s) you've already imported have a common ancestor, the universally unique IDs will match, and you know without a doubt that someone thought they were the same person already. You can breeze through these merges with confidence that you won't merge people you shouldn't. Likewise there is an AFN match and merge, which is almost as trustworthy. (I'm a bit paranoid so I always double-check anything coming from Ancestral File. Maybe it's because there are about 5 versions of me in Ancestral File, most of which can't even spell my name right.) Finally, go through the other options (name, soundex) and do a thorough match/merge.

Now, go through all the remaining RINs greater than the RIN you noted earlier. These are the new people in your database. Get to know them. See where they sit in the pedigree. Read the notes. Make sure they meet your quality standards. Add sources if you know of them. Make notes of missing information, questionable stuff to research, etc. You should have a whole truckload of research tasks to do after this import—and some of them you should do before the next import (you'll recognize these if you take the time to think of them and write them down). Actually you should do that with every person you merge in the previous step as well, since they will merge into lower RINs. Don't hit that merge button until you've done the quality check!

After weeks, months, or years of doing this on Sunday afternoon, you will have a meticulous database that works for you. You will have laid a solid foundation which will impower your future research efforts. You will not be sorry.

June 22, 2008

Hans Fugal
no nic
The Fugue :
» Genealogy: Induction or Deduction?

From time to time I think about evidence-based genealogy. All good genealogy is evidence-based, i.e. you have evidence to support all of your conclusions, and a complete stranger would agree with your conclusions because of your evidence. But most amateur genealogists, and computer software, treat evidence as a secondary concern at best. To them, it's the conclusions that are important, and documenting the evidence is an afterthought and a bother and usually is not done at all. After all, it's obvious at the moment that you're recording the marriage of Fred and Wilma that it's true. Of course later we find that Fred and Wilma never even knew each other, and we've forgotten why we thought they got married. Oops.

The problem is exacerbated by the fact that most amateur genealogists (including genealogy software developers) start out by recording the family history stored in their heads. This is information that they are as sure about as they are of gravity. Recording evidence of these "givens" is tedious and ridiculous. And there's enough of it that by the time you've entered it all into the computer (or onto family group sheets) you have developed a solid bad habit of not entering sources.

This is compounded even further by genealogical databases. Go to a family history center or genealogy website and download a few thousand or tens of thousands of names. Who would turn down such a tremendous head start? Who would meticulously verify and document the evidence of every one of those names and the dozen or more conclusions associated with each one?

But I'm getting sidetracked. The question of the day is whether genealogy is an inductive or deductive sport. Let's review the definitions.

induction |inˈdək sh ən|, noun. The inference of a general law from particular instances.

deduction |diˈdək sh ən|, noun. The inference of particular instances by reference to a general law or principle.

So induction is going from specific to general, i.e. making conculsions based on evidence. Sounds like genealogy, right? But if we replace "general law or principle" with the word "premise," then it also looks a lot like genealogy. The problem is, neither evidence nor genealogical conclusions look an awful lot like "general laws."

Let me take another crack at defining the terms. Induction is when you take a bunch of observations and induce a probable generality from them. Deduction is when you take premises and deduce an absolute generality from them, given that the premises are true.

If I have a birth certificate for one Fred Flintstone, then I can deduce that some Fred Flintstone was born on such and such date. The only way to question that conclusion is to question the veracity of the birth certificate. Note that I said some Fred Flintstone. A common pitfall in genealogy is the leap from evidence for someone of the same name to evidence for the particular person being researched.

If I have the birth certificate, and a bunch of other documents, and they all support the notion that there was one Fred Flintstone in Bedrock during this period of time, and all the evidence fits together well, I can construct a probable picture of the person Fred Flintstone. This seems to be induction. Even though my premises are true, I may be taking a leap of faith to conclude that the Fred Flintstone from the birth certificate is the Fred Flintstone that married Wilma (and therefore my ancestor). It's not deduction because it doesn't follow directly from the premises.

Well, so it seems like genealogy is both inductive and deductive, and that's before you even consider the fallability of evidence. No wonder it can be such a mess. This underlines the need for tools that help us dwell in the realm of evidence which is relatively stable compared to the realm of conclusions. Very rarely indeed will a primary source be completely false (though it is more common to find inaccurate sources—bad spelling or slightly-off dates). More often, our conclusions based on the primary sources are completely false. Yet, in the end, it's the conclusions that we care about. So the software needs to allow us to dwell in the evidence world while providing the context of our current set of conclusions.

Software developers would be tempted to treat evidence-based genealogical software as deductive reasoning. They'd program in all kinds of ways for the computer to do the thinking for you. Fuzzy probable conclusions have no place in this vision. I think that's the point of this post. We mustn't fall into that trap or we'll have another dark age like the conclusion-based age we're still struggling to get out of. Except this one will be worse because it doesn't even match the amateur genealogist's first way of thinking of things.

While I believe there is room for computers to automatically infer things based on evidence, and direct researchers to areas of the family tree that may be influenced by this new bit of information, I think it is vital that we not lose sight of the fact that this is a human enterprise. In the end, a person must interpret the evidence, and she must be able to easily change her mind later. As such, the software must first and foremost be an organizational tool. It must help us make sense of the mass of evidence and conclusions. It must free us from the shackles of disorganization without binding us with the shackles of inflexible deductive logic. And yet, at best it will encourage the infallibility of deductive reasoning where appropriate.

So what do you think? I'm a computer scientist, not a logician and I have been known to confuse inductive and deductive reasoning. Is genealogy inductive, deductive, or both?