A Django site.
January 16, 2008

Kevin Kubasik
nonic
For Once I Oneder
» Follow Up: Beagle 0.3.2 Gutsy Packages

It appears that I spoke too soon, and some packages had not finished their builds, now everything should be done.
Kevin Kubasik's PPA


January 10, 2008

Kevin Kubasik
nonic
For Once I Oneder
» Beagle 0.3.2 Packages Backported To Gutsy

So, I have completed  the simple task of backporting the Beagle 0.3.2 package to Gutsy. While Beagle itself runs just fine (Gotta love the magic of runtimes!) there are some known issues with libbeagle. Namely, since the small API change,  the Nautilus that ships with Gutsy has a few issues. To try and solve this issue, I have a poorly backported Nautilus with the needed fixes available as well, however I must warn you, I recently made the switch to Gutsy and haven't given these packages much testing. Please report bugs with the packages to launchpad (and assign them to kkubasik) or leave a comment on my blog. The other devs really aren't all that interested in my mistakes ;)

So, without further ado! I give you the coolest PPA on earth.

November 16, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Mono 1.2.6 Memory Usage

So, I've heard a lot of hype about the upcoming 1.2.6 release of Mono being faster, leaner, and more stable then ever before (due largely to Novell's acquisition of a QA team dedicated to Mono). Beagle has always gotten flack over memory use, and as a result, we are relentless in our hunt for abused memory. And while it is wonderfully satisfying to reduce memory usage, its really hard to beat dropping megabytes of resident memory for free :). I'm running Ubuntu Gutsy and its 1.2.4 release of Mono, but in my quest for some real numbers to back up all this talk I built the current SVN trunk of Mono. Even my most optimistic expectations put our potential benefit around maybe 2 or 3 MB resident less than beagle running under Mono 1.2.4. On my test setup, Beagle 0.3pre consumed (after my recent Opera backend fix) around 110 MB of VM and 36 MB of RSS (averaged over a 2 hour run). After building and installing Mono 1.2.6, the same 2 hour run was averaging 72 MB of VM and 27 MB of RSS! Its still far from perfect, but free memory reduction is just plain cool :). Some observations about the general pattern of allocation and collection under 1.2.6, it 'idles' much lower than 1.2.4. While some actions always push the memory usage up, 1.2.6 *appeared* to return to its lower memory point much faster, and more regularly. Anyways, I just wanted to say, props to everyone on the Mono team for rocking my socks. So, I've heard a lot of hype about the upcoming 1.2.6 release of Mono being faster, leaner, and more stable then ever before (due largely to Novell's acquisition of a QA team dedicated to Mono). Beagle has always gotten flack over memory use, and as a result, we are relentless in our hunt for abused memory. And while it is wonderfully satisfying to reduce memory usage, its really hard to beat dropping megabytes of resident memory for free :). I'm running Ubuntu Gutsy and its 1.2.4 release of Mono, but in my quest for some real numbers to back up all this talk I built the current SVN trunk of Mono. Even my most optimistic expectations put our potential benefit around maybe 2 or 3 MB resident less than beagle running under Mono 1.2.4. On my test setup, Beagle 0.3pre consumed (after my recent Opera backend fix) around 110 MB of VM and 36 MB of RSS (averaged over a 2 hour run). After building and installing Mono 1.2.6, the same 2 hour run was averaging 72 MB of VM and 27 MB of RSS! Its still far from perfect, but free memory reduction is just plain cool :). Some observations about the general pattern of allocation and collection under 1.2.6, it 'idles' much lower than 1.2.4. While some actions always push the memory usage up, 1.2.6 *appeared* to return to its lower memory point much faster, and more regularly. Anyways, I just wanted to say, props to everyone on the Mono team for rocking my socks.

December 7, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Tomboy Hackfest Tonight at the Novell OSTC

Well be hacking it up tonight at 6:00PM MST at the Novell Open Source Technology Center. The rough TODO for the night seems to be Tags, Tasks and maybe even a backend to query Beagle. ;)  Anyways, if your in the greater Salt Lake City area, come on down! If your a little further away but want to join in anyways,  join in on #tomboy!

See you tonight!

November 16, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Mono 1.2.6 Memory Usage

So, I’ve heard a lot of hype about the upcoming 1.2.6 release of Mono being faster, leaner, and more stable then ever before (due largely to Novell’s acquisition of a QA team dedicated to Mono). Beagle has always gotten flack over memory use, and as a result, we are relentless in our hunt for abused memory. And while it is wonderfully satisfying to reduce memory usage, its really hard to beat dropping megabytes of resident memory for free :). I’m running Ubuntu Gutsy and its 1.2.4 release of Mono, but in my quest for some real numbers to back up all this talk I built the current SVN trunk of Mono.

Even my most optimistic expectations put our potential benefit around maybe 2 or 3 MB resident less than beagle running under Mono 1.2.4. On my test setup, Beagle 0.3pre consumed (after my recent Opera backend fix) around 110 MB of VM and 36 MB of RSS (averaged over a 2 hour run).After building and installing Mono 1.2.6, the same 2 hour run was averaging 72 MB of VM and 27 MB of RSS! Its still far from perfect, but free memory reduction is just plain cool :).

Some observations about the general pattern of allocation and collection under 1.2.6, it ‘idles’ much lower than 1.2.4. While some actions always push the memory usage up, 1.2.6 *appeared* to return to its lower memory point much faster, and more regularly.

Anyways, I just wanted to say, props to everyone on the Mono team for rocking my socks.

November 15, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Google Docs Presentations: A Major Disappointment

Google Docs has revolutionized the office suite, namely the word processor. Collaboration is easy, smooth, integrated, and automatic, whats more all your documents are accessible from anywhere, and all the common features I need are present. While I don’t really use spreadsheets very often, my few simple instances of using Google Doc’s for spreadsheets were easy enough. Needless to say, when I heard that a Presentation component was to be added I was excited.

Now, I’m a far cry from a Powerpoint Guru, I’ve used it maybe 2 times, but with an upcoming presentation and the Ubuntu Utah user group, I figured I should probably slap a few slides together. Since I want to have a sane contingency for the exploding laptop, forgotten laptop, or limited presentation machine, I figured this would be a great chance to stretch my presenting legs and give it a try. Too bad I’m not that lucky, I was unable to save _any_ changes, just create a new presentation, modify it as much as I wanted, but any close would lose all my work… lovely (this is in Firefox 2.0, 3.0 trunk, IE7, and Opera 8.24). I’m willing to give Google a few hours and then try again (its possible its some small downtime, its still beta after all ;) ).

Another gripe (as I read the Help to try and find similar reports of such bugs) is that there is a very limited selection of templates, while I might be able to upload new templates authored in PowerPoint or OpenOffice, couldn’t they at least let me change the color schemes? (If I can and I’m just missing how to do it, please share!)

I’ll post an update soon, and let you all know if I had any luck saving anything….

Update: This is fixed, and now its kinda cool! I’m going to try it this Saturday at the Ubuntu Utah users group and see how it goes.

November 2, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Google: How do you do it?

So its not a big surprise that an oft-requested feature for Beagle is the ability to index a users Gmail messages (like Google Desktop Search). Today we (the Beagle developers) started to investigate just how this is done. While POP3 (and now IMAP) are available, downloading all of a users mail, indexing it, and then caching the text so we can display it. Now, my initial investigation into GDS for Linux revealed that it was calling home via POP3s and downloading lots of data. I have assumed that it was simply iterating over all messages (via POP3), downloading them, indexing them, and caching the compressed content somewhere in Google’s custom indexes.

Now, I had originally planned on this post being an open plea to any and everyone at Google asking them to open up the Gmail access API, but seeing as its just the plain old ugly POP3 (maybe a cool extension), were stuck biting the bullet and implementing a remote mail access layer.

Anyways, given how incredible Google has been in a million other situations, I thought I would throw out 2 wildly out-of-this-world questions, I wouldn’t expect to get a response, but before I spend the time figuring it all out, I felt like I should at least ask.

  • Are there some special POP Extensions available in Gmail? Is there some helper web api? Or does GDS really just have a POP3 crawler?
  • Is your compression/text storage library open source? (or documented in some research paper at all?) Beagle has always struggled with how to best handle storing copies of a documents text so that it might be made available in interfaces. While we do have a new hybrid text cache (text over 4k on the filesystem, under in a sqlite db, all compressed) we were still no where near as small as the GDS indexes. A cursory examination reveals that the GDS indexes are some form of b-tree on disk, but how are you compressing all that text so small? Is there some substitution/reconstruction algorithm? (It seems like that would be wildly expensive, but who knows).

Anyways, its a long shot, and its pretty far out there, but for the sake of not passing up answers that I can’t seem to find elsewhere on the net, I have asked.

October 24, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Waking Up In the Middle of the Night

It happens.. even running on so little sleep, I still find myself waking.

Fortunately, this time I awoke with an awesome realization. I’ve been pounding my brain against the wall for a week now on how to further refine/increase the accuracy of my original relation-based ranking system. My initial results had been less than stellar when unleashed upon the desktop as a whole. In controlled situations (where my defined relationships weight’s were proportionate and scaled) the results were excellent, but I was hoping this ‘lowest common denominator’ of sorts would be the answer. I was mistaken. After being more or less tossed back to square one, I was less than optimistic to say the least.

However, at 2:30 this morning all that seems irrelevant, as I believe I have determined the key to blazingly accurate desktop search results (specifically over large search sets, to the order of shared drives with thousands of documents, images, e-mails and other media files without any real semantic system to start). In my original design I made the mistake of utilizing fixed-proportion weights for my relationships. A similar mistake as seen in many ObjectRank based systems. PDF Alert! By fixed proportion, I mean that an astronomical amount of time has gone into determining how important an ‘author’ relationship is when compared to a ‘creation date’ relationship. I (like many before me) was using a weight x termsimilarityindex type system for each relationship. As a result I was spending tons of time and effort trying to strike the proper balance, and in most cases when I got one situation to work, I completely destroyed another.

I think my < sarcasm > brilliant </sarcasm > revelation is becoming obvious, but bear with me.

We cannot pretend that authorship means the same thing to all users, a simple example is the large number of users who still operate relatively isolated desktops, where they are the only author for most of the content. if someone email’s them a document, it will have a hard time weighing up. However, creation date/modification date would probably serve as a solid indicator of relationship, as one person can really only work on one thing at a time.

I wish I had something better to show than just this (I’m mostly writing this down so I don’t forget it in the morning :) ) but I’ve determined that we need a deeper dimension of weight on relationship weighting (when scoring). While one possibility is to just add another variable to our existing weight-determination system, I am leaning towards something more broad. What if the programmer only had to specify a relationship, and through a combination of its occurrence, how closely it paralleled term-based similarity, and how often that relationship type was used to rank a selected result (would require gui integration, but for this proof of concept thats ok in my head) to build an individualized weight for each relationship.

All of a sudden, the massive programmer burden of a relational ranking system is removed! (it takes a lot of specific code to handle each relationship and its weights/different characteristics properly) While there would be a massive front-end cost to tweaking and tuning the system which determines those individual relationship weights, it would be time well spent, as new data types/sources are added, there is no additional work beyond declaring/mapping the relevant relationships.

Once the sun has actually risen, I’ll try to start the process of actually codifying what I’m trying to say. If I’ve actually made enough sense that anyone understands what I’m getting at and has any thoughts/comments/criticisms, please share!

October 3, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Building More Relationships in Beagle

Today I checked in a few fun changes to Beagle today focused on the idea of emphasizing relationships between entities. It doesn’t sound like a whole lot of fun, but its kinda nifty.

New Query Context Options

  1. Find Documents by same author.
  2. Find E-mails from same contact.
  3. Find Pages from same site.

In addition (building upon Beagle’s new External Metadata system) I have added support for the tracking of Firefox downloads to files. The file downloaded with Firefox has an extra property (beagle:Origin) which denotes the Url it was downloaded from. I haven’t started to integrate anything on the UI side with this new information, as I want to add support for Epiphany, Opera, and Konqeror. Eventually, I would love to see this kind of mapping from downloaded mail attachments, but thats a little more difficult.

Anyways, this is more work towards my eventual goal of a ranking system based upon relationships (among desktop data). Anyways, I know that no feature-centric blog post is complete without screenshots, so I present:

Original Query

The Resulting Query

Beagle’s powerful and simple query language makes stuff like this really easy, its just a matter of knowing what properties warrant special treatment like this. I’m open to ideas, what

September 25, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Relationships in the Desktop - Relational Desktop Search and Beagle

I’ve been working on and off on a writeup concerning the use of Beagle to build an intelligent ‘rank’ for desktop entities. Or, in short, a Ranking system (not unlike Page Rank or the like) to organize desktop search results by far more than just keyword/date. I know the writing sucks, and its not 100% complete yet. In addition, I don’t have much in terms of code to share (yet).

To summarize (for those lazybones out there) I’m thinking of utilizing fairly universal and constant relationships (Creator, Creation Date, Modification Date(s), Parent/Source, and maybe others) to recurse deep into desktop relationships. By adding relevancy to the root hit for every child it has (logarithmically decreased by recurse iteration) we can have far more accurate desktop search results when querying a simple keyword/phrase. In addition, the children of a hit could often be considered hits themselves, if found in enough ‘root’ hits.

Its a loose and patchy idea, and miles from a realistic implementation, however, thanks to the awesomeness of Lucene, comparing 2 in-index documents for textual relevancy (based on Term Frequency) is not impossible. (I have not considered the performance elements of these comparisons yet, they may be too slow to be realistic without serious optimization)

Anyways, I’m working on it in Google Docs, so you can check out the full document here. I’ll post once I’ve finished my research/planning etc.

Please, share your thoughts! This is in the ‘major brainfart’ stage, so its open to whatever from anyone, I want to hear ideas!

Technorati Tags: , , , , ,

Powered by ScribeFire.

September 6, 2007

Kevin Kubasik
nonic
For Once I Oneder
» Updated Beagle Packages for Gutsy Available

Beagle support in Ubuntu has been less than stellar up until this point (across all releases), and unfortunately, the best that we can really hope for in the immediate future is acceptable. This is mostly because only a few of Beagle’s developers are running Ubuntu, and accurately reproducing common errors is difficult. To top this all off, the defacto Ubuntu contact at this point is me, and I haven’t had the available time to really track down some of the more difficult bugs.

However, this problem reached an all time low when the beagle source package stopped building in Gutsy. This spurred us into action (our urgency increasing as we realized how close Gutsy was to shipping) and as a result there exist updated Ubuntu Gutsy packages (based upon the new 0.2.18 bugfix release of Beagle) available for testing. Thanks to Launchpads new super-awesome Personal Package Archive system, you only need to add the following sources, or download from the corresponding link. (NOTE! the versioning of these debs will not force an update if they are accepted into main, you will need to reinstall should they be accepted at their current version number!)

deb     http://ppa.launchpad.net/kkubasik/ubuntu gutsy main 
deb-src http://ppa.launchpad.net/kkubasik/ubuntu gutsy main 

Please report bugs with these packages either to Beagle in launchpad or the dashboard-hackers mailing list. The more feedback we get in the next few days the better the chance that Ubuntu Gutsy will ship a solid Beagle.  

 

October 31, 2007
» Gnome devs too lazy for python?

Some interesting thoughts about testing and python found among gnome developers. Joe Shaw was explaining how most new gnome apps these days are written in either python or C#. They choose C# for their app (a href="http://www.beagle-project.org/"