A Django site.
August 6, 2008

Steve Dibb
beandog
wonkablog
» postgres and mysql comparison paper

I’ve been job hunting, and while my dream job would be somewhere that uses PostgreSQL, I am having an extremely hard time finding anyone that uses it. So, I think my chances might be better actually getting a company to convert to using it instead. In doing that, I’ve started outlining a draft of a paper that I can present to both lead programmers, database administrators, and management on the pros of using PostgreSQL over MySQL. If anyone has some ideas that I could add in, I would appreciate it.

Here’s the general principles I already plan on covering: foreign key support, data types, transactions, shell interface, ANSI SQL support, table types, general features, history, licensing, abstraction layers (using PHP).

Also, and I don’t mean to sound like I’m spreading FUD, but it occurred to me this morning that I’ve never heard anyone say that MySQL is better than PostgreSQL.

Anyway, ideas welcome. I’ll post my progress as I get the paper put together. This is something I’ve been meaning to do for a long time.

April 17, 2008

John Anderson
sontek
sontek ( John M. Anderson )
» Printing in GTK#

I just finished porting Tomboy’s print code to GTK so that we would be more portable and I thought other Mono apps looking to move from libgnomeprint or wanting to add printing support might find a code example helpful.

First, To start printing you do something like:

private void PrintButtonClicked (object sender, EventArgs args)
{
Gtk.PrintOperation op = new PrintOperation ();
op.BeginPrint += new BeginPrintHandler (OnBeginPrint);
op.DrawPage += new DrawPageHandler(OnDrawPage);

op.Run (Gtk.PrintOperationAction.PrintDialog, this.Window);
}

after the PrintOperation is set off you need to handle the BeginPrint event. The main things that need to be done are finding out how many lines will fit on a page and how many lines you have:

public void OnBeginPrint(object sender, Gtk.BeginPrintArgs args)
{
PrintOperation op = (PrintOperation)sender;
lines_per_page = (int)Math.Floor ((double)args.Context.Height / (double)font_size);
Gtk.TextIter start_iter, end_iter;
this.Buffer.GetBounds (out start_iter, out end_iter);
lines = this.Buffer.GetText (start_iter, end_iter, false).Split ('\n');
op.NPages = (int)Math.Ceiling ((double)lines.Length / (double)lines_per_page);
}

Finally, now that you have the printing setup, you need to actually render the data to be printed:

public void OnDrawPage(object sender, Gtk.DrawPageArgs args)
{
PrintOperation op = (PrintOperation)sender;
Cairo.Context cr = args.Context.CairoContext;

int line = args.PageNr * lines_per_page;
int num_lines = 0;
if (args.PageNr+1 != op.NPages)
num_lines = line + lines_per_page;
else
num_lines = lines.Length;

cr.MoveTo (0, 0);

for (int i = 0; i < lines_per_page && line < num_lines; i++)
{
Pango.Layout layout = args.Context.CreatePangoLayout ();
Pango.FontDescription desc = Pango.FontDescription.FromString (”sans ” + font_size);
desc.Size = (int)(font_size * Pango.Scale.PangoScale);
layout.FontDescription = desc;

layout.SetText (lines[line]);
Pango.CairoHelper.ShowLayout (cr, layout);
cr.RelMoveTo (0, font_size);
line++;
}
}

This does not take into account styles but will give you the basic idea of what needs to be done.

March 23, 2008

Steve Dibb
beandog
wonkablog
» prepared statements and stored procedures

I’m still working on cleaning up the import scripts for GPNL, and I’m going to have to start using PHP’s PDO database layer to connect to an SQLite3 database at one point.

I haven’t used it yet, but I had heard it was coming in PHP 5 for a while. Personally, I’ve always used PEAR::DB and was quite happy with that.

I’m still not sold on using the new layer anyway, but I figured I’d do some reading while I am getting ready to use it in this very small instance that I’m implementing.

On the docs page, I found a great summary of why prepared statements and stored procedures are handy and helpful. In short: they save you time for queries you have to repeat a lot, by pre-compiling the preparation that is common to all the queries, so that the database is really only processing the new data, and thus using less resources.

Prepared statements I haven’t played with much before until a few weeks ago, but I’ve slowly started using them in my import scripts. Performance-wise, I’ve only seen about a 15 to 20 percent speed increase. The thing I like the most about them, though, is that I don’t have to escape my strings anymore. That’s a nice little advantage I can live with.

Anyway, php.net’s PDO documentation page has a nice writeup as well, and instead of trying to summarize it myself any more, I’ll just quote it verbatim:

Many of the more mature databases support the concept of prepared statements. What are they? You can think of them as a kind of compiled template for the SQL that you want to run, that can be customized using variable parameters. Prepared statements offer two major benefits:

  • The query only needs to be parsed (or prepared) once, but can be executed multiple times with the same or different parameters. When the query is prepared, the database will analyze, compile and optimize it’s plan for executing the query. For complex queries this process can take up enough time that it will noticeably slow down your application if you need to repeat the same query many times with different parameters. By using a prepared statement you avoid repeating the analyze/compile/optimize cycle. In short, prepared statements use fewer resources and thus run faster.
  • The parameters to prepared statements don’t need to be quoted; the driver handles it for you. If your application exclusively uses prepared statements, you can be sure that no SQL injection will occur. (However, if you’re still building up other parts of the query based on untrusted input, you’re still at risk).

» nice mysql vs postgres summary

I was googling for a postgresql image I could use when I found this page, a nice short summary on the differences between MySQL and PostgreSQL with an emphasis on development policy.

I should mention that I’m linking to it because I agree with the author and also because I’m biased towards PostgreSQL. I prefer postgres not because of fanboyism, but because of experience and years of using both databases.

I was actually lucky enough to be trained to use PostgreSQL as the first database I ever used, and everything after that has never been able to duplicate its feature set. Since my first tech job, I’ve worked with Access, MySQL, SQL Server 2000 and SQLite.

Anyway, I love postgres. If you’ve never given it a chance, and you are looking for more advanced features, check it out. It’s all that and a box of girl scout cookies. I tell you what.

March 6, 2008

Phil Windley
pjw
Phil Windley's Technometria
» CouchDB from 10,000 Feet

Jan Lehnardt and Damien
Katz
Jan Lehnardt and Damien Katz
(click to enlarge)

Damien Katz and Jan Lehnardt are talking about CouchDB. My students have mentioned it several times and we've had brief discussions about it, but I've never spent much time on it. This seemed like my chance. CouchDB's goal is a simple, non-relational database.

Damien started the CouchDB project after working for a number of years on the Lotus Notes project. He loved the document model of the data store (as did a lot of other people). He wanted an open source version of that model and CouchDB was born.

In real life, most data is document centric--not relational. A business card has all the data on it. The downside is that of your job title changes, a self-contained document model doesn't update that (it's not a separate table). On the other hand, more and more documents are starting to contain references to other data (URLs) which makes up for this in some cases.

CouchDB documents are in a JSON format. If you're not familiar with JSON, it's an XML-like format for storing data, but without the angle brackets. It's easier for people to read and write. It's not a substitute for XML, but it's great when just just need simple structured data. JSON is widely supported.

CouchDB uses an HTTP API. This allows CouchDB to make use of existing caches, load balancers, and analyzers. You can use curl to drive CouchDB from the command line or HTTP libraries for various languages to use it.

CouchDB views allow you to filter, collate, and aggregate data. Views are powered by Map/Reduce. The map stage processes key/value pairs to produce intermediate values and reduce then combines intermediate values for particular key. Map/Reduce is inherently parallelizable making it useful on clusters of machines.

CouchDB is designed to be easily replicated and supports synchronizing machines.

Disks are getting cheaper and machines are being built with more and more cores. That makes a model like CouchDB uses very appealing. CouchDB is written in Erlang and provides a non-locking MVCC and ACID compliant data store.

There are some bonus features: Lucene is integrated for fulltext search and CouchDB also provide JSON searching using JSearch, a wrapper on Lucene for JSON structures.

CouchDB has been accepted for incubation as an Apache project and uses the Apache license.

Tags: etech etech08 databases

March 4, 2008

Phil Windley
pjw
Phil Windley's Technometria
» Amazon's SimpleDB

Jay Ridgeway
Jay Ridgeway from Nextumi
(click to enlarge)

This afternoon, I was torn between the session on botnets and one on Amazon's SimpleDB by Mike Culver and Jay Ridgeway. I chose the latter.

The goal is a durable, flexible datastore at a cheap price: $0.14 per machine house, $0.10/Gb into the cloud and $0.18/Gb out.

The API call list is short. Domains are used to partition data. You can think of them as tables, that helps. To add something to a domain you use this syntax:

PUT (item, 123), (description, Sweater), (color, Red), (color, Blue)

The first name-value tuple is the name of the row and needs to be unique. The remaining tuples are attributes and names can be repeated to represent a attribute with multiple values. There are no datatypes. Everything is a string.

A query looks like:

Domain = MyStore
['description' = 'Sweater']

Note that this isn't SQL. :-)

There's a Javascript application called SimpleDB Scratchpad that can be used to play with SimpleDB. All you need is your AWS key.

Jay Ridgeway from Nextumi took the mic to talk about their experience using SimpleDB to implement ShareThis. They've made heavy use of SimpleDB. He concluded with the following list of downsides and upsides. On the downside:

  • Limited features
  • minimal toolset and documentation
  • no experience in house
  • high switching cost

On the upside:

  • zero software cost
  • minimal staff costs
  • low barrier to development
  • responsive and reliable
  • simple, pragmatic solution for a complex problem.

Nextumi does maintain a copy of the raw data in case Amazon ceased to exist for some reason, but using it would obviously require some redesign of their site. I wonder if anyone has created the SimpleDB API on top of BerkeleyDB or MySQL? That would be handy.

SimpleDB doesn't handle binary data well. The best thing is to put binary data in S3 and put a reference to it in SimpleDB.

Tags: etech etech08 aws amazon databases

December 14, 2007

Phil Windley
pjw
Phil Windley's Technometria
» Amazon's SimpleDB

I just posted piece at Between the Lines on Amazon's latest announcement: SimpleDB, a database service in the cloud. I gave it the title "Economics that are impossible to stop" because that what I think Amazon's doing: changing the whole economic model of how people build large scale distributed applications.

Tags: amazon web+services databases cs462

» Amazon's SimpleDB

I just posted piece at Between the Lines on Amazon's latest announcement: SimpleDB, a database service in the cloud. I gave it the title "Economics that are impossible to stop" because that what I think Amazon's doing: changing the whole economic model of how people build large scale distributed applications.

Tags: amazon web+services databases cs462

September 30, 2007

John Anderson
sontek
sontek ( John M. Anderson )
» Setting up Postgresql on SUSE

To install postgresql you just need to use the SUSE package manager, either from yast or from zypper:

zypper in postgresql-server
or
yast -i postgresql-server

By default Postgresql on SUSE is setup in ‘ident’ mode, which matches your SQL user with your unix user, so only the unix user ’sontek’ can login to the database with the username ’sontek’. This is great security but is confusing the first time you are setting up the server because you wont be able to login with your user or root.

What you have to do is switch your user to postgres :
su - postgres

and then you will be able to setup your own user account: 
createuser ’sontek’ 

and now your user will be able to login to the postgresql server with your account.

The other issue with ident is now only unix users will be able to connect to your database server, so if you are a programmer and need to write an application that connects to the database you will need to setup a non-ident user. To do this you first need to create a user with a password:

createuser ‘appuser’ –pwprompt -E

This will create a user named ‘appuser’ and will prompt you for a password that will be encrypted.

and then add this line to your /var/lib/pgsql/data/pg_hba.conf file:

host   all        appuser         127.0.0.1/32          md5

This will setup your server to  allow ‘appuser’ to be authenticated via an md5 password. After you modify the pg_hba.conf file you will need to restart postgresql and then you are all set to start working with postgresql on SUSE!

/etc/init.d/postgresql restart