Washington Apple Pi

A Community of Apple iPad, iPhone and Mac Users

Using PostgreSQL on the Mac

© 2003 by Rick Rodman

Washington Apple Pi Journal, reprint information

In business environments, Macs tend to be used in creative departments, for writing, graphics or publishing. Business applications, such as accounting and databases, tend to be done on PC platforms. With the introduction of Mac OS X, this is changing; the Mac is becoming a suitable platform for serious business computing. One database management system (DBMS) that has recently become available on the Mac is PostgreSQL.

PostgreSQL is not a standard Mac application. It comes from the Unix world, and does not have a graphical user interface; you communicate with it mostly through Terminal. Once you've installed it, you are likely to ask: Where is it? How do I run it?

This article will try to show how PostgreSQL can be used on a Mac. The process will require a generous dollop of geekspeak, so put on your lab coat, pocket protector and horn-rim glasses. You won't need your hex calculator, but you can have it nearby for moral support.

PostgreSQL is, like most modern DBMSs, a "client-server" system, even when running on a single Mac. There is a server task called “postmaster” which does all the heavy lifting. Client programs supply requests to it, and the server returns data.

Installing PostgreSQL on Mac OS X is very much like installing it on any other Unix platform. One minor pain is that Mac OS X is a little non-standard from a Unix standpoint. User directories are under /users instead of /usr, /tmp is a link to /private/tmp, etc. An already-compiled, ready-to-install version of PostgreSQL is available (see references at end of article).

Starting the Server

The server is a program called "postmaster" which runs as user "postgres". Here's how it gets started up:

[number26:/users/postgres] rickrodm# su - postgres [number26:~] postgres% [number26:~] postgres% /usr/local/bin/postmaster -D /usr/local/pgsql/data & [1] 520 [number26:~] postgres% LOG: database system was interrupted at 2003-12-25 10:17:15 EST LOG: checkpoint record is at 0/8341AC LOG: redo record is at 0/8341AC; undo record is at 0/0; shutdown TRUE LOG: next transaction id: 510; next oid: 16980 LOG: database system was not properly shut down; automatic recovery in progress LOG: ReadRecord: record with zero length at 0/8341EC LOG: redo is not required LOG: database system is ready

Queries

Now that the database server is running, how does one do anything with it? With a program called “psql”. Still logged in as postgres…

[number26:~] postgres% createdb postgres CREATE DATABASE [number26:~] postgres% psql Welcome to psql 7.3.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit postgres=#

PostgreSQL is a Structured Query Language (SQL) database. SQL was designed and standardized as a query language, intended to be simple enough that managers could write their own simple reports (not all managers, of course, only those with functioning cerebral cortices). Just so we don't lose anybody, a very brief SQL introduction will follow; SQL gurus may skim.

For the demonstration purposes, we will create a simple author/story database. The database will consist of four tables, Authors, Stories, Characters, and Characters_in_Stories. The Data Definition Language (DDL), which defines the structures, and the Data Modification Language (DML), which inserts the data, are given in sidebars.

The Authors table has a Name field and an Author_ID field. The Author_ID is a "serial" field that sequentially generates values as they are inserted into the table. The serial field uses an implied sequence, generating unique key information in a way similar to that used in Oracle, but with a PostgreSQL-unique syntax: you don't specify anything for the serial field.

postgres=# insert into authors ( name ) values ( 'Rex Stout' ); INSERT 17038 1

Here are the Authors:

postgres=# select * from authors; author_id | name -----------+----------------- 1 | Agatha Christie 2 | Rex Stout (2 rows)

The Stories table has a Title field, a Length field, a Story_ID field, and an Author_ID field. The Story_ID field is a serial field, but the Author_ID field contains the Author_ID from the Authors table. In this way we can "join" Authors to Stories. The sample data is a small group of Agatha Christie stories. Here are the Stories:

Here's how we join Authors to Stories:

postgres=# select stories.title, authors.name postgres-# from stories, authors postgres-# where stories.author_id = authors.author_id; title | name ------------------------+----------------- A Pocket Full of Rye | Agatha Christie Funerals are Fatal | Agatha Christie The Tuesday Night Club | Agatha Christie (3 rows)

Similarly, there have a Characters table. It has a Name field and a Character_ID field just like Authors.

We could have put a Character_ID field on the Stories table, but then we could have had only one value there - only one character per story. Of course we're not going to list all the characters, but there are Agatha Christie stories in which both Hercule Poirot and Ariadne Oliver appear. I would like as many characters to be joinable to a story as possible. The solution is an intermediate table, Characters_in_Stories, which contains a Story_ID and a Character_ID. To connect any character to any story, you insert a record in Characters_in_Stories.

To get a list of stories in which Hercule Poirot appears, we (a) join Stories to Characters_in_Stories, (b) join Characters_in_Stories to Characters, (c) look for a match in Characters to "Hercule Poirot". Expressed in SQL, it looks like this:

postgres=# select title, length from stories, characters_in_stories, postgres-# characters where stories.story_id = characters_in_stories.story_id postgres-# and characters_in_stories.character_id = characters.character_id postgres-# and characters.name = 'Hercule Poirot'; title | length --------------------+-------- Funerals are Fatal | 186

Simple, eh? But, if you've looked at the DML sidebar, you’ve probably noticed that entering data with insert statements is pretty cumbersome, especially since you have to type each of the names perfectly each time. You could look up the IDs, sure – but then you might insert incorrect data if the IDs should be different.

Stored functions

PostgreSQL implements a built-in programming capability, PL/pgSQL, which is very similar to Oracle's PL/SQL. This can simplify data entry and automate functions. I wrote a function, add_story, which adds a story and its main character, listed in one of the sidebars.

select add_story( 'Agatha Christie', 'The Tuesday Night Club', 'Jane Marple', 12 ) from dual;

With additional work, this function could add multiple characters.

Programming APIs

All of the above commands were executed in the Terminal, using the psql client program. For an end-user application, Terminal is probably not an acceptable user interface. There are some folks who have written user-friendly front-ends for PostgreSQL for the Mac; a search on versiontracker.com should turn up some. For simple applications, these may do the trick; but, as I mentioned, PostgreSQL is not really an engine for simple applications.

A C API, called "libpq", is available for programming applications which access PostgreSQL databases. It should be possible to use this API from Cocoa, since Cocoa allows use of straight C; however, I haven't tried this as yet.

PostgreSQL also supports use of Java Database Connectivity (JDBC), for cross-platform applications written in Java. Java works well on the Mac, and applications written in Java can be moved in binary ("jar file") form and run on Windows, Solaris and other systems. Java's GUI, "Swing", supports most "widgets" and can produce programs with acceptably attractive user interfaces.

Compared to Oracle and others?

PostgreSQL is sometimes called the poor man’s Oracle. A lot of the features that Oracle programmers use regularly have been implemented in irritatingly different ways, though, or are still missing from PostgreSQL. Another concern, when considering large databases, is the "object ID", which is currently a 32-bit integer which must uniquely identify all objects which exist and have ever existed in the database. Four billion objects may seem like a lot, but large databases often have a billion rows or more in a single table. (A 64-bit G5 version may eliminate this concern.) For an enterprise-level database, PostgreSQL's tools are weak, too.

MySQL is often mentioned in comparison with PostgreSQL; but MySQL is targeted at a different type of application. MySQL is very fast, but only allows very simple SQL, no subqueries. It is often used in web sites where data is actually maintained and updated by other means. MySQL does its particular job very well, and it's often the best tool for the job.

PostgreSQL is growing into a more capable, universal DBMS. It will eventually become comparable to Oracle – quite possibly, much better. It's usable today, but watch out for its quirks, and don't plan on tables becoming very large.

Conclusion

Mac OS X's Unix base means that Macs can now become solid players in enterprise business systems, and its Mac heritage gives us powerful tools for graphics and publishing. PostgreSQL may become a big part of the Mac's future.

Where to get PostgreSQL?

You can find PostgreSQL at Sourceforge or at Versiontracker. I downloaded the software from:
http://www2.entropy.ch/download/pgsql-7.3.3.pkg.tar.gz

Detailed instructions are at:
http://techdocs.postgresql.org/installguides.php#macosx

Sidebar – Why a relational database?

With all the “geekspeak” and complexity in this article, you must wonder, why not just use FileMaker or HyperCard? They have built-in forms and reports capability, they’re graphical and easy to use. However, such flat-file databases fall down in multiuser environments. Further, they lack the relational capability.

A relational database allows joins, or relations, between different tables. This allows the tables to be normalized. “Normalization” means that each given piece of data exists in only one place.

In the database I set up in the main text, notice that each author’s name exists in one place only. If I wish to change “Agatha Christie” to “Agatha Christie Mallowan”, I need to edit one record only, and all of my queries and reports will automatically get the updated data. In a FileMaker or HyperCard database, I’d have to select all the records with “Agatha Christie” on them, and update all of them - and if I make a typo, a record might become forever invisible.

Relational databases usually also separate the user interface from the storing of data. The DBMS (database management system) concentrates solely on effective storage and retrieval of data. Big Oracle and DB2 databases often grow into billions of records.

Sidebar - Data Definition Language (DDL) for Authors database

create table authors ( author_id serial, name varchar( 40 ) ); create table stories ( story_id serial, author_id int not null, title varchar( 40 ), length int, review int, comments varchar( 40 ) ); create table characters ( character_id serial, author_id int not null, name varchar( 40 ) ); create table characters_in_stories ( story_id int not null, character_id int not null );

Sidebar - Data Modification Language (DML) for Authors database

-- insert data for authors insert into authors ( name ) values( 'Agatha Christie' ); insert into authors ( name ) values( 'Rex Stout' ); -- insert data for stories insert into stories ( author_id, title, length, review, comments ) select author_id, 'A Pocket Full of Rye', 186, 5, 'Very good' from authors where name = 'Agatha Christie'; insert into stories ( author_id, title, length, review, comments ) select author_id, 'Funerals are Fatal', 186, 5, 'Very good' from authors where name = 'Agatha Christie'; -- insert data for characters insert into characters ( author_id, name ) select author_id, 'Hercule Poirot' from authors where name = 'Agatha Christie'; insert into characters ( author_id, name ) select author_id, 'Jane Marple' from authors where name = 'Agatha Christie'; -- insert data for characters_in_stories insert into characters_in_stories ( story_id, character_id ) select story_id, character_id from stories, characters where stories.title = 'A Pocket Full of Rye' and stories.author_id = characters.author_id and characters.name = 'Jane Marple'; insert into characters_in_stories ( story_id, character_id ) select story_id, character_id from stories, characters where stories.title = 'Funerals are Fatal' and stories.author_id = characters.author_id and characters.name = 'Hercule Poirot'; Sidebar - PL/pgSQL function for adding a story [Courier] create or replace function add_story( varchar, varchar, varchar, int ) returns int as ' declare v_author alias for $1; v_title alias for $2; v_character alias for $3; v_length alias for $4; begin insert into stories ( author_id, title, length, review, comments ) select author_id, v_title, v_length, null, null from authors where name = v_author; insert into characters_in_stories ( story_id, character_id ) select story_id, character_id from stories, characters where stories.title = v_title and stories.author_id = characters.author_id and characters.name = v_character; return 1; end; ' language 'plpgsql'; Sidebar - Java code snippet for accessing PostgreSQL [Courier] try { DriverManager.registerDriver( new org.postgresql.Driver()); Connection db = DriverManager.getConnection( "jdbc:postgresql:postgres", "postgres", "" ); Statement stmt = db.createStatement(); ResultSet rset = stmt.executeQuery( s ); if( rset == null ) { System.out.println( "rset is null" ); } else while( rset.next() ) { String s2 = rset.getString( 1 ); System.out.println( s2 ); } rset.close(); stmt.close(); } catch( SQLException e ) { System.out.println( "exception: " + e.getMessage() ); }