Washington Apple Pi

A Community of Apple iPad, iPhone and Mac Users

electric pi


From HTML to PDF

Adobe Acrobat Captures the Web

By Dennis Dimick

Washington Apple Pi Journal, November/December 1999, pp. 23-25, reprint information

Imagine having a web browser that captures into a single file on your Mac every web page you view, or if you choose, every page of a website. Also imagine opening that captured file in Adobe Acrobat and seeing something that looks and acts just like a website. The links work, the text, images, and graphics all are rendered properly.

That's what Adobe Systems has finally come up with in a promised set of enhancements for Acrobat 4.0. This new set of free plug-ins for Adobe's electronic document system may change the way you think about and work with the World Wide Web. Further, Adobe has just released a "Create Adobe PDF" desktop printer utility that now allows drag-and-drop creation of PDF from most any program via Acrobat 4.0.

As of this late September writing, a pre-release set of plug-ins available at Adobe's website, http://www.adobe.com, now allows capture of whole or partial web sites to Acrobat's increasingly ubiquitous PDF (Portable Document Format) files. Of course there are limitations, and these beta plug-ins aren't totally stable. But they do work, and work fairly well.

The new plug-ins also enable you to add "digitally-verified" signatures to Acrobat files, to compare versions of PDF documents side-by-side, and to automatically attach PDF files to e-mail. The digital signature verification feature opens the way to creating PDF files on the Mac that can substitute for paper-based legal and commercial documents. All these features have been available on Windows for several months.

Acrobat's "Web Capture" feature intrigues most, and that's what I'll detail here. I'll also describe the "Create Adobe PDF" utility that just released, and look briefly at its usefulness. If you're interested in the other plug-in features, Adobe's web site has lots of information you can download to a PDF file on your Mac using Acrobat 4.0 and web capture.

Admission Price: OS 8.6, Fast Hardware

Adobe lists minimum requirements for Acrobat 4.0 and the web capture feature as a PowerPC Macintosh running Mac OS 8.6. I've successfully run web capture on a PowerMac 7100/80 with dial-up Internet connection, but faster Macs and a speedy Internet connection make a huge difference.

More RAM is better, as captured web pages apparently are held as temporary files in RAM until you save a PDF file to disk. Like other demanding tasks, more RAM given to the Acrobat application itself speeds the process and allows larger site captures.

Acrobat's initial capture of pages from remote web servers is only one aspect of this process. It's also possible to capture web sites from local drives or CD-ROMs. You can select how many "layers down" you want the web capture feature to go into a target site, and you can also choose to capture a whole web site. Several layers of big web sites can mean thousands of pages captured to a single PDF file that exceeds 50 MB.

Capture to Fit: When you set up Acrobat's Web Capture Plug-In, you can configure margins and page orientations. It's also possible to scale contents of the captured pages so everything will fit into the page dimensions you set.

To understand "layers" of a web site, envision a pyramid. The home page or opening page of a site equates to the single pointed stone at the top of a pyramid. Each layer of stones down from the point stone equates to each layer of a web site behind the home page. The deeper you go, the more information is stored in each descending web layer, just as the lower layers of a pyramid contain more stones, and area, than the layer just above.

If you choose to capture just one layer of a web site, typically all you get is one page, the home page. A two-layer capture will also get all other pages that can be linked to directly from the home page. A three-layer capture will also bring in pages directly linked to from pages on the second layer, and so on. The lower into a site you dig, the more pages you typically find in each layer.

As a web capture test, I set up a PowerMac G3/333 to grab a five-layer deep piece of the site of The Sierra Club, a San Francisco-based environmental advocacy group. (http://www.sierraclub.org.) Acrobat had reeled in 6883 pages when I stopped the capture after two hours. It then took this G3/333 four hours to save the PDF to disk. The Mac had 90 megabytes of RAM allocated to Acrobat 4.0 and the resulting PDF file was 56 MB in size.

Despite the time involved, the links on the PDF-based pages do work, and the pages look just like Sierra Club's real web site. Apparently the excessive time required in saving these files is so Acrobat can compress captured data and weave together thousands of hypertext links from captured pages into one file.

Over a couple of weeks, part or all of at least 10 other sites were subjected to this Acrobat web capture treatment. In general, smaller sites converted quickly, but bigger captures took longer, often much longer as the number of captured pages rose into the thousands from sites.

If you have an existing web capture PDF file and click on a link for a page not already captured, web capture will automatically add the new page or pages to the existing PDF file if you have an open Internet connection. Web capture can also be set up to go back to the site and update in your PDF file all pages that have changed since the original web capture.

WAP Site Snagged: Here's the Washington Apple Pi web site being captured into a single PDF file by Acrobat 4.0 and the new Web Capture Plug-In. At right is the Pi's home page (www.wap.org), at left are Acrobat bookmarks with the title names of every web page already captured. The smaller window at center left indicates which page is being captured, and how many KB of data have been taken from the web site so far. (Select the image for an enlargement.)

Not a Total Save

That said, Acrobat web capture has limitations. It can easily handle HTML pages with embedded graphics like JPEG and GIF files, and pages with PDF files attached. Only one frame of animated GIFs will be brought in, and web capture appears incapable of handling pages with Macromedia Shockwave files. If a QuickTime movie is embedded in a web page, the page will capture but the movie will not. Perhaps there is a way to configure Acrobat web capture to recognize these file types but I could not find a way to set this up.

I've found two problems, no doubt due to web capture's "pre-release" status. After a site capture is complete all menu bar titles in the Acrobat program may gray out and the only solution is to quit and restart Acrobat. Also, if you click on a link in a captured web file that links to a web page outside the PDF file, your Mac may crash if it's not hooked to an open Internet connection.

One-Stop PDF Creation

Until now it's been a several-step process to create PDF files from existing documents on the Mac. For example, if you have a Quark layout or Word document, the process typically has been to "print" to a Postscript file and then manually take it to Acrobat Distiller for final creation of a PDF file.

This new Create Adobe PDF simplifies the process. It's an extension to the Adobe PS 8.6 printer driver and Acrobat 4.0 package, and if you have both of these installed, all you need to do is drag-and-drop an existing document on the "Create Adobe PDF" desktop printer icon. The next thing you know there is a perfectly rendered PDF file on your desktop.

Adobe's PS 8.6 printer driver comes with the Acrobat 4.0 package and must be used in preference to the Apple LaserWriter printer driver if you plan to use Create Adobe PDF software.

Web Capture Shows Promise

Why write about beta software? In this case the web capture plug-ins may change how you use and save web resources, and my experience so far has shown them to work well, despite an occasional crash. My anecdotal experience shows that these plug-ins are more stable than "release" software I've owned. For example, did you ever try Mac OS 7.5.2, Adobe Premiere 5.0 or Bryce 3.0?

Of all features in Mac Acrobat 4.0, web capture is the one I was most disappointed about not having in the original release last spring. Many Macintosh users, including me, wrote Adobe Systems to complain about the lack of this new feature. It is heartening to know they listen, and are trying to fix Mac Acrobat's shortcomings.

As long as you remember the Mac OS 8.6 and hardware and RAM demands of Acrobat 4.0 web capture, and you can be patient when time comes to save potentially huge, complex captured files, you will find this new addition to Adobe Acrobat a useful trick indeed.

 Acrobat 4.0
Adobe Systems, Inc.
San Jose, CA

Street Price $ 230
Upgrade $ 90
Web Capture Plug-Ins: Free Download
Create Adobe PDF software: Free Download
Power Macintosh with Mac OS 7.5.3 for Acrobat 4.0
Power Macintosh with Mac OS 8.6 for Web Capture
12 MB application RAM recommended, 60 MB hard disk space
CD-ROM required.

WAP member Dennis Dimick wrote about the limited feature set of Acrobat 4.0 in the July-August issue of the Journal. He also has written on QuickTime and imaging-related topics. He can be reached via email at ddimick@aol.com.

Return to electric pi

Revised November 24, 1999 Lawrence I. Charters
Washington Apple Pi
URL: http://www.wap.org/journal/