From HTML to PDF
Adobe Acrobat Captures the Web
By Dennis Dimick
Washington Apple Pi Journal, November/December 1999,
pp. 23-25, reprint
information
Imagine having a web browser that captures into a single
file on your Mac every web page you view, or if you choose,
every page of a website. Also imagine opening that captured
file in Adobe Acrobat and seeing something that looks and
acts just like a website. The links work, the text, images,
and graphics all are rendered properly.
That's what Adobe Systems has finally come up with in a
promised set of enhancements for Acrobat 4.0. This new set
of free plug-ins for Adobe's electronic document system may
change the way you think about and work with the World Wide
Web. Further, Adobe has just released a "Create Adobe PDF"
desktop printer utility that now allows drag-and-drop
creation of PDF from most any program via Acrobat 4.0.
As of this late September writing, a pre-release set of
plug-ins available at Adobe's website, http://www.adobe.com,
now allows capture of whole or partial web sites to
Acrobat's increasingly ubiquitous PDF (Portable Document
Format) files. Of course there are limitations, and these
beta plug-ins aren't totally stable. But they do work, and
work fairly well.
The new plug-ins also enable you to add
"digitally-verified" signatures to Acrobat files, to compare
versions of PDF documents side-by-side, and to automatically
attach PDF files to e-mail. The digital signature
verification feature opens the way to creating PDF files on
the Mac that can substitute for paper-based legal and
commercial documents. All these features have been available
on Windows for several months.
Acrobat's "Web Capture" feature intrigues most, and
that's what I'll detail here. I'll also describe the "Create
Adobe PDF" utility that just released, and look briefly at
its usefulness. If you're interested in the other plug-in
features, Adobe's web site has lots of information you can
download to a PDF file on your Mac using Acrobat 4.0 and web
capture.
Admission Price: OS 8.6, Fast Hardware
Adobe lists minimum requirements for Acrobat 4.0 and the
web capture feature as a PowerPC Macintosh running Mac OS
8.6. I've successfully run web capture on a PowerMac 7100/80
with dial-up Internet connection, but faster Macs and a
speedy Internet connection make a huge difference.
More RAM is better, as captured web pages apparently are
held as temporary files in RAM until you save a PDF file to
disk. Like other demanding tasks, more RAM given to the
Acrobat application itself speeds the process and allows
larger site captures.
Acrobat's initial capture of pages from remote web
servers is only one aspect of this process. It's also
possible to capture web sites from local drives or CD-ROMs.
You can select how many "layers down" you want the web
capture feature to go into a target site, and you can also
choose to capture a whole web site. Several layers of big
web sites can mean thousands of pages captured to a single
PDF file that exceeds 50 MB.
|
Capture to Fit: When you set up Acrobat's Web
Capture Plug-In, you can configure margins and page
orientations. It's also possible to scale contents
of the captured pages so everything will fit into
the page dimensions you set.
|
To understand "layers" of a web site, envision a pyramid.
The home page or opening page of a site equates to the
single pointed stone at the top of a pyramid. Each layer of
stones down from the point stone equates to each layer of a
web site behind the home page. The deeper you go, the more
information is stored in each descending web layer, just as
the lower layers of a pyramid contain more stones, and area,
than the layer just above.
If you choose to capture just one layer of a web site,
typically all you get is one page, the home page. A
two-layer capture will also get all other pages that can be
linked to directly from the home page. A three-layer capture
will also bring in pages directly linked to from pages on
the second layer, and so on. The lower into a site you dig,
the more pages you typically find in each layer.
As a web capture test, I set up a PowerMac G3/333 to grab
a five-layer deep piece of the site of The Sierra Club, a
San Francisco-based environmental advocacy group. (http://www.sierraclub.org.)
Acrobat had reeled in 6883 pages when I stopped the capture
after two hours. It then took this G3/333 four hours to save
the PDF to disk. The Mac had 90 megabytes of RAM allocated
to Acrobat 4.0 and the resulting PDF file was 56 MB in size.
Despite the time involved, the links on the PDF-based
pages do work, and the pages look just like Sierra Club's
real web site. Apparently the excessive time required in
saving these files is so Acrobat can compress captured data
and weave together thousands of hypertext links from
captured pages into one file.
Over a couple of weeks, part or all of at least 10 other
sites were subjected to this Acrobat web capture treatment.
In general, smaller sites converted quickly, but bigger
captures took longer, often much longer as the number of
captured pages rose into the thousands from sites.
If you have an existing web capture PDF file and click on
a link for a page not already captured, web capture will
automatically add the new page or pages to the existing PDF
file if you have an open Internet connection. Web capture
can also be set up to go back to the site and update in your
PDF file all pages that have changed since the original web
capture.
|
WAP Site Snagged: Here's the Washington Apple Pi
web site being captured into a single PDF file by
Acrobat 4.0 and the new Web Capture Plug-In. At
right is the Pi's home page (www.wap.org), at left
are Acrobat bookmarks with the title names of every
web page already captured. The smaller window at
center left indicates which page is being captured,
and how many KB of data have been taken from the
web site so far. (Select the image for an
enlargement.)
|
Not a Total Save
That said, Acrobat web capture has limitations. It can
easily handle HTML pages with embedded graphics like JPEG
and GIF files, and pages with PDF files attached. Only one
frame of animated GIFs will be brought in, and web capture
appears incapable of handling pages with Macromedia
Shockwave files. If a QuickTime movie is embedded in a web
page, the page will capture but the movie will not. Perhaps
there is a way to configure Acrobat web capture to recognize
these file types but I could not find a way to set this up.
I've found two problems, no doubt due to web capture's
"pre-release" status. After a site capture is complete all
menu bar titles in the Acrobat program may gray out and the
only solution is to quit and restart Acrobat. Also, if you
click on a link in a captured web file that links to a web
page outside the PDF file, your Mac may crash if it's not
hooked to an open Internet connection.
One-Stop PDF Creation
Until now it's been a several-step process to create PDF
files from existing documents on the Mac. For example, if
you have a Quark layout or Word document, the process
typically has been to "print" to a Postscript file and then
manually take it to Acrobat Distiller for final creation of
a PDF file.
This new Create Adobe PDF simplifies the process. It's an
extension to the Adobe PS 8.6 printer driver and Acrobat 4.0
package, and if you have both of these installed, all you
need to do is drag-and-drop an existing document on the
"Create Adobe PDF" desktop printer icon. The next thing you
know there is a perfectly rendered PDF file on your desktop.
Adobe's PS 8.6 printer driver comes with the Acrobat 4.0
package and must be used in preference to the Apple
LaserWriter printer driver if you plan to use Create Adobe
PDF software.
Web Capture Shows Promise
Why write about beta software? In this case the web
capture plug-ins may change how you use and save web
resources, and my experience so far has shown them to work
well, despite an occasional crash. My anecdotal experience
shows that these plug-ins are more stable than "release"
software I've owned. For example, did you ever try Mac OS
7.5.2, Adobe Premiere 5.0 or Bryce 3.0?
Of all features in Mac Acrobat 4.0, web capture is the
one I was most disappointed about not having in the original
release last spring. Many Macintosh users, including me,
wrote Adobe Systems to complain about the lack of this new
feature. It is heartening to know they listen, and are
trying to fix Mac Acrobat's shortcomings.
As long as you remember the Mac OS 8.6 and hardware and
RAM demands of Acrobat 4.0 web capture, and you can be
patient when time comes to save potentially huge, complex
captured files, you will find this new addition to Adobe
Acrobat a useful trick indeed.
Acrobat 4.0
Adobe Systems, Inc.
San Jose, CA
http://www.adobe.com
Street Price $ 230
Upgrade $ 90
Web Capture Plug-Ins: Free Download
Create Adobe PDF software: Free Download
Power Macintosh with Mac OS 7.5.3 for Acrobat 4.0
Power Macintosh with Mac OS 8.6 for Web Capture
12 MB application RAM recommended, 60 MB hard disk space
CD-ROM required.
WAP member Dennis Dimick wrote about the limited feature
set of Acrobat 4.0 in the July-August issue of the Journal.
He also has written on QuickTime and imaging-related topics.
He can be reached via email at ddimick@aol.com.
|