Return to Index

[download | view]

Historic Pittsburgh (upitt)

Description

The Historic Pittsburgh is a collection of historical texts published in the 19th and 20th centuries about the city of Pittsburgh. The collection is owned by Pittsburgh University.

The copyright page contains the following text:

Users of the Historic Pittsburgh website do not need to seek permission for downloading images for private or educational use. However, the University of Pittsburgh does retain the rights to the digital images available on this website.

Formats

The kdl module supports the following download formats:

-f value

Description

pdf

Pages are returned as one PDF file per page.

image

Pages are returned as one GIF image per page.

ID

The ID for each work can be found by navigating to the item and finding the portion of the URL that says:

id=AXX-XXX-XXXXXXXX

Where "A" is a letter and each "X" is a number.

History

Module Printout

# upitt : Historic Pittsburgh# 
# This module will allow you to download page images from
# the Historic Pittsburgh site. See the documentation
# for additional details


$module_c = "pitttext";

$idno = lc($idno);

$module_default_format = "pdf";

%module_formats = (
  'image'  => 'gif',
  'gif'    => 'gif',
  'pdf'    => 'pdf',
);


$module_format = module_check_format($module_default_format,keys(%module_formats));
$config{'ext'} = $module_formats{$module_format};


if ( $module_format eq "pdf" ) {
  print_v("Renumbering pages!");
  $config{'renumber'} = 1;
}

$lc1 = substr($idno,0,1);
$lc2 = substr($idno,1,1);
$lc3 = substr($idno,2,1);

$module_url{'sitebase'}  = "http://digital.library.pitt.edu/";
$module_url{'plistbase'} = $module_url{'sitebase'}."cgi-bin/t/text/pageviewer-idx?c=".$module_c.";idno=".$idno.";view=image;seq=0001";
$module_url{'touchbase'} = $module_url{'sitebase'}."cgi-bin/t/text/pageviewer-idx?c=".$module_c.";idno=".$idno.";size=l;view=$module_format;seq=";
$module_url{'imagebase'} = $module_url{'sitebase'}."cache/$lc1/$lc2/$lc3/$idno/";


print_v("Getting page listing... (".$module_url{'plistbase'}.")");

$res = $ua->get($module_url{'plistbase'});


if ( $res->content =~ m!name="nav" src="/(.*?)"!s ) {
   $module_url{'plist'} = $module_url{'sitebase'}.$1;
} else {
  print STDERR "Could not get page listing (1)\n".$res->status_line."\n";
  exit();
}


$res = $ua->get($module_url{'plist'});
print_v($module_url{'plist'});

if ( $res->is_error ) {
  print STDERR "Could not get page listing (2)\n".$res->status_line."\n";
  exit();
}

@lines = split(m!</option>!, $res->content);
pop(@lines);

my $module_available_high = $#lines;

my @pages;

foreach (@lines) {
  if ( m!value="(\d+)".*?>([A-z0-9]+)! ) {
    $pages[$1] = lc($2);
  }
}

print_v("This volume has $module_available_high pages...");

$module_actual_high = module_set_limit($module_available_high);


foreach $module_i ( $config{'start'} .. $module_actual_high ) {
  $module_touch_url = $module_url{'touchbase'}.$module_i;

  $module_j = sprintf("%04d",$module_i);
  $module_k = "0" x (4-length($pages[$module_i])) . $pages[$module_i];

  if ( $config{'ext'} eq "pdf" ) {
    $module_image_url = $module_url{'imagebase'}.$module_j.$module_k.".tif.1.pdf";
  } else {
    $module_image_url = $module_url{'imagebase'}.$module_j.$module_k.".tifl.gif";
  }

  print_v("Touch URL: $module_touch_url");
  print_v("Image URL: $module_image_url");

  push(@touchurls,$module_touch_url);
  push(@urls,$module_image_url);
}

Valid XHTML 1.0! Valid CSS! PDA Friendly