CAM::PDFTextforms - CAM::PDF wrapper to also allow editing of checkboxes (ie. for IRS Tax forms).

The README is used to introduce the module and provide instructions on
how to install the module, any machine dependencies it may have (for
example C compilers and installed libraries) and any other information
that should be provided before the module is installed.

A README file is required for CPAN modules since CPAN extracts the
README file from a module distribution so that people browsing the
archive can use it get an idea of the modules uses. It is usually a
good idea to provide version information here so that people can
decide whether fixes for the module are worth downloading.

INSTALLATION

Install via one of the following:
  perl Makefile.PL
  make
  make test
  make install

or

  perl Build.PL
  perl Build
  perl Build test
  perl Build install

DEPENDENCIES
    Perl 5, CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5

NAME
    CAM::PDFTaxforms - CAM::PDF wrapper to also allow editing of checkboxes
    (ie. for IRS Tax forms).

AUTHOR
    Jim Turner "<https://metacpan.org/author/TURNERJW>".

    This module is a wrapper around and a drop-in replacement for CAM::PDF,
    by Chris Dolan.

ACKNOWLEDGMENTS
    Thanks to Chris Dolan and everyone involved in developing and supporting
    CAM::PDF, on which this module is based and relies on.

LICENSE AND COPYRIGHT
    Copyright (c) 2010-2019 Jim Turner "<mailto:turnerjw784@yahoo.com>"

    This library is free software; you can redistribute it and/or modify it
    under the same terms as CAM::PDF and Perl itself.

    CAM::PDF:

    Copyright (c) 2002-2006 Clotho Advanced Media, Inc.,
    <http://www.clotho.com/>

    Copyright (c) 2007-2008 Chris Dolan

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

SYNOPSIS
        #!/usr/bin/perl -w

        use strict;
        use CAM::PDFTaxforms;
        my $pdf = CAM::PDFTaxforms->new('f1040.pdf') or die "Could not open PDF ($!)!";    
        my $page1 = $pdf->getPageContent(1);

        #DISPLAY THE LIST NAMES OF EDITABLE FIELDS:
        my @fieldnames = $pdf->getFormFieldList();
        print "--fields=".join('|',@fieldnames)."=\n";

        #UPDATE THE VALUES OF ONE OF THE FIELDS AND A COUPLE OF THE CHECKBOXES:
        $pdf->fillFormFields('fieldname1' => 'value1', 'fieldname2' => 'value2');

        #WRITE THE UPDATED PDF FORM TO A NEW FILE NAME:
        $pdf->cleanoutput('f1040_completed.pdf');

    Many example programs are included in this distribution to do useful
    tasks. See the "bin" subdirectory.

DESCRIPTION
    This package is a wrapper for and creates a CAM::PDF object. The
    difference is that some method functions are overridden to fix some
    issues and add some new features, namely to better handle IRS tax forms,
    many of which have checkboxes, in addition to numeric and text fields.
    Several other patches have also been applied, particularly those
    provided by CAM::PDF bugs #58144, #122890 and #125299. Otherwise, it
    should work well as a full drop-in replacement for CAM::PDF in the API.

    CAM::PDF description:

    This package reads and writes any document that conforms to the PDF
    specification generously provided by Adobe at
    <http://partners.adobe.com/public/developer/pdf/index_reference.html>
    (link last checked Oct 2005).

    The file format through PDF 1.5 is well-supported, with the exception of
    the "linearized" or "optimized" output format, which this module can
    read but not write. Many specific aspects of the document model are not
    manipulable with this package (like fonts), but if the input document is
    correctly written, then this module will preserve the model integrity.

    The PDF writing feature saves as PDF 1.4-compatible. That means that we
    cannot write compressed object streams. The consequence is that reading
    and then writing a PDF 1.5+ document may enlarge the resulting file by a
    fair margin.

    This library grants you some power over the PDF security model. Note
    that applications editing PDF documents via this library MUST respect
    the security preferences of the document. Any violation of this respect
    is contrary to Adobe's intellectual property position, as stated in the
    reference manual at the above URL.

    Technical detail regarding corrupt PDFs: This library adheres strictly
    to the PDF specification. Adobe's Acrobat Reader is more lenient,
    allowing some corrupted PDFs to be viewable. Therefore, it is possible
    that some PDFs may be readable by Acrobat that are illegible to this
    library. In particular, files which have had line endings converted to
    or from DOS/Windows style (i.e. CR-NL) may be rendered unusable even
    though Acrobat does not complain. Future library versions may relax the
    parser, but not yet.

    This version is HACKED by Jim Turner 09/2010 to enable the
    fillFormFields() function to also modify checkboxes (primarily on IRS
    Tax forms).

EXAMPLES
    See the *example/* subdirectory in the source tree. There is a sample
    blank 2018 official IRS Schedule B tax form and two programs:
    *dof1040sb.pl*, which fills in the form using the sample input data text
    file *f1040sb_inputs.txt*, and creates a filled in version of the form
    called *f1040sb_out.pdf*. The other program (*test1040sb.pl*) can read
    the data filled in the filled in form created by the other program and
    displays it as output.

    To run the programs, switch to the *example/* subdirectory in the source
    tree and run them without arguments (ie. ./dof1040sb.pl).

    To see the names of the fields and their current values in a PDF form,
    such as the aforementioned tax form, run the included program, ie:
    *listpdffields2.pl -d f1040sb_out.pdf*.

API
  Functions intended to be used externally
     $self = CAM::PDFTaxforms->new(content | filename | '-')
     $self->toPDF()
     $self->needsSave()
     $self->save()
     $self->cleansave()
     $self->output(filename | '-')
     $self->cleanoutput(filename | '-')
     $self->previousRevision()
     $self->allRevisions()
     $self->preserveOrder()
     $self->appendObject(olddoc, oldnum, [follow=(1|0)])
     $self->replaceObject(newnum, olddoc, oldnum, [follow=(1|0)])
        (olddoc can be undef in the above for adding new objects)
     $self->numPages()
     $self->getPageText(pagenum)
     $self->getPageDimensions(pagenum)
     $self->getPageContent(pagenum)
     $self->setPageContent(pagenum, content)
     $self->appendPageContent(pagenum, content)
     $self->deletePage(pagenum)
     $self->deletePages(pagenum, pagenum, ...)
     $self->extractPages(pagenum, pagenum, ...)
     $self->appendPDF(CAM::PDF object)
     $self->prependPDF(CAM::PDF object)
     $self->wrapString(string, width, fontsize, page, fontlabel)
     $self->getFontNames(pagenum)
     $self->addFont(page, fontname, fontlabel, [fontmetrics])
     $self->deEmbedFont(page, fontname, [newfontname])
     $self->deEmbedFontByBaseName(page, basename, [newfont])
     $self->getPrefs()
     $self->setPrefs()
     $self->canPrint()
     $self->canModify()
     $self->canCopy()
     $self->canAdd()
     $self->getFormFieldList()
     $self->fillFormFields(fieldname, value, [fieldname, value, ...])
       or $self->fillFormFields(%values)
     $self->clearFormFieldTriggers(fieldname, fieldname, ...)

    Note: 'clean' as in cleansave() and cleanobject() means write a fresh
    PDF document. The alternative (e.g. save()) reuses the existing doc and
    just appends to it. Also note that 'clean' functions sort the objects
    numerically. If you prefer that the new PDF docs more closely resemble
    the old ones, call preserveOrder() before cleansave() or cleanobject().

  For additional methods and functions, see the CAM::PDF documentation.
METHODS
    $doc = CAM::PDFTaxforms->new($content)
    $doc = CAM::PDFTaxforms->new($ownerpass, $userpass)
    $doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $prompt)
    $doc = CAM::PDFTaxforms->new($content, $ownerpass, $userpass, $options)
        Instantiate a new CAM::PDFTaxforms object. $content can be a
        document in a string, a filename, or '-'. The latter indicates that
        the document should be read from standard input. If the document is
        password protected, the passwords should be passed as additional
        arguments. If they are not known, a boolean $prompt argument allows
        the programmer to suggest that the constructor prompt the user for a
        password. This is rudimentary prompting: passwords are in the clear
        on the console.

        This constructor takes an optional final argument which is a hash
        reference. This hash can contain any of the following optional
        parameters:

        prompt_for_password => $boolean
            This is the same as the $prompt argument described above.

        fault_tolerant => $boolean
            This flag causes the instance to be more lenient when reading
            the input PDF. Currently, this only affects PDFs which cannot be
            successfully decrypted.

    $hashref = $doc->getFieldValue('fieldname1' [, fieldname2, ...
    fieldnameN ])
        (CAM::PDFTaxforms only, not available in CAM::PDF)

        Fetches the corresponding current values for each field name in the
        argument list. Returns a reference to a hash containing the field
        names as keys and the corresponding values. If a field does not
        exist or does not contain a value, an empty string is returned in
        the hash as it's value. If called in array / hash context, then a
        list of field names and values in the order (fieldname1, value1,
        fieldname2, value2, ... fieldnameN valueN) is returned.

    $doc->fillFormFields($name => $value, ...)
    $doc->fillFormFields($opts_hash, $name => $value, ...)
        Set the default values of PDF form fields. The name should be the
        full hierarchical name of the field as output by the
        getFormFieldList() function. The argument list can be a hash if you
        like. A simple way to use this function is something like this:

            my %fields = (fname => 'John', lname => 'Smith', state => 'WI');
            $field{zip} = 53703;
            $self->fillFormFields(%fields);

        NOTE: For checkbox fields specify any value that is *false* in Perl
        (ie. 0, '', or *undef*), or any of the strings: 'Off', 'No', or
        'Unchecked' (case insensitive) to un-check a checkbox, or any other
        value that is *true* in Perl to check it. Checkbox fields are only
        supported by CAM::PDFTaxforms and was the original reason for
        creating it.

        If the first argument is a hash reference, it is interpreted as
        options for how to render the filled data:

        background_color =< 'none' | $gray | [$r, $g, $b]
            Specify the background color for the text field.

    $doc->getFormFieldList()
        Return an array of the names of all of the PDF form fields. The
        names are the full hierarchical names constructed as explained in
        the PDF reference manual. These names are useful for the
        fillFormFields() function.

    $doc->getFormField($name)
        *For INTERNAL use*

        Return the object containing the form field definition for the
        specified field name. $name can be either the full name or the
        "short/alternate" name.

    $doc->writeAny($node)
        Returns the serialization of the specified node. This handles all
        Node types, including object Nodes.

SCRIPTS
    CAM::PDF includes a number of handy utility scripts, installed in the
    users local/bin path, but we add a modified version of their
    *listpdffields.pl* utility that is called listpdffields2.pl which adds a
    -d (--data) option for displaying the names of all the fields found in a
    PDF form, along with their corresponding current values (if any).

    listpdffiles2.pl [-dhsvV] *pdfformfile.pdf*
        The general format is:

        listpdffiles2.pl -d *pdfformfile.pdf*

COMPATIBILITY
    This library was primarily developed against the 3rd edition of the
    reference (PDF v1.4) with several important updates from 4th edition
    (PDF v1.5). This library focuses most deeply on PDF v1.2 features.
    Nonetheless, it should be forward and backward compatible in the
    majority of cases.

PERFORMANCE
    This module is written with good speed and flexibility in mind, often at
    the expense of memory consumption. Entire PDF documents are typically
    slurped into RAM. As an example, simply calling
    "new('PDFReference15_v15.pdf')" (the 13.5 MB Adobe PDF Reference V1.5
    document) pushes Perl to consume 89 MB of RAM on my development machine.

DEPENDS
    CAM::PDF, Text::PDF, Crypt::RC4, Digest::MD5

KEYWORDS
    pdf taxforms

KNOWN BUGS / TODO
    1) Checkboxes / radio buttons set programatically to "CHECKED" by
    CAM::PDFTaxforms ARE checked, and shown as so in the form, but evince,
    and perhaps Acrobat(tm) form editor don't seem to consider them checked
    the first time a user clicks on them to uncheck them, requiring a second
    click. This can be especially disconcerting to the user for
    radio-buttons as it is possible to click a second button in the group
    checking it, but the originally-checked button is NOT automatically
    unchecked. I need to somehow FIX this, but have so far been unable to do
    so (as of v1.1 - sorry!), so please don't file a bug on this UNLESS you
    have a PATCH for either me OR CAM::PDF itself!

    2) CAM::PDF is used under the hood for most of the actual work, and has
    many open bugs / issues (see:
    <https://rt.cpan.org/Public/Dist/Display.html?Name=CAM-PDF>), so, except
    for the patched ones mentioned in the DESCRIPTION section above, those
    issues remain unfixed here as well! Therefore, check if your issue works
    if using standard CAM::PDF first before filing a new bug here (or unless
    it involves a specific CAM::PDFTextforms feature, or you have a patch,
    in which case you're likely to get it merged here sooner!).

SEE ALSO
    CAM::PDF (Obviously) as this module is a wrapper around it (and requires
    it as a prerequisite). Also see the docs there for all the other methods
    and features available to CAM::PDFTaxforms (it's NOT just for IRS tax
    forms)!

    There are several other PDF modules on CPAN. Below is a brief
    description of a few of them. If these comments are out of date, please
    inform me.

    PDF::API2
        As of v0.46.003, LGPL license.

        This is the leading PDF library, in my opinion.

        Excellent text and font support. This is the highest level library
        of the bunch, and is the most complete implementation of the Adobe
        PDF spec. The author is amazingly responsive and patient.

    Text::PDF
        As of v0.25, Artistic license.

        Excellent compression support (CAM::PDF cribs off this Text::PDF
        feature). This has not been developed since 2003.

    PDF::Reuse
        As of v0.32, Artistic/GPL license, like Perl itself.

        This library is not object oriented, so it can only process one PDF
        at a time, while storing all data in global variables. I'm not fond
        of it, but it's quite popular, so don't take my word for it!

    Additionally, PDFLib is a commercial package not on CPAN
    (www.pdflib.com). It is a C-based library with a Perl interface. It is
    designed for PDF creation, not for reuse.