jugad2 - Vasudev Ram on software innovation: PDF-libraries

Showing posts with label PDF-libraries. Show all posts

Thursday, December 25, 2014

Create tabular PDF reports with Python, xtopdf and tablib

Tablib is a Python library that allows you to import, export and manipulate tabular data.

I had come across tablib a while ago. Today I thought of using it with xtopdf, my Python library for PDF creation, to generate PDF output from tabular data. So I wrote a program, TablibToPDF.py, for that. It generates dummy data for student grades (for an examination), then puts that data into a tablib Dataset, and then exports the contents of that Dataset to PDF, using xtopdf. Given the comments in the code, it is mostly self-explanatory. I first wrote the program in an obvious/naive way, and then improved it a little by removing some intermediate variables, and by converting some for loops to list comprehensions, thereby shortening the code by a few lines. Here is the code for TablibToPDF.py:

"""
TablibToPDF.py
Author: Vasudev Ram
Copyright 2014 Vasudev Ram - www.dancingbison.com
This program is a demo of how to use the tablib and xtopdf Python libraries 
to generate tabular data reports as PDF output.
Tablib is at: https://tablib.readthedocs.org/en/latest/
xtopdf is at: https://bitbucket.org/vasudevram/xtopdf
and info about xtopdf is at: http://slides.com/vasudevram/xtopdf or 
at: http://slid.es/vasudevram/xtopdf
"""

import random
import tablib
from PDFWriter import PDFWriter

# Helper function to output a string to both screen and PDF.
def print_and_write(pw, strng):
    print strng
    pw.writeLine(strng)

# Set up grade and result names and mappings.
grade_letters = ['F', 'E', 'D', 'C', 'B', 'A']
results = {'A': 'Pass', 'B': 'Pass', 'C': 'Pass', 
    'D': 'Pass', 'E': 'Pass', 'F': 'Fail'}

# Create an empty Dataset and set its headers.
data = tablib.Dataset()
data.headers = ['ID', 'Name', 'Marks', 'Grade', 'Result']
widths = [5, 12, 8, 8, 12] # Display widths for columns.

# Create some rows of student data and use it to populate the Dataset.
# Columns for each student row correspond to the header columns 
# shown above.

for i in range(20):
    id = str(i).zfill(2)
    name = 'Student-' + id
    # Let's grade them on the curve [1].
    # This examiner doesn't give anyone 100 marks :)
    marks = random.randint(40, 99)
    # Compute grade from marks.
    grade = grade_letters[(marks - 40) / 10]
    result = results[grade]
    columns = [id, name, marks, grade, result]
    row = [ str(col).center(widths[idx]) for idx, col in enumerate(columns) ]
    data.append(row)

# Set up the PDFWriter.
pw = PDFWriter('student_grades.pdf')
pw.setFont('Courier', 10)
pw.setHeader('Student Grades Report - generated by xtopdf')
pw.setFooter('xtopdf: http://slides.com/vasudevram/xtopdf')

# Generate header and data rows as strings; output them to screen and PDF.

separator = '-' * sum(widths)
print_and_write(pw, separator)

# Output headers
header_strs = [ header.center(widths[idx]) for idx, header in enumerate(data.headers) ]
print_and_write(pw, ''.join(header_strs))
print_and_write(pw, separator)

# Output data
for row in data:
    print_and_write(pw, ''.join(row))

print_and_write(pw, separator)
pw.close()

# [1] http://en.wikipedia.org/wiki/Grading_on_a_curve
# I'm not endorsing the idea of grading on a curve; I only used it as a 
# simple algorithm to generate the marks and grades for this example.

You can run it with:

$ python TablibToPDF.py

It sends the tabular output that it generates, to both the screen and to a PDF file named student_grades.pdf.
Here is a screenshot of the generated PDF file, opened in Foxit PDF Reader:

The program that I wrote could actually have been written without using tablib, just with plain Python lists and/or dictionaries. But tablib has some additional features, such as dynamic columns, export to various formats (but not PDF), and more - see its documentation, linked near the top of this post. I may write another blog post later that explores the use of some of those tablib features.

- Enjoy.

Vasudev Ram - Python consulting and training - Dancing Bison Enterprises

Signup to hear about new products or services from me.

Contact Page

Share |

Monday, October 13, 2014

Hacker News thread on PDF reporting tools

By Vasudev Ram

I saw this thread about PDF reporting tools on Hacker News (HN) today:

Ask HN: What do you use for PDF reports these days?

It was interesting to see that multiple HN users commented saying that they use ReportLab for PDF report creation in Python and like it a lot. I also commented, mentioning my xtopdf PDF generation library, which is also written in Python and builds on top of Reportlab, and provides a subset of ReportLab's functionality, with a somewhat easier interface / API for that subset.

PrinceXML (*), Jasper (Java), JagPDF (C++, Python, Java, C), Flying Saucer (Java), PDFBox (Java), prawn (Ruby), wkhtmltopdf, FPDF/TCPDF (PHP) were some of the other interesting PDF creation tools or libraries mentioned. I have come across many of these tools in my explorations of the PDF creation field (which has been going on for some years, as it is a personal interest of mine, and I've also done some consulting projects that involved PDF generation and PDF text extraction), but still came across some tools new to me, in the HN thread.

(*) A possibly somewhat less-known fact is that Håkon Wium Lie, one of the board members of YesLogic, the company behind PrinceXML is also the original proposer of CSS and the CTO of Opera Software (yes, the company behind the Opera browser).

Wikipedia page about PDF - the Portable Document Format.

PDF became an ISO standard - ISO 32000-1 some years ago.

- Vasudev Ram - Dancing Bison Enterprises

Click here to signup for email notifications about new products and services from Vasudev Ram.

Contact Page

Share |

Wednesday, August 22, 2012

Try OCaml in the browser - with a guided tutorial

OCaml is a programming language developed at INRIA, a French national research institute for computer science and allied areas.

Here is the Wikipedia page about OCaml.

Try OCaml is a site that lets you try out OCaml in the browser, using a step-by-step guided tutorial. The Try OCaml site is by OCamlPro.com, a company that supports OCaml and has ties to the OCaml group at INRIA.

Their site has a page with a section titled Why OCaml, which gives some reasons for using OCaml.

OCaml is used by a UK-based company called Coherent Graphics. I had come across them a while ago when checking out various PDF processing libraries. Their CamlPDF is a free PDF processing library written in OCaml. But it also is the basis for PDF library support in multiple other languages. From their site:

[ This is CamlPDF, an OCaml library for reading, writing and manipulating Adobe portable document files.

CamlPDF consists of a set of low level modules for representing, reading and writing the basic structure of PDF, together with an initial attempt at a higher level API.

CamlPDF is released under a BSD licence with special exceptions. See the LICENCE file in the source for details.

CamlPDF forms the basis of our PDF Command-line Toolkit and .NET PDF Toolkit, our PDF Editor for Mac OS X and the PDF import for a major commercial vector graphics package. ]

Inspired by nature.
- dancingbison.com | @vasudevram | jugad2.blogspot.com

Thursday, August 9, 2012

clj-pdf, Clojure PDF library and instant-pdf, a RESTful PDF generation service

By Vasudev Ram

[ UPDATE: I emailed the author of clj-pdf and instant-pdf, and he told me two things: 1) the instant-pdf bug mentioned below seems to be due to a Clojure JSON library (see the comments on this post), and does not occur when clj-pdf is used directly, and 2) clj-pdf uses the iText PDF library for generating PDF. ]

Saw this via proggit (programming.reddit.com):

clj-pdf is a Clojure library for PDF generation.

instant-pdf is a RESTful web service for PDF generation, built using clj-pdf. It supports JSON for markup. Interesting approach.
It has a fairly large JSON syntax for many elements of PDF, like metadata, text, paragraphs, chapters, colors, tables, etc.

I tried it out a bit. What I tried mostly worked, except for one issue - pasting a DOS directory listing into the text box (for the content section), resulted in PDF output that contained the string "null" instead of backslashes, e.g. for a path like C:\abc\def\some-file.

clj-pdf uses JFreeChart.

P.S. I liked this blog post: WHY ALL THE PARENS, by yogthos, the author of clj-pdf.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Friday, August 3, 2012

The POCO project - C++ with batteries included

By Vasudev Ram

I had blogged about the POCO Project some years ago, either on this blog, or my earlier blog, jugad's Journal.

The POCO Project is a set of C++ libraries for doing common things that many applications need to do. Though I got to know about it indirectly via searching for PDF libraries (it had a thin wrapper over libharu, a C PDF creation library), that is not the main thing about it. It has lots more generally useful functionality, of which some are: Compression, Database, Filesystem, Logging, Multithreading, Processes, and XML.

Happened to see a tweet about it today, about a talk they are having. Then checked the project site again and found that it now has lots of users, many contributors, and has got a couple of corporate sponsors. (When I first came across it, I don't think they had any sponsors). There are also some testimonials from users, on their site.

POCO project users (from their site):

[ Companies like 454 Life Sciences, Appcelerator, CACE Technologies, CodeLathe, Schneider Electric and Voltwerk Electronics; open source projects like GLUEscript, MITK, openFrameworks, Open Game Engine and Ogre. ]

POCO project contributors.

- Vasudev Ram - Dancing Bison Enterprises

Share |

Friday, September 2, 2011

Started CreatingPDF, a list of PDF creation libraries on Wikia

By Vasudev Ram - dancingbison.com | @vasudevram | jugad2.blogspot.com

Hi readers,

I started a list of libraries that help you create PDF programmatically, here on Wikia:

http://http://creatingpdf.wikia.com

The list will be across languages, i.e., will not be restricted to just one or a few programming languages.

Will update it over time with more PDF creation libs that I know of.

Reportlab (Python), FPDF (PHP), Haru / libharu (C), POCO PDF (C++, thin wrapper over Haru), PyFPDF (Python port of FPDF), iText (Java), iTextSharp (C#), PDF::Writer and Prawn (both Ruby), xtopdf (Python, mine, uses Reportlab, easier interface for plain text to PDF), some Perl PDF libraries, are to be added. There are lots more. Anyone with suggestions for libraries to add, feel free to email me; see my Contact page at:

http://www.dancingbison.com/contact.html

Posted via email.

jugad2 - Vasudev Ram on software innovation

Pages

Thursday, December 25, 2014

Create tabular PDF reports with Python, xtopdf and tablib

Monday, October 13, 2014

Hacker News thread on PDF reporting tools

Wednesday, August 22, 2012

Try OCaml in the browser - with a guided tutorial

Thursday, August 9, 2012

clj-pdf, Clojure PDF library and instant-pdf, a RESTful PDF generation service

Friday, August 3, 2012

The POCO project - C++ with batteries included

Friday, September 2, 2011

Started CreatingPDF, a list of PDF creation libraries on Wikia

Blog Archive

Labels