Showing posts with label ebook-creation. Show all posts
Showing posts with label ebook-creation. Show all posts

Thursday, December 5, 2013

Added a Table of Contents feature to XMLtoPDFBook

By Vasudev Ram

XMLtoPDFBook is a publishing tool I created, that allows you to create simple PDF ebooks from text content in XML files.

I had blogged about XMLtoPDFBook earlier, here:

Create PDF books with XMLtoPDFBook

and here:

XMLtoPDFBook now supports chapter numbers and names

Today I added some support for a Table of Contents feature to XMLtoPDFBook. Here is the updated program:
# XMLtoPDFBook2.py

# A program to convert a book in XML text format to a PDF book.
# Uses xtopdf and ReportLab.

# Author: Vasudev Ram - http://www.dancingbison.com
# Version: v0.2

#--------------------------------------------------------------------

# imports

import sys
import os
import string
import time

from PDFWriter import PDFWriter

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

#--------------------------------------------------------------------

# global variables

sysargv = None

#--------------------------------------------------------------------

def debug(message):
    sys.stderr.write(message + "\n")

#--------------------------------------------------------------------

def get_xml_filename(sysargv):
    return sysargv[1]

#--------------------------------------------------------------------

def get_pdf_filename(sysargv):
    return sysargv[2]

#--------------------------------------------------------------------

def XMLtoPDFBook():

    debug("Entered XMLtoPDFBook()")

    global sysargv

    # Get command-line arguments.
    xml_filename = get_xml_filename(sysargv)
    debug("xml_filename: " + xml_filename)
    pdf_filename = get_pdf_filename(sysargv)
    debug("pdf_filename: " + pdf_filename)

    # Parse the XML file.
    try:
        tree = ET.ElementTree(file=xml_filename)
        debug("tree = " + repr(tree))
    except Exception:
        sys.stderr.write("Error: caught exception in ET.ElementTree(file)")
        sys.exit(1)

    # Get the tree root.
    root = tree.getroot()
    debug("root.tag = " + root.tag)
    if root.tag != "book":
        debug("Error: Root tag is not 'book'")
        sys.exit(1)

    # Initialize the table of contents list.
    toc = []
    # Initialize the chapters list.
    chapters = []

    # Traverse the tree, extracting needed data into variables.
    debug("-" * 60)
    for root_child in root:
        if root_child.tag != "chapter":
            debug("Error: root_child tag is not 'chapter'")
            sys.exit(1)
        chapter = root_child
        #debug(chapter.text)
        chapters.append(chapter.text)
        try:
            chapter_name = chapter.attrib['name']
        except KeyError:
            chapter_name = ""
        toc.append(chapter_name)
        debug("-" * 60)

    # Create and set some fields of a PDFWriter.
    pw = PDFWriter(pdf_filename)
    pw.setFont("Courier", 12)
    pw.setFooter("Generated by XMLtoPDFBook. Copyright 2013 Vasudev Ram")

    # Write the TOC.
    pw.setHeader("Table of Contents")
    chapter_num = 0
    debug("Chapter names")
    for chapter_name in toc:
        debug(chapter_name)
        chapter_num += 1
        pw.writeLine(str(chapter_num) + ": " + chapter_name)
    pw.savePage()

    # Write the chapters.
    chapter_num = 0
    for chapter in chapters:
        chapter_num += 1
        pw.setHeader("Chapter " + str(chapter_num) + ": " + toc[chapter_num - 1])
        lines = chapter.split("\n")
        for line in lines:
            pw.writeLine(line)
        pw.savePage()

    pw.close()

    debug("Exiting XMLtoPDFBook()")

def main():

    debug("Entered main()")

    global sysargv
    sysargv = sys.argv

    # Check for right number of arguments.
    if len(sysargv) != 3:
        sys.exit(1)

    XMLtoPDFBook()

    debug("Exiting main()")

#--------------------------------------------------------------------

if __name__ == "__main__":
    try:
        main()
    except Exception, e:
        sys.stderr.write("Error: caught Exception" + str(e))
        sys.exit(1)

#--------------------------------------------------------------------

You can run it as follows:
python XMLtoPDFBook2.py vi_quickstart2.xml vi_quickstart2.pdf 
where I've used my vi quickstart tutorial, first written for Linux For You magazine, as the input XML file.

Here is a screenshot of the first page of the resulting PDF ebook - the Table of Contents:


And here is a screenshot Chapter 3 of the book:


I've pushed the code (as file XMLtoPDFBook2.py) to my xtopdf project on Bitbucket.

Enjoy.

- Vasudev Ram - Dancing Bison Enterprises

Contact Page




Friday, October 18, 2013

xtopdf - an online presentation


By Vasudev Ram

(Updated the post with more links and a few edits; apologies to readers seeing it twice as a result, via feed readers.)

While doing some minor work on my PDF creation toolkit, xtopdf (which is written in Python), I realized that there was no central place where all or most of its features and uses were described, although I have written various articles about xtopdf on this blog and elsewhere on the Internet. I also realized that over a period of time, I have been adding various features or improvements to xtopdf, which some users may not be aware of.

And coincidentally, while browsing the site of a Python Quant, Dr. Yves J. Hilpisch, I saw a presentation there, done using Reveal.js, by Hakim El Hattab. Reveal.js looks very good, BTW. It is a JavaScript tool for creating online presentations. It can be used by creating your slides manually in HTML/JavaScript/CSS, or via an online tool:

There is a companion site, slid.es, where you can create a presentation using Reveal.js. The slid.es site has both free and paid accounts.

So I created a presentation about xtopdf on the slid.es site.

You can view it either here: slid.es/vasudevram/xtopdf, or embedded below:



I hope the xtopdf presentation helps readers and users to get to know xtopdf better, and use it to the fullest. Note: the presentation has hyperlinks; make sure to (right-)click on them to see all the information about xtopdf.

- Vasudev Ram - Dancing Bison Enterprises

Make a Python training or consulting inquiry

Saturday, August 31, 2013

50% off on all O'Reilly books in Back to (Tech) School Sale


By Vasudev Ram

O'Reilly Media is conducting a sale - 50% off on the price of all O'Reilly books. The sale is on until 10 September 2013.

They are calling it the "Back to (Tech) School Sale".


Back to (Tech) School Sale

You can click on the banner above to go to the site for the O'Reilly books sale.

Disclosure: it is an affiliate link, so I will get a small percentage of the sale value, since I recently became an O'Reilly Media affiliate under their Affiliates program for bloggers. More on that in a follow-up post, but let me say for now to my readers, that I'll use the affiliate feature judiciously, so as not to clutter up my blog with too many ads.

I also took a look at the sale site myself. It was interesting to see the variety of books (*) on display (many of which I have bought and read in the past), and also the fact that the O'Reilly book "Learning Python" was among the bestsellers shown.

(*) They do have a large variety and number of books. I have been buying and reading O'Reilly books for many years now, almost from the start of my career, so I've seen that they have books for many of the popular programming languages, such as C, C++, Python, Java, Scala, Perl, Ruby, JavaScript, etc., as well as books on many other programming, system administration, web design and other computer topics.


Back to (Tech) School Sale


There are 7000 titles on sale.

This is a good opportunity to pick up some good O'Reilly books at half the cost.

And if you're an author or aspiring author yourself, you may wish to check out my xtopdf toolkit for PDF creation from other file formats. xtopdf includes a few tools for creating PDF ebooks, such as PDFBook.py, which lets you create a PDF ebook from a set of chapters stored in text files, and XMLtoPDFBook.py, which lets you create a PDF ebook from a set of chapters stored in XML format. xtopdf is released as open source software under the BSD License, so it is free for any use, commercial or non-commercial.

Packt Publishing of the UK/India uses xtopdf in their book production workflow, and the Software Freedom Law Center (SFLC) of the USA uses xtopdf for their e-discovery work (as I've been told by people from Packt and SFLC respectively). xtodf is written in Python and uses the open source version of the Reportlab toolkit.

Here is a guide to installing and using xtopdf, which can help you get started with creating PDF books using it.

Here are two posts about XMLtoPDFBook:

Create PDF books with XMLtoPDFBook.

XMLtoPDFBook now supports chapter numbers and names.


Read all xtopdf posts on jugad2.

Read all Python posts on jugad2.



- Vasudev Ram - Dancing Bison Enterprises


Contact me

Monday, June 17, 2013

XMLtoPDFBook now supports chapter numbers and names


By Vasudev Ram

I've added support for chapter numbers and names to XMLtoPDFBook, which I blogged about recently. XMLtoPDFBook enables you to create simple PDF ebooks from chapters stored as text in an XML file.

The chapter numbers and names are printed in the header of the PDF file created. Chapter numbers are added automatically, starting from 1, and incremented by 1 for each chapter. For chapter names, you have to change the chapter elements in the XML file from the earlier format, which had no attributes for the chapter element, to add an attribute called 'name', with its value being the chapter name.

Earlier format for the chapter element:

<chapter>

New format for the chapter element:

<chapter name="chapter_name">

where you replace "chapter_name" with the name of each chapter, as desired.

That is the only change needed. The (updated) XMLtoPDFBook program takes care of the rest.

Chapter names, though supported, are optional. If a chapter element has no name attribute, it is not an error. No chapter name will be printed in the header for that chapter.

You can run XMLtoPDFBook the same way as I said in my first post about it:

python XMLtoPDFBook.py vi_quickstart.xml vi_quickstart.pdf

For viewing the PDF file, you may want to try using either Foxit PDF Reader or NitroReader. I've used Foxit Reader a lot, and it is fairly good. Just started trying NitroReader (*).

Here is a screenshot of page 1 of the generated PDF file, vi_quickstart.pdf, in NitroReader (right-click to open in a new tab and view larger size):


And here is a screenshot of page 5 of the same PDF file, vi_quickstart.pdf, in Foxit PDF Reader (right-click to open in a new tab and view larger size):


I also added some more error handling to the program.

I've uploaded XMLtoPDF to my Bitbucket repository for xtopdf, since it is now a part of my xtopdf toolkit. You can download it from here.

Incidentally, I saw on the NitroReader site that it was PDF's birthday this month; the PDF format is now 20 years old.

(*) And finally, it was a bit interesting to me to remember that NitroPDF (from the same company as NitroReader) was one of the topics of my very second blog post on my earlier blog, jugad's Journal :-). I ran that blog for about 3 years before moving to this one (which you are reading now), on Blogger, due to the takeover of LiveJournal by some other company.

- Vasudev Ram - Dancing Bison Enterprises

Contact me

Saturday, June 15, 2013

Create PDF books with XMLtoPDFBook

By Vasudev Ram


XMLtoPDFBook is a program that lets you create simple PDF books from XML text content. It requires Python, ReportLab and my xtopdf toolkit for PDF creation.

(Use ReportLab v1.21, not the 2.x series; though 2.x has more features, xtopdf has not been tested with it; also, those additional features are not required for xtopdf.)

XMLtoPDFBook.py is released as open source software under the BSD license, and I'll be adding it to the tools in my xtopdf toolkit.

Here's how to use XMLtoPDFBook:

In a text editor, create a simple XML template for the book, like this:
<?xml version="1.0"?>
<book>
        <chapter>
        Chapter 1 content here.
        </chapter>

        <chapter>
        Chapter 2 content here.
        </chapter>
</book>
Add as many chapter elements as you need.

Then write or paste the text of one chapter inside each chapter element, in sequence.

Now you can convert the book content to PDF using this program, XMLtoPDFBook:
#--------------------------------------------------
# XMLtoPDFBook.py

# A program to convert a book in XML text format to a PDF book.
# Uses xtopdf and ReportLab.

# Author: Vasudev Ram - http://www.dancingbison.com
# Version: v0.1

#--------------------------------------------------

# imports

import sys
import os
import string
import time

from PDFWriter import PDFWriter

try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

#--------------------------------------------------

# global variables

sysargv = None

#--------------------------------------------------

def debug(message):
    sys.stderr.write(message + "\n")

#--------------------------------------------------

def get_xml_filename(sysargv):
    return sysargv[1]

#--------------------------------------------------

def get_pdf_filename(sysargv):
    return sysargv[2]

#--------------------------------------------------

def XMLtoPDFBook():

    debug("Entered XMLtoPDFBook()")

    global sysargv

    xml_filename = get_xml_filename(sysargv)
    debug("xml_filename: " + xml_filename)
    pdf_filename = get_pdf_filename(sysargv)
    debug("pdf_filename: " + pdf_filename)

    pw = PDFWriter(pdf_filename)
    pw.setFont("Courier", 12)
    pw.setHeader(xml_filename + " to " + pdf_filename)
    pw.setFooter("Generated by ElementTree and xtopdf")

    tree = ET.ElementTree(file=xml_filename)
    debug("tree = " + repr(tree))

    root = tree.getroot()
    debug("root.tag = " + root.tag)
    if root.tag != "book":
        debug("Error: Root tag is not 'book'")
        sys.exit(2)

    debug("=" * 60)
    for root_child in root:
        if root_child.tag != "chapter":
            debug("Error: root_child tag is not 'chapter'")
            sys.exit(3)
        debug(root_child.text)
        lines = root_child.text.split("\n")
        for line in lines:
            pw.writeLine(line)
        pw.savePage()
        debug("-" * 60)
    debug("=" * 60)
    pw.close()

    debug("Exiting XMLtoPDFBook()")

#--------------------------------------------------

def main():

    debug("Entered main()")

    global sysargv
    sysargv = sys.argv

    # Check for right number of arguments.
    if len(sysargv) != 3:
        sys.exit(1)

    XMLtoPDFBook()

    debug("Exiting main()")

#--------------------------------------------------

if __name__ == "__main__":
    main()

#--------------------------------------------------

Here is an example run of XMLtoPDFBook, using my vi quickstart article earlier published in Linux For You magazine:

python XMLtoPDFBook.py vi_quickstart.xml vi_quickstart.pdf

This results in the contents of the article being published to PDF in the file vi_quickstart.pdf.

- Vasudev Ram - Dancing Bison Enterprises

Contact me

Saturday, September 22, 2012

How to create an ebook with Pandoc, the swiss-army-knife conversion tool

By Vasudev Ram


Pandoc is a tool that lets you convert many document formats to many other document formats.

I had blogged or tweeted about Pandoc some time ago, but saw this feature only today:

Pandoc can be used to easily create EPUB format ebooks.

I was interested to see that the process of creating an EPUB ebook using Pandoc is similar (in one respect only, that is one book chapter per file or directory *) to the process of creating PDF ebooks with xtopdf, my PDF creation toolkit.

Of course, the Pandoc method supports many more features (including markup, metadata, etc.) than xtopdf does. But xtopdf is very easy if you just want a simple set of text chapters converted to a single PDF ebook.

* One chapter per file for xtopdf, one chapter per directory for Pandoc.

- Vasudev Ram - Dancing Bison Enterprises



Wednesday, July 11, 2012

Guide to installing and using xtopdf, including creating simple PDF e-books


By Vasudev Ram


This is a guide to using my open source xtopdf toolkit to create PDF from text and DBF files (including creating simple PDF e-books)

This guide was initially posted on the original iText site http://itext.ugent.be, Bruno Lowagie's site for his product iText, a Java PDF creation library, in a section about other PDF tools. That site is gone now, so I'm re-posting the guide here (with some edits/updates).

xtopdf is both a set of end-user tools and a library for use by developers, to create PDF from various input formats. This post is for end-users.

The steps are for the Windows platform. The steps for UNIX / Linux platforms are similar in principle but differ in the details.

xtopdf should work with any version of Python 2.x which is >= 2.2. Python 2.7.x is the current version of Python 2.x. I have not tested it yet on Python 3.x. I have tested xtopdf with at least versions 2.2.x through to 2.7.x (for some values of x) and did not come across any issues.

1. Get Python from here:
http://www.python.org/ftp/

2. Get Reportlab open source version 1.21 here:

http://www.reportlab.com/ftp/

(Don't use ReportLab 2.0 although it is available. I've not tested xtopdf with it.
ReportLab 1.21 is the latest stable version in the version 1 series.)

Install it following the instructions in the README file.
It should be straightforward. The main points to take care of are:

2.1 First, before installing ReportLab, run Python once (you may have to add the directory/folder where Python was installed, say C:\Python27, to your PATH variable first). Once that directory/folder is added to your PATH (preferably via Control Panel), open a DOS prompt.

At this prompt, type:

python

This should start the Python interpreter. You will get a one or two line message with the Python version, and then the Python interpreter prompt.

2.2. At this prompt, type the following two lines:

import sys
print sys.path

This should display a list of all the directories/folders that are in the Python PATH (the environment variable PYTHONPATH - different from the DOS variable PATH) - an internal Python variable that gets set automatically, upon startup of the interpreter, to a set of default directories. This variable is analogous to the DOS PATH variable. In this list of directories, look for "C:\Python27\lib\site-packages" as one of the directories. It should be there by default.

If it is there, then exit the Python interpreter by typing Ctrl-Z and Enter.

3. Now install Reportlab:

Unzip the ReportLab_1_21.tgz file with WinZip, into some folder, say c:\reportlab.
This will create a folder called either:

a) reportlab_1.21 with a folder called reportlab under it

or

b) just a folder called reportlab.

If a), then move (or copy) the reportlab folder (which is under reportlab_1.21) to under C:\Python27\Lib\site-packages .

If b), then move the reportlab folder to under C:\Python27\Lib\site-packages.

The above steps should make ReportLab work.

An alternative way is to just unzip the reportlab .tgz file into some folder, say, C:\RL, and then create a file called, say, reportlab.pth, which contains just one line - the path to this folder where the extracted contents get stored.e.g. C:\RL\reportlab . Please check that step out (in the ReportLab .tgz file's README file for the exact details).

4. After the above steps, to check that Reportlab works, go to a DOS prompt again, run python again as before, and then at the Python prompt, enter either or both of the following commands (on separate lines):

import reportlab

from reportlab import pdfgen

If either or both of these above commands work (and if there is no error message), it means that Reportlab is properly installed.

5. Now you can install xtopdf.

Get xtopdf here: http://sourceforge.net/projects/xtopdf

After downloading the file, unzip it into a folder, say c:\xtopdf. This will create a folder called xtopdf-1.0 under C:\xtopdf. Go to that folder.

There are many Python programs here with the filename extension ".py".

To run, e.g., WritePDF.py, do this:

python WritePDF.py some_file.txt

where some_file.txt is a text file that you want to convert to PDF.

This will run WritePDF.py and the output will be a PDF file called some_file.pdf.
Try opening it in Adobe Reader.

Similarly try running some more programs:

python DBFReader.py test1.dbf (or test2.dbf or test3.dbf or test4.dbf - all of which are in the package)

This should read the DBF file and display its metadata (file header and field headers) and data records to standard output - the screen.

python DBFToPDF.py test1.dbf test1.pdf

This should do the same as the above (DBFReader.py), except that instead of the output going to the screen, it will go to a file called test1.pdf.

And similarly, try out a few other programs. Most or all of the programs can be run as "python prog_name.py". Some require one or more command-line arguments (all of them require at least one command-line argument, at least an input file).

Be sure to try running this one also, to create PDF e-books from text files:

python PDFBook.py book1.pdf book1.txt

This one reads a list of chapter file names (where each chapter is one .txt file) and corresponding chapter titles, from the 2nd argument book1.txt, and creates a PDF e-book out of all the chapters combined, using the chapter title as the heading for each page.

This is a very quick and simple way of creating simple PDF e-books from a set of chapters, one chapter per text file.

- Vasudev Ram - Dancing Bison Enterprises