Publican is a single source publishing tool. It uses DocBook XML files as input sources and can generate outputs in a variety of formats, such as HTML, PDF, ePub, Man, WebHelp, etc. A number of Open Source projects on the Internet, like Debian, Fedora, GIMP, etc, use Publican as the tool for generating documentation.
There is built-in support in Publican for many languages including Traditional and Simplified Chinese. However, some preparations are required before Publican can generate satisfactory Chinese PDF documents.
Install Default Chinese Fonts
When the language code for generating a document is set to zh-TW for Traditional Chinese or zh-CN for Simplified Chinese, Publican will use AR PL UMing TW (AR PL ShanHeiSun Uni) and AR PL UMing CN (ZYSong18030), respectively, to render the text in the document. If Publican can't find the proper font, it will fall back to using the default Liberation font which doesn't contain any Chinese glyphs.
So, the first step to have Chinese text properly rendered in a PDF is to install the required Chinese font. On a Debian Linux platform, e.g. Ubuntu, you can use following command to install AR PL UMing font which includes both Traditional and Simplified Chinese glyphs.
$ sudo apt-get install fonts-arphic-uming
Now you can try to generate a PDF with Chinese text using Publican. Assuming you have already set up a Publican document folder and a language translation sub-folder, say zh-TW. Then, you may generate the PDF with this command when in that document folder:
$ publican build --formats=pdf --langs=zh-TW
You may check the generated PDF in sub-folder tmp/zh-TW/pdf using a PDF viewer. Most Chinese characters should be rendered as expected, but some texts are shown as a string of '#' characters.
Install Extra Chinese Fonts
It turns out that Publican, actually FOP (the underlying PDF rendering engine on most installlations), in some places expects a font also comes with its italic and/or bold variants. Unfortunately, Chinese fonts usually come with only one style. When FOP tries to generate a PDF code for a block of text, but can't find the specified font or style, it uses the default font, and replaces that block of text with a string of '#' characters if default font doesn't have those glyphs.
What we have to do is using different Chinese fonts to fake italic, bold and italic-bold styles for AR PL UMing TW font. First, we have to install more Chinese fonts.
$ sudo apt-get install fonts-arphic-ukai $ sudo apt-get install fonts-wqy-microhei $ sudo apt-get install fonts-cwtex-fs
The first two contain both Traditional and Simplified Chinese glyphs. The remaining one only supports Traditinal Chinese.
Fake Italic, Bold and Italic-Bold Styles
With extra Chinese fonts installed, we can proceed to modify the file /usr/share/publican/fop/fop.xconf to fake italic, bold and italic-bold styles for AR PL UMing TW font (or AR PL UMing CN if you use language code zh-CN).
As you can see in the file /usr/share/publican/fop/fop.xconf, by the default, Publican asks FOP to automatically detect fonts installed on the system. We will manually create font metrics for the 3 extra Chinese fonts and add entries to make them as various styles for AR PL UMing TW font.
To create font metrics for those fonts, run following commands. The generated font metrics will be placed in folder /usr/share/publican/fop/font-metrics.
$ cd /usr/share/publican/fop $ mkdir font-metrics $ fop-ttfreader -ttcname 'AR PL UKai TW' /usr/share/fonts/truetype/arphic/ukai.ttc font-metrics/ukai-tw.xml $ fop-ttfreader -ttcname 'WenQuanYi Micro Hei' /usr/share/fonts/truetype/wqy/wqy-microhei.ttc font-metrics/wqy-microhei.xml $ fop-ttfreader /usr/share/fonts/truetype/cwtex/cwfs.ttf font-metrics/cwfs.xml
You should now have 3 files, ukai-tw.xml, wqy-microhei.xml and cwfs.xml, in the sub-folder font-metrics. Next, open the file /usr/share/publican/fop/fop.xconf and insert these lines between <fonts> and <auto-detect/>:
<!-- following font tags were added to fake styles not existed in AR PL UMing font. - Kochin -->
<!-- Use WenQuanYi Micro Hei to fake AR PL UMing TW Bold -->
<font metrics-url="/usr/share/publican/fop/font-metrics/wqy-microhei.xml" kerning="yes" embed-url="/usr/share/fonts/truetype/wqy/wqy-microhei.ttc">
<font-triplet name="AR PL UMing TW" style="normal" weight="bold"/>
</font>
<!-- Use cwTeXFangSong to fake AR PL UMing TW Italic -->
<font metrics-url="/usr/share/publican/fop/font-metrics/cwfs.xml" kerning="yes" embed-url="/usr/share/fonts/truetype/cwtex/cwfs.ttf">
<font-triplet name="AR PL UMing TW" style="italic" weight="normal"/>
</font>
<!-- Use AR PL UKai TW to fake AR PL UMing TW Italic Bold -->
<font metrics-url="/usr/share/publican/fop/font-metrics/ukai-tw.xml" kerning="yes" embed-url="/usr/share/fonts/truetype/arphic/ukai.ttc">
<font-triplet name="AR PL UMing TW" style="italic" weight="bold"/>
</font>
Basically those lines tell FOP, when requires different styles of AR PL UMing TW font, to use WenQuanYi Micro Hei for bold, cwTexFanSong for italic, and AR PL UKai TW for italic-bold.
Again, build your Chinese PDF by running the Publican command shown previously. This time the output PDF should contain correct Chinese texts at all places.
Optional: Font with Missing NBSP
(This part only applies when you want to use a Chinese font missing a No-Break Space.)
Before I used WenQuanYi Micro Hei for bold style, I had tried another cwTex font, cwTexHeiBold. It worked mostly, but there was a glyph missing which is used by FOP to generate document's contents. Once again, the missing glyph showed up in the PDF as hashtags.
After inspected the generated PDF codes, I realized the missing glyph is the No-Break Space (NBSP) which has Unicode 0x00A0. One look into the font's metrics file confirmed Unicode 0x00A0 is indeed missing from its CMAP. By the way, cwTexHeiBold's font metrics file can be created with these commands:
$ cd /usr/share/publican/fop $ fop-ttfreader /usr/share/fonts/truetype/cwtex/cwheib.ttf font-metrics/cwheib.xml
Open the font metrics file, /usr/share/publican/fop/font-metrics/cwheib.xml, with a text editor. Inside the <bfranges></bfranges>, insert the following code:
<bf gi="3" ue="160" us="160"/>
This piece of code tells FOP to use glyph 3 to render Unicode 160 (0x00A0). The glyph 3 is for the regular space character. Since cwTexHeiBold doesn't include a glyph for No-Break Space, I use the regular space glyph for it.
Modify the file /usr/share/publican/fop/fop.xconf so that the code between <fonts> and <auto-detect/> becomes:
<!-- following font tags were added to fake styles not existed in AR PL UMing font. - Kochin -->
<!-- Use cwTeXHeiBold to fake AR PL UMing TW Bold -->
<font metrics-url="/usr/share/publican/fop/font-metrics/cwheib.xml" kerning="yes" embed-url="/usr/share/fonts/truetype/cwtex/cwheib.ttf">
<font-triplet name="AR PL UMing TW" style="normal" weight="bold"/>
</font>
<!-- Use cwTeXFangSong to fake AR PL UMing TW Italic -->
<font metrics-url="/usr/share/publican/fop/font-metrics/cwfs.xml" kerning="yes" embed-url="/usr/share/fonts/truetype/cwtex/cwfs.ttf">
<font-triplet name="AR PL UMing TW" style="italic" weight="normal"/>
</font>
<!-- Use AR PL UKai TW to fake AR PL UMing TW Italic Bold -->
<font metrics-url="/usr/share/publican/fop/font-metrics/ukai-tw.xml" kerning="yes" embed-url="/usr/share/fonts/truetype/arphic/ukai.ttc">
<font-triplet name="AR PL UMing TW" style="italic" weight="bold"/>
</font>
Rebuild the document with Publican, the newly generated PDF should now have correctly rendered contents.