How to copy Tibetan from PDF?

How to copy Tibetan from PDF?

Post by dharmabum9 » Thu Dec 29, 2016 2:26 am

I have a pdf file with English and Tibetan. When i try to copy the text to Pages (Mac), or convert the file by Callibre, or convert it using PDF Reader, instead of Tibetan text i'm getting just a bunch of symbols, e.g.: ",$?- eJ- =%- gR$?- 3*3- 0:A- 3,- 3%:- ;%- , " .
All tibetan fonts that i found in internet are installed, but it doesn't help(
What magic should i perform to get Tibetan from PDF to a word processor?
Thank you!

Re: How to copy Tibetan from PDF?

Post by Wayfarer » Thu Dec 29, 2016 3:21 am

I don't know any Tibetan, but I am a 'documentation guy', I work on corporate documentation systems, and often get asked about converting PDF back to other formats, like Word. Copying text out of PDF is difficult, because it's not intended to produce word-processing files. You can copy and paste english characters OK, but I suspect that the basic algorithm by which Acrobat recognises characters is not programmed to recognize exotic scripts such as Tibetan, and as a result it is just throwing out basically random symbols. I would be surprised if you could overcome that issue - someone else might know better, but I don't think there will be an easy answer to it.
Re: How to copy Tibetan from PDF?

Post by Palzang Jangchub » Thu Dec 29, 2016 5:06 am

There are some posts on the Adobe forums about similar problems. See if this helps, as it's been marked as corrected answered (which means it solved the issue for the OP).

Also occurs to me that despite having several Tibetan fonts installed, you might still need a Tibetan keyboard program in order for them to be recognized with the copy/paste feature. The most intuitive and easy to use one that I've found is Denjong. There's documentation with it, but if you know Wylie that's what it's mapped to (the PDF is still useful for special characters until you get used to them).

Re: How to copy Tibetan from PDF?

Post by dharmabum9 » Fri Dec 30, 2016 12:05 am

Thank you, people! Checking out these options...

Re: How to copy Tibetan from PDF?

Post by mikenz66 » Fri Dec 30, 2016 1:52 am

Hi dharmabum9,

This is a perennial problem with special characters. Many Pali or Sanskrit PDFs also give gibberish when copied. This problem is solved with newer encoding schemes, but files generated with older software will continue to be problematical. Older software mapped character codes above 132 to particular embedded characters and when you cut and paste, your word processor is using some different character set, so it's gibberish...

For example, if you cut and paste the first verse from this chanting manual ... anting.pdf (Page 1), you'll get:
Morning Chanting
Arahaª samm›-sambuddho bhagav›.
The Blessed One is Worthy & Rightly Self-awakened.
Buddhaª bhagavantaª abhiv›demi.
I bow down before the Awakened, Blessed One.
This one gets the Pali characters right but has some other problems with the up-down indications: ... -1-Web.pdf
Arahaṃ sammāsambuddho bha꜕gavā 3
The Lord, the Perfectly Enlightened and Blessed One —
Buddhaṃ bha꜕gavantaṃ a꜕bhivādemi
I render homage to꜕ the Bu꜓ddha, the Blessed One. [ bow ]
If I copy and paste text from a modern Widsom Publications PDF, the Pali comes out fine:
Ye puratthimāya disāya pāṇā paraṃ yojanasataṃ tesu daṇḍaṃ nikkhipāhi.
Nāhaṃ kvacana, kassaci kiñcanatasmiṃ, na ca mama kvacana, katthaci kiñcanatātthi.
It's possible to remap the characters if there are not too many of them. This was discussed over here: ... 24#p378601


