How to copy Tibetan from PDF?

Looking for translations, or for help with translations and transliterations? This is the place.
Post Reply
dharmabum9
Posts: 3
Joined: Thu Dec 29, 2016 2:09 am

How to copy Tibetan from PDF?

Post by dharmabum9 » Thu Dec 29, 2016 2:26 am

I have a pdf file with English and Tibetan. When i try to copy the text to Pages (Mac), or convert the file by Callibre, or convert it using PDF Reader, instead of Tibetan text i'm getting just a bunch of symbols, e.g.: ",$?- eJ- =%- gR$?- 3*3- 0:A- 3,- 3%:- ;%- , " .
All tibetan fonts that i found in internet are installed, but it doesn't help(
What magic should i perform to get Tibetan from PDF to a word processor?
Thank you!

User avatar
Wayfarer
Posts: 3761
Joined: Sun May 27, 2012 8:31 am
Location: Sydney AU

Re: How to copy Tibetan from PDF?

Post by Wayfarer » Thu Dec 29, 2016 3:21 am

I don't know any Tibetan, but I am a 'documentation guy', I work on corporate documentation systems, and often get asked about converting PDF back to other formats, like Word. Copying text out of PDF is difficult, because it's not intended to produce word-processing files. You can copy and paste english characters OK, but I suspect that the basic algorithm by which Acrobat recognises characters is not programmed to recognize exotic scripts such as Tibetan, and as a result it is just throwing out basically random symbols. I would be surprised if you could overcome that issue - someone else might know better, but I don't think there will be an easy answer to it.
Only practice with no gaining idea ~ Suzuki-roshi

User avatar
Palzang Jangchub
Posts: 858
Joined: Wed Dec 12, 2012 10:19 pm
Contact:

Re: How to copy Tibetan from PDF?

Post by Palzang Jangchub » Thu Dec 29, 2016 5:06 am

There are some posts on the Adobe forums about similar problems. See if this helps, as it's been marked as corrected answered (which means it solved the issue for the OP).

Also occurs to me that despite having several Tibetan fonts installed, you might still need a Tibetan keyboard program in order for them to be recognized with the copy/paste feature. The most intuitive and easy to use one that I've found is Denjong. There's documentation with it, but if you know Wylie that's what it's mapped to (the PDF is still useful for special characters until you get used to them).
Image

"The Sutras, Tantras, and Philosophical Scriptures are great in number. However life is short, and intelligence is limited, so it's hard to cover them completely. You may know a lot, but if you don't put it into practice, it's like dying of thirst on the shore of a great lake. Likewise, a common corpse is found in the bed of a great scholar." ~ Karma Chagme

དྲིན་ཆེན་རྩ་བའི་བླ་མ་སྐྱབས་རྗེ་མགར་ཆེན་ཁྲི་སྤྲུལ་རིན་པོ་ཆེ་ཁྱེད་མཁྱེན་ནོ།།
རྗེ་བཙུན་བླ་མ་མཁས་གྲུབ་ཀརྨ་ཆགས་མེད་མཁྱེན་ནོ། ཀརྨ་པ་མཁྱེན་ནོཿ

dharmabum9
Posts: 3
Joined: Thu Dec 29, 2016 2:09 am

Re: How to copy Tibetan from PDF?

Post by dharmabum9 » Fri Dec 30, 2016 12:05 am

Thank you, people! Checking out these options...

mikenz66
Posts: 112
Joined: Mon Apr 06, 2009 1:10 am
Location: New Zealand

Re: How to copy Tibetan from PDF?

Post by mikenz66 » Fri Dec 30, 2016 1:52 am

Hi dharmabum9,

This is a perennial problem with special characters. Many Pali or Sanskrit PDFs also give gibberish when copied. This problem is solved with newer encoding schemes, but files generated with older software will continue to be problematical. Older software mapped character codes above 132 to particular embedded characters and when you cut and paste, your word processor is using some different character set, so it's gibberish...

For example, if you cut and paste the first verse from this chanting manual http://www.accesstoinsight.org/lib/auth ... anting.pdf (Page 1), you'll get:
Morning Chanting
Arahaª samm›-sambuddho bhagav›.
The Blessed One is Worthy & Rightly Self-awakened.
Buddhaª bhagavantaª abhiv›demi.
I bow down before the Awakened, Blessed One.
(BOW DOWN)
This one gets the Pali characters right but has some other problems with the up-down indications:
http://cdn.amaravati.org/wp-content/upl ... -1-Web.pdf
Arahaṃ sammāsambuddho bha꜕gavā 3
The Lord, the Perfectly Enlightened and Blessed One —
Buddhaṃ bha꜕gavantaṃ a꜕bhivādemi
I render homage to꜕ the Bu꜓ddha, the Blessed One. [ bow ]
If I copy and paste text from a modern Widsom Publications PDF, the Pali comes out fine:
Ye puratthimāya disāya pāṇā paraṃ yojanasataṃ tesu daṇḍaṃ nikkhipāhi.
Nāhaṃ kvacana, kassaci kiñcanatasmiṃ, na ca mama kvacana, katthaci kiñcanatātthi.
...
It's possible to remap the characters if there are not too many of them. This was discussed over here:
http://www.dhammawheel.com/viewtopic.ph ... 24#p378601

:anjali:
Mike

Post Reply

Who is online

Users browsing this forum: No registered users and 7 guests