I have a pdf file with English and Tibetan. When i try to copy the text to Pages (Mac), or convert the file by Callibre, or convert it using PDF Reader, instead of Tibetan text i'm getting just a bunch of symbols, e.g.: ",$?- eJ- =%- gR$?- 3*3- 0:A- 3,- 3%:- ;%- , " .
All tibetan fonts that i found in internet are installed, but it doesn't help(
What magic should i perform to get Tibetan from PDF to a word processor?
Thank you!
How to copy Tibetan from PDF?
Re: How to copy Tibetan from PDF?
I don't know any Tibetan, but I am a 'documentation guy', I work on corporate documentation systems, and often get asked about converting PDF back to other formats, like Word. Copying text out of PDF is difficult, because it's not intended to produce word-processing files. You can copy and paste english characters OK, but I suspect that the basic algorithm by which Acrobat recognises characters is not programmed to recognize exotic scripts such as Tibetan, and as a result it is just throwing out basically random symbols. I would be surprised if you could overcome that issue - someone else might know better, but I don't think there will be an easy answer to it.
'Only practice with no gaining idea' ~ Suzuki Roshi
- Palzang Jangchub
- Posts: 1008
- Joined: Wed Dec 12, 2012 10:19 pm
- Contact:
Re: How to copy Tibetan from PDF?
There are some posts on the Adobe forums about similar problems. See if this helps, as it's been marked as corrected answered (which means it solved the issue for the OP).
Also occurs to me that despite having several Tibetan fonts installed, you might still need a Tibetan keyboard program in order for them to be recognized with the copy/paste feature. The most intuitive and easy to use one that I've found is Denjong. There's documentation with it, but if you know Wylie that's what it's mapped to (the PDF is still useful for special characters until you get used to them).
Also occurs to me that despite having several Tibetan fonts installed, you might still need a Tibetan keyboard program in order for them to be recognized with the copy/paste feature. The most intuitive and easy to use one that I've found is Denjong. There's documentation with it, but if you know Wylie that's what it's mapped to (the PDF is still useful for special characters until you get used to them).
"The Sutras, Tantras, and Philosophical Scriptures are great in number. However life is short, and intelligence is limited, so it's hard to cover them completely. You may know a lot, but if you don't put it into practice, it's like dying of thirst on the shore of a great lake. Likewise, a common corpse is found in the bed of a great scholar." ~ Karma Chagme
དྲིན་ཆེན་རྩ་བའི་བླ་མ་སྐྱབས་རྗེ་མགར་ཆེན་ཁྲི་སྤྲུལ་རིན་པོ་ཆེ་ཁྱེད་མཁྱེན་ནོ།།
རྗེ་བཙུན་བླ་མ་མཁས་གྲུབ་ཀརྨ་ཆགས་མེད་མཁྱེན་ནོ། ཀརྨ་པ་མཁྱེན་ནོཿ
-
- Posts: 3
- Joined: Thu Dec 29, 2016 2:09 am
Re: How to copy Tibetan from PDF?
Thank you, people! Checking out these options...
Re: How to copy Tibetan from PDF?
Hi dharmabum9,
This is a perennial problem with special characters. Many Pali or Sanskrit PDFs also give gibberish when copied. This problem is solved with newer encoding schemes, but files generated with older software will continue to be problematical. Older software mapped character codes above 132 to particular embedded characters and when you cut and paste, your word processor is using some different character set, so it's gibberish...
For example, if you cut and paste the first verse from this chanting manual http://www.accesstoinsight.org/lib/auth ... anting.pdf (Page 1), you'll get:
http://cdn.amaravati.org/wp-content/upl ... -1-Web.pdf
http://www.dhammawheel.com/viewtopic.ph ... 24#p378601
Mike
This is a perennial problem with special characters. Many Pali or Sanskrit PDFs also give gibberish when copied. This problem is solved with newer encoding schemes, but files generated with older software will continue to be problematical. Older software mapped character codes above 132 to particular embedded characters and when you cut and paste, your word processor is using some different character set, so it's gibberish...
For example, if you cut and paste the first verse from this chanting manual http://www.accesstoinsight.org/lib/auth ... anting.pdf (Page 1), you'll get:
This one gets the Pali characters right but has some other problems with the up-down indications:Morning Chanting
Arahaª samm›-sambuddho bhagav›.
The Blessed One is Worthy & Rightly Self-awakened.
Buddhaª bhagavantaª abhiv›demi.
I bow down before the Awakened, Blessed One.
(BOW DOWN)
http://cdn.amaravati.org/wp-content/upl ... -1-Web.pdf
If I copy and paste text from a modern Widsom Publications PDF, the Pali comes out fine:Arahaṃ sammāsambuddho bha꜕gavā 3
The Lord, the Perfectly Enlightened and Blessed One —
Buddhaṃ bha꜕gavantaṃ a꜕bhivādemi
I render homage to꜕ the Bu꜓ddha, the Blessed One. [ bow ]
It's possible to remap the characters if there are not too many of them. This was discussed over here:Ye puratthimāya disāya pāṇā paraṃ yojanasataṃ tesu daṇḍaṃ nikkhipāhi.
Nāhaṃ kvacana, kassaci kiñcanatasmiṃ, na ca mama kvacana, katthaci kiñcanatātthi.
...
http://www.dhammawheel.com/viewtopic.ph ... 24#p378601
Mike