Why is text selection from PDF in Preview not working properly on Mac mini?

Sometimes I like to copy text out of a PDF file into TextEdit for various purposes, such as transferring it to a text to speech reader so I can listen to it on the go. One of the periodicals I get in PDF format seems to have changed the way text is selected that makes this difficult.


I know the trick of holding down option when you select so that you select only one column, as trying to select multiple columns usually results in the copied text being in a jumbled order. However, when I select a column of text, it's highlighted with a strange, blue overlapping gradient. This must have something to do with the way the PDF was created.


When transferred to a text file, some of the lines of text may be in the wrong order and some of the lines have not copied at all. Needless to say, 80 or 90% accuracy when copying text out of a PDF it's not good enough. I have tried the same technique with older PDFs I have and it works normally.


If Preview is not the best app to extract text from a PDF, is there another one that's better?



[Re-Titled by Moderator]

Mac mini, macOS 13.3

Posted on Apr 12, 2025 11:52 AM

Reply
Question marked as Top-ranking reply

Posted on Apr 13, 2025 5:54 AM

One possibility is the PDF author chose to scan the original content to PDF — deliberately creating a PDF wrapper around the image of the text you see as a copy prevention. That would seem a waste of their time as depending on the application performing the document layout, they may have been able to place user restrictions on PDF copying. In this case, the text would need to be subjected to optical character recognition (OCR) to allow you to select it properly.¹


Consider in the Finder, clicking once on that PDF and then pressing option+cmd+i to open an Information panel that should tell you the content creator and encoding software.


I have a two-column, unscanned PDF created by Pages and using option+drag, I can select the first column of that PDF page, and copy/paste it into TextEdit (or Pages). That selection only highlights in blue the actual lines of text, not the interim spacing. Blue because of my System Setting > Appearance's Highlight color is set to Blue. The paste operation does not preserve the paragraph separators and every line of the pasted text ends in a newline. That is normal for text copied/pasted from a PDF since PDFs are not word-processing documents.


¹ Apple introduced the ability to OCR text images in a PDF with Preview beginning with Sonoma (14.*) and this places a text registration over the image text allowing one to perform a text search or selection. One chooses File menu > Export and on that panel choose the export type to PDF. If the PDF is the result of a scanning operation, you will see the following on that Export panel, where Embed Text is Apple-speak for Optical Character Recognition. The Embed Text option will not be present if the PDF is not scanned. If scanned, you export the PDF with a different name, perhaps as foo_ocr.pdf.


Similar questions

4 replies
Question marked as Top-ranking reply

Apr 13, 2025 5:54 AM in response to Timothy Arends1

One possibility is the PDF author chose to scan the original content to PDF — deliberately creating a PDF wrapper around the image of the text you see as a copy prevention. That would seem a waste of their time as depending on the application performing the document layout, they may have been able to place user restrictions on PDF copying. In this case, the text would need to be subjected to optical character recognition (OCR) to allow you to select it properly.¹


Consider in the Finder, clicking once on that PDF and then pressing option+cmd+i to open an Information panel that should tell you the content creator and encoding software.


I have a two-column, unscanned PDF created by Pages and using option+drag, I can select the first column of that PDF page, and copy/paste it into TextEdit (or Pages). That selection only highlights in blue the actual lines of text, not the interim spacing. Blue because of my System Setting > Appearance's Highlight color is set to Blue. The paste operation does not preserve the paragraph separators and every line of the pasted text ends in a newline. That is normal for text copied/pasted from a PDF since PDFs are not word-processing documents.


¹ Apple introduced the ability to OCR text images in a PDF with Preview beginning with Sonoma (14.*) and this places a text registration over the image text allowing one to perform a text search or selection. One chooses File menu > Export and on that panel choose the export type to PDF. If the PDF is the result of a scanning operation, you will see the following on that Export panel, where Embed Text is Apple-speak for Optical Character Recognition. The Embed Text option will not be present if the PDF is not scanned. If scanned, you export the PDF with a different name, perhaps as foo_ocr.pdf.


Apr 12, 2025 11:27 PM in response to Timothy Arends1

Timothy ~ An AI app can extract text from an uploaded PDF. For example, ChatGPT (macOS 14+, M1+ chip)...


"With multiple columns, accuracy can depend on how the text is formatted inside the file. If it’s a true digital PDF (not a scan), I can usually extract the columns cleanly. But if it’s a scanned image of text, layout recognition might be trickier, especially if columns aren’t clearly separated."


õ¿õ¬

This thread has been closed by the system or the community team. You may vote for any posts you find helpful, or search the Community for additional answers.

Why is text selection from PDF in Preview not working properly on Mac mini?

Welcome to Apple Support Community
A forum where Apple customers help each other with their products. Get started with your Apple Account.