Compressing PDF documents to oblivion

  • Thread starter Wrichik Basu
  • Start date
  • Tags
    Pdf
In summary,The student is applying to several universities for a Masters degree and is having difficulty compressing all the required documents into a single 5 MB file. The universities are asking the student to compress the documents using Adobe's "Optimize" feature, but the compression only achieves a 100 KB decrease. The student contacted the admissions team and was told to try harder. Some of the scanned documents were not OCR'ed correctly and the text became illegible when downscaled to 100 ppi. The total package represents around 15 MB when compressed using the "Optimize" feature and the text is selectable.
  • #1
Wrichik Basu
Science Advisor
Insights Author
Gold Member
2,138
2,713
I am applying to several universities for Masters. The universities are asking me to compress all the required documents into a single 5 MB file.

I have to submit my 10th and 12th standard marksheets and gradesheets, along with their reverse side. These documents are around 5 MB each because they are digitally signed and downloaded from the official Govt. website. I distilled these using the Adobe Acrobat Print to PDF (I have the paid version of Adobe), and the total size decreased to 12 MB. Add to these the scans of my UG semester transcripts, which are 1.3 MB total. Upon that, a photo ID card, so a total of around 15 MB.

After combining all these documents into a single file, I tried compressing using the "Optimize" feature in Adobe. With the default settings, the size decreased by just 100 KB. If I force further compression by downscaling all images to 100 ppi, the scanned documents become illegible. I also tried to compress each file separately, but without success.

I tried a few online services as well, but in vain.

I contacted the admissions team regarding the issue. The reply was pretty straightforward: "You say you can't do it, but a lot of students are doing it. We are confident you can do it if you try hard enough. If you still can't, please don't apply."

Any assistance is appreciated.
 
Computer science news on Phys.org
  • #2
Provide a link to documents that are available on the web.
You may have to OCR some of the documents.
 
  • Like
Likes Vanadium 50 and russ_watters
  • #3
Are you using ZIP tools like 7-Zip or WinZip?
 
  • #4
Baluncore said:
Provide a link to documents that are available on the web.
You may have to OCR some of the documents.
No links accepted. Even if they did, the documents have to be downloaded by logging in and are inaccessible to the public.

All documents have already been OCR'ed.
Borg said:
Are you using ZIP tools like 7-Zip or WinZip?
Zip files not accepted. Only PDF, max 5 MB.
 
  • #5
Wrichik Basu said:
If I force further compression by downscaling all images to 100 ppi, the scanned documents become illegible.
What types of images are these? If they are .bmp images that were generated from a Microsoft copy command, they could be 2 MB each (it drives me crazy when people send me massive emails like that). Changing the resolution won't affect the size of those very much because they have a bunch of OLE code associated with them. You might try capturing a screenshot of the images and saving them as jpeg or png files which would be much smaller. Just be sure to delete the old MS code.
 
  • Like
Likes Algr and phinds
  • #7
Borg said:
What types of images are these?
No idea. I downloaded the files as PDF, did not create them. I verified the signature on the documents, and the distilled them. Regarding downscaling images, I am talking about this option in Adobe: OptimizeAdvanced Optimization.

1680539683773.png
 
  • #8
Wrichik Basu said:
All documents have already been OCR'ed.
I don't think that's true. I think when you print to pdf you rasterize the entire document, including the text. This makes a massive difference in file size. Like, an oder of magnitude or more. You need the text stored as text. The easy way to check is if the text in the docs is selectable.

Some of what you scanned, too - is it available in a digital to digital pdf from the original source? OCR is limited and graphics on original docs are often vector as well (so, much smaller).
Wrichik Basu said:
Zip files not accepted. Only PDF, max 5 MB.
Zip files don't help here anyway - they do very little on pictures because they have to be lossless.
 
Last edited:
  • #9
russ_watters said:
I don't think that's true. When you print to pdf you rasterize the entire document, including the text. This makes a massive difference in file size. Like, an oder of magnitude or more. You need the text stored as text. The easy way to check is if the text in the docs is selectable.
The text is selectable. I re-did the OCR after distilling.
 
  • Like
Likes russ_watters
  • #10
Can you indicate roughly how many pages of each kind of material the total package represents? PDF is usually pretty good,,,,something seems amiss. I would also look at which Docs are least compressed by the advanced compression routines and concentrate on those first.
Of course on a more humorous note it would seem prudent to compress the life out of any parts of your record you would like to de-emphasize. Maybe they will only look at the really good parts if the rest is nearly illegible!!
 
  • #11
Did it. Finally.

Converted each downloaded PDF to PNG with low quality setting. Converted those PNG files using an online service (https://png2pdf.com) and then combined those PDF files in Adobe. The final file size is 4116 KB.

And now, the university website is down.

Anyway, a pretty bad method, but the documents are still legible.
 
  • Like
Likes Nik_2213, DrClaude, Borg and 2 others

FAQ: Compressing PDF documents to oblivion

How does compressing a PDF document affect its quality?

Compressing a PDF document can significantly reduce its file size, but it may also lead to a loss in quality. The extent of this loss depends on the compression method used and the settings chosen. It is important to strike a balance between file size reduction and maintaining an acceptable level of quality.

What are the benefits of compressing a PDF document?

Compressing a PDF document can make it easier to share and upload, as it reduces the file size. This can result in faster upload and download times, as well as reduced storage space requirements. Compressing PDFs can also help improve overall document management efficiency.

Are there different methods for compressing PDF documents?

Yes, there are various methods for compressing PDF documents, including lossy and lossless compression techniques. Lossy compression reduces file size by removing some data and can result in a loss of quality. Lossless compression, on the other hand, reduces file size without sacrificing quality by eliminating redundant data.

Can compressed PDF documents be easily restored to their original quality?

Once a PDF document has been compressed, it may be challenging to restore it to its original quality. While some compression methods allow for minimal loss of quality, fully restoring a document to its original state may not be possible. It is important to carefully consider the level of compression needed before applying it to a PDF document.

What are some tips for effectively compressing PDF documents?

Some tips for effectively compressing PDF documents include adjusting the resolution of images, removing unnecessary elements such as annotations and bookmarks, and using the appropriate compression settings. It is also recommended to test the compressed document to ensure that the quality meets your requirements before sharing or storing it.

Back
Top