doc_curation.pdf

Curate and process pdf files.

doc_curation.pdf.compress_with_gs(input_file_path, output_file_path, power=3)[source]

Function to compress PDF and remove text via Ghostscript command line interface

Parameters:power – 0,1,2,3,4
doc_curation.pdf.detext_via_jpg(input_file_path, output_file_path)[source]
doc_curation.pdf.detext_via_ps(input_file_path, output_file_path)[source]
doc_curation.pdf.detext_with_pdfimages(input_file_path, output_file_path)[source]

Sometimes does not work satisfactorily - just outputs 2 pages of many. :param input_file_path: :param output_file_path: :return:

doc_curation.pdf.dump_images(input_file_path, output_path)[source]
doc_curation.pdf.images_to_pdf(image_dir, output_path)[source]
doc_curation.pdf.split_into_small_pdfs(pdf_path, output_directory=None, start_page=1, end_page=None, small_pdf_pages=25)[source]