This project contains the companion code for GPT-4o versus Azure Document Intelligence and Azure Computer Vision OCR
Notebook Name | Description |
---|---|
PdfToTextPages.ipynb | Baseline, makes a text file for each page using pypdf to extract the text |
PdfToPageImages.ipynb | Given a folder of PDF files, converts each page of each PDF file into JPEG images using a resolution of 300 dpi, and saves them in a structured directory format. |
DocIntelligencePipeline.ipynb | C# Polyglot notebook with functions to convert an entire PDF to markdown and another that creates an OCR markdown file from each image created using PdfToPageImages.ipynb |
turbo-2024-04-09.ipynb | Azure Open AI using GPT-4 with vision to create a markdown file for each image created using PdfToPageImages.ipynb |
v4omni.ipynb | OpenAI using GPT-4o to create a markdown file for each image created using PdfToPageImages.ipynb |
v4omni-image-plus-docIntelOcr.ipynb | OpenAI using GPT-4o to create a markdown file for each image created using PdfToPageImages.ipynb grounded with OCR text created using DocIntelligencePipeline.ipynb |
visionWithOcr.ipynb | Azure Computer Vision GPT4-Vision OCR |
visionWithOcrAndGrounding.ipynb | Azure Computer Vision GPT4-Vision OCR and grounding |