Extracting OCR Datasets For ML Models