Creates a hOCR file from Document AI output.
make_hocr(type, output, outfile_name = "out.hocr", dir = getwd())
one of "sync" or "async" depending on the function used to process the original document.
either a HTTP response object (from dai_sync()
) or
the path to a JSON file (from dai_async
).
a string with the desired filename. Must end with
either .hocr
, .html
, or .xml
.
a string with the path to the desired output directory.
no return value, called for side effects.
hOCR is an open standard of data representation for formatted text obtained from optical character recognition. It can be used to generate searchable PDFs and many other things. This function generates a file compliant with the official hOCR specification (https://github.com/kba/hocr-spec) complete with token-level confidence scores. It also works with non-latin scripts and right-to-left languages.
if (FALSE) { # \dontrun{
make_hocr(type = "async", output = "output.json")
resp <- dai_sync("file.pdf")
make_hocr(type = "sync", output = resp)
make_hocr(type = "sync", output = resp, outfile_name = "myfile.xml")
} # }