Builds a token dataframe from the text OCRed by Document AI (DAI) in an asynchronous request. Rows are tokens, in the order DAI proposes to read them. Columns are location variables such as page coordinates and block bounding box numbers.

build_token_df(json)

Arguments

json

filepath of a JSON file obtained using dai_async()

Value

a token data frame

Details

The location variables are: start index, end index, left boundary, right boundary, top boundary, bottom boundary, page number, and block number. Start and end indices refer to character position in the string containing the full text.

Examples

if (FALSE) { token_df <- build_token_df("pdf_output.json") }