How to Build an Android Document Scanner App with OCR

Sanity Image

Document scanner apps are one of the more popular tools that make smart use of mobile device features, like the built-in camera and touch-screen, to make scanning both convenient and practical for virtually anyone.

For example: Google Drive’s document scanning feature lets you take pictures of items such as receipts, letters, billing statements, etc. and save them as PDFs on your Drive. But the resulting PDF document only contains static images without any interactive text.

In this post, we recreate this feature and take it a step further, teaching you how to create a document scanner app for Android using Apryse’s OCR module. This makes text in your scanned documents searchable and selectable. And since we’re using Apryse to view the resulting PDF file, we can also annotate and edit the document!

Sample code for this post and an Android document scanner example can be found on Github, and you can try our sample by installing the APK.

To keep things simple, the OCR portion uses Google's ML Kit Text Recognition APIs, while the client scanner app is based on our fork of a third-party Android document scanner library, AndroidScannerDemo.

Client Setup for Android Document Scanner with Apryse SDK

Copied to clipboard
  1. Create a new Android project using Android Studio.
  2. Add Google's ML Kit Text Recognition Android libraries as described in the ML Kit guide.
  3. Download the following AAR file and add the AAR as a new module dependency in your project.
  4. Integrate the Apryse library via Gradle, as described here.
 // Add callback to handle returned image from scanner val scannerLauncher = registerForActivityResult(ScannerContract()) if (uri != null)  // Obtain the bitmap and save as a local image file var bitmap: Bitmap? = null bitmap = MediaStore.Images.Media.getBitmap(contentResolver, uri) contentResolver.delete(uri. null, null) // Save bitmap to local cache as image then upload for processing val localJpeg = Utils.saveBitmapAsJpeg(bitmap) // Process image using ML Kit processOCR(imgWidth, imgHeight, image, localJpeg) > > . // Launch the scanner activity scannerLauncher.launch(ScanConstants.OPEN_CAMERA) 

5. Next, as mentioned previously, the Android app will use our fork of a third-party Android document scanner library, found here. We'll use this library to capture, crop, and filter images using the built-in camera.

You can launch the scanner and handle the returned image by calling the following in your MainActivity. (Note: The processOCR method will be implemented later in the guide.)

 private fun processOCR( imgWidth: Double, imgHeight: Double, image: InputImage, localJpeg: File )   val result = TextRecognition.getClient().process(image) .addOnSuccessListener < visionText ->// Create the PDF containing the recognized text val outputPath = createPDF(imgWidth, imgHeight, localJpeg, visionText) // Open the document in the viewer val config = ViewerConfig.Builder().openUrlCachePath(cacheDir.absolutePath).build() DocumentActivity.openDocument( this@MainActivity, Uri.fromFile(outputPath), config ) > > private fun createPDF( imgWidth: Double, imgHeight: Double, localJpeg: File, visionText: com.google.mlkit.vision.text.Text ): File  val doc = PDFDoc() val outputFile = File( this.filesDir, com.pdftron.pdf.utils.Utils.getFileNameNotInUse( "scanned_doc_output.pdf" ) ) // First convert the image to a PDF Doc Convert.toPdf(doc, localJpeg.absolutePath) val page = doc.getPage(1) // currently this sample only supports 1 page val ratio = page.pageWidth / imgWidth; // We will need to generate a JSON containing the text data, which will be used // to insert the text information into the PDF document val jsonWords = JSONArray() for (block in visionText.textBlocks)  for (line in block.lines)  for (element in line.elements)  val elementText = element.text val elementFrame = element.boundingBox val pdfRect = androidRectToPdfRect(elementFrame. ratio, imgHeight) pdfRect.normalize() val word = JSONObject() word.put("font-size", (pdfRect.y2 - pdfRect.y1).toInt()) word.put("length", (pdfRect.x2 - pdfRect.x1).toInt()) word.put("text", elementText) word.put("orientation", "U") word.put("x", pdfRect.x1.toInt()) word.put("y", pdfRect.y1.toInt()) jsonWords.put(word) > > > val jsonObj = JSONObject() val jsonPages = JSONArray() val jsonPage = JSONObject() jsonPage.put("Word", jsonWords) jsonPage.put("num", 1) // Only supports one page jsonPage.put("dpi", 96) jsonPage.put("origin", "BottomLeft") jsonPages.put(jsonPage) jsonObj.put("Page", jsonPages) OCRModule.applyOCRJsonToPDF(doc, jsonObj.toString()); doc.save(outputFile.absolutePath, SDFDoc.SaveMode.LINEARIZED, null) return outputFile > 

6. Now let's add code that will help us with the OCR portion, which creates searchable and selectable text from static images. There are two steps: process the image using ML Kit, and then create a PDF using the scanned image and processed text.

In your MainActivity, add the following methods:

Now you can capture a physical document, upload it to ML Kit for text recognition, and open the text-searchable and -selectable PDF document in the Apryse viewer.