Should I switch? Let’s compare Google Document AI and Azure Document AI

Didik Mulyadi
6 min readJul 20, 2024

--

Photo by Piotr Makowski on Unsplash

Recognizing both the strengths and weaknesses of Google’s solution, the team is proactively seeking competitors or alternative tools that can overcome the identified limitations.

The reason why Google Doc AI feels lacking:

  1. It does not support table labeling, so labeling the transaction list takes more effort. if keep labeling the table, the accurate result is challenging.
  2. the extraction speed performance (without tuning) is very slow, with a 1-page e-statement (~167KB) it takes 22–24 seconds.
  3. there’s no option to upgrade the server to increase the extraction speed.
  4. the cost of the deployed model version is $0.05 per hour per deployed version, which means $36/month or IDR 576.000/month ($1 = 16K), and it is quite expensive.
  5. To see the JSON result, we should call the endpoint of the deployed version.

After researching some big platforms e.g. Chat GPT, AWS, Huawei, and Azure. I only found the Doc AI service with a custom extractor on the Azure Cloud.

So, why am I interested in the Azure AI Document Intelligence? before I created an account on that cloud, I found these things:

  1. It supports table labeling, it’s making the team easier to label many documents with many transactions.
  2. I can’t determine which one that’s has better performance, but Azure provides a custom container to deploy the model. If the default Doc AI server is not good, we have an option to try the custom container.
  3. I did not find a cost for the deployed version if it’s free, that’s very good. We will know after I deploy the version.
  4. We can see the JSON result when we test the deployed version

The points that need to be defined:

  1. The Performance (Extraction Speed and Accuracy)
  2. The Cost

Base Model (Production-Ready only)

Google and Azure do not use Gemini AI or Chat GPT as a base model, both of them use the deep learning model as a foundation model.

in Google, Gemini is used to help the trainer label the document and is involved in the fine-tuning process.

Google has a release candidate model uses Gemini 1.0/1.5 Pro LLM as a foundation model.

Fine Tuning

Fine-tuning is a process of adjusting the parameters of the underlying AI model based on the labeled data.

Labeling is creating a recipe and fine-tuning is cooking the receipt.

Google Cloud

fine-tuning works by leveraging Generative AI to identify the patterns in the labeled data and then adjusting the weights and biases within the foundation model to improve its ability to recognize the patterns. So, that’s why the fine-tuning needs a more trained document because it relies on labeled data.

Azure Cloud

we can’t see the fine-tuning action in their portal but it utilizes fine-tuning differently. It’s different from Google in that you should manually fine-tune the model, in Azure, when you train the model it also fine-tunes the model to become more accurate.

Fine Tuning Cost

the fine-tuning cost should be free in both cloud providers, no article mentions the cost.

Storage Cost

Google Cloud

the uploaded document for the dataset is not directly connected to the cloud storage so the storage cost is free.

Azure Cloud

when we label a document, the uploaded document will exist on the cloud storage with 3: original.pdf, original.pdf.labels.json (stored the label), and original.ocr.json (stored the result). We will be charged for the storage.

Data Transfer Cost

Google Cloud

It’s free

Azure Cloud

Because I set the region as EAST US, I will be charged for “Bandwidth Inter-Region — Intra Continent — North America”. It might happen in the labeling process because we upload the document from a local computer or there’s a data transfer behind the Azure Document Intelligence between services.

https://azure.microsoft.com/en-us/pricing/details/bandwidth/

Table Labeling

Google Cloud

They don't provide table labeling, so we have 2 options to label the list data:

  1. Draw a polygon area for each row transaction, it’s easy to label but we need to split the column on the backend. If there are many document types, the backend should create many splitters. If there are so many transactions it will take time.
  2. Create a labeling group, it takes more time than point 1. We don't need a custom extractor in the backend but the labeling process is challenging.

Azure Cloud

They provide it, so the team can easily scan the list of transactions on the document. It will show the table icon on the document after running the layout scanning,

Deployed Version Cost

This is another cost besides the extraction document cost.

Google Cloud

As mentioned before, they charge $0.05 per hour for each deployed version no matter how long the request or the idle, it’s an hour-based.

Azure Cloud

After I deploy the version, it’s free. So we can activate the version for 24 hours.

Extraction Cost by API

The documentation said Google Cloud and Azure have the same cost for the extraction page, it’s $30/1000 pages or $0,03/page. For Google Cloud, I will calculate this with the real cost that was charged to me.

Google Cloud does not provide free extraction, but Azure has free extraction of 500 pages/month

Google Cloud

The doc price is different from the real charge, it said the charge is $0.03/page but the real is $0.045/page.

Azure Cloud

it said the charge is $0.03/per page, I can't see the real charge because there is a free for 500 pages/month for 12 months.

Extraction Performance

When processing the list transactions, I think Azure is more accurate than Google because the base model can detect the table, with table labeling increasing the accuracy and speed extraction performance.

Google Cloud

We already did it for Google Document AI, it takes 22–24 seconds. link.

Azure Cloud

With the same e-statement file, I call the service from my local computer to the deployed model version that is located in EAST US. it takes 9–11 seconds with confidence 0.987.

Conclusion

Accuracy to Extract Simple Data (WIN: Google and Azure)

For example: bank name, account name, account number, etc.
Google and Azure have good accuracy.

Accuracy to Extract Table (WIN: Azure)

For example: the history of transactions
Labeling the table with Google is hard because we do a manual process to label the transactions. In Azure, it is easy to extract the table.

Extraction Performance (WIN: Azure)

Google takes +20 seconds and Azure takes +9 seconds.

Cost (WIN: Azure)

Google charges for 2 things: Extraction ($0.045/page) and Hosting ($0.075/hour)

Azure charges for 3 things: Extraction ($0.03/page), Cloud Storage, and Bandwidth.

For the cost, I choose Azure.

Needs (WIN: Azure)

Because we want to handle different types/variations of documents or forms, Azure provides a feature to “compose” some models into 1. so we can reduce the complexity, and improve efficiency and flexibility If there is a new variant, we just develop a new model and compose it.

Winner

I go with Azure Document Intelligence

--

--