Data Extraction’s Cost and Performance Comparison Between Google Document AI and Vertex AI Studio (Gemini) — The Doc Said Different with The Real Charges?
We need to decide what kind of service that will be used to handle the business process at an acceptable cost.
I update the article to add the real charges because the result is different with the mentioned price in the Google Doc AI pricing page.
Cost per Page
The assumption is that if we have a PDF file with 10 pages, we will make a request 10 times.
So, we will calculate with the assumption
1. 100 pages/day
2. 250 input characters per request
3. 1500 output characters per request
Extraction Cost
1. Google Document AI
$30 / 1000 page = $0.03 * 100 = $3
Total = $3/day
On top of that, if you deployed a custom preprocessor, you will be charged $0.05 per hour per deployed version.
That’s based on the documentation, what’s the real charge? here is my billing
The extraction cost is +$0.045/page not $0.03 and the hosting cost is +$0.075/hour not $0.05/hour.
2. Vertex Gemini AI
The doc: https://cloud.google.com/vertex-ai/generative-ai/pricing
2.1 Vertex Gemini AI (1.5 Flash model)
Image: $0.0001315 / page= $0.0001315 * 100 = $0.01315
Input: $0.000125 / 1k char = $0.00003125/250 char * 100 = $0.00625
Output: $0.000375 / 1k char = $0.0005625/1.5k char * 100 = $0.05625
Total: $0.07565/ 100 pages / day
That’s based on the documentation, what’s the real charge? here is my billing
The real charged is
Image: $0.23 / 1179 page = +$0.00019508/page not $0.0001315
Input: $0.04 / 214.741 characters = +$0.00018627/1k char not $0.000125
Output: $0.17 / 299.22 characters = +$0.00056814/1k char not $0.000375
2.2 Vertex Gemini AI (1.5 Pro model)
Image: $0.001315 / page = $0.001315 * 100 = $0.1315
Input: $0.00125 / 1k char = $0.0003125/250 char * 100 = $0.03125
Output: $0.00375 / 1k char = $0.005625/1.5k char * 100 = $0.5625
Total = $0.72525 / 100 pages / day
That’s based on the documentation, what’s the real charge? here is my billing
2.3 Vertex Gemini AI (1.0 Pro model)
Image: $0.0025 / page = $0.0025 * 100 = $0.25
Input: $0.000125 / 1k char = $0.00003125/250 char * 100 = $0.003125
Output: $0.000375 / 1k char = $0.0005625/1.5k char * 100 = $0.05625
Total = $0.309375 / 100 pages / day
That’s based on the documentation, what’s the real charge? here is my billing
Conclusion
Google Document AI is more expensive than Vertex Gemini AI because Google Document AI has a labeling process that makes the extracting result more accurate.
Extraction Performance
The performance is calculated by calling the model via request API from a server in the US (Council Bluffs, Iowa, Amerika Utara) to prevent local internet issues.
When requesting the API, the document will be converted to the base64 format. In this performance calculation, the base64 size that I used is 167 KB or 1 page.
Document AI
Here is the request body that is used, the content should be base64 of the file
{
"fieldMask": "ejsontities",
"skipHumanReview": true,
"rawDocument": {
"mimeType": "application/pdf",
"content": ""
}
}
The request time is 22 ~ 23 seconds.
Vertex AI Studio (Gemini Generative AI)
Here is the request body that is used, the data should base64 of the file.
{
"contents": [
{
"role": "user",
"parts": [
{
"inlineData": {
"mimeType": "application/pdf",
"data": ""
}
},
{
"text": "Extract the data from that file into an object like this {bank,account_name, account_number,estatement_date, transactions: [{ date, detail, amount, balance, category}]}, for the category field, please categorize the transaction based on the detail's field with this options: groceries, transfer IN, top-up e-money, transfer OUT, investment, withdraw, transaction fee, utilities. please consider this sample to determine the category:\nassume it to be top-up e-money if the detail field value is related to an e-money platform e.g. ShopeePay, OVO, Gopay, Dana, etc.\nassume it to be groceries if the detail field value is related to an e-commerce brand e.g. Blibli, TikTok, etc.\nassume it to be an investment if the detail field value is related to an investment context e.g. gold, stocks, etc.\nassume it to utilities if the detail field value is related to the laundry, PLN.\nassume it to transfer OUT if the detail field value context is transfer amount value to the person"
}
]
}
],
"generationConfig": {
"maxOutputTokens": 8192,
"topP": 0.95,
"temperature": 0
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
],
}
Model “gemini-1.5-flash-001”, the request time is 11 ~ 14 seconds.
Model “gemini-1.5-pro-001”, the request time is 20 ~ 28 seconds.
Model “gemini-1.0-pro-vision-001”, the request time is 11~ 16 seconds.
Conclusion
- Gemini AI “gemini-1.5-flash-001”: 11 ~ 14 seconds
- Gemini AI “gemini-1.0-pro-vision-001”: 11 ~ 16 seconds
- Gemini AI “gemini-1.5-pro-001”: 20 ~ 28 seconds
- Document AI: 22 ~ 23 seconds
Conclusion
What’s fit for you? follow these things
Document AI
- We understand how to handle the field and labeling process, we need it to run the AI.
- We accept the slower response that Gemini AI
- We only focus on the 1 document format
- We handle large-scale and complex file
- We process the document in the background
- We want to connect it with Google Cloud Service (Workflows, etc)
- Only support English document
Gemini Generative AI
- We want an easy implementation
- We need a faster response
- We have many variant documents e.g. BCA e-statement, GoPay e-statement, etc.
- User purpose because the response is faster than document AI and it’s easy to get some field on the document by the prompt.
- Gemini Pro can extract a file in many languages.
Thank you for reading my article!
Reach me on Linkedin: https://www.linkedin.com/in/didikmulyadi