PDF Configuration Guide
This guide explains how to configure the PDF Import process using JSON configuration files. The configuration structure allows you to define various aspects of PDF processing, including logging and page data extraction.
Processing PDF documents usually means importing competitive data. Specifically, importing a competitors PROMOFILE
.
To keep things simple all PROMOFILES
are saved in the PROMOTION FILES VIEW
on the dashboard. Once a PDF has been processed into a PROMOFILE
you can perform the same operations on it as you could own your own PROMOFILE
.
The conversion process use two external processes to extract the information. One process uses a 3rd party AI Model to extract features from each page. The other process is a DecaSIM vector search API that performs a vector search to return UPCs for a product description.
Configuration Structure
Users can specify configs for multiple PDF convertors. When a PDF is dropped into the convertor dropzone, if there are multiple pdf configurations present you will need to select which pdf config you want to use.
General Structure
The general structure for this config file is a list of PDF CONVERTOR
configs.
{
"pdf_convertors":
[
{config #1},
{config #2},
{config #3},
],
""
}
Each PDF CONVERTOR
config contains user and system configurations to process PDF documents to extract different types of data and outputs. Currently, we have only created
the code the process competitive promotional advertising and flyers. The pdf can be a document downloaded from a competitors website or the export of the promotions featured on a competitors website or email campaign.
Example: PDF Config For A Weekly Ad
{
"Name": *String* [Name of config]
"Config":
{
"DPI": *Integer*. [The dots-per-inch resolution you want to export pages from the pdf document at. Higher DPI will take longer without better results potentually. We use 100 for this config.],
"MAX_RETRIES": *Integer* [Number between 1 and 3, we use 3 for this config.],
"GOOGLE_GEN_AI_KEY": *String* [The Secret Key Value issued by the 3rd party API we use.],
"DEFAULT_PROMO_PRICE": *Float* [When we extract promotion price sometimes our logic fails because the promoted price is expressed in a new or complex way. In these cases we return a default value. 3.99 seems to work pretty well.],
"UPC_EMBEDDINGS_DATA_SET": *String* [We have different endpoints trained on different input data. Currently, "brand" dataset gives the best results.],
"UPC_RETURN_COUNT": *Integer* [The larger this number the more UPC results get returned, generating a larger and more random promofile.] ,
"UPC_EMBEDDINGS_URL": *String* [Use this config value "http://kld200.squirrel-bellatrix.ts.net:8000/v1/similar"],
"UPC_SCORE_THRESHOLD": *Float* [This can be a number between 0 and 5. It represents the cosine similarity between the search term and the retrieved term. High numbers let more spurious matches through. We use 0.5 and get good results passing through.]
}
}
Banner List
A list of different competitors that you will be importing data for.
Price Zones
A list of price zones for your own stores / customer representatives.
Private Label Brands
A list of private label brands that might appear in competitive promotion documents.
{
"pdf_convertors": [
{"NAME": "WeeklyAd",
"CONFIG": {
"DPI": 100,
"MAX_RETRIES": 3,
"GOOGLE_GEN_AI_KEY": "abc1234",
"DEFAULT_PROMO_PRICE": 3.99,
"UPC_EMBEDDINGS_DATA_SET": "brand",
"UPC_RETURN_COUNT": 1,
"UPC_EMBEDDINGS_URL": "http://kld200.squirrel-bellatrix.ts.net:8000/v1/similar",
"UPC_SCORE_THRESHOLD": 0.5
}},
{"NAME": "Test",
"CONFIG": {
"DPI": 100,
"MAX_RETRIES": 3,
"GOOGLE_GEN_AI_KEY": "abc1234",
"DEFAULT_PROMO_PRICE": 3.99,
"UPC_EMBEDDINGS_DATA_SET": "brand",
"UPC_RETURN_COUNT": 1,
"UPC_EMBEDDINGS_URL": "http://kld200.squirrel-bellatrix.ts.net:8000/v1/similar",
"UPC_SCORE_THRESHOLD": 0.5
}}
],
"BANNER_LIST": [
"ACME MARKETS","KROGER","QFC"
],
"PRIVATE_LABEL_BRANDS": [
"ACME MARKETS","KROGER","QFC"
],
"PRICE_ZONES": [
"Generic", "West", "North", "South", "East", "Hispanic"
],
"CAMPAIGN_ID_LIST": [
"1","2","3","50","51","52"
]
}