IQ Bot Custom Python Logic Feature Documentation
Introduction
The custom logic feature enables users to use Python scripts to make precise changes to IQ Bot
extractions results.
This power to make automatic fine-tune and flexible adjustments on extracted data streamlines
data integration into target systems and further reduces the need for human action in data
extraction.
Using custom logic, a user can modify extraction results in numerous ways such as:
- Remove a specific word, number, symbol, or phrase
- Extract only numbers from a sentence
- Apply a regex filter
- Delete rows that contain a specific value
- Add values to a new column if another column contains a specific value
- Query a database and return data related to the extraction value
- Call an external machine learning system to analyze text
The custom logic feature leverages the simplicity and power of Python code to provide nearly
endless possibilities for refining extracted document data.
This document explains how to use the custom logic feature, details specifics of its capabilities
and performance, and provides a multitude of Python code examples demonstrating how to
implement common use cases.
Getting Started
The custom logic feature is not always enabled by default. To enable the feature, edit the
features.json file in […\Automation Anywhere IQ Bot 6.5\Configurations] such that the
attributes “fieldLogic”, “tableLogic”, and "logicEditor:fullscreen" are all
set to true.
To apply custom logic to an extracted field or table, you can click the “Logic” section found
under either (A) the “Field options” section of an extracted field or (B) under the “Table-
section/settings” of an extracted table.
(A)
(B)
This will display a code box. In the code box, use Python to manipulate the extraction value to
achieve extracted results. The extraction value is automatically saved to a Python variable
called either field_value or table_value depending whether a field or table is selected.
The code entered will apply only the field or table that is currently selected.
Here a simple example of custom logic in action:
This single line of Python code will convert all lowercase letters to uppercase letters.
Example input:
Julia McDaniel
Resulting output:
JULIA MCDANIEL
A user might do this to eliminate potential inconsistencies in capitalization of names.
Other practical use cases include converting a word format to a numerical format date (i.e.
“December 7, 2018” to “12/7/2018”) and stripping an extraction of its symbols (i.e. convert
“$537.14” to “537.14”). These kinds of changes are extremely useful when sending extraction
data to systems that require data to be in specific formats.
Feature Details
IQ Bot comes installed with Python version 3.5.4. Custom logic will execute the code using the
Python version currently installed on the IQ Bot host system.
Pre-installed packages:
arabic-reshaper (2.0.15) - reconstruct Arabic sentences to be used in applications that
don't support Arabic script
certifi (2019.6.16) - collection of Root Certificates for validating the trustworthiness of SSL
certificates while verifying the identity of TLS hosts
chardet (3.0.4) - detects languages from unknown character encodings
cx-Oracle (7.1.3) - used to access and interact with Oracle databases
DateTime (4.3) - provides classes for manipulating dates and times
dateutils (0.6.6) - provides methods for manipulating dates and times
future (0.17.1) - allows cross-compatibility of code use between Python 2 and Python 3
idna (2.8) - support for the Internationalized Domain Names in Applications (IDNA)
protocol
inflection (0.3.1) - allows for specific string manipulations such as singularizing and
pluralizing words and converting CamelCase to underscored string.
Jinja2 (2.10.1) - modern templating language for Python
MarkupSafe (1.1.1) - implements a text object that escapes characters for safe use of
HTML and XML
numpy (1.16.4) - used for advanced scientific and mathematic operations
opencv-python (4.1.0.25) - contains open source computer vision algorithms
pandas (0.24.2) - provides flexible data structures for easier use and manipulation for
structured data
Pillow (6.0.0) - provides extended image processing capabilities
pip (9.0.1) - allows for easy Python package installation and management
pymongo (3.8.0) - used to access and interact with MongoDB databases
pyodbc (4.0.26) - used to access and interact with ODBC databases
python-dateutil (2.8.0) - provides additional functionality for manipulating dates and times
beyond the “DateTime” library
pytz (2019.1) - used for working with time zones
requests (2.22.0) - allows user to send HTTP requests
setuptools (28.8.0) - for facilitating packaging of Python projects
six (1.12.0) - provides functions for creating Python code compatible with both to Python 2
and 3
urllib3 (1.25.3) - alternate library for allowing user to send HTTP requests
zope.interface (4.6.0) - assists with labeling objects as conforming to a given API or
contract
New packages can be installed from command line with pip install[package-name]
Any new packages installed will be accessible by IQ Bot custom logic.
Each custom logic code block for each field or table runs in a sequential order during extraction.
Depending on nature of the code, custom logic may affect the rate at which documents are
processed by IQ Bot.
Python code runs for 4 minutes maximum before timeout, to prevent hanging responses and
slow performance.
Example Use cases with Fields
This section will cover how to perform various string operations in Python to obtain a desired
result from an extracted value.
Replacing/deleting substrings
If a user wanted to replace substrings… The .replace(a,b) function will replace every
instance a with b in a given string. This example will replace string “USD ” with “$”. This is
useful for correcting OCR errors or converting between data formats.
field_value = field_value.replace("USD ", "$")
Example input:
USD 537.14
Resulting output:
$537.14
This example will replace string “USD ” with an empty string (“”) , in other words, delete “USD
”. This is useful for removing unwanted data.
field_value = field_value.replace("USD ", "")
Example input:
USD 537.14
Resulting output:
537.14
Keep only a specific part of a string
A user can select a specific part the string by selecting the index of which the desired substring
begins and ends. The first character of a string has index 0, the second has index 1 and so on.
Such as:
Index number: 0123456789
Field value: USD 537.14
1) A user can specify the substring of a field value with
field_value[beginningIndex:endingIndex]
Example:
field_value = field_value[4:9]
Example input:
USD 537.14 is the price
Resulting output:
537.14
2) A user can also specify an index of a string all the way until the end, regardless of the length
of the string
field_value = field_value[4:]
Example input:
USD 537.14 is the price
Resulting output:
537.14 is the price
Splitting strings
The .split() function can be used to split the field value by any character, symbol, or
number of choice into values into separate array values. This splitting indicator is called a
delimiter.
1) In the example below, we will use the space (or “ “) as the delimiter. The line of code will
split the string at every space and place the split segments into separate elements in an array in
sequential order.
field_value = field_value.split(" ")
Example input:
USD 537.14 is the price
Resulting output:
(Index number: 0 1 2 3 4)
field_value: [USD, 537.14, is, the, price]
2) In order to reference a specific item in the split array, one can places the index number in
brackets after the split method
field_value = field_value.split(" ")[1]
Example input:
USD 537.14 is the price
Resulting output:
(Index number: 1)
field_value: 537.14
3) A user can also specify a number to limit how many times the string should be split. The
code below splits only at the first 3 spaces and specifies element with index number 3 in the
brackets.
field_value = field_value.split(" ",3)[3]
Example input:
The price is USD 537.14
Resulting output:
(Index number: 3)
field_value: USD 537.14
Joining strings
The .join() function can be used to combine two or more strings together. An array of
strings is passed as a parameter, and these strings are joined by the string that the function
operates on. This is most useful when breaking up a string, removing one or more sections, and
stitching it back together.
1) This code joins “The price is “ and field_value with “USD “
field_value = "USD ".join(["The price is ", field_value])
Example input:
537.14
Resulting output:
The price is USD 537.14
2) Alternately, this code does the same thing:
field_value = "The price is USD " + field_value
Replace value if string contains certain value
Here we are using a conditional statement to check if the field value contains a certain
substring. If that certain substring is contained within the field_value, we replace it with
another value.
if "dropped" in field_value:
field_value = field_value.replace("The price dropped by USD
", "-")
Example input:
The price dropped by USD 537.14
Resulting output:
-537.14
Example Use cases with Tables
This section will cover common ways you can make changes to your table, including
adding/removing columns, applying a string operation on a specific cell in the table, and
applying regex operations to the table.
Manipulating table values is much easier using the pre-installed pandas Python library. It is
recommended to perform table operations using this library due to its flexibility.
In your code, you can use the code below to import the pandas library:
import pandas as pd
Using this code, you can convert the table_values variable into a pandas data frame object
which will be called df :
df = pd.DataFrame.from_dict(table_values)
Once the table_values variable is converted into a data frame object, we can begin to
apply changes to the values in the table.
After the values of df have been modified in the desired manner, it must be saved back to the
table_values variable:
table_values = df.to_dict()
For the sake of simplicity, the above lines of code will appear in the first example and will be
assumed to be included in all the table use case examples after the first one.
Extract only specific columns
This Python code will convert table_values into a pandas data frame object, or df, change
the data frame object to a table which contains only the specified columns, and save the new
table back into table_values.
import pandas as pd
df = pd.DataFrame.from_dict(table_values)
df = df[["item_description","quantity"]]
table_values = df.to_dict()
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
item_description quantity
0 Apple 40
1 Orange 20
2 Banana 30
3 Peach 50
Extract only specific rows
This Python code will change the data frame object, or df, to a table which contains only rows in
the specified row index range.
import pandas as pd
df = pd.DataFrame.from_dict(table_values)
df = df[1:3]
table_values = df.to_dict()
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
item_description unit_price quantity
1 Orange $1.20 20
2 Banana $0.80 30
Extract only a specific cell
This code will assign the 2nd indexed row under column item_description as the new
dataframe value of df
import pandas as pd
df = df.loc[2,"item_description"]
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
Banana
Extract only a specific group of cells
This code will assign the rows indexed 1 and 2 under columns item_description and
quantity as the new dataframe value of df
df = df.loc[1:3,["item_description","quantity"]]
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
item_description quantity
1 Orange 20
2 Banana 30
Extract only a specific group of cells using indices
This code will assign the rows indexed 1 and 2 under columns indexed 0 and 1 as the new data
frame value of df
df = df.iloc[1:3,[0:2]]
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
item_description quantity
1 Orange 20
2 Banana 30
Add a row to a table
This code adds a row in which all the current column names are specified as keys and new row
entries are specified as key values in a Python dictionary object. This object is added to the
table with the .append() method.
import pandas as pd
df = pd.DataFrame.from_dict(table_values)
df = df.append({'item_description': 'Watermelon',
'unit_price': '$5.00',
'quantity': '4',}, ignore_index=True)
table_values = df.to_dict()
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
4 Watermelon $5.00 4
Delete rows with missing column values
This Python code will convert table_values into a pandas data frame object, or df, remove
any row in which column item_total is empty, and save new table back into
table_values.
import pandas as pd
df = pd.DataFrame.from_dict(table_values)
df = df[(df[”unit_price"] != "")]
table_values = df.to_dict()
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana 30
3 Peach $1.00 50
Resulting output:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Peach $1.00 50
Delete rows matching a regular expression
This code will match a regular expression with each value under column
item_description. If regex match, delete the row containing that value.
import pandas as pd
import re
df = pd.DataFrame.from_dict(table_values)
def is_found(string):
a = re.findall('Item [0-9]{4}',string)
if a:
return False
else:
return True
df = df[(df["item_description"].apply(is_found))]
table_values = df.to_dict()
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Item 0034 $0.80 30
3 Peach $1.00 50
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Peach $1.00 50
Replace values of a specific column
This code replaces all dollar signs with Euro symbols.
import pandas as pd
import re
df = pd.DataFrame.from_dict(table_values)
df["Item_Total"] = df["Item_Total"].replace({'$':'€'},
regex=True)
table_values = df.to_dict()
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
Resulting output:
item_description unit_price quantity
0 Apple €0.60 40
1 Orange €1.20 20
2 Banana €0.80 30
3 Peach €1.00 50
Replace values anywhere in the table
This code will replace every instance of 0range with Orange within the data frame, regardless
of row or column. One can use the .applymap() method to apply any function to the all the
individual cell values of the data frame.
import pandas as pd
import re
df = pd.DataFrame.from_dict(table_values)
def find_and_replace(value):
return value.replace("0range","Orange")
df=df.applymap(find_and_replace)
table_values = df.to_dict()
Example input:
item_description unit_price quantity
0 Apple $0.60 40
1 0range $l.20 20
2 Banana $0.80 30
3 Peach $1.00 50
1 0range Juice $3.00 8
Resulting output:
item_description unit_price quantity
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana $0.80 30
3 Peach $1.00 50
1 Orange Juice $3.00 8
Add values to a column if other column matches a regular expression
This code will match a regular expression with each value under column unit_price. If the
value contains a dollar sign, it will populate the corresponding countryOrigin value as USA
import pandas as pd
import re
df = pd.DataFrame.from_dict(table_values)
def is_USA(string):
a= re.findall('$*',string)
if a:
return True
else:
return False
df.loc[(df["unit_price"].apply(is_found)),"countryOrigin"] =
'USA'
table_values = df.to_dict()
Example input:
item_description unit_price quantity origin_country
0 Apple $0.60 40
1 Orange $1.20 20
2 Banana €0.80 30
3 Peach $1.00 50
4 Watermelon $5.00 4
Resulting output:
item_description unit_price quantity origin_country
0 Apple $0.60 40 USA
1 Orange $1.20 20 USA
2 Banana €0.80 30
3 Peach $1.00 50 USA
4 Watermelon $5.00 4 USA
Using Python Libraries with custom logic
There are a multitude of other things you can do using Python’s extensive library collection.
You can import a library with the code import [library-name]. Here are a couple of
ways that a user can utilize them in manipulating fields.
Applying Regex using Regex library
A user can search for a string of a specific kind of format using a regular expression, or Regex.
The Regex is a string of characters that represents the format of the desired search string.
Import the Regex library or “re” to apply regex with Python code.
These lines of code search for any numerical strings in an XXX-XXX-XXXX format, such as phone
numbers or social security numbers. If no matches are found, the original string is returned.
import re
def find_numbers(string):
match = re.findall('([0-9]{3}-[0-9]{3}-[0-9]{4})',string)
if match:
return match
else:
return string
field_value = find_numbers(field_value)
Example input:
His phone number is 222-444-8888, SSN is 123-456-7890, and DL#
is 1234567890
Resulting output:
['222-444-8888', '123-456-7890']
Reformat dates with datetime library
The datetime library makes it easy to change the formats of dates.
from datetime import datetime
field_value = datetime.strptime(field_value, '%d %
%Y').strftime("%Y/%m/%d")
Example input:
22 Aug 2016
Resulting output:
2016/08/22
Using external calls / APIs with custom logic
There are various external systems custom logic can utilize to apply modifications to data, such
as REST services and databases.
Making an HTTP Request
An HTTP request to is made to an external address parser application. The address parser will
return a parsed address for use. We will capture only the road value of the response.
field_value = "818 Lexington Ave, #6, PO Box 1234, Brooklyn
11221 NY, USA"
import requests
url = "http://ec2-54-86-166-132.compute-
1.amazonaws.com:4123/parser"
payload = "{\"query\": \""+field_value+"\"}"
headers = {
'Content-Type': "application/json",
'Accept': "*/*",
}
response = requests.request("POST", url, data=payload,
headers=headers)
resp = eval(response.text)
Adr = {}
for idic in resp:
Adr[idic['label']] = idic['value']
field_value = Adr['road']
Example input:
818 Lexington Ave, #6, PO Box 1234, Brooklyn 11221 NY, USA
Resulting output:
Lexington Ave
Query database to get corresponding info
This code will take a vendor name and return its corresponding vendor ID. Note: the follow
example’s database info is not representative of a real external environment.
import pyodbc
conn = pyodbc.connect('Driver={SQL Server};'
'Server=localhost\sqlexpress;'
'Database=TestDB;'
'Trusted_Connection=no;'
'uid=user123;'
'pwd=pass123')
cursor = conn.cursor()
cursor.execute('SELECT vendor_id FROM
[TestDB].[dbo].[VENDOR_INFO_1] where vendor_name_short =
\''+field_value+'\'')
for row in cursor:
field_value = row[0]
Example input:
Adego Industries
Resulting output:
XIS77823964
Query database to get fuzzy match
This example will take a vendor name and check its closest match to a list of vendors on a
separate text file. This will reduce any OCR inaccuracies.
from fuzzywuzzy import fuzz,process
text_file = open("c:\\vendorlist.txt", "r")
options = text_file.read().split(',')
text_file.close()
print(options)
Ratios = process.extract(field_value,options)
highest = process.extractOne(field_value,options)
field_value = highest[0]
Example input:
Adeg0 Industries
Resulting output:
Adego Industries
Call external machine learning systems to intelligently identify text
Calling external machine learning systems is another way a user can utilize custom logic. With
an external machine learning system, a user can apply a variety of complex actions on
unstructured text such as recognize client names, classify the intention of a human message,
and automatically identify and translate foreign languages.
The code below is an example that uses an API call to send text to a machine learning system
that identifies the serial number in the text, regardless of the position, format, or length of the
serial number. The system extracts data at a level of intelligence beyond standard Regex or
string manipulation.
Using external services such as machine learning, unlocks a new dimension of capability in
custom logic to perform advanced operations on extracted document data.
More information about using machine learning in custom logic can be found in this video:
Note: Setting up and running machine learning system is not included in this code.
import pandas as pd
import requests
df = pd.DataFrame.from_dict(table_values)
RawBody = df.loc[:, “Raw_Body”][0]
url = “http://localhost:5000/models”
model = “ML-model-01”
payload = “{\”text\”: \””+RawBody+”-“,\”model\”:\””+model+”\”}”
headers = {
‘Content-Type’: “application/json”,
‘Accept’: “*/*”,
}
response = requests.request(“POST”, url, data=payload,
headers=headers)
resp = eval(response.text)
for ent in resp[‘entities’}:
ENT_TYPE = ent[‘label’]
ENT_SCORE = ent[‘score’]
ENT_TEXT = end[‘text’]
df.loc[: ,”Order_List”][0] = ENT_TEXT
table_values = df.to_dict()
Example input:
I am looking for the corresponding product name for product
number ZLS-539AJ297. Can you help me?
Resulting output:
ZLS-539AJ297