Ai-Assisted Tax Authorities
Ai-Assisted Tax Authorities
net/publication/387648475
Article in Jurnal BPPK Badan Pendidikan dan Pelatihan Keuangan · December 2024
DOI: 10.48108/jurnalbppk.v17i3
CITATIONS READS
0 60
1 author:
SEE PROFILE
All content following this page was uploaded by Muhammad Sukri bin Ramli on 02 January 2025.
1. INTRODUCTION
1.1. Background of Study immense capability to revolutionize tax compliance
Tax audits are a critical component of a well- reporting by streamlining data analysis, enhancing
functioning tax system, ensuring compliance with anomaly detection, and improving overall optimized
regulations, deterring tax evasion, and securing process and scalability. This paper will discuss more
revenue for essential government services into how these AI advancements can be leveraged
(International Monetary Fund, 2024). throughout the various stages of the tax audit process.
However, the traditional audit process is often The goal is to demonstrate how AI can significantly
plagued by inefficiencies with high dependency of improve the efficiency, accuracy, and scalability of tax
human input. Fatigue, lack of motivation, and other audit reporting, ultimately leading to a more efficient,
work-related stress undermine the auditor’s robust and fair tax system.
productivity. Manually reviewing vast amounts of
financial records, contracts, and communication
records is a laborious task, leading to lengthy audit
completion times (Chowdhry et al., 2023).
Furthermore, the traditional approach struggles to
handle the ever-increasing volume of data
associated with complex financial transactions,
potentially hindering audit coverage (Adams &
Weaver, 2018). Additionally, human error is a
constant concern in manual data analysis, which can
lead to missed anomalies or inconsistencies in
financial records (Elliott, 2018).
Emerging Artificial Intelligence (AI) technologies
offer promising solutions to these challenges. Large
Language Models (LLMs), Natural Language
Processing (NLP), and Machine Learning (ML) hold
Figure 1: Architecture Of AI Ecosystem
AI-ASSISTED TAX AUTHORITIES: LEVERAGING LLM, NLP, AND ML FOR EFFICIENT TAX AUDIT REPORTING
Muhammad Sukri Bin Ramli
Feature Static LLM Dynamic LLM Figure 4: Ai Assisted Tax Audit Report Generation
Data Access Fixed Dataset Real-time Internet LLMs also excel at text summarization. They can
Knowledge Limited to Access
Continuously assess voluminous financial records and generate
Base training data updates with new summaries highlighting relevant information such as
information income sources, claimed deductions, and risk of
discrepancies (Fan et al., 2018). This allows auditors
Strengths Established Latest information, to grasp the financial picture quickly and identify
knowledge, real-time trends areas requiring further investigation.
historical data Furthermore, LLMs with access to real-time
analysis information can empower auditors to stay updated
on the latest tax code changes. By functioning as a
Applications Translation, word Summarising, research assistant, the LLM can answer questions
(Tax prediction, information about recent modifications to tax regulations
Auditing) grammar gathering, reporting (Sutskever et al., 2014). For instance, an auditor can
checking ask the LLM, "What are the recent changes to zero
rated supply list in VAT?" The LLM can then search
Grammarly, Spam
relevant legal databases and provide a comprehensive
Examples filtering, Basic ChatGPT, Gemini
response. LLMs can also be trained on frequently
machine
asked questions (FAQs) related to relevant topics.
translation,
This allows them to generate clear and concise
HMRC Connect
answers to client inquiries about specific findings in
system
the audit report (Li et al., 2023).
Beyond these core functionalities, there are
Figure 3: Primary Categories of LLMs Distinguished additional considerations. Machine translation
by Their Information Access Methods: Static and capabilities can be particularly useful for international
Dynamic level of compliance tax audits involving foreign
companies or documents. LLMs can translate
Large Language Models (LLMs) are financial statements and other relevant documents
revolutionizing the way tax auditors prepare reports. into the auditor's language, facilitating efficient review
These AI models offer a variety of functionalities that and analysis. Sentiment analysis is another possible
can streamline the process and elevate report benefit. While tax reports are typically objective,
accuracy (Mikolov et al., 2013; Sutskever et al., 2014). identifying areas of potential concern or disagreement
One key contribution of LLMs lies in text is crucial. LLMs with sentiment analysis capabilities
generation. Trained on tax audit report templates and can process interview transcripts or client
historical data, LLMs can generate standardized communication to identify signs of defensiveness or
sections like company background profile, obfuscation, prompting auditors to gain a deeper
introductions, methodologies, and conclusions (Fan understanding into those areas (Fan et al., 2018).
et al., 2018). This saves auditors time and ensures It's important to remember that LLMs are tools,
consistent formatting across reports. Additionally, not replacements for human expertise (Mikolov et al.,
LLMs can adapt their writing style based on the 2013). Auditors must exercise judgment and critical
target audience. For technical reports directed at tax thinking when using LLM outputs. Additionally, LLMs
authorities, the LLM can use formal language and require high-quality training data specific to tax
precise terminology (Li et al., 2023). Conversely, for regulations and terminology to function optimally (Li
client summaries, the LLM can generate clearer and et al., 2023). By leveraging the capabilities of LLMs,
more concise explanations of key findings. tax auditors can improve the productivity and
accuracy of report preparation, allowing them to
dedicate more time to complex analysis and client
communication.
4.3. Discussion - Natural Language Processing methodologies, and conclusions (Standardized Report
(NLP) Sections). This saves auditors time and ensures
Natural Language Processing (NLP), a subfield of consistency in report formatting.
Artificial Intelligence (AI) focused on enabling Finally, NLP can analyse interview transcripts and
computers to understand the nuances of human generate summaries capturing key points, allowing
language (Jurafsky & Martin, 2020), is proving to be a auditors to quickly grasp the information and
valuable tool in the tax audit process. NLP techniques summarize interview notes (Summarizing Interview
play a crucial role in extracting key information from Notes; Mikolov et al., 2013). Compliance audit often
various documents and identifying inconsistencies involve interviews and a lot of document-related
that might warrant further investigation (Liao et al., readings. NLP's ability to analyse transcripts and
2020). generate summaries can significantly reduce the time
Tax auditors can leverage NLP capabilities in auditors spend reviewing lengthy documents.
several ways to streamline and enhance their audit
reports. Entity recognition allows NLP to excel at
identifying specific entities within tax documents,
such as company names, social security numbers,
financial amounts, and tax codes (Wang et al., 2020).
This facilitates the extraction of relevant data from
invoices, receipts, contracts, and other financial
records, saving auditors time and reducing manual
data entry errors. NLP can also categorize documents
ba sed on their content, automatically classifying them
as income statements, balance sheets, invoices, or
other relevant tax document types (Document
Classification). This functionality promotes efficient
document organization and retrieval during the audit Figure 5: NLP summarization capability
process.
Furthermore, NLP goes beyond simply identifying 4.4. Discussion - Machine Learning (ML)
entities. It can determine the relationships between
them, aiding in uncovering inconsistencies Tax authorities are increasingly utilizing
(Relationship Extraction). For instance, NLP can Machine Learning (ML) to enhance their ability to
identify discrepancies between a company's reported identify high probability of tax risks. Collaboration
income and the amounts mentioned in invoices from between data scientists and tax authority may create a
their suppliers (Mintz et al., 2009). This can be specific purpose Machine learning (ML) algorithms
particularly useful in situations where a company model that can learn from historical audit data to
reports high expenses for a specific category but has identify patterns and anomalies that might signal
minimal supporting invoices. potential tax risks. These algorithms can be trained
Pattern recognition is another capability offered on vast datasets of past audits, tax filings, and financial
by NLP. It allows for the analysis of large volumes of information (Chen et al., 2020).
text data to identify recurring patterns and Machine learning (ML) is playing an increasing
keywords associated with increasing risk of tax role in tax auditing. Two main types of ML algorithms
evasion attempts (Li et al., 2023). By analysing are used: supervised learning and unsupervised
historical audit reports and identifying language learning. Supervised learning utilizes historical data
patterns indicative of fraudulent activities, NLP can where tax discrepancies have already been flagged.
assist auditors in prioritizing documents that require By training algorithms on these examples, they can
closer scrutiny. learn to identify patterns associated with past tax
Tax reports are typically objective; however, NLP evasion attempts in new filings (Huang et al., 2020;
can be used to investigate the sentiment of Singh et al., 2023). Unsupervised learning, on the
communication records, such as emails or internal other hand, works with unlabeled data. In the conte xt
company memos. Identifying negative sentiment of tax audits, this allows for grouping similar tax
towards tax liabilities or mentions of aggressive tax returns together based on shared characteristics. This
strategies can flag these documents for further review can be helpful for auditors, as they can then prioritize
(Liu, 2012). This might uncover attempts to clusters with a higher likelihood of containing
manipulate tax filings or latent areas requiring irregularities for further investigation (Aggarwal et al.,
deeper investigation. 2016).
In addition to enhancing data extraction and An ML model can be trained on historical data to
analysis, NLP can also improve the effectiveness of analyze financial ratios, expense categories, tax
report generation. NLP models can be trained on tax deductions claimed, and industry benchmarks (Li et al.,
audit report templates and historical reports. Based 2021). The model can then identify companies with
on the specific findings of an audit, the NLP model can significant deviations from these benchmarks,
generate boilerplate sections like introductions, potentially indicating under-reported income or
fraudulent tax practices (Nguyen et al., 2020). By maximizing audit effectiveness. ML algorithms can
leveraging the combined capabilities of LLMs, NLP, analyze a company's financial statements, industry
and ML, tax auditors can gain valuable insights from data, and historical filing patterns to generate a risk
vast amounts of text data, identify hidden patterns, and score. This score then helps auditors prioritize
ultimately boost the productivity and effectiveness of companies with a higher likelihood of tax
the tax audit reporting process. discrepancies for further scrutiny (Singh et al., 2023).
See Appendix I Figure 6: Tax gap and behaviors Predictive analytics take this a step further by
pattern recognized by HMRC Connect ML Algorithm allowing ML models to predict the likelihood of tax
from 2019 – 2023. errors in new filings based on historical audit data.
This empowers auditors to allocate resources more
4.5. Applications in Tax Audit Reporting effectively by focusing on companies with a higher
Large Language Models (LLMs), Natural Language probability of under-reporting taxable income
Processing (NLP), and Machine Learning (ML) offer (Nguyen et al., 2020).
significant potential to improve the quality and Traditionally, generating comprehensive audit
efficiency of tax audit reporting. reports can be time-consuming. LLMs can significantly
See Appendix II Figure 7: Typical Tax Audit expedite this process by automating several key
Process and potential for Artificial Intelligence functions. They can populate sections of the report
Assistance. with extracted data from the reviewed documents,
In the initial data analysis and anomaly detection including identified discrepancies, relevant tax codes,
phase, LLMs can act as intelligent assistants, and other pertinent information (Li et al., 2023). LLMs
pinpointing unusual entities within financial can also generate initial drafts of the report based on
documents (e.g., sudden appearance of new shell the extracted information and identified anomalies.
companies) (Li et al., 2023). They can also detect This saves auditors significant time in structuring the
inconsistencies between reported figures and report and allows them to focus on analysis and
corresponding language (e.g., high expenses with interpretation of the data (Huang et al., 2020). NLP
vague descriptions) (Huang et al., 2020) and flag techniques can ensure consistent language use and
documents containing suspicious keywords or terminology throughout the report, enhancing its
phrasing indicative of tax evasion schemes by learning professionalism and readability (Jurafsky & Martin,
from past audit cases (Sutskever et al., 2014). 2020). Furthermore, LLMs can tailor the writing style
Meanwhile, ML algorithms can leverage supervised and level of technical detail in the report based on the
learning to identify patterns associated with past tax intended audience. For instance, reports for tax
discrepancies in historical audit data. These patterns authorities can use formal language and precise tax
can then be applied to analyze new tax filings and flag code references, while reports for company
potential areas of concern (Singh et al., 2023). management can be presented in a clearer and more
Unsupervised learning algorithms can further concise manner (Li et al., 2023).
contribute by clustering similar tax returns, allowing Finally, LLMs can bridge the communication gap
auditors to focus on outliers or companies with high- between auditors and taxpayers, promoting
risk profiles that warrant further scrutiny (Aggarwal transparency and a smoother audit process. LLMs can
et al., 2016). translate complex tax regulations and jargon into
Tax compliance processes often involve reviewing plain language that is easier for taxpayers to
a multitude of documents, a task that LLMs can understand. This can improve communication during
significantly streamline. LLMs can automatically the audit process and reduce taxpayer anxiety
generate concise summaries of key points from (Akerlof & Shiller, 2015). Additionally, LLMs can
various documents, allowing auditors to grasp the generate standardized interview guides tailored to
overall content more efficiently and prioritize their specific tax concerns identified during the initial data
review efforts (Liu & Lapata, 2019). Additionally, they analysis. This ensures consistent information gathering
can categorize documents based on their tax relevance across different audits and minimizes the risk of
(e.g., invoices, contracts, emails), facilitating overlooking crucial details (Dyche et al., 2018). Finally,
organization and retrieval during the audit process LLMs can generate initial drafts of emails or letters to
(Fan et al., 2018). Even with redacted documents, taxpayers, summarizing key findings and next steps in
LLMs may still be able to evaluate the remaining a clear and concise manner, saving auditors time and
content to identify suspicious patterns or anomalies promoting efficient communication (Li et al., 2023).
by examining the context and surrounding language,
potentially uncovering redacted information or 4.6. Case study
inconsistencies (Li et al., 2023). The initial stage of tax HMRC's Connect system is a sophisticated data
audits involves analyzing vast quantities of financial analytics platform designed to uncover tax evasion and
records, contracts, and communication records. Here's fraud. By cross-referencing billions of data points from
how LLMs and ML can contribute to this critical various sources, Connect identifies hidden patterns
process (Chen et al., 2020). and relationships between individuals, organizations,
Machine learning plays a vital role in risk and financial transactions. This enables HMRC to
assessment and audit selection, a crucial step for detect anomalies in areas such as bank interest,
property income, and lifestyle indicators compared to practices has the capacity to significantly enhance the
declared tax liabilities. overall effectiveness and fairness of the tax system.
Connect comprises two main components: The most immediate benefit of AI lies in its ability
• Analytical Compliance Environment (ACE): to streamline the audit process. Repetitive tasks like
Allows analysts to manipulate and analyze data data extraction and report generation can be
in depth, focusing on tasks like identifying automated, freeing up valuable auditor time for
undeclared income sources. complex analysis and critical decision-making.
• Integrated Compliance Environment (ICE): Additionally, ML models trained on historical data
Provides a visual interface for presenting complex and industry benchmarks can provide data-driven risk
data, aiding in further investigation and risk assessments. This allows for a more targeted approach
assessment. to audits, focusing efforts on areas with a higher
likelihood of discrepancies.
With a user base of approximately 3,000 staff, Furthermore, AI-powered tools can analyse vast
Connect plays a crucial role in HMRC's efforts to amounts of data with far greater accuracy than
enhance audit efficiency, improve case selection, and traditional methods, potentially uncovering hidden
combat tax fraud. Developed by BAE Systems at a cost patterns and anomalies that might escape human
of £100 million, the system has generated over £3 auditors. This translates to a significant improvement
billion in additional tax revenue. HMRC has in the accuracy of tax audits, ensuring a more robust
successfully harnessed AI technologies, particularly and reliable tax collection system. Additionally, LLMs
through the Connect platform, to revolutionize tax can ensure consistent formatting and terminology
administration. By integrating financial transactions, within audit reports, leading to professional, clear
property records, and tax returns, Connect platform documents that are easy for all parties involved to
employs advanced algorithms to detect anomalies, understand. This fosters better communication
assess taxpayer risk, and predict high levels of non- between auditors and taxpayers, potentially reducing
compliance. confusion and improving cooperation.
Beyond fraud detection, HMRC utilizes AI for However, for responsible and sustainable AI
tasks such as predictive modeling, natural language integration in tax auditing, several key considerations
processing, and chatbot interactions. These must be addressed. First and foremost, ensuring
technologies have significantly streamlined the transparency within ML algorithms is crucial. This
process, improved compliance rates, and allows for the identification and mitigation of
strengthened HMRC's ability to make data-driven potential biases that could lead to unfair audit
decisions. While AI offers substantial benefits, outcomes. Second, the effectiveness of AI hinges on
challenges like data quality, privacy, and algorithmic high-quality, secure data. Measures to ensure data
bias require careful management. As AI technology accuracy, completeness, and protection from
continues to evolve, HMRC, and other tax authorities, unauthorized access are paramount. Finally, While AI
must adapt their strategies to maximize advantages tools significantly enhance the audit process, they are
while mitigating risks. designed to augment, not replace, the critical thinking
and expertise of human auditors.
By acknowledging these considerations and
fostering ongoing research on ethical AI development,
tax authorities can harness the power of AI to create a
more efficient, accurate, and fair tax system for the
future. This future relies on a strong partnership
• Vehicle registration between human expertise and the transformative
• Property ownership capability of AI.
• Investment
5.2. Recommendation
Figure 8: Anomaly Detection by HRMC With Connect
Building upon the potential of AI in tax auditing,
System
several promising areas warrant further exploration
to enhance its effectiveness and impact. One crucial
5. CONCLUSIONS area of research lies in developing more sophisticated
5.1. Conclusion anomaly detection algorithms. These advanced
Large Language Models (LLMs), Natural Language algorithms should be designed to identify complex
Processing (NLP), and Machine Learning (ML) hold the and nuanced patterns that might indicate correlation
key to unlocking a new era of tax audit reporting. AI of tax evasion schemes (Singh et al., 2023). This could
advancements automate repetitive tasks and involve the ability to detect subtle inconsistencies or
streamline data analysis, allowing auditors to focus on hidden relationships within financial data. By focusing
high-risk areas and complex decision-making, on such sophisticated algorithms, we can significantly
ultimately enhancing overall efficiency and precision improve the accuracy and effectiveness of identifying
in the audit process. As these technologies continue to
mature, their seamless integration into tax auditing
138 Jurnal BPPK Volume 17 Nomor 3, 2024
AI-ASSISTED TAX AUTHORITIES: LEVERAGING LLM, NLP, AND ML FOR EFFICIENT TAX AUDIT REPORTING
Muhammad Sukri Bin Ramli
Huang, Z., Li, J., Sun, L., & Wu, D. (2020). Leveraging Li, Y., Mao, C., Li, Z., Ren, Y., Li, P., & Li, S. (2021).
attention mechanism and hierarchical Exploring the effectiveness of legal language
structure for tax audit report generation. In models in detecting tax evasion.
Proceedings of the 2020 Conference on
Empirical Methods in Natural Language Liao, S., Li, J., Xu, Y., & Sun, L. (2020). A review of
Processing (EMNLP) (pp. 2445-2455). natural language processing in tax audit. ACM
Association for Computational Linguistics. Computing Surveys (CSUR), 53(6), 1-38.
[Link]
International Monetary Fund. (2024, July 14). Tax [Link]/Zampolli-1994-Sparck- [Link]
administration.
[Link] - Liao, S., Zhang, H., Song, Y., Bao, Y., & Liu, B. (2020).
policies/Revenue-Portal/Tax-and-Customs- NLP for financial analysis: A survey.
Administration
Liu, B. (2012). Sentiment analysis and opinion mining.
International Public Sector Fraud Forum. (2023, Synthesis Lectures on Human Language
February 13). The use of Artificial Intelligence Technologies, 3(1), 1-167.
to Combat Public Sector Fraud: Professional
Guidance. [UK Government]. Retrieved from Liu, B., & Lapata, M. (2019). Text summarization with
[Link] neural attention.
[Link]/government/uploa [Link]
ds/system/uploads/attachment_data/file/86
5721/Artificial_intelligence_13_Feb.pdf Liu, Y., & Lapata, M. (2019). Text summarization with
pretrained encoders. In Proceedings of the
Jurafsky, D., & Martin, J. H. (2020). Speech and 57th Annual Meeting of the Association for
language processing (3rd ed.). Pearson Computational Linguistics (Vol. 1, pp. 682-
Education Limited. 691).
Kencana, K. C., & Widhiastuti, S. (2018). The effect of Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean,
over workload and role conflicts on behavior J. (2013). Distributed representations of
of tax auditor dysfunction with working words and phrases and their compositional
stress as mediation factor and moral semantics. In Proceedings of the 26th
competence as moderation factor (Empirical International Conference on Neural
study at Indonesian Directorate General of Information Processing Systems - Volume 2
Taxes). Indonesian Journal of Accounting and (pp. 3111-3119).
Governance (IJAG), 2(1), 43-54.
Mintz, M., Billsus, D., & McCallum, A. (2009). Document
Li, J., Huang, Z., Sun, L., & Wu, D. (2023). A survey of embeddings for clustering and classification. In
artificial intelligence in tax auditing. Artificial Proceedings of the 2009 ACL-IJCNLP
Intelligence, 322, 1 Conference on Human Language
Technologies: Volume 1 (pp. 211-219).
Li, J., Huang, Z., Sun, L., & Wu, D. (2023). A survey of Association for Computational Linguistics.
artificial intelligence in tax auditing. Artificial [Link]
Intelligence, 322, 149-177.
[Link] Mintz, M., Billsus, D., & Snow, R. (2009). The role of
48128101_The_effect_ of_artificial annotation in named entity extraction from
_intelligence_technologies_ on_audit_evidence biomedical text. In Proceedings of the NAACL
HLT 2009 Workshop on BionLP: Exploring
Li, J., Langsgaard, M., Yao, A., Xiao, L., & Guo, J. (2021). Biological Language Processing (Vol. 2, pp. 98-
Evaluating document-level machine 106).
translation with contextual embeddings. In
Proceedings of the 2021 Conference on Mintz, R., Billsus, D., & Christopoulou, E. (2009).
Empirical Methods in Natural Language Automated fraud detection in tax return data.
Processing (EMNLP) (pp. 8784-8795). In Proceedings of the 12th International
Conference on Discovery Science (pp. 403-
Li, O., Liu, H., Liu, L., Qin, J., & Chen, Z. (2023, May). 414). Springer.
Learning to detect tax evasion attempts with [[Link]
large language models. 080/08839514.2022.2086354]
[Link]
Nguyen, T. T., Cao, T. T., & Hoang, N. D. (2020). A
machine learning approach for tax evasion
detection using financial ratios and
Slemrod, J. (2007). Tax compliance and enforcement. Wang, X., Jiang, J., Xu, Z., Deng, Y., Li, Y., & Zhao, H.
Handbook of Public Economics, 3, 1155-1204. (2020). A survey on named entity recognition
for financial big data. ACM Computing Surveys
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to (CSUR), 53(3), 1-36.
sequence learning with neural networks. In
Advances in neural information processing Wang, Y., Zhang, Y., Su, Z., & Liu, Y. (2020). A survey on
systems (pp. 3104-3112). the application of natural language processing
techniques in tax auditing. Journal of
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence Information Processing & Management,
to sequence learning with neural networks. 57(2), 102228.
In Proceedings of the 27th International [Link]
Conference on Neural Information Processing cle/abs/pii/S0939362523000286
APPENDIX I
Figure 6: Tax gap and behaviors pattern recognized by HMRC Connect ML Algorithm from 2019 -
2023. Retrieved from [Link]
illustrative- tax-gap-by-behaviour
APPENDIX II
Figure 7: Typical Tax Audit Process and potential for Artificial Intelligence Assistance
This article is licensed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0
International License ([Link] which permits any noncommercial
use, sharing, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if
changes were made. Any derivative works must be distributed under the same license as the original.