Submission information
Submission Number: 110
Submission ID: 207
Submission UUID: 355a10da-2019-44cb-a2bc-ff935c8657b8
Submission URI: /form/project
Created: Fri, 09/17/2021 - 10:16
Completed: Fri, 09/17/2021 - 10:17
Changed: Tue, 08/30/2022 - 15:22
Remote IP address: 73.89.101.1
Submitted by: Katherine Nelson
Language: English
Is draft: No
Webform: Project
US Tax Code to Natural Language Parsable Data for Programming Languages

Complete
Project Leader
Project Personnel
Project Information
This project is to prepare a subsection of US Tax code for Natural language translation for either https://catala-lang.org/ or ErgoAI of Coherent Knowledge http://coherentknowledge.com
We hope to get initial basic translations from US Tax code into either Catala-Lang or ErgoAI within some threshold of acceptability. Once we get the basic translations within some threshold we want to see if a local startup can complete the translations by engaging humans with expertise in the US Tax code.
This very early stage startup (Neutral Tax Networks, Greenwich CT) is in the formative stage and has a patent in this area while developing other intellectual property.
Link to House of Representatives site where US code is located. Internal Revenue Code is Title 36 (part way down on this list): https://uscode.house.gov/browse/prelim@title26/subtitleA/chapter1/subchapterA/part1&edition=prelim
Here is the IRS website page that contains the links to the internal revenue code and regulations that are provided as a public service by Cornell law schools Legal Information Institute: https://www.irs.gov/privacy-disclosure/tax-code-regulations-and-official-guidance#irc
Link to the internal revenue code sections provide dry Cornell’s Legal Information Institute (accessible by clicking on one of the links on the IRS website): https://www.law.cornell.edu/uscode/text
We hope to get initial basic translations from US Tax code into either Catala-Lang or ErgoAI within some threshold of acceptability. Once we get the basic translations within some threshold we want to see if a local startup can complete the translations by engaging humans with expertise in the US Tax code.
This very early stage startup (Neutral Tax Networks, Greenwich CT) is in the formative stage and has a patent in this area while developing other intellectual property.
Link to House of Representatives site where US code is located. Internal Revenue Code is Title 36 (part way down on this list): https://uscode.house.gov/browse/prelim@title26/subtitleA/chapter1/subchapterA/part1&edition=prelim
Here is the IRS website page that contains the links to the internal revenue code and regulations that are provided as a public service by Cornell law schools Legal Information Institute: https://www.irs.gov/privacy-disclosure/tax-code-regulations-and-official-guidance#irc
Link to the internal revenue code sections provide dry Cornell’s Legal Information Institute (accessible by clicking on one of the links on the IRS website): https://www.law.cornell.edu/uscode/text
Project Information Subsection
- All legal tax code XML from https://www.irs.gov/…, https://uscode.house.gov or or https://www.law.cornell.edu/uscode/text transformed into English tax code.
- A validated algorithmic mapping from the English legal tax to a format for storing in a relational database.
- Store all the legal tax code in a relational database (MySQL) using the mapping that is suitable for translation to ErgoAI or Catala-lang
- A validated algorithmic mapping from the English legal tax to a format for storing in a relational database.
- Store all the legal tax code in a relational database (MySQL) using the mapping that is suitable for translation to ErgoAI or Catala-lang
{Empty}
They should be able learn Python or know how to code in Python or similar language.
They should be able to learn to parse XML with Python.
They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )
They should be able to learn to parse XML with Python.
They should also be able to learn to work with one of several Python NLP libraries such as NLTK ( https://realpython.com/nltk-nlp-python/ )
{Empty}
Some hands-on experience
{Empty}
University of Connecticut - Stamford
Stamford, Connecticut
CR-Yale
{Empty}
Yes
Already behind3Start date is flexible
6
{Empty}
12/08/2021
{Empty}
06/08/2022
- Milestone Title: Capture tax code
Milestone Description: Pulling all legal tax code XML from https://www.irs.gov/…, https://uscode.house.gov or or https://www.law.cornell.edu/uscode/text
- Milestone Title: Transform tax code to English
Milestone Description: Transform all XML into English legal tax code text out of the IRS XML tax-code - Milestone Title: Design organizational mapping
Milestone Description: Validate a useful organizational mapping so the English legal tax code text is stored in a relational database (MySQL). This will likely require NLP processing of the tax code to make it suitable for ErgoAI or Catala-lang. This is the first part of the threshold of acceptability. - Milestone Title: Apply organizational mapping
Milestone Description: Apply the organizational mapping to all of the tax code. - Milestone Title: Store mapped tax code
Milestone Description: Store the mapped legal tax code in a relational database (MySQL) - Milestone Title: Leveraging mapped tax code for deduction
Milestone Description: Validate the mapped legal tax code can be expressed as basic terms in ErgoAI or Catala-lang. This is the final part of the threshold of acceptability for the tax code translation.
{Empty}
{Empty}
{Empty}
The student will learn how to parse, transform (using XSLT), and store the transformed data.
The student will learn to parse the English tax code using Python NLP library such as NLTK.
The student will learn to organize the legal text for storage to make retrieval easy and mapping easy to either ErgoAI or Catala-lang.
The student will learn some data architecture.
This transformed/organized tax code will be stored in a relational database such as MySQL.
The student will learn SQL and how to interact with a relational database through a database workbench.
The student will learn how to work with a relational database from a language like Python.
If there is time, the student will learn about deduction in ErgoAI or Catala-lang.
The student will learn to parse the English tax code using Python NLP library such as NLTK.
The student will learn to organize the legal text for storage to make retrieval easy and mapping easy to either ErgoAI or Catala-lang.
The student will learn some data architecture.
This transformed/organized tax code will be stored in a relational database such as MySQL.
The student will learn SQL and how to interact with a relational database through a database workbench.
The student will learn how to work with a relational database from a language like Python.
If there is time, the student will learn about deduction in ErgoAI or Catala-lang.
{Empty}
{Empty}
No clear need for HPC.
Though the tax code is substantial so there is a possibility the NLP application may require a good deal of CPU cycles.
Though the tax code is substantial so there is a possibility the NLP application may require a good deal of CPU cycles.
{Empty}
Final Report
This project had a solid impact on understanding automated knowledge authoring for legal reasoning. We explored a number of ways to simply transform legal text into logical reasoning in ErgoAI (a variation of Prolog).
{Empty}
{Empty}
Yes - both positive impact on our student, Krutika Patel, as well as positive impact on managing student research.
{Empty}
{Empty}
{Empty}
Yes - there is an impact towards technology transfer. Besides leadership by Phil Bradford, this project was done with a Connecticut entrepreneur (Henry Orphys) as well as a faculty member (Paul Fodor) from Stonybrook University. Henry has a distinguished law and tax accounting background and he is focused on launching a startup using the technology we explored. Paul is both a faculty member as well as an entrepreneur. We isolated several challenges and better understand the resources necessary for launching a product in this space.
{Empty}
{Empty}