Blockchain technology and smart contracts, the backbone of cryptocurrencies such as Bitcoin, Ethereum, Doge and more, is a method of recording and storing electronic information, such as financial transactions, in a way that they are either impossible or difficult to alter. In short, a digital permanent record.
But if a smart contract has mistakes or bugs in its code that can be exploited, it can result in huge financial losses.
Dr. Lingxiao Jiang, an associate professor of Computer Science at Singapore Management University, researches the challenges, and opportunities, for the growing use of smart contracts in financial services.
Read the original article: https://doi.org/10.1109/TSE.2020.2971482
Hello and welcome to Research Pod. Thank you for listening and joining us today. In this episode, we will be looking at the research of Dr. Lingxiao Jiang, an associate professor of Computer Science at Singapore Management University. Dr Jiang is also one of the directors at the university’s Research Lab for Intelligent Software Engineering, where he researches the challenges, and opportunities, for the growing use of smart contracts in financial services.
Innovations in automatic bug detection and program analysis using AI and machine learning are vital for ensuring quality and security in the modern digital world. Bugs can lead to software vulnerabilities that can be uncovered and exploited by attackers. Thus, bug detection enables businesses and organizations to secure their networks and data from unauthorized access.
To quote Dr Jiang, “with increasingly complex software developed with diverse and heterogenous technologies and programming languages, new code analysis techniques that can work across layers and components are needed to facilitate scalable and accurate bug detection.”
His research involves using machine learning techniques that can understand and analyze code across various programming languages and software packages for multiple industries, and therefore provides bug detection and quality assurance capabilities for various kinds of modern software packages.
More specifically, the research focuses on the use of machine learning techniques for detecting bugs in smart contracts and blockchain systems in the financial technology (or fintech) industry, as smart contracts and blockchains are gaining more popularity and acceptance in the financial industry, and their usage is growing exponentially.
Blockchain technology is a method of recording and storing electronic information, such as financial transactions, in a way that they are either impossible or difficult to alter – in short, a digital permanent record. It can be used to host smart contracts to perform various financial transactions automatically according to agreed-upon terms and conditions.
Machine learning is the process of using statistical models and algorithms to develop computer systems and programs that can learn from input data and provide the correct output without explicit human intervention.
This seemingly unrelated technology is actually useful for ensuring the quality and security of smart contracts and blockchains, as machine learning can be tailored to take code (including smart contracts) as input and learn their patterns and characteristics and help to automate code analysis and bug detection.
What are smart contracts?
Smart contracts were first introduced in 1994 by Nick Szabo, a computer scientist, cryptographer, and legal scholar known for his research on digital currencies and digital contracts. These contracts are a relatively new form of creating and enforcing financial transactions, such as stock purchases, divestments, life insurance, inventory management, and supply chain payment automation. These contracts are written in the form of software programs, and hosted or stored on the blockchain, and they execute automatically when certain agreed conditions have been met.
In such contracts, the terms and conditions are agreed upon prior to the deal, and cannot be changed because they are stored on the blockchain as programs. This feature means that all participants in the contract are certain of the outcome, and this outcome will occur with minimal human involvement or time delays.
Dr Jiang’s 2020 paper “Checking smart contracts with structural code embedding,” also adds that smart contracts’ irreversible and trackable nature helps reduce accidental or malicious actions in business transactions. Therefore, such contracts are considered trustworthy, leading to the significant growth in their adoption and usage.
The growth in smart contracts is also partly due to the rise of cryptocurrencies. Bitcoin, Ethereum, Doge, and many others operate on blockchain technology. Dr Jiang states that smart contracts and cryptocurrencies are often intertwined because “many financial transactions involve the transfer of cryptocurrencies and are performed via various kinds of smart contracts, and a smart contract in the blockchains often involves cryptocurrencies worth of millions of USD.”
The challenges for smart contracts
Although smart contracts now have numerous benefits and applications, they also present new challenges for software developers, security professionals, and businesses using such transactions.
The first challenge is that, as mentioned, smart contracts are irreversible programs that are created by software developers. However, these developers are human, and they can make mistakes in the program code. These mistakes are usually known as bugs or vulnerabilities. Therefore, if a contract has mistakes, it can result in huge financial losses for businesses because it will be executed regardless of these mistakes. Given that these contracts involve millions of USD, it is imperative that such mistakes are avoided as best as possible.
The second challenge is related to the first: Bugs or vulnerabilities in the smart contract code may attract malicious actors such as hackers. Such hackers are usually complicated and work fast. Hence, they can identify and target these bugs faster than the developers and hijack the contract. The outcome here will also be huge financial losses for the businesses involved in the transaction.
The third challenge is related to the availability of many programming languages and a lot of smart contracts in the blockchain. Software developers use their language of choice to develop these contracts, be it Python, C++, or any other code. In some cases, it is necessary to translate the source code from one language to another to increase compatibility and usage, and ensure the quality of smart contracts written in different languages.
In their 2018 paper, “Hierarchical learning of cross-language mappings through distributed vector representations of code,” Dr Jiang and coauthors write that, just like translating human languages, current program translation techniques involve grammar checks. Although this method is highly accurate, it is inflexible when dealing with different languages as well as evolving languages. Accurate program translation and bug detection across languages are useful to ensure good functionality, reduce code clones, and prevent errors, especially in smart contract programs.
What are the available solutions?
Dr Jiang and his team provide various solutions to the challenges facing smart contracts. These solutions are based on automation through AI and machine learning. First, they use deep learning techniques to check smart contracts for any evolution in language rules and or bug development. Their approach is based on the fact that “code and bug patterns, including their lexical, syntactical, and even some semantic information, can be automatically encoded into numerical vectors via techniques adapted from word embedding.”
When combined with similarity checking, this method of “code embedding” can be widely applied from debugging the source code and maintaining it through translation and analysis. For example, it can detect cloned contracts, check a contract against a set of known vulnerabilities, and detect specific bugs in a large group of smart contracts.
The researchers have created an AI and machine learning program known as SMART EMBED to accomplish this goal. This program works with smart contracts that use the Solidity programming language and are stored in the Ethereum blockchain. The team checked for bugs, clones, and contract validation using a sample of more than 22-thousand verified Solidity smart contracts and learning bug patterns from more than 50 buggy contracts of 10 kinds from online sources. The results show comparable performance with other currently available verification tools such as SmartCheck and Deckard.
More specifically, the clone detection results show that SMART EMBED can identify clones with a similarity ratio of 90 percent, revealing the existence of much duplicate and heavily reused smart contract code, and it “can detect more semantic clones accurately than other commonly used clone detection tools.” For bug detection, the program can identify more than 1000 bugs related to clones. Lastly, SMARTEMBED can help to validate contracts by checking if the contracts contain similar bugs of known patterns with low rates of false positives.
It is also important to deal with the challenges in manual or hybrid program translation and analysis. After all, any human involvement means there’s a chance for some bugs and mistakes for smart contracts. In their 2019 paper, “Learning cross-language API mappings with little knowledge,” Jiang and coauthors state that current automated translation techniques still require large amounts of parallel corpora, ranging from pairs of application programming interface methods (APIs) or code fragments that are functionally equivalent to similar code comments.” This process typically needs a lot of prior knowledge, carried out by humans. Instead, they propose to conduct API and code mapping through a new domain adoption method that learns to align various vector spaces that embed APIs and code of different programming languages.
After the embedding vectors for code in different languages (such as C# and Java) are aligned, various code analysis and bug detection techniques can be transferred and reused across the languages. The new approach reduces the need for human input as well as the necessary knowledge for a new programming language (such as Solidity, Vyper and others) used for developing smart contracts. In turn, that helps to reduce bugs and clones in smart contracts.
Smart contracts are gaining more traction as tools in financial transactions. As such, they are likely to be adopted by industries outside the financial sector. For example, they can be used to automate workflows in businesses and manufacturing industries, triggering actions for machines or managers as soon as conditions are satisfied.
For Dr Jiang and his team, the next stage is to develop AI and machine learning algorithms and programs that provide better program analysis and translation and other general software quality assurance measures. These can include malware detection, vulnerability detection, automate code generation, and software repair for various programming languages and use cases. These developments will make smart contracts safer and more trustworthy as they continue to become an efficient and effective method of financial transactions in the digital age.
That’s all for this episode – thanks for listening and stay subscribed to Research Pod for more of the latest science. See you again soon.