Today there’s a number of publicly available EVM bytecode decompilers. However, many of them actually work only for toy examples and fail on real-world programs.
This is one of the problems for Ethereum smart contracts reverse engineering. The toolset for both development and reverse engineering is immature and often buggy.
The guys from Trail of Bits provided an interesting figure during their talk “Blackhat Ethereum” (CanSecWest’18):
There’re a lot of Ethereum contracts without source code provided. Well, it may be provided somewhere off-chain, but this makes the contract somewhat “private” or at least not public in the sense of the blockchain.
Furthermore, a lot of contract developers may not provide even an ABI, which makes their usage also “private”. Yes, you can call the functions by their signature, but you have to reverse engineer the bytecode to make sense of it.
This is where the EVM function signature generation algorithm comes in handy. EVM heavily relies on hashing, particularly Keccak (namely Keccak256, a variant of SHA3). Hashing is used to generate uniform distribution of such values as storage variable addresses and function signatures.
Function signature in EVM bytecode is a value which is used to call a function instead of using its real name (and arguments). The compiler just creates a switch statement with the jump opcodes pointing to function bodies.
The signature algorithm itself is as follows: return first 4 bytes of Keccak256(function prototype). This leads to an amusing way to recover the actual function prototypes from the bytecode: brute force!
I implemented a small tool for that purpose: ABI Decompiler.
Possible use cases of this tool when analysing a contract without ABI and source code:
- Reverse engineering (having function prototypes may help to get sense of what they are doing without looking into disassembly)
- Debugging (prototypes and disassembly combined may give you a debuggable Solidity code with inline assembly which you can insert into Remix and get things going)
ABI Decompiler has several function prototype brute force strategies built in (such as trying most common argument data types first).
After implementing this tool, I stumbled upon several other helpful resources for the same purpose:
- Ethereum Function Signature Database is a public database containing more than 90,000 signatures
- Ethersplay also contains a database of over 43,000 function signatures (see known_hashes.py)
ABI Decompiler doesn’t rely on signature database: the signatures are generated from the function name list (several wordlists are provided) and the argument types. One can also use this to generate a huge signature database.
P.S. ICOs are becoming less lucrative, and the hype curve seems to be descending, but the urge for the security audits of blockchain projects and dApps is not.