llvm-journey

LLVM Journey
git clone git://0xff.ir/g/llvm-journey.git
Log | Files | Refs | README | LICENSE

LLVM journey

My journey through "My First Language Frontend with LLVM Tutorial".

Chapter 1: Kaleidoscope language and Lexer

The lexer has been implemented as a STL-like algorithm. It accepts a pair of ForwardIterators and a function object; for each token, the function object will be called as follow: funcobj(kal::TkToken, FwdIt tkBegin, FwdIt tkEnd).

kal::lex function template

Tokenize the input sequence.

template<typename FwdIt, typename TokenOp>
FwdIt
lex(FwdIt f, FwdIt l, TokenOp op);

The function template tokenize the range [f, l), and will call op with the following arguments:

Parameters

Return value

Returns the end of input range (l).

Example of usage

File: examples/ex_kaleidoscope_lexer.cpp

#include <iostream>
#include <string>

#include "kaleidoscope_lexer.hpp"

int
main()
{
  std::string source{
    R"(
extern sin(arg); # C standard library function

sin(3.14159265/2)
)"
  };
  auto sBegin = source.cbegin();
  auto sEnd = source.cend();

  kal::lex(
    sBegin, sEnd, [&](kal::TkType type, const auto& tkBegin, const auto& tkEnd) {
      auto b = std::distance(sBegin, tkBegin);
      auto e = std::distance(sBegin, tkEnd);

      // clang-format off

      std::cout << "Token{"
                << "\n  /*type*/  kal::TkType::"
                << kal::TKTYPE_STR[static_cast<int>(type)]
                << ",\n  /*begin*/ " << b
                << ",\n  /*end*/   " << e
                << ",\n  /*str*/   \"" << source.substr(b, e - b) << '"';

      // clang-format on

      if (type == kal::TkType::Num)
        std::cout << ",\n  /*num*/   " << source.substr(b, e - b);

      std::cout << ",\n}\n";
    });

  return 0;
}

Output of the program:

Token{
  /*type*/  kal::TkType::Extern,
  /*begin*/ 1,
  /*end*/   7,
  /*str*/   "extern",
}
Token{
  /*type*/  kal::TkType::Id,
  /*begin*/ 8,
  /*end*/   11,
  /*str*/   "sin",
}
Token{
  /*type*/  kal::TkType::Etc,
  /*begin*/ 11,
  /*end*/   12,
  /*str*/   "(",
}
Token{
  /*type*/  kal::TkType::Id,
  /*begin*/ 12,
  /*end*/   15,
  /*str*/   "arg",
}
Token{
  /*type*/  kal::TkType::Etc,
  /*begin*/ 15,
  /*end*/   16,
  /*str*/   ")",
}
Token{
  /*type*/  kal::TkType::Etc,
  /*begin*/ 16,
  /*end*/   17,
  /*str*/   ";",
}
Token{
  /*type*/  kal::TkType::Comment,
  /*begin*/ 18,
  /*end*/   47,
  /*str*/   "# C standard library function",
}
Token{
  /*type*/  kal::TkType::Id,
  /*begin*/ 49,
  /*end*/   52,
  /*str*/   "sin",
}
Token{
  /*type*/  kal::TkType::Etc,
  /*begin*/ 52,
  /*end*/   53,
  /*str*/   "(",
}
Token{
  /*type*/  kal::TkType::Num,
  /*begin*/ 53,
  /*end*/   63,
  /*str*/   "3.14159265",
  /*num*/   3.14159265,
}
Token{
  /*type*/  kal::TkType::Etc,
  /*begin*/ 63,
  /*end*/   64,
  /*str*/   "/",
}
Token{
  /*type*/  kal::TkType::Num,
  /*begin*/ 64,
  /*end*/   65,
  /*str*/   "2",
  /*num*/   2,
}
Token{
  /*type*/  kal::TkType::Etc,
  /*begin*/ 65,
  /*end*/   66,
  /*str*/   ")",
}
Token{
  /*type*/  kal::TkType::END,
  /*begin*/ 67,
  /*end*/   67,
  /*str*/   "",
}

Chapter 2: Kaleidoscope: Implementing a Parser and AST

Abstract Syntax Tree (AST)

ASTNode class

Each node of AST (Abstract Syntax Tree) can be saved in ASTNode class. It's a concepts-based polymorphic type that accepts an instance of type T that satisfies the following requirements:

kal::NodeType is an enum class:

enum class NodeType
{
  None = -1,
  Number,
  Variable,
  BinaryOp,
  Call,
  Prototype,
  Function,
};

For more information about implementation details of concepts-based polymorphic types and their benefits, you can watch this great talk by Sean Parent. If you're impatient you can go directly to time 49:13.

kal::cast function template

template<typename T>
bool
cast(const ASTNode& n, T& val);

This function will copy the value type inside the polymorphic type n to val. On successful copy, this will return true.

Parameters
Return value

Returns true on success (when dynamic_cast can casts to T).

Concrete types for AST nodes

Current implementation provides the following regular types for AST nodes:

All of these types are implemented as struct (all members are public). Because there's no invariant to establish. See C++ Core Guidelines C.2.

Example of usage

#include "kaleidoscope_ast.hpp"

using namespace kal;

int
main()
{
  Number n{ 3.14 };
  Variable v{ "var1" };
  BinaryOp b{ '+', n, v }; // 3.14 + var1

  Variable x{ "x" };
  Variable y{ "y" };
  Call c{ "tan2", { x, y } }; // tan2(x, y)

  Prototype p{ "add3", { "x", "y", "z" } }; // extern add3(x, y, z)

  Variable z{ "z" };
  Function f{ p, BinaryOp{ '+', BinaryOp{ '+', x, y }, z } };

  return 0;
}

Parser

Program Analysis

Algorithms to make sure validity of input program.

Chapter 3: Kaleidoscope: Code generation to LLVM IR

Code generator assumes input program is valid; feeding code generator with invalid program leads to undefined behavior (UB).

Code generation for kal::Number, kal::Variable, and kal::BinaryOp

kal::codegen function overload for const kal::Number&

Generates code for kal::Number.

llvm::Value* kal::codegen(const kal::Number&);

kal::codegen function overload for const kal::Variable&

Generates code for kal::Variable.

llvm::Value* kal::codegen(const kal::Variable&);

kal::codegen function overload for const kal::BinaryOp&

Generates code for binary operation kal::BinaryOp.

llvm::Value* kal::codegen(const kal::BinaryOp&);

kal::codegen function overload for const kal::ASTNode&

Generates code for the concrete type inside the node.

llvm::Value* kal::codegen(const kal::ASTNode& node);
Pre-condition:

The pre-condition is that node must not be empty (default constructed), or contain Prototype or Function nodes.

Chapter 4: Kaleidoscope: Adding JIT and Optimizer Support

Optimizer