LLVM:

模块化,可重用的编译器以及工具链技术集合.

创始人: Chris Lattner

LLVM不是Low Level Virtual Machine(低级虚拟机)的缩写,LLVM就是他的项目全名.

传统编译器:

GCC
Clang

传统编译器架构:

ping-mu-kuai-zhao-2018-10-26-xia-wu-9-53-38

  • Frontend: 前端

词法分析,语法分析,语义分析,生成中间代码

  • Opotimizer: 优化器

中间代码优化

  • Backend: 后端

生成机器码

LLVM架构

ping-mu-kuai-zhao-2018-10-26-xia-wu-9-58-59

  • 不同的前端后端使用统一的中间代码 LLVM Intermediate Representation(LLVM IR).
  • 如果需要支持一种新的编程语言,只需要实现一个新的前端.
  • 如果需要支持一种新的硬件设备,只需要增加一个新的后端.
  • 优化阶段是一个通用阶段,它针对的是统一的LLVM IR,无论是支持新的编程语言,还是支持新的硬件设备,都不需要对优化阶段做修改.
  • 相比之下,GCC的前端和后端没分的泰开,前端后端耦合在一起.所以GCC为了来支持一门新的语言或者新的硬件设备,就变得很困难.
  • LLVM现在被用作为实现各种静态和运行时变易语言的通用基础结构.(GCC家族,Java,.net,Python等)

Clang

  • LLVM一个子项目
  • 基于LVVM架构的C/C++/Objective-C编译器前端

优点:

  • 编译速度快,在某些平台上Clang的便以速度显著地快过GCC
  • 占用内存小,Clang生成的AST所占用的内训师CGG的五分之一左右
  • 模块化设计,基于库的模块化设计,易于IDE集成以及其他用途的重用
  • 诊断信息可读性强: 在编译过程中,Clang创建并保留了大量详细的元数据(metadata),有利于调试和错误解读.
  • 设计清晰简单,容易理解,易于扩展增强

Clang与LLVM

ping-mu-kuai-zhao-2018-10-26-xia-wu-10-17-23

  • 广义LLVM

整个LLVM架构

  • 狭义LLVM

LLVM后端(代码优化,目标代码生成等)

ping-mu-kuai-zhao-2018-10-26-xia-wu-10-22-37

OC源文件编译过程

命令行查看编译过程

clang -ccc-print-phases main.m

➜  TestSwift clang -ccc-print-phases main.swift
0: input, "main.swift", object
1: linker, {0}, image
2: bind-arch, "x86_64", {1}, image
➜  TestOC clang -ccc-print-phases main.m
0: input, "main.m", objective-c
1: preprocessor, {0}, objective-c-cpp-output
2: compiler, {1}, ir
3: backend, {2}, assembler
4: assembler, {3}, object
5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image

Swift比OC少了4个编译阶段呐,有木有...

查看preprocessor(预处理)的结果

clang -E main.m

//源文件
print("Hello World")
//预处理输出
➜  TestSwift clang -E main.swift
clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
//源文件
#define AGE 10

int main(int argc, const char * argv[]) {
    
    int a = 10;
    int b = 20;
    int c = a + b + AGE;
    
    return 0;
}
//预处理输出
➜  TestOC clang -E main.m
# 1 "main.m"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 373 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "main.m" 2
# 11 "main.m"
int main(int argc, const char * argv[]) {

    int a = 10;
    int b = 20;
    int c = a + b + 10;

    return 0;
}

词法分析

  • 词法分析,生成Token(类似英语中主语,谓语,宾语,宾补...)

clang -fmodules -E -Xclang -dump-tokens main.m

➜  TestSwift clang -fmodules -E -Xclang -dump-tokens main.swift
clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-fmodules' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-Xclang -dump-tokens' [-Wunused-command-line-argument]
➜  TestOC clang -fmodules -E -Xclang -dump-tokens main.m
int 'int'	 [StartOfLine]	Loc=<main.m:11:1>
identifier 'main'	 [LeadingSpace]	Loc=<main.m:11:5>
l_paren '('		Loc=<main.m:11:9>
int 'int'		Loc=<main.m:11:10>
identifier 'argc'	 [LeadingSpace]	Loc=<main.m:11:14>
comma ','		Loc=<main.m:11:18>
const 'const'	 [LeadingSpace]	Loc=<main.m:11:20>
char 'char'	 [LeadingSpace]	Loc=<main.m:11:26>
star '*'	 [LeadingSpace]	Loc=<main.m:11:31>
identifier 'argv'	 [LeadingSpace]	Loc=<main.m:11:33>
l_square '['		Loc=<main.m:11:37>
r_square ']'		Loc=<main.m:11:38>
r_paren ')'		Loc=<main.m:11:39>
l_brace '{'	 [LeadingSpace]	Loc=<main.m:11:41>
int 'int'	 [StartOfLine] [LeadingSpace]	Loc=<main.m:13:5>
identifier 'a'	 [LeadingSpace]	Loc=<main.m:13:9>
equal '='	 [LeadingSpace]	Loc=<main.m:13:11>
numeric_constant '10'	 [LeadingSpace]	Loc=<main.m:13:13>
semi ';'		Loc=<main.m:13:15>
int 'int'	 [StartOfLine] [LeadingSpace]	Loc=<main.m:14:5>
identifier 'b'	 [LeadingSpace]	Loc=<main.m:14:9>
equal '='	 [LeadingSpace]	Loc=<main.m:14:11>
numeric_constant '20'	 [LeadingSpace]	Loc=<main.m:14:13>
semi ';'		Loc=<main.m:14:15>
int 'int'	 [StartOfLine] [LeadingSpace]	Loc=<main.m:15:5>
identifier 'c'	 [LeadingSpace]	Loc=<main.m:15:9>
equal '='	 [LeadingSpace]	Loc=<main.m:15:11>
identifier 'a'	 [LeadingSpace]	Loc=<main.m:15:13>
plus '+'	 [LeadingSpace]	Loc=<main.m:15:15>
identifier 'b'	 [LeadingSpace]	Loc=<main.m:15:17>
plus '+'	 [LeadingSpace]	Loc=<main.m:15:19>
numeric_constant '10'	 [LeadingSpace]	Loc=<main.m:15:21 <Spelling=main.m:9:13>>
semi ';'		Loc=<main.m:15:24>
return 'return'	 [StartOfLine] [LeadingSpace]	Loc=<main.m:17:5>
numeric_constant '0'	 [LeadingSpace]	Loc=<main.m:17:12>
semi ';'		Loc=<main.m:17:13>
r_brace '}'	 [StartOfLine]	Loc=<main.m:18:1>
eof ''		Loc=<main.m:18:2>

语法分析

  • 语法分析,生成语法树(AST, Abstract Syntax Tree)

clang -fmodules -fsyntax-only -Xclang -ast-dump main.m

➜  Test clang -fmodules -fsyntax-only -Xclang -ast-dump main.swift
clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-fmodules' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-Xclang -ast-dump' [-Wunused-command-line-argument]
➜  TestOC clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
TranslationUnitDecl 0x7ff3730298e8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x7ff373029e60 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x7ff373029b80 '__int128'
|-TypedefDecl 0x7ff373029ed0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x7ff373029ba0 'unsigned __int128'
|-TypedefDecl 0x7ff373029f70 <<invalid sloc>> <invalid sloc> implicit SEL 'SEL *'
| `-PointerType 0x7ff373029f30 'SEL *'
|   `-BuiltinType 0x7ff373029dc0 'SEL'
|-TypedefDecl 0x7ff37302a058 <<invalid sloc>> <invalid sloc> implicit id 'id'
| `-ObjCObjectPointerType 0x7ff37302a000 'id'
|   `-ObjCObjectType 0x7ff373029fd0 'id'
|-TypedefDecl 0x7ff37302a138 <<invalid sloc>> <invalid sloc> implicit Class 'Class'
| `-ObjCObjectPointerType 0x7ff37302a0e0 'Class'
|   `-ObjCObjectType 0x7ff37302a0b0 'Class'
|-ObjCInterfaceDecl 0x7ff37302a190 <<invalid sloc>> <invalid sloc> implicit Protocol
|-TypedefDecl 0x7ff37302a4f8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct __NSConstantString_tag'
| `-RecordType 0x7ff37302a300 'struct __NSConstantString_tag'
|   `-Record 0x7ff37302a260 '__NSConstantString_tag'
|-TypedefDecl 0x7ff37302a590 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7ff37302a550 'char *'
|   `-BuiltinType 0x7ff373029980 'char'
|-TypedefDecl 0x7ff373062488 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct __va_list_tag [1]'
| `-ConstantArrayType 0x7ff373062430 'struct __va_list_tag [1]' 1
|   `-RecordType 0x7ff3730622a0 'struct __va_list_tag'
|     `-Record 0x7ff373062200 '__va_list_tag'
`-FunctionDecl 0x7ff373062758 <main.m:11:1, line:18:1> line:11:5 main 'int (int, const char **)'
  |-ParmVarDecl 0x7ff3730624f8 <col:10, col:14> col:14 argc 'int'
  |-ParmVarDecl 0x7ff373062610 <col:20, col:38> col:33 argv 'const char **':'const char **'
  `-CompoundStmt 0x7ff373062bd8 <col:41, line:18:1>
    |-DeclStmt 0x7ff373062928 <line:13:5, col:15>
    | `-VarDecl 0x7ff3730628a8 <col:5, col:13> col:9 used a 'int' cinit
    |   `-IntegerLiteral 0x7ff373062908 <col:13> 'int' 10
    |-DeclStmt 0x7ff3730629d8 <line:14:5, col:15>
    | `-VarDecl 0x7ff373062958 <col:5, col:13> col:9 used b 'int' cinit
    |   `-IntegerLiteral 0x7ff3730629b8 <col:13> 'int' 20
    |-DeclStmt 0x7ff373062b88 <line:15:5, col:24>
    | `-VarDecl 0x7ff373062a08 <col:5, line:9:13> line:15:9 c 'int' cinit
    |   `-BinaryOperator 0x7ff373062b60 <col:13, line:9:13> 'int' '+'
    |     |-BinaryOperator 0x7ff373062b18 <line:15:13, col:17> 'int' '+'
    |     | |-ImplicitCastExpr 0x7ff373062ae8 <col:13> 'int' <LValueToRValue>
    |     | | `-DeclRefExpr 0x7ff373062a68 <col:13> 'int' lvalue Var 0x7ff3730628a8 'a' 'int'
    |     | `-ImplicitCastExpr 0x7ff373062b00 <col:17> 'int' <LValueToRValue>
    |     |   `-DeclRefExpr 0x7ff373062aa8 <col:17> 'int' lvalue Var 0x7ff373062958 'b' 'int'
    |     `-IntegerLiteral 0x7ff373062b40 <line:9:13> 'int' 10
    `-ReturnStmt 0x7ff373062bc0 <line:17:5, col:12>
      `-IntegerLiteral 0x7ff373062ba0 <col:12> 'int' 0

LLVM IR

LLVM IR有三种表示形式(本质等价,好比水的气态,液态,固态)

1.text: 便于阅读的文本格式,类似于汇编语言,扩展名.ll

> clang -S -emit-llvm main.m
; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0"

; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @main(i32, i8**) #0 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  %6 = alloca i32, align 4
  %7 = alloca i32, align 4
  %8 = alloca i32, align 4
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  store i32 10, i32* %6, align 4
  store i32 20, i32* %7, align 4
  %9 = load i32, i32* %6, align 4
  %10 = load i32, i32* %7, align 4
  %11 = add nsw i32 %9, %10
  %12 = add nsw i32 %11, 10
  store i32 %12, i32* %8, align 4
  ret i32 0
}

attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
!llvm.ident = !{!7}

!0 = !{i32 1, !"Objective-C Version", i32 2}
!1 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!2 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
!3 = !{i32 4, !"Objective-C Garbage Collection", i32 0}
!4 = !{i32 1, !"Objective-C Class Properties", i32 64}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{i32 7, !"PIC Level", i32 2}
!7 = !{!"Apple LLVM version 10.0.0 (clang-1000.11.45.2)"}
//什么鬼东西

2.memory: 内存格式
3.bitcode: 二进制格式,扩展名.bc

clang -c -emit-llvm main.m

IR基本语法

  • 注释以分号;开头
  • 全局标识符以@开头,局部标识符以%开头
  • alloca在当前函数栈帧中分配内存
  • i32,32bit,4个字节的意思
  • align,内存对齐
  • store,写入数据
  • load,读取数据

--EOF--