LLVM:
模块化,可重用的编译器以及工具链技术集合.
创始人: Chris Lattner
LLVM不是Low Level Virtual Machine(低级虚拟机)的缩写,LLVM就是他的项目全名.
传统编译器:
GCC
Clang
传统编译器架构:
- Frontend: 前端
词法分析,语法分析,语义分析,生成中间代码
- Opotimizer: 优化器
中间代码优化
- Backend: 后端
生成机器码
LLVM架构
- 不同的前端后端使用统一的中间代码 LLVM Intermediate Representation(LLVM IR).
- 如果需要支持一种新的编程语言,只需要实现一个新的前端.
- 如果需要支持一种新的硬件设备,只需要增加一个新的后端.
- 优化阶段是一个通用阶段,它针对的是统一的LLVM IR,无论是支持新的编程语言,还是支持新的硬件设备,都不需要对优化阶段做修改.
- 相比之下,GCC的前端和后端没分的泰开,前端后端耦合在一起.所以GCC为了来支持一门新的语言或者新的硬件设备,就变得很困难.
- LLVM现在被用作为实现各种静态和运行时变易语言的通用基础结构.(GCC家族,Java,.net,Python等)
Clang
- LLVM一个子项目
- 基于LVVM架构的C/C++/Objective-C编译器前端
优点:
- 编译速度快,在某些平台上Clang的便以速度显著地快过GCC
- 占用内存小,Clang生成的AST所占用的内训师CGG的五分之一左右
- 模块化设计,基于库的模块化设计,易于IDE集成以及其他用途的重用
- 诊断信息可读性强: 在编译过程中,Clang创建并保留了大量详细的元数据(metadata),有利于调试和错误解读.
- 设计清晰简单,容易理解,易于扩展增强
Clang与LLVM
- 广义LLVM
整个LLVM架构
- 狭义LLVM
LLVM后端(代码优化,目标代码生成等)
OC源文件编译过程
命令行查看编译过程
clang -ccc-print-phases main.m
➜ TestSwift clang -ccc-print-phases main.swift
0: input, "main.swift", object
1: linker, {0}, image
2: bind-arch, "x86_64", {1}, image
➜ TestOC clang -ccc-print-phases main.m
0: input, "main.m", objective-c
1: preprocessor, {0}, objective-c-cpp-output
2: compiler, {1}, ir
3: backend, {2}, assembler
4: assembler, {3}, object
5: linker, {4}, image
6: bind-arch, "x86_64", {5}, image
Swift比OC少了4个编译阶段呐,有木有…
查看preprocessor(预处理)的结果
clang -E main.m
//源文件
print("Hello World")
//预处理输出
➜ TestSwift clang -E main.swift
clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
//源文件
#define AGE 10
int main(int argc, const char * argv[]) {
int a = 10;
int b = 20;
int c = a + b + AGE;
return 0;
}
//预处理输出
➜ TestOC clang -E main.m
# 1 "main.m"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 373 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "main.m" 2
# 11 "main.m"
int main(int argc, const char * argv[]) {
int a = 10;
int b = 20;
int c = a + b + 10;
return 0;
}
词法分析
- 词法分析,生成Token(类似英语中主语,谓语,宾语,宾补…)
clang -fmodules -E -Xclang -dump-tokens main.m
➜ TestSwift clang -fmodules -E -Xclang -dump-tokens main.swift
clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-fmodules' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-Xclang -dump-tokens' [-Wunused-command-line-argument]
➜ TestOC clang -fmodules -E -Xclang -dump-tokens main.m
int 'int' [StartOfLine] Loc=<main.m:11:1>
identifier 'main' [LeadingSpace] Loc=<main.m:11:5>
l_paren '(' Loc=<main.m:11:9>
int 'int' Loc=<main.m:11:10>
identifier 'argc' [LeadingSpace] Loc=<main.m:11:14>
comma ',' Loc=<main.m:11:18>
const 'const' [LeadingSpace] Loc=<main.m:11:20>
char 'char' [LeadingSpace] Loc=<main.m:11:26>
star '*' [LeadingSpace] Loc=<main.m:11:31>
identifier 'argv' [LeadingSpace] Loc=<main.m:11:33>
l_square '[' Loc=<main.m:11:37>
r_square ']' Loc=<main.m:11:38>
r_paren ')' Loc=<main.m:11:39>
l_brace '{' [LeadingSpace] Loc=<main.m:11:41>
int 'int' [StartOfLine] [LeadingSpace] Loc=<main.m:13:5>
identifier 'a' [LeadingSpace] Loc=<main.m:13:9>
equal '=' [LeadingSpace] Loc=<main.m:13:11>
numeric_constant '10' [LeadingSpace] Loc=<main.m:13:13>
semi ';' Loc=<main.m:13:15>
int 'int' [StartOfLine] [LeadingSpace] Loc=<main.m:14:5>
identifier 'b' [LeadingSpace] Loc=<main.m:14:9>
equal '=' [LeadingSpace] Loc=<main.m:14:11>
numeric_constant '20' [LeadingSpace] Loc=<main.m:14:13>
semi ';' Loc=<main.m:14:15>
int 'int' [StartOfLine] [LeadingSpace] Loc=<main.m:15:5>
identifier 'c' [LeadingSpace] Loc=<main.m:15:9>
equal '=' [LeadingSpace] Loc=<main.m:15:11>
identifier 'a' [LeadingSpace] Loc=<main.m:15:13>
plus '+' [LeadingSpace] Loc=<main.m:15:15>
identifier 'b' [LeadingSpace] Loc=<main.m:15:17>
plus '+' [LeadingSpace] Loc=<main.m:15:19>
numeric_constant '10' [LeadingSpace] Loc=<main.m:15:21 <Spelling=main.m:9:13>>
semi ';' Loc=<main.m:15:24>
return 'return' [StartOfLine] [LeadingSpace] Loc=<main.m:17:5>
numeric_constant '0' [LeadingSpace] Loc=<main.m:17:12>
semi ';' Loc=<main.m:17:13>
r_brace '}' [StartOfLine] Loc=<main.m:18:1>
eof '' Loc=<main.m:18:2>
语法分析
- 语法分析,生成语法树(AST, Abstract Syntax Tree)
clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
➜ Test clang -fmodules -fsyntax-only -Xclang -ast-dump main.swift
clang: warning: main.swift: 'linker' input unused [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-fmodules' [-Wunused-command-line-argument]
clang: warning: argument unused during compilation: '-Xclang -ast-dump' [-Wunused-command-line-argument]
➜ TestOC clang -fmodules -fsyntax-only -Xclang -ast-dump main.m
TranslationUnitDecl 0x7ff3730298e8 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x7ff373029e60 <<invalid sloc>> <invalid sloc> implicit __int128_t '__ int128'
| `-BuiltinType 0x7ff373029b80 '__int128'
|-TypedefDecl 0x7ff373029ed0 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned__ int128'
| `-BuiltinType 0x7ff373029ba0 'unsigned __int128'
|-TypedefDecl 0x7ff373029f70 <<invalid sloc>> <invalid sloc> implicit SEL 'SEL *'
| `-PointerType 0x7ff373029f30 'SEL *'
| `-BuiltinType 0x7ff373029dc0 'SEL'
|-TypedefDecl 0x7ff37302a058 <<invalid sloc>> <invalid sloc> implicit id 'id'
| `-ObjCObjectPointerType 0x7ff37302a000 'id'
| `-ObjCObjectType 0x7ff373029fd0 'id'
|-TypedefDecl 0x7ff37302a138 <<invalid sloc>> <invalid sloc> implicit Class 'Class'
| `-ObjCObjectPointerType 0x7ff37302a0e0 'Class'
| `-ObjCObjectType 0x7ff37302a0b0 'Class'
|-ObjCInterfaceDecl 0x7ff37302a190 <<invalid sloc>> <invalid sloc> implicit Protocol
|-TypedefDecl 0x7ff37302a4f8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString 'struct__ NSConstantString_tag'
| `-RecordType 0x7ff37302a300 'struct __NSConstantString_tag'
| `-Record 0x7ff37302a260 '__NSConstantString_tag'
|-TypedefDecl 0x7ff37302a590 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x7ff37302a550 'char *'
| `-BuiltinType 0x7ff373029980 'char'
|-TypedefDecl 0x7ff373062488 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'struct__ va_list_tag [1]'
| `-ConstantArrayType 0x7ff373062430 'struct __va_list_tag [1]' 1
| `-RecordType 0x7ff3730622a0 'struct __va_list_tag'
| `-Record 0x7ff373062200 '__va_list_tag'
`-FunctionDecl 0x7ff373062758 <main.m:11:1, line:18:1> line:11:5 main 'int (int, const char **)'
|-ParmVarDecl 0x7ff3730624f8 <col:10, col:14> col:14 argc 'int'
|-ParmVarDecl 0x7ff373062610 <col:20, col:38> col:33 argv 'const char **':'const char**'
`-CompoundStmt 0x7ff373062bd8 <col:41, line:18:1>
|-DeclStmt 0x7ff373062928 <line:13:5, col:15>
| `-VarDecl 0x7ff3730628a8 <col:5, col:13> col:9 used a 'int' cinit
| `-IntegerLiteral 0x7ff373062908 <col:13> 'int' 10
|-DeclStmt 0x7ff3730629d8 <line:14:5, col:15>
| `-VarDecl 0x7ff373062958 <col:5, col:13> col:9 used b 'int' cinit
| `-IntegerLiteral 0x7ff3730629b8 <col:13> 'int' 20
|-DeclStmt 0x7ff373062b88 <line:15:5, col:24>
| `-VarDecl 0x7ff373062a08 <col:5, line:9:13> line:15:9 c 'int' cinit
| `-BinaryOperator 0x7ff373062b60 <col:13, line:9:13> 'int' '+'
| |-BinaryOperator 0x7ff373062b18 <line:15:13, col:17> 'int' '+'
| | |-ImplicitCastExpr 0x7ff373062ae8 <col:13> 'int' <LValueToRValue>
| | | `-DeclRefExpr 0x7ff373062a68 <col:13> 'int' lvalue Var 0x7ff3730628a8 'a' 'int'
| | `-ImplicitCastExpr 0x7ff373062b00 <col:17> 'int' <LValueToRValue>
| | `-DeclRefExpr 0x7ff373062aa8 <col:17> 'int' lvalue Var 0x7ff373062958 'b' 'int'
| `-IntegerLiteral 0x7ff373062b40 <line:9:13> 'int' 10
`-ReturnStmt 0x7ff373062bc0 <line:17:5, col:12>
`-IntegerLiteral 0x7ff373062ba0 <col:12> 'int' 0
LLVM IR
LLVM IR有三种表示形式(本质等价,好比水的气态,液态,固态)
1.text: 便于阅读的文本格式,类似于汇编语言,扩展名.ll
> clang -S -emit-llvm main.m
; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.14.0"
; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @main(i32, i8**) #0 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca i32, align 4
%7 = alloca i32, align 4
%8 = alloca i32, align 4
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8 **%1, i8*** %5, align 8
store i32 10, i32* %6, align 4
store i32 20, i32* %7, align 4
%9 = load i32, i32* %6, align 4
%10 = load i32, i32* %7, align 4
%11 = add nsw i32 %9, %10
%12 = add nsw i32 %11, 10
store i32 %12, i32* %8, align 4
ret i32 0
}
attributes #0 = { noinline nounwind optnone ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6}
!llvm.ident = !{!7}
!0 = !{i32 1, !"Objective-C Version", i32 2}
!1 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!2 = !{i32 1, !"Objective-C Image Info Section", !" __DATA,__ objc_imageinfo,regular,no_dead_strip"}
!3 = !{i32 4, !"Objective-C Garbage Collection", i32 0}
!4 = !{i32 1, !"Objective-C Class Properties", i32 64}
!5 = !{i32 1, !"wchar_size", i32 4}
!6 = !{i32 7, !"PIC Level", i32 2}
!7 = !{!"Apple LLVM version 10.0.0 (clang-1000.11.45.2)"}
//什么鬼东西
2.memory: 内存格式
3.bitcode: 二进制格式,扩展名.bc
clang -c -emit-llvm main.m
IR基本语法
- 注释以分号
;
开头 - 全局标识符以
@
开头,局部标识符以%
开头 alloca
在当前函数栈帧中分配内存i32
,32bit,4个字节的意思align
,内存对齐store
,写入数据load
,读取数据
–EOF–