Skip to content

Commit bef187c

Browse files
committed
Implement -fsanitize-coverage-whitelist and -fsanitize-coverage-blacklist for clang
Summary: This commit adds two command-line options to clang. These options let the user decide which functions will receive SanitizerCoverage instrumentation. This is most useful in the libFuzzer use case, where it enables targeted coverage-guided fuzzing. Patch by Yannis Juglaret of DGA-MI, Rennes, France libFuzzer tests its target against an evolving corpus, and relies on SanitizerCoverage instrumentation to collect the code coverage information that drives corpus evolution. Currently, libFuzzer collects such information for all functions of the target under test, and adds to the corpus every mutated sample that finds a new code coverage path in any function of the target. We propose instead to let the user specify which functions' code coverage information is relevant for building the upcoming fuzzing campaign's corpus. To this end, we add two new command line options for clang, enabling targeted coverage-guided fuzzing with libFuzzer. We see targeted coverage guided fuzzing as a simple way to leverage libFuzzer for big targets with thousands of functions or multiple dependencies. We publish this patch as work from DGA-MI of Rennes, France, with proper authorization from the hierarchy. Targeted coverage-guided fuzzing can accelerate bug finding for two reasons. First, the compiler will avoid costly instrumentation for non-relevant functions, accelerating fuzzer execution for each call to any of these functions. Second, the built fuzzer will produce and use a more accurate corpus, because it will not keep the samples that find new coverage paths in non-relevant functions. The two new command line options are `-fsanitize-coverage-whitelist` and `-fsanitize-coverage-blacklist`. They accept files in the same format as the existing `-fsanitize-blacklist` option <https://clang.llvm.org/docs/SanitizerSpecialCaseList.html#format>. The new options influence SanitizerCoverage so that it will only instrument a subset of the functions in the target. We explain these options in detail in `clang/docs/SanitizerCoverage.rst`. Consider now the woff2 fuzzing example from the libFuzzer tutorial <https://github.com/google/fuzzer-test-suite/blob/master/tutorial/libFuzzerTutorial.md>. We are aware that we cannot conclude much from this example because mutating compressed data is generally a bad idea, but let us use it anyway as an illustration for its simplicity. Let us use an empty blacklist together with one of the three following whitelists: ``` # (a) src:* fun:* # (b) src:SRC/* fun:* # (c) src:SRC/src/woff2_dec.cc fun:* ``` Running the built fuzzers shows how many instrumentation points the compiler adds, the fuzzer will output //XXX PCs//. Whitelist (a) is the instrument-everything whitelist, it produces 11912 instrumentation points. Whitelist (b) focuses coverage to instrument woff2 source code only, ignoring the dependency code for brotli (de)compression; it produces 3984 instrumented instrumentation points. Whitelist (c) focuses coverage to only instrument functions in the main file that deals with WOFF2 to TTF conversion, resulting in 1056 instrumentation points. For experimentation purposes, we ran each fuzzer approximately 100 times, single process, with the initial corpus provided in the tutorial. We let the fuzzer run until it either found the heap buffer overflow or went out of memory. On this simple example, whitelists (b) and (c) found the heap buffer overflow more reliably and 5x faster than whitelist (a). The average execution times when finding the heap buffer overflow were as follows: (a) 904 s, (b) 156 s, and (c) 176 s. We explain these results by the fact that WOFF2 to TTF conversion calls the brotli decompression algorithm's functions, which are mostly irrelevant for finding bugs in WOFF2 font reconstruction but nevertheless instrumented and used by whitelist (a) to guide fuzzing. This results in longer execution time for these functions and a partially irrelevant corpus. Contrary to whitelist (a), whitelists (b) and (c) will execute brotli-related functions without instrumentation overhead, and ignore new code paths found in them. This results in faster bug finding for WOFF2 font reconstruction. The results for whitelist (b) are similar to the ones for whitelist (c). Indeed, WOFF2 to TTF conversion calls functions that are mostly located in SRC/src/woff2_dec.cc. The 2892 extra instrumentation points allowed by whitelist (b) do not tamper with bug finding, even though they are mostly irrelevant, simply because most of these functions do not get called. We get a slightly faster average time for bug finding with whitelist (b), which might indicate that some of the extra instrumentation points are actually relevant, or might just be random noise. Reviewers: kcc, morehouse, vitalybuka Reviewed By: morehouse, vitalybuka Subscribers: pratyai, vitalybuka, eternalsakura, xwlin222, dende, srhines, kubamracek, #sanitizers, lebedev.ri, hiraditya, cfe-commits, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D63616
1 parent a7aaaf7 commit bef187c

File tree

11 files changed

+329
-49
lines changed

11 files changed

+329
-49
lines changed

clang/docs/SanitizerCoverage.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,58 @@ will not be instrumented.
312312
// for every non-constant array index.
313313
void __sanitizer_cov_trace_gep(uintptr_t Idx);
314314

315+
Partially disabling instrumentation
316+
===================================
317+
318+
It is sometimes useful to tell SanitizerCoverage to instrument only a subset of the
319+
functions in your target.
320+
With ``-fsanitize-coverage-whitelist=whitelist.txt``
321+
and ``-fsanitize-coverage-blacklist=blacklist.txt``,
322+
you can specify such a subset through the combination of a whitelist and a blacklist.
323+
324+
SanitizerCoverage will only instrument functions that satisfy two conditions.
325+
First, the function should belong to a source file with a path that is both whitelisted
326+
and not blacklisted.
327+
Second, the function should have a mangled name that is both whitelisted and not blacklisted.
328+
329+
The whitelist and blacklist format is similar to that of the sanitizer blacklist format.
330+
The default whitelist will match every source file and every function.
331+
The default blacklist will match no source file and no function.
332+
333+
A common use case is to have the whitelist list folders or source files for which you want
334+
instrumentation and allow all function names, while the blacklist will opt out some specific
335+
files or functions that the whitelist loosely allowed.
336+
337+
Here is an example whitelist:
338+
339+
.. code-block:: none
340+
341+
# Enable instrumentation for a whole folder
342+
src:bar/*
343+
# Enable instrumentation for a specific source file
344+
src:foo/a.cpp
345+
# Enable instrumentation for all functions in those files
346+
fun:*
347+
348+
And an example blacklist:
349+
350+
.. code-block:: none
351+
352+
# Disable instrumentation for a specific source file that the whitelist allowed
353+
src:bar/b.cpp
354+
# Disable instrumentation for a specific function that the whitelist allowed
355+
fun:*myFunc*
356+
357+
The use of ``*`` wildcards above is required because function names are matched after mangling.
358+
Without the wildcards, one would have to write the whole mangled name.
359+
360+
Be careful that the paths of source files are matched exactly as they are provided on the clang
361+
command line.
362+
For example, the whitelist above would include file ``bar/b.cpp`` if the path was provided
363+
exactly like this, but would it would fail to include it with other ways to refer to the same
364+
file such as ``./bar/b.cpp``, or ``bar\b.cpp`` on Windows.
365+
So, please make sure to always double check that your lists are correctly applied.
366+
315367
Default implementation
316368
======================
317369

clang/include/clang/Basic/CodeGenOptions.h

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,16 @@ class CodeGenOptions : public CodeGenOptionsBase {
306306
/// List of dynamic shared object files to be loaded as pass plugins.
307307
std::vector<std::string> PassPlugins;
308308

309+
/// Path to whitelist file specifying which objects
310+
/// (files, functions) should exclusively be instrumented
311+
/// by sanitizer coverage pass.
312+
std::vector<std::string> SanitizeCoverageWhitelistFiles;
313+
314+
/// Path to blacklist file specifying which objects
315+
/// (files, functions) listed for instrumentation by sanitizer
316+
/// coverage pass should actually not be instrumented.
317+
std::vector<std::string> SanitizeCoverageBlacklistFiles;
318+
309319
public:
310320
// Define accessors/mutators for code generation options of enumeration type.
311321
#define CODEGENOPT(Name, Bits, Default)

clang/include/clang/Basic/DiagnosticDriverKinds.td

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,10 @@ def err_drv_invalid_argument_to_option : Error<
158158
"invalid argument '%0' to -%1">;
159159
def err_drv_malformed_sanitizer_blacklist : Error<
160160
"malformed sanitizer blacklist: '%0'">;
161+
def err_drv_malformed_sanitizer_coverage_whitelist : Error<
162+
"malformed sanitizer coverage whitelist: '%0'">;
163+
def err_drv_malformed_sanitizer_coverage_blacklist : Error<
164+
"malformed sanitizer coverage blacklist: '%0'">;
161165
def err_drv_duplicate_config : Error<
162166
"no more than one option '--config' is allowed">;
163167
def err_drv_config_file_not_exist : Error<

clang/include/clang/Driver/Options.td

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1022,6 +1022,12 @@ def fno_sanitize_coverage
10221022
Group<f_clang_Group>, Flags<[CoreOption, DriverOption]>,
10231023
HelpText<"Disable specified features of coverage instrumentation for "
10241024
"Sanitizers">, Values<"func,bb,edge,indirect-calls,trace-bb,trace-cmp,trace-div,trace-gep,8bit-counters,trace-pc,trace-pc-guard,no-prune,inline-8bit-counters,inline-bool-flag">;
1025+
def fsanitize_coverage_whitelist : Joined<["-"], "fsanitize-coverage-whitelist=">,
1026+
Group<f_clang_Group>, Flags<[CoreOption, DriverOption]>,
1027+
HelpText<"Restrict sanitizer coverage instrumentation exclusively to modules and functions that match the provided special case list, except the blacklisted ones">;
1028+
def fsanitize_coverage_blacklist : Joined<["-"], "fsanitize-coverage-blacklist=">,
1029+
Group<f_clang_Group>, Flags<[CoreOption, DriverOption]>,
1030+
HelpText<"Disable sanitizer coverage instrumentation for modules and functions that match the provided special case list, even the whitelisted ones">;
10251031
def fsanitize_memory_track_origins_EQ : Joined<["-"], "fsanitize-memory-track-origins=">,
10261032
Group<f_clang_Group>,
10271033
HelpText<"Enable origins tracking in MemorySanitizer">;

clang/include/clang/Driver/SanitizerArgs.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ class SanitizerArgs {
2727

2828
std::vector<std::string> UserBlacklistFiles;
2929
std::vector<std::string> SystemBlacklistFiles;
30+
std::vector<std::string> CoverageWhitelistFiles;
31+
std::vector<std::string> CoverageBlacklistFiles;
3032
int CoverageFeatures = 0;
3133
int MsanTrackOrigins = 0;
3234
bool MsanUseAfterDtor = true;

clang/lib/CodeGen/BackendUtil.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,9 @@ static void addSanitizerCoveragePass(const PassManagerBuilder &Builder,
234234
static_cast<const PassManagerBuilderWrapper &>(Builder);
235235
const CodeGenOptions &CGOpts = BuilderWrapper.getCGOpts();
236236
auto Opts = getSancovOptsFromCGOpts(CGOpts);
237-
PM.add(createModuleSanitizerCoverageLegacyPassPass(Opts));
237+
PM.add(createModuleSanitizerCoverageLegacyPassPass(
238+
Opts, CGOpts.SanitizeCoverageWhitelistFiles,
239+
CGOpts.SanitizeCoverageBlacklistFiles));
238240
}
239241

240242
// Check if ASan should use GC-friendly instrumentation for globals.

clang/lib/Driver/SanitizerArgs.cpp

Lines changed: 80 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,40 @@ static void addDefaultBlacklists(const Driver &D, SanitizerMask Kinds,
151151
}
152152
}
153153

154+
/// Parse -f(no-)?sanitize-(coverage-)?(white|black)list argument's values,
155+
/// diagnosing any invalid file paths and validating special case list format.
156+
static void parseSpecialCaseListArg(const Driver &D,
157+
const llvm::opt::ArgList &Args,
158+
std::vector<std::string> &SCLFiles,
159+
llvm::opt::OptSpecifier SCLOptionID,
160+
llvm::opt::OptSpecifier NoSCLOptionID,
161+
unsigned MalformedSCLErrorDiagID) {
162+
for (const auto *Arg : Args) {
163+
// Match -fsanitize-(coverage-)?(white|black)list.
164+
if (Arg->getOption().matches(SCLOptionID)) {
165+
Arg->claim();
166+
std::string SCLPath = Arg->getValue();
167+
if (D.getVFS().exists(SCLPath)) {
168+
SCLFiles.push_back(SCLPath);
169+
} else {
170+
D.Diag(clang::diag::err_drv_no_such_file) << SCLPath;
171+
}
172+
// Match -fno-sanitize-blacklist.
173+
} else if (Arg->getOption().matches(NoSCLOptionID)) {
174+
Arg->claim();
175+
SCLFiles.clear();
176+
}
177+
}
178+
// Validate special case list format.
179+
{
180+
std::string BLError;
181+
std::unique_ptr<llvm::SpecialCaseList> SCL(
182+
llvm::SpecialCaseList::create(SCLFiles, D.getVFS(), BLError));
183+
if (!SCL.get())
184+
D.Diag(MalformedSCLErrorDiagID) << BLError;
185+
}
186+
}
187+
154188
/// Sets group bits for every group that has at least one representative already
155189
/// enabled in \p Kinds.
156190
static SanitizerMask setGroupBits(SanitizerMask Kinds) {
@@ -561,37 +595,18 @@ SanitizerArgs::SanitizerArgs(const ToolChain &TC,
561595
// Setup blacklist files.
562596
// Add default blacklist from resource directory.
563597
addDefaultBlacklists(D, Kinds, SystemBlacklistFiles);
564-
// Parse -f(no-)sanitize-blacklist options.
565-
for (const auto *Arg : Args) {
566-
if (Arg->getOption().matches(options::OPT_fsanitize_blacklist)) {
567-
Arg->claim();
568-
std::string BLPath = Arg->getValue();
569-
if (D.getVFS().exists(BLPath)) {
570-
UserBlacklistFiles.push_back(BLPath);
571-
} else {
572-
D.Diag(clang::diag::err_drv_no_such_file) << BLPath;
573-
}
574-
} else if (Arg->getOption().matches(options::OPT_fno_sanitize_blacklist)) {
575-
Arg->claim();
576-
UserBlacklistFiles.clear();
577-
SystemBlacklistFiles.clear();
578-
}
579-
}
580-
// Validate blacklists format.
581-
{
582-
std::string BLError;
583-
std::unique_ptr<llvm::SpecialCaseList> SCL(
584-
llvm::SpecialCaseList::create(UserBlacklistFiles, D.getVFS(), BLError));
585-
if (!SCL.get())
586-
D.Diag(clang::diag::err_drv_malformed_sanitizer_blacklist) << BLError;
587-
}
588-
{
589-
std::string BLError;
590-
std::unique_ptr<llvm::SpecialCaseList> SCL(llvm::SpecialCaseList::create(
591-
SystemBlacklistFiles, D.getVFS(), BLError));
592-
if (!SCL.get())
593-
D.Diag(clang::diag::err_drv_malformed_sanitizer_blacklist) << BLError;
594-
}
598+
599+
// Parse -f(no-)?sanitize-blacklist options.
600+
// This also validates special case lists format.
601+
// Here, OptSpecifier() acts as a never-matching command-line argument.
602+
// So, there is no way to append to system blacklist but it can be cleared.
603+
parseSpecialCaseListArg(D, Args, SystemBlacklistFiles, OptSpecifier(),
604+
options::OPT_fno_sanitize_blacklist,
605+
clang::diag::err_drv_malformed_sanitizer_blacklist);
606+
parseSpecialCaseListArg(D, Args, UserBlacklistFiles,
607+
options::OPT_fsanitize_blacklist,
608+
options::OPT_fno_sanitize_blacklist,
609+
clang::diag::err_drv_malformed_sanitizer_blacklist);
595610

596611
// Parse -f[no-]sanitize-memory-track-origins[=level] options.
597612
if (AllAddedKinds & SanitizerKind::Memory) {
@@ -745,6 +760,21 @@ SanitizerArgs::SanitizerArgs(const ToolChain &TC,
745760
CoverageFeatures |= CoverageFunc;
746761
}
747762

763+
// Parse -fsanitize-coverage-(black|white)list options if coverage enabled.
764+
// This also validates special case lists format.
765+
// Here, OptSpecifier() acts as a never-matching command-line argument.
766+
// So, there is no way to clear coverage lists but you can append to them.
767+
if (CoverageFeatures) {
768+
parseSpecialCaseListArg(
769+
D, Args, CoverageWhitelistFiles,
770+
options::OPT_fsanitize_coverage_whitelist, OptSpecifier(),
771+
clang::diag::err_drv_malformed_sanitizer_coverage_whitelist);
772+
parseSpecialCaseListArg(
773+
D, Args, CoverageBlacklistFiles,
774+
options::OPT_fsanitize_coverage_blacklist, OptSpecifier(),
775+
clang::diag::err_drv_malformed_sanitizer_coverage_blacklist);
776+
}
777+
748778
SharedRuntime =
749779
Args.hasFlag(options::OPT_shared_libsan, options::OPT_static_libsan,
750780
TC.getTriple().isAndroid() || TC.getTriple().isOSFuchsia() ||
@@ -871,6 +901,17 @@ static std::string toString(const clang::SanitizerSet &Sanitizers) {
871901
return Res;
872902
}
873903

904+
static void addSpecialCaseListOpt(const llvm::opt::ArgList &Args,
905+
llvm::opt::ArgStringList &CmdArgs,
906+
const char *SCLOptFlag,
907+
const std::vector<std::string> &SCLFiles) {
908+
for (const auto &SCLPath : SCLFiles) {
909+
SmallString<64> SCLOpt(SCLOptFlag);
910+
SCLOpt += SCLPath;
911+
CmdArgs.push_back(Args.MakeArgString(SCLOpt));
912+
}
913+
}
914+
874915
static void addIncludeLinkerOption(const ToolChain &TC,
875916
const llvm::opt::ArgList &Args,
876917
llvm::opt::ArgStringList &CmdArgs,
@@ -933,6 +974,10 @@ void SanitizerArgs::addArgs(const ToolChain &TC, const llvm::opt::ArgList &Args,
933974
if (CoverageFeatures & F.first)
934975
CmdArgs.push_back(F.second);
935976
}
977+
addSpecialCaseListOpt(
978+
Args, CmdArgs, "-fsanitize-coverage-whitelist=", CoverageWhitelistFiles);
979+
addSpecialCaseListOpt(
980+
Args, CmdArgs, "-fsanitize-coverage-blacklist=", CoverageBlacklistFiles);
936981

937982
if (TC.getTriple().isOSWindows() && needsUbsanRt()) {
938983
// Instruct the code generator to embed linker directives in the object file
@@ -968,16 +1013,10 @@ void SanitizerArgs::addArgs(const ToolChain &TC, const llvm::opt::ArgList &Args,
9681013
CmdArgs.push_back(
9691014
Args.MakeArgString("-fsanitize-trap=" + toString(TrapSanitizers)));
9701015

971-
for (const auto &BLPath : UserBlacklistFiles) {
972-
SmallString<64> BlacklistOpt("-fsanitize-blacklist=");
973-
BlacklistOpt += BLPath;
974-
CmdArgs.push_back(Args.MakeArgString(BlacklistOpt));
975-
}
976-
for (const auto &BLPath : SystemBlacklistFiles) {
977-
SmallString<64> BlacklistOpt("-fsanitize-system-blacklist=");
978-
BlacklistOpt += BLPath;
979-
CmdArgs.push_back(Args.MakeArgString(BlacklistOpt));
980-
}
1016+
addSpecialCaseListOpt(Args, CmdArgs,
1017+
"-fsanitize-blacklist=", UserBlacklistFiles);
1018+
addSpecialCaseListOpt(Args, CmdArgs,
1019+
"-fsanitize-system-blacklist=", SystemBlacklistFiles);
9811020

9821021
if (MsanTrackOrigins)
9831022
CmdArgs.push_back(Args.MakeArgString("-fsanitize-memory-track-origins=" +

clang/lib/Frontend/CompilerInvocation.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1181,6 +1181,10 @@ static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
11811181
Opts.SanitizeCoveragePCTable = Args.hasArg(OPT_fsanitize_coverage_pc_table);
11821182
Opts.SanitizeCoverageStackDepth =
11831183
Args.hasArg(OPT_fsanitize_coverage_stack_depth);
1184+
Opts.SanitizeCoverageWhitelistFiles =
1185+
Args.getAllArgValues(OPT_fsanitize_coverage_whitelist);
1186+
Opts.SanitizeCoverageBlacklistFiles =
1187+
Args.getAllArgValues(OPT_fsanitize_coverage_blacklist);
11841188
Opts.SanitizeMemoryTrackOrigins =
11851189
getLastArgIntValue(Args, OPT_fsanitize_memory_track_origins_EQ, 0, Diags);
11861190
Opts.SanitizeMemoryUseAfterDtor =

0 commit comments

Comments
 (0)