Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ecf591c
Update CurLineNum anc CurColNum in sync with movement in text
Bertik23 Jul 29, 2025
06926e9
Remove remains from cherry pick from LSP branch
Bertik23 Aug 4, 2025
1fdf13c
Make isLabelTail more safe and rename it to better show what it does
Bertik23 Aug 4, 2025
2772cd8
Remove dangling comment
Bertik23 Aug 4, 2025
b05d11a
Fix typo
Bertik23 Aug 12, 2025
458599b
Add location tracking to IR parser
Bertik23 Aug 28, 2025
ee39ed1
Fix some OOB posibilities
Bertik23 Aug 28, 2025
b0c5318
Fix clang format
Bertik23 Aug 28, 2025
416514e
Move private members to top of class definition
Bertik23 Aug 29, 2025
35ca1a5
Use SourceMgr to resolve Line:Column position
Bertik23 Sep 2, 2025
b3d8254
Fix zeroindexing on token positions
Bertik23 Sep 2, 2025
23dcc6b
Replace Line:Column storage with Poiters and on demand conversion
Bertik23 Sep 3, 2025
06d5265
Use nullptr as missing value
Bertik23 Sep 4, 2025
4e08921
Enclose debug prints of tests in LLVM_DEBUG
Bertik23 Sep 4, 2025
3da9e9d
Decapitalize DEBUG_TYPE
Bertik23 Sep 15, 2025
4b3bc0e
Move FileLoc from Value.h to FileLoc.h
Bertik23 Sep 26, 2025
ed7a04a
Rename include guard defines to reflext filename
Bertik23 Sep 26, 2025
e6142b5
include in namespace llvm
Bertik23 Oct 1, 2025
f5da73c
Fix typo in comment
Bertik23 Oct 6, 2025
10a2b75
Path to llvm/AsmParser/FileLoc.h
Bertik23 Oct 6, 2025
17b5753
assert.h -> cassert
Bertik23 Oct 6, 2025
737c5e0
Remove filename and emacs marker
Bertik23 Oct 8, 2025
72b89e5
optimize lookup
Bertik23 Oct 8, 2025
41284df
FileLoc docs and fix reange
Bertik23 Oct 8, 2025
ff9a33d
full path to includes
Bertik23 Oct 8, 2025
008ae63
Apply suggestion from @nikic
Bertik23 Oct 8, 2025
a44ef20
Typo add period
Bertik23 Oct 8, 2025
f201d1f
actually fix filelocrange openness
Bertik23 Oct 8, 2025
1de2447
remove old irrelevant comment
Bertik23 Oct 8, 2025
4d51839
Doc coments with ///
Bertik23 Oct 8, 2025
77385c0
Doc coments with ///
Bertik23 Oct 8, 2025
f58e0ad
Merge remote-tracking branch 'upstream/main' into parser-location-info
Bertik23 Oct 8, 2025
75e5b57
Revert changes irrelevant in LLLexer
Bertik23 Oct 8, 2025
07689fc
revert formating
Bertik23 Oct 8, 2025
66ce6b6
make clang-format happy
Bertik23 Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions llvm/include/llvm/AsmParser/AsmParserContext.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
//===-- AsmParserContext.h --------------------------------------*- C++ -*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit file name and emacs marker in new files.

//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_ASMPARSER_ASMPARSER_STATE_H
#define LLVM_ASMPARSER_ASMPARSER_STATE_H

#include "llvm/ADT/DenseMap.h"
#include "llvm/IR/Value.h"
#include <optional>

namespace llvm {

/// Registry of file location information for LLVM IR constructs
///
/// This class provides access to the file location information
/// for various LLVM IR constructs. Currently, it supports Function,
/// BasicBlock and Instruction locations.
///
/// When available, it can answer queries about what is at a given
/// file location, as well as where in a file a given IR construct
/// is.
///
/// This information is optionally emitted by the LLParser while
/// it reads LLVM textual IR.
class AsmParserContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this class makes it sound like something that should be owned by the LLParser. Maybe something like LLParserLocationInfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think AsmParserContext is better, as it's more future proof. In the future we might want to add some more info, other than location. As for the LLParser vs AsmParser, I think AsmParser is better since it uses what is parsed other then how it's parserd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @arichardson -- I'd expect AsmParserContext to be something internally used by AsmParser. What kind of future extensions do you have in mind here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supposed to be used similarly to AsmParserState in MLIR, In tools, that need to know some things from the parser, in this case we want to use it to be able to access locations or instructions, functions, ... in IR files for the purposes of navigating in the file via a LSP server (such as goto definition/references)

public:
std::optional<FileLocRange> getFunctionLocation(const Function *) const;
std::optional<FileLocRange> getBlockLocation(const BasicBlock *) const;
std::optional<FileLocRange> getInstructionLocation(const Instruction *) const;
std::optional<Function *> getFunctionAtLocation(const FileLocRange &) const;
std::optional<Function *> getFunctionAtLocation(const FileLoc &) const;
std::optional<BasicBlock *> getBlockAtLocation(const FileLocRange &) const;
std::optional<BasicBlock *> getBlockAtLocation(const FileLoc &) const;
std::optional<Instruction *>
getInstructionAtLocation(const FileLocRange &) const;
std::optional<Instruction *> getInstructionAtLocation(const FileLoc &) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use nullptr here to indicate absence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

bool addFunctionLocation(Function *, const FileLocRange &);
bool addBlockLocation(BasicBlock *, const FileLocRange &);
bool addInstructionLocation(Instruction *, const FileLocRange &);

private:
DenseMap<Function *, FileLocRange> Functions;
DenseMap<BasicBlock *, FileLocRange> Blocks;
DenseMap<Instruction *, FileLocRange> Instructions;
};
} // namespace llvm

#endif
38 changes: 35 additions & 3 deletions llvm/include/llvm/AsmParser/LLLexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,20 @@ namespace llvm {
const char *CurPtr;
StringRef CurBuf;

// The line number at `CurPtr-1`, zero-indexed
unsigned CurLineNum = 0;
// The column number at `CurPtr-1`, zero-indexed
unsigned CurColNum = -1;
// The line number of the start of the current token, zero-indexed
unsigned CurTokLineNum = 0;
// The column number of the start of the current token, zero-indexed
unsigned CurTokColNum = 0;
// The line number of the end of the current token, zero-indexed
unsigned PrevTokEndLineNum = -1;
// The column number of the end (exclusive) of the current token,
// zero-indexed
unsigned PrevTokEndColNum = -1;

enum class ErrorPriority {
None, // No error message present.
Parser, // Errors issued by parser.
Expand Down Expand Up @@ -62,9 +76,7 @@ namespace llvm {
explicit LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &,
LLVMContext &C);

lltok::Kind Lex() {
return CurKind = LexToken();
}
lltok::Kind Lex() { return CurKind = LexToken(); }

typedef SMLoc LocTy;
LocTy getLoc() const { return SMLoc::getFromPointer(TokStart); }
Expand All @@ -79,6 +91,21 @@ namespace llvm {
IgnoreColonInIdentifiers = val;
}

// Get the current line number, zero-indexed
unsigned getLineNum() { return CurLineNum; }
// Get the current column number, zero-indexed
unsigned getColNum() { return CurColNum; }
// Get the line number of the start of the current token, zero-indexed
unsigned getTokLineNum() { return CurTokLineNum; }
// Get the column number of the start of the current token, zero-indexed
unsigned getTokColNum() { return CurTokColNum; }
// Get the line number of the end of the previous token, zero-indexed,
// exclusive
unsigned getPrevTokEndLineNum() { return PrevTokEndLineNum; }
// Get the column number of the end of the previous token, zero-indexed,
// exclusive
unsigned getPrevTokEndColNum() { return PrevTokEndColNum; }

// This returns true as a convenience for the parser functions that return
// true on error.
bool ParseError(LocTy ErrorLoc, const Twine &Msg) {
Expand All @@ -93,7 +120,12 @@ namespace llvm {
private:
lltok::Kind LexToken();

// Return closest pointer after `Ptr` that is an end of a label.
// Returns nullptr if `Ptr` doesn't point into a label.
const char *getLabelTail(const char *Ptr);
int getNextChar();
const char *skipNChars(unsigned N);
void advancePositionTo(const char *Ptr);
void SkipLineComment();
bool SkipCComment();
lltok::Kind ReadString(lltok::Kind kind);
Expand Down
9 changes: 7 additions & 2 deletions llvm/include/llvm/AsmParser/LLParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#ifndef LLVM_ASMPARSER_LLPARSER_H
#define LLVM_ASMPARSER_LLPARSER_H

#include "AsmParserContext.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include "AsmParserContext.h"
#include "llvm/AsmParser/AsmParserContext.h"

Also incorrect in the line below, probably this wasn't adjusted when the header was exported...

#include "LLLexer.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/AsmParser/NumberedValues.h"
Expand Down Expand Up @@ -177,6 +178,9 @@ namespace llvm {
// Map of module ID to path.
std::map<unsigned, StringRef> ModuleIdMap;

/// Keeps track of source locations for Values, BasicBlocks, and Functions
AsmParserContext *ParserContext;

/// Only the llvm-as tool may set this to false to bypass
/// UpgradeDebuginfo so it can generate broken bitcode.
bool UpgradeDebugInfo;
Expand All @@ -189,10 +193,11 @@ namespace llvm {
public:
LLParser(StringRef F, SourceMgr &SM, SMDiagnostic &Err, Module *M,
ModuleSummaryIndex *Index, LLVMContext &Context,
SlotMapping *Slots = nullptr)
SlotMapping *Slots = nullptr,
AsmParserContext *ParserContext = nullptr)
: Context(Context), OPLex(F, SM, Err, Context),
Lex(F, SM, Err, Context), M(M), Index(Index), Slots(Slots),
BlockAddressPFS(nullptr) {}
BlockAddressPFS(nullptr), ParserContext(ParserContext) {}
bool Run(
bool UpgradeDebugInfo,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
Expand Down
16 changes: 9 additions & 7 deletions llvm/include/llvm/AsmParser/Parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/Support/Compiler.h"
#include <memory>
#include <optional>
Expand Down Expand Up @@ -62,7 +63,8 @@ parseAssemblyFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
/// parsing.
LLVM_ABI std::unique_ptr<Module>
parseAssemblyString(StringRef AsmString, SMDiagnostic &Err,
LLVMContext &Context, SlotMapping *Slots = nullptr);
LLVMContext &Context, SlotMapping *Slots = nullptr,
AsmParserContext *ParserContext = nullptr);

/// Holds the Module and ModuleSummaryIndex returned by the interfaces
/// that parse both.
Expand Down Expand Up @@ -128,9 +130,9 @@ parseSummaryIndexAssemblyString(StringRef AsmString, SMDiagnostic &Err);
LLVM_ABI std::unique_ptr<Module> parseAssembly(
MemoryBufferRef F, SMDiagnostic &Err, LLVMContext &Context,
SlotMapping *Slots = nullptr,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
return std::nullopt;
});
DataLayoutCallbackTy DataLayoutCallback =
[](StringRef, StringRef) { return std::nullopt; },
AsmParserContext *ParserContext = nullptr);

/// Parse LLVM Assembly including the summary index from a MemoryBuffer.
///
Expand Down Expand Up @@ -169,9 +171,9 @@ parseSummaryIndexAssembly(MemoryBufferRef F, SMDiagnostic &Err);
LLVM_ABI bool parseAssemblyInto(
MemoryBufferRef F, Module *M, ModuleSummaryIndex *Index, SMDiagnostic &Err,
SlotMapping *Slots = nullptr,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
return std::nullopt;
});
DataLayoutCallbackTy DataLayoutCallback =
[](StringRef, StringRef) { return std::nullopt; },
AsmParserContext *ParserContext = nullptr);

/// Parse a type and a constant value in the given string.
///
Expand Down
32 changes: 32 additions & 0 deletions llvm/include/llvm/IR/Value.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,38 @@ class User;

using ValueName = StringMapEntry<Value *>;

struct FileLoc {
unsigned Line;
unsigned Col;

bool operator<=(const FileLoc &RHS) const {
return Line < RHS.Line || (Line == RHS.Line && Col <= RHS.Col);
}

bool operator<(const FileLoc &RHS) const {
return Line < RHS.Line || (Line == RHS.Line && Col < RHS.Col);
}

FileLoc(unsigned L, unsigned C) : Line(L), Col(C) {}
};

struct FileLocRange {
FileLoc Start;
FileLoc End;

FileLocRange() : Start(0, 0), End(0, 0) {}

FileLocRange(FileLoc S, FileLoc E) : Start(S), End(E) {
assert(Start <= End);
}

bool contains(FileLoc L) const { return Start <= L && L <= End; }

bool contains(FileLocRange LR) const {
return contains(LR.Start) && contains(LR.End);
}
};

//===----------------------------------------------------------------------===//
// Value Class
//===----------------------------------------------------------------------===//
Expand Down
17 changes: 9 additions & 8 deletions llvm/include/llvm/IRReader/IRReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#define LLVM_IRREADER_IRREADER_H

#include "llvm/ADT/StringRef.h"
#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/Support/Compiler.h"
#include <memory>
Expand Down Expand Up @@ -50,19 +51,19 @@ getLazyIRFileModule(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
/// for it. Otherwise, attempt to parse it as LLVM Assembly and return
/// a Module for it.
/// \param DataLayoutCallback Override datalayout in the llvm assembly.
LLVM_ABI std::unique_ptr<Module> parseIR(MemoryBufferRef Buffer,
SMDiagnostic &Err,
LLVMContext &Context,
ParserCallbacks Callbacks = {});
LLVM_ABI std::unique_ptr<Module>
parseIR(MemoryBufferRef Buffer, SMDiagnostic &Err, LLVMContext &Context,
ParserCallbacks Callbacks = {},
AsmParserContext *ParserContext = nullptr);

/// If the given file holds a bitcode image, return a Module for it.
/// Otherwise, attempt to parse it as LLVM Assembly and return a Module
/// for it.
/// \param DataLayoutCallback Override datalayout in the llvm assembly.
LLVM_ABI std::unique_ptr<Module> parseIRFile(StringRef Filename,
SMDiagnostic &Err,
LLVMContext &Context,
ParserCallbacks Callbacks = {});
LLVM_ABI std::unique_ptr<Module>
parseIRFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
ParserCallbacks Callbacks = {},
AsmParserContext *ParserContext = nullptr);
}

#endif
91 changes: 91 additions & 0 deletions llvm/lib/AsmParser/AsmParserContext.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
//===-- AsmParserContext.cpp ------------------------------------*- C++ -*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "llvm/AsmParser/AsmParserContext.h"

namespace llvm {

std::optional<FileLocRange>
AsmParserContext::getFunctionLocation(const Function *F) const {
if (!Functions.contains(F))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use find() to avoid duplicate lookup.

return std::nullopt;
return Functions.at(F);
}

std::optional<FileLocRange>
AsmParserContext::getBlockLocation(const BasicBlock *BB) const {
if (!Blocks.contains(BB))
return std::nullopt;
return Blocks.at(BB);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps we can use DenseMap::lookup_or here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that suggestion, makes the code a bit cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it sadly is not possible. Because the return type of lookup_or is the value type of the map, so it can't return a nullopt

}

std::optional<FileLocRange>
AsmParserContext::getInstructionLocation(const Instruction *I) const {
if (!Instructions.contains(I))
return std::nullopt;
return Instructions.at(I);
}

std::optional<Function *>
AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const {
for (auto &[F, Loc] : Functions) {
if (Loc.contains(Query))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very inefficient...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not that bad. When testing this even on large files the LSP was pretty responsive. So I wouldn't consider this an issue for now.

return F;
}
return std::nullopt;
}

std::optional<Function *>
AsmParserContext::getFunctionAtLocation(const FileLoc &Query) const {
return getFunctionAtLocation(FileLocRange(Query, Query));
}

std::optional<BasicBlock *>
AsmParserContext::getBlockAtLocation(const FileLocRange &Query) const {
for (auto &[BB, Loc] : Blocks) {
if (Loc.contains(Query))
return BB;
}
return std::nullopt;
}

std::optional<BasicBlock *>
AsmParserContext::getBlockAtLocation(const FileLoc &Query) const {
return getBlockAtLocation(FileLocRange(Query, Query));
}

std::optional<Instruction *>
AsmParserContext::getInstructionAtLocation(const FileLocRange &Query) const {
for (auto &[I, Loc] : Instructions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over all instructions seems rather inefficient e.g. if you're trying to get info towards the end of large file. Can't we have a list sorted by file location and do a binary search?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it doesn't matter if the lookup by function is more common. Do you have a link to code using this new class?

I'm not worried about performance for the LSP, optimizations can always be done later. Please don't interpret my review comments as negative towards this change, I am very excited to see this support land eventually.

if (Loc.contains(Query))
return I;
}
return std::nullopt;
}

std::optional<Instruction *>
AsmParserContext::getInstructionAtLocation(const FileLoc &Query) const {
return getInstructionAtLocation(FileLocRange(Query, Query));
}

bool AsmParserContext::addFunctionLocation(Function *F,
const FileLocRange &Loc) {
return Functions.insert({F, Loc}).second;
}

bool AsmParserContext::addBlockLocation(BasicBlock *BB,
const FileLocRange &Loc) {
return Blocks.insert({BB, Loc}).second;
}

bool AsmParserContext::addInstructionLocation(Instruction *I,
const FileLocRange &Loc) {
return Instructions.insert({I, Loc}).second;
}

} // namespace llvm
1 change: 1 addition & 0 deletions llvm/lib/AsmParser/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# AsmParser
add_llvm_component_library(LLVMAsmParser
AsmParserContext.cpp
LLLexer.cpp
LLParser.cpp
Parser.cpp
Expand Down
Loading
Loading