Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
ecf591c
Update CurLineNum anc CurColNum in sync with movement in text
Bertik23 Jul 29, 2025
06926e9
Remove remains from cherry pick from LSP branch
Bertik23 Aug 4, 2025
1fdf13c
Make isLabelTail more safe and rename it to better show what it does
Bertik23 Aug 4, 2025
2772cd8
Remove dangling comment
Bertik23 Aug 4, 2025
b05d11a
Fix typo
Bertik23 Aug 12, 2025
458599b
Add location tracking to IR parser
Bertik23 Aug 28, 2025
ee39ed1
Fix some OOB posibilities
Bertik23 Aug 28, 2025
b0c5318
Fix clang format
Bertik23 Aug 28, 2025
416514e
Move private members to top of class definition
Bertik23 Aug 29, 2025
35ca1a5
Use SourceMgr to resolve Line:Column position
Bertik23 Sep 2, 2025
b3d8254
Fix zeroindexing on token positions
Bertik23 Sep 2, 2025
23dcc6b
Replace Line:Column storage with Poiters and on demand conversion
Bertik23 Sep 3, 2025
06d5265
Use nullptr as missing value
Bertik23 Sep 4, 2025
4e08921
Enclose debug prints of tests in LLVM_DEBUG
Bertik23 Sep 4, 2025
3da9e9d
Decapitalize DEBUG_TYPE
Bertik23 Sep 15, 2025
4b3bc0e
Move FileLoc from Value.h to FileLoc.h
Bertik23 Sep 26, 2025
ed7a04a
Rename include guard defines to reflext filename
Bertik23 Sep 26, 2025
e6142b5
include in namespace llvm
Bertik23 Oct 1, 2025
f5da73c
Fix typo in comment
Bertik23 Oct 6, 2025
10a2b75
Path to llvm/AsmParser/FileLoc.h
Bertik23 Oct 6, 2025
17b5753
assert.h -> cassert
Bertik23 Oct 6, 2025
737c5e0
Remove filename and emacs marker
Bertik23 Oct 8, 2025
72b89e5
optimize lookup
Bertik23 Oct 8, 2025
41284df
FileLoc docs and fix reange
Bertik23 Oct 8, 2025
ff9a33d
full path to includes
Bertik23 Oct 8, 2025
008ae63
Apply suggestion from @nikic
Bertik23 Oct 8, 2025
a44ef20
Typo add period
Bertik23 Oct 8, 2025
f201d1f
actually fix filelocrange openness
Bertik23 Oct 8, 2025
1de2447
remove old irrelevant comment
Bertik23 Oct 8, 2025
4d51839
Doc coments with ///
Bertik23 Oct 8, 2025
77385c0
Doc coments with ///
Bertik23 Oct 8, 2025
f58e0ad
Merge remote-tracking branch 'upstream/main' into parser-location-info
Bertik23 Oct 8, 2025
75e5b57
Revert changes irrelevant in LLLexer
Bertik23 Oct 8, 2025
07689fc
revert formating
Bertik23 Oct 8, 2025
66ce6b6
make clang-format happy
Bertik23 Oct 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions llvm/include/llvm/AsmParser/AsmParserContext.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_ASMPARSER_ASMPARSERCONTEXT_H
#define LLVM_ASMPARSER_ASMPARSERCONTEXT_H

#include "llvm/ADT/DenseMap.h"
#include "llvm/AsmParser/FileLoc.h"
#include "llvm/IR/Value.h"
#include <optional>

namespace llvm {

/// Registry of file location information for LLVM IR constructs.
///
/// This class provides access to the file location information
/// for various LLVM IR constructs. Currently, it supports Function,
/// BasicBlock and Instruction locations.
///
/// When available, it can answer queries about what is at a given
/// file location, as well as where in a file a given IR construct
/// is.
///
/// This information is optionally emitted by the LLParser while
/// it reads LLVM textual IR.
class AsmParserContext {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this class makes it sound like something that should be owned by the LLParser. Maybe something like LLParserLocationInfo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think AsmParserContext is better, as it's more future proof. In the future we might want to add some more info, other than location. As for the LLParser vs AsmParser, I think AsmParser is better since it uses what is parsed other then how it's parserd.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @arichardson -- I'd expect AsmParserContext to be something internally used by AsmParser. What kind of future extensions do you have in mind here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supposed to be used similarly to AsmParserState in MLIR, In tools, that need to know some things from the parser, in this case we want to use it to be able to access locations or instructions, functions, ... in IR files for the purposes of navigating in the file via a LSP server (such as goto definition/references)

DenseMap<Function *, FileLocRange> Functions;
DenseMap<BasicBlock *, FileLocRange> Blocks;
DenseMap<Instruction *, FileLocRange> Instructions;

public:
std::optional<FileLocRange> getFunctionLocation(const Function *) const;
std::optional<FileLocRange> getBlockLocation(const BasicBlock *) const;
std::optional<FileLocRange> getInstructionLocation(const Instruction *) const;
/// Get the function at the requested location range.
/// If no single function occupies the queried range, or the record is
/// missing, a nullptr is returned.
Function *getFunctionAtLocation(const FileLocRange &) const;
/// Get the function at the requested location.
/// If no function occupies the queried location, or the record is missing, a
/// nullptr is returned.
Function *getFunctionAtLocation(const FileLoc &) const;
/// Get the block at the requested location range.
/// If no single block occupies the queried range, or the record is missing, a
/// nullptr is returned.
BasicBlock *getBlockAtLocation(const FileLocRange &) const;
/// Get the block at the requested location.
/// If no block occupies the queried location, or the record is missing, a
/// nullptr is returned.
BasicBlock *getBlockAtLocation(const FileLoc &) const;
/// Get the instruction at the requested location range.
/// If no single instruction occupies the queried range, or the record is
/// missing, a nullptr is returned.
Instruction *getInstructionAtLocation(const FileLocRange &) const;
/// Get the instruction at the requested location.
/// If no instruction occupies the queried location, or the record is missing,
/// a nullptr is returned.
Instruction *getInstructionAtLocation(const FileLoc &) const;
bool addFunctionLocation(Function *, const FileLocRange &);
bool addBlockLocation(BasicBlock *, const FileLocRange &);
bool addInstructionLocation(Instruction *, const FileLocRange &);
};
} // namespace llvm

#endif
56 changes: 56 additions & 0 deletions llvm/include/llvm/AsmParser/FileLoc.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
//===-- FileLoc.h ---------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#ifndef LLVM_ASMPARSER_FILELOC_H
#define LLVM_ASMPARSER_FILELOC_H

#include <cassert>
#include <utility>

namespace llvm {

/// Struct holding Line:Column location
struct FileLoc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment on whether these are 0-based or 1-based values.

/// 0-based line number
unsigned Line;
/// 0-based column number
unsigned Col;

bool operator<=(const FileLoc &RHS) const {
return Line < RHS.Line || (Line == RHS.Line && Col <= RHS.Col);
}

bool operator<(const FileLoc &RHS) const {
return Line < RHS.Line || (Line == RHS.Line && Col < RHS.Col);
}

FileLoc(unsigned L, unsigned C) : Line(L), Col(C) {}
FileLoc(std::pair<unsigned, unsigned> LC) : Line(LC.first), Col(LC.second) {}
};

/// Struct holding a semiopen range [Start; End)
struct FileLocRange {
FileLoc Start;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on what kind of range this is. Based on the contains() implementation, it's inclusive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that, it should be a semiopen range

FileLoc End;

FileLocRange() : Start(0, 0), End(0, 0) {}

FileLocRange(FileLoc S, FileLoc E) : Start(S), End(E) {
assert(Start <= End);
}

bool contains(FileLoc L) const { return Start <= L && L < End; }

bool contains(FileLocRange LR) const {
return Start <= LR.Start && LR.End <= End;
}
};

} // namespace llvm

#endif
26 changes: 21 additions & 5 deletions llvm/include/llvm/AsmParser/LLLexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,22 +13,25 @@
#ifndef LLVM_ASMPARSER_LLLEXER_H
#define LLVM_ASMPARSER_LLLEXER_H

#include "LLToken.h"
#include "llvm/ADT/APFloat.h"
#include "llvm/ADT/APSInt.h"
#include "llvm/AsmParser/LLToken.h"
#include "llvm/Support/SMLoc.h"
#include "llvm/Support/SourceMgr.h"
#include <string>

namespace llvm {
class Type;
class SMDiagnostic;
class SourceMgr;
class LLVMContext;

class LLLexer {
const char *CurPtr;
StringRef CurBuf;

/// The end (exclusive) of the previous token.
const char *PrevTokEnd = nullptr;

enum class ErrorPriority {
None, // No error message present.
Parser, // Errors issued by parser.
Expand Down Expand Up @@ -62,9 +65,7 @@ namespace llvm {
explicit LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &,
LLVMContext &C);

lltok::Kind Lex() {
return CurKind = LexToken();
}
lltok::Kind Lex() { return CurKind = LexToken(); }

typedef SMLoc LocTy;
LocTy getLoc() const { return SMLoc::getFromPointer(TokStart); }
Expand All @@ -79,6 +80,21 @@ namespace llvm {
IgnoreColonInIdentifiers = val;
}

/// Get the line, column position of the start of the current token,
/// zero-indexed
std::pair<unsigned, unsigned> getTokLineColumnPos() {
auto LC = SM.getLineAndColumn(SMLoc::getFromPointer(TokStart));
return {LC.first - 1, LC.second - 1};
}
/// Get the line, column position of the end of the previous token,
/// zero-indexed exclusive
std::pair<unsigned, unsigned> getPrevTokEndLineColumnPos() {
auto LC = SM.getLineAndColumn(SMLoc::getFromPointer(PrevTokEnd));
--LC.first;
--LC.second;
return LC;
}

// This returns true as a convenience for the parser functions that return
// true on error.
bool ParseError(LocTy ErrorLoc, const Twine &Msg) {
Expand Down
11 changes: 8 additions & 3 deletions llvm/include/llvm/AsmParser/LLParser.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
#ifndef LLVM_ASMPARSER_LLPARSER_H
#define LLVM_ASMPARSER_LLPARSER_H

#include "LLLexer.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/AsmParser/LLLexer.h"
#include "llvm/AsmParser/NumberedValues.h"
#include "llvm/AsmParser/Parser.h"
#include "llvm/IR/Attributes.h"
Expand Down Expand Up @@ -177,6 +178,9 @@ namespace llvm {
// Map of module ID to path.
std::map<unsigned, StringRef> ModuleIdMap;

/// Keeps track of source locations for Values, BasicBlocks, and Functions.
AsmParserContext *ParserContext;

/// Only the llvm-as tool may set this to false to bypass
/// UpgradeDebuginfo so it can generate broken bitcode.
bool UpgradeDebugInfo;
Expand All @@ -189,10 +193,11 @@ namespace llvm {
public:
LLParser(StringRef F, SourceMgr &SM, SMDiagnostic &Err, Module *M,
ModuleSummaryIndex *Index, LLVMContext &Context,
SlotMapping *Slots = nullptr)
SlotMapping *Slots = nullptr,
AsmParserContext *ParserContext = nullptr)
: Context(Context), OPLex(F, SM, Err, Context),
Lex(F, SM, Err, Context), M(M), Index(Index), Slots(Slots),
BlockAddressPFS(nullptr) {}
BlockAddressPFS(nullptr), ParserContext(ParserContext) {}
bool Run(
bool UpgradeDebugInfo,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
Expand Down
16 changes: 9 additions & 7 deletions llvm/include/llvm/AsmParser/Parser.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/Support/Compiler.h"
#include <memory>
#include <optional>
Expand Down Expand Up @@ -62,7 +63,8 @@ parseAssemblyFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
/// parsing.
LLVM_ABI std::unique_ptr<Module>
parseAssemblyString(StringRef AsmString, SMDiagnostic &Err,
LLVMContext &Context, SlotMapping *Slots = nullptr);
LLVMContext &Context, SlotMapping *Slots = nullptr,
AsmParserContext *ParserContext = nullptr);

/// Holds the Module and ModuleSummaryIndex returned by the interfaces
/// that parse both.
Expand Down Expand Up @@ -128,9 +130,9 @@ parseSummaryIndexAssemblyString(StringRef AsmString, SMDiagnostic &Err);
LLVM_ABI std::unique_ptr<Module> parseAssembly(
MemoryBufferRef F, SMDiagnostic &Err, LLVMContext &Context,
SlotMapping *Slots = nullptr,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
return std::nullopt;
});
DataLayoutCallbackTy DataLayoutCallback =
[](StringRef, StringRef) { return std::nullopt; },
AsmParserContext *ParserContext = nullptr);

/// Parse LLVM Assembly including the summary index from a MemoryBuffer.
///
Expand Down Expand Up @@ -169,9 +171,9 @@ parseSummaryIndexAssembly(MemoryBufferRef F, SMDiagnostic &Err);
LLVM_ABI bool parseAssemblyInto(
MemoryBufferRef F, Module *M, ModuleSummaryIndex *Index, SMDiagnostic &Err,
SlotMapping *Slots = nullptr,
DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
return std::nullopt;
});
DataLayoutCallbackTy DataLayoutCallback =
[](StringRef, StringRef) { return std::nullopt; },
AsmParserContext *ParserContext = nullptr);

/// Parse a type and a constant value in the given string.
///
Expand Down
17 changes: 9 additions & 8 deletions llvm/include/llvm/IRReader/IRReader.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#define LLVM_IRREADER_IRREADER_H

#include "llvm/ADT/StringRef.h"
#include "llvm/AsmParser/AsmParserContext.h"
#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/Support/Compiler.h"
#include <memory>
Expand Down Expand Up @@ -50,19 +51,19 @@ getLazyIRFileModule(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
/// for it. Otherwise, attempt to parse it as LLVM Assembly and return
/// a Module for it.
/// \param DataLayoutCallback Override datalayout in the llvm assembly.
LLVM_ABI std::unique_ptr<Module> parseIR(MemoryBufferRef Buffer,
SMDiagnostic &Err,
LLVMContext &Context,
ParserCallbacks Callbacks = {});
LLVM_ABI std::unique_ptr<Module>
parseIR(MemoryBufferRef Buffer, SMDiagnostic &Err, LLVMContext &Context,
ParserCallbacks Callbacks = {},
AsmParserContext *ParserContext = nullptr);

/// If the given file holds a bitcode image, return a Module for it.
/// Otherwise, attempt to parse it as LLVM Assembly and return a Module
/// for it.
/// \param DataLayoutCallback Override datalayout in the llvm assembly.
LLVM_ABI std::unique_ptr<Module> parseIRFile(StringRef Filename,
SMDiagnostic &Err,
LLVMContext &Context,
ParserCallbacks Callbacks = {});
LLVM_ABI std::unique_ptr<Module>
parseIRFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
ParserCallbacks Callbacks = {},
AsmParserContext *ParserContext = nullptr);
}

#endif
89 changes: 89 additions & 0 deletions llvm/lib/AsmParser/AsmParserContext.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
//===----------------------------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "llvm/AsmParser/AsmParserContext.h"

namespace llvm {

std::optional<FileLocRange>
AsmParserContext::getFunctionLocation(const Function *F) const {
if (auto FIt = Functions.find(F); FIt != Functions.end())
return FIt->second;
return std::nullopt;
}

std::optional<FileLocRange>
AsmParserContext::getBlockLocation(const BasicBlock *BB) const {
if (auto BBIt = Blocks.find(BB); BBIt != Blocks.end())
return BBIt->second;
return std::nullopt;
}

std::optional<FileLocRange>
AsmParserContext::getInstructionLocation(const Instruction *I) const {
if (auto IIt = Instructions.find(I); IIt != Instructions.end())
return IIt->second;
return std::nullopt;
}

Function *
AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const {
for (auto &[F, Loc] : Functions) {
if (Loc.contains(Query))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very inefficient...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually not that bad. When testing this even on large files the LSP was pretty responsive. So I wouldn't consider this an issue for now.

return F;
}
return nullptr;
}

Function *AsmParserContext::getFunctionAtLocation(const FileLoc &Query) const {
return getFunctionAtLocation(FileLocRange(Query, Query));
}

BasicBlock *
AsmParserContext::getBlockAtLocation(const FileLocRange &Query) const {
for (auto &[BB, Loc] : Blocks) {
if (Loc.contains(Query))
return BB;
}
return nullptr;
}

BasicBlock *AsmParserContext::getBlockAtLocation(const FileLoc &Query) const {
return getBlockAtLocation(FileLocRange(Query, Query));
}

Instruction *
AsmParserContext::getInstructionAtLocation(const FileLocRange &Query) const {
for (auto &[I, Loc] : Instructions) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over all instructions seems rather inefficient e.g. if you're trying to get info towards the end of large file. Can't we have a list sorted by file location and do a binary search?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it doesn't matter if the lookup by function is more common. Do you have a link to code using this new class?

I'm not worried about performance for the LSP, optimizations can always be done later. Please don't interpret my review comments as negative towards this change, I am very excited to see this support land eventually.

if (Loc.contains(Query))
return I;
}
return nullptr;
}

Instruction *
AsmParserContext::getInstructionAtLocation(const FileLoc &Query) const {
return getInstructionAtLocation(FileLocRange(Query, Query));
}

bool AsmParserContext::addFunctionLocation(Function *F,
const FileLocRange &Loc) {
return Functions.insert({F, Loc}).second;
}

bool AsmParserContext::addBlockLocation(BasicBlock *BB,
const FileLocRange &Loc) {
return Blocks.insert({BB, Loc}).second;
}

bool AsmParserContext::addInstructionLocation(Instruction *I,
const FileLocRange &Loc) {
return Instructions.insert({I, Loc}).second;
}

} // namespace llvm
Loading