Skip to content

proposal: go/scanner: add (*Scanner).Pos() #74958

@mateusz834

Description

@mateusz834

Proposal Details

Currently there is no easy and reliable way to access the End position of a token through go/scanner.

It can be done to some extend with:

pos, tok, lit := s.Scan()
tokLength := len(lit)
if !tok.IsLiteral() && tok != token.COMMENT {
	tokLength = len(tok.String())
}
tokEnd := pos + token.Pos(tokLength)

It looks correct, but actually it is not. There are few issues:

  • len(lit) (line 2) is wrong for comments and raw string literals, since carriage returns ('\r') are not included in the literal, thus for such cases the tokEnd is already wrong.
  • When a file ends (just before an EOF token) and impliedSemi==true, then an artificial SEMICOLON token is emitted. Say you want to inspect the whitespace between tokens:
     func TestScanner(t *testing.T) {
     	const src = "package a; var a int"
    
     	file := token.NewFileSet().AddFile("", -1, len(src))
     	var s scanner.Scanner
     	s.Init(file, []byte(src), func(pos token.Position, msg string) {
     		panic("unreachable: " + msg)
     	}, scanner.ScanComments)
    
     	prevEndOff := 0
     	for {
     		pos, tok, lit := s.Scan()
     		t.Logf("%v %v %v", pos, tok, lit)
     		off := file.Offset(pos)
    
     		white := src[prevEndOff:off] // panics when tok == EOF
     		for _, c := range white {
     			switch c {
     			case ' ', '\t', '\n', '\r', '\ufeff':
     			default:
     				panic("unreachable: " + strconv.QuoteRune(c))
     			}
     		}
     		t.Logf("%q", white)
    
     		tokLength := len(lit)
     		if !tok.IsLiteral() && tok != token.COMMENT {
     			tokLength = len(tok.String())
     		}
     		prevEndOff = off + tokLength
    
     		if tok == token.EOF {
     			break
     		}
     	}
     }
    This code panics, because of the artificial SEMICOLON token.

To solve such problems, and to simplify the logic i propose to add to the go/scanner following new API:

package scanner // go/scanner

// Pos returns the current position in the source where the next Scan call
// will begin tokenizing.
// It also represents the end position of the previous token.
func (s *Scanner) Pos() token.Pos {
	return s.file.Pos(s.offset)
}

CC @adonovan @findleyr

Metadata

Metadata

Assignees

No one assigned

    Labels

    LibraryProposalIssues describing a requested change to the Go standard library or x/ libraries, but not to a toolProposal

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions