Semantic Highlight Guide

Semantic highlighting is an addition to syntax highlighting as described in the Syntax Highlight Guide. Visual Studio Code uses TextMate grammars as the main tokenization engine. TextMate grammars work on a single file as input and break it up based on lexical rules expressed in regular expressions.

Semantic tokenization allows language servers to provide additional token information based on the language server's knowledge on how to resolve symbols in the context of a project. Themes can opt-in to use semantic tokens to improve and refine the syntax highlighting from grammars. The editor applies the highlighting from semantic tokens on top of the highlighting from grammars.

Here's an example of what semantic highlighting can add:

Without semantic highlighting:

without semantic highlighting

With semantic highlighting:

with semantic highlighting

Notice the color differences based on language service symbol understanding:

  • line 10: languageMode is colored as a parameter
  • line 11: Range and Position are colored as classes and document as a parameter.
  • line 13: getFoldingRanges is colored as a function.

Semantic token provider

To implement semantic highlighting, language extensions can register a semantic token provider by document language and/or file name. The editor will make requests to the providers when semantic tokens are needed.

const tokenTypes = ['class', 'interface', 'enum', 'function', 'variable'];
const tokenModifiers = ['declaration', 'documentation'];
const legend = new vscode.SemanticTokensLegend(tokenTypes, tokenModifiers);

const provider: vscode.DocumentSemanticTokensProvider = {
  provideDocumentSemanticTokens(
    document: vscode.TextDocument
  ): vscode.ProviderResult<vscode.SemanticTokens> {
    // analyze the document and return semantic tokens
  }
};

const selector = { language: 'java', scheme: 'file' }; // register for all Java documents from the local file system

vscode.languages.registerDocumentSemanticTokensProvider(selector, provider, legend);

The semantic token provider API comes in two flavors to accommodate a language server's capabilities:

  • DocumentSemanticTokensProvider - Always takes a full document as input.

    • provideDocumentSemanticTokens - Provides all tokens of a document.
    • provideDocumentSemanticTokensEdits- Provides all tokens of a document as a delta to the previous response.
  • DocumentRangeSemanticTokensProvider - Works only on a range:

    • provideDocumentRangeSemanticTokens - Provides all tokens of a document range.

Each token returned by the provider comes with a classification that consists of a token type, any number of token modifiers, and a token language. This information is similar to the TextMate scopes described in the Syntax Highlight Guide, but has its own dedicated, cleaner classification system.

As seen in the example above, the provider names the types and modifiers it's going to use in a SemanticTokensLegend. That allows the provide APIs to return token types and modifies as an index to the legend.

Semantic token classification

These are the standard semantic token types and semantic token modifiers predefined by VS Code.

The standard types and modifiers cover common concepts used by many languages. While each languages might use a different terminology for some types and modifiers, by adhering to the standard classifications, it will be possible for theme authors to define theming rules that work across languages.

Standard semantic token types:

  • namespace
  • type, class, enum, interface, struct, typeParameter
  • parameter, variable, property, enumMember, event
  • function, member, macro
  • label
  • comment, string, keyword, number, regexp, operator

Standard semantic token modifiers:

  • declaration
  • readonly, static, deprecated, abstract
  • async, modification, documentation, defaultLibrary

If necessary, extensions can define new types and modifiers or create sub types of existing type through the semanticTokenTypes and semanticTokenModifiers contribution points.

{
  "contributes": {
    "semanticTokenTypes": [
      {
        "id": "templateType",
        "superType": "type",
        "description": "A template type."
      }
    ],
    "semanticTokenModifiers": [
      {
        "id": "native",
        "description": "Annotates a symbol that is implemented natively"
      }
    ]
  }
}

A contributed type can name a super type from which it will inherit all styling rules.

Theming

Theming is about assigning colors and styles to tokens. Theming rules are specified in color themes, but users can customize the theming rules in the user settings.

Using the semanticHighlighting setting, a color theme can tell the editor whether semantic tokens should be shown or not.

If enabled, semantic tokens are first matched against the semantic token rules defined in semanticTokenColors:

{
  "semanticTokenColors": {
    "variable.readonly:java": "#ff0000"
  }
}

variable.readonly:java is called a selector and has the form (*|tokenType)(.tokenModifier)*(:tokenLanguage)?.

Here are other examples of selectors and styles:

  • "*.declaration": { "fontStyle": "bold" }: // all declarations are bold
  • "class:java": { "foreground": "#00ff00" "fontStyle": "bold" } // classes in java

If no rule matches, the VS Code uses the [Semantic Token Scope Map][#semantic-token-scope-map] to evaluate a TextMate scope for the given semantic token. That scope is matched against the TextMate theming rules in tokenColors.

Semantic token scope map

In order to make semantic highlighting also work for themes that have not defined any specific semantic rules and to serve as fallback for custom token types and modifiers, VS Code maintains a map from semantic token selectors to TextMate scopes.

If a theme has semantic highlighting enabled, but does not contain a rule for the given semantic token, these TextMate scopes are used to find a TextMate theming rule instead.

The following table shows the predefined mappings.

Semantic Token Selector Fallback TextMate Scope
namespace entity.name.namespace
type entity.name.type
type.defaultLibrary support.type
struct storage.type.struct
class entity.name.type.class
class.defaultLibrary support.class
interface entity.name.type.interface
enum entity.name.type.enum
function entity.name.function
function.defaultLibrary support.function
member entity.name.function.member
macro entity.name.other.preprocessor.macro
variable variable.other.readwrite , entity.name.variable
variable.readonly variable.other.constant
variable.readonly.defaultLibrary support.constant
parameter variable.parameter
property variable.other.property
property.readonly variable.other.constant.property
enumMember variable.other.enummember
event variable.other.event

This map can be extended by new rules through the semanticTokenScopes contribution point.

There are two use cases for extensions to do that:

  • The extension that defines custom token types and token modifiers provides TextMate scopes as fallback when a theme does not define a theming rule for the added semantic token type or modifiers:

    {
      "contributes": {
        "semanticTokenScopes": [
          {
            "scopes": {
              "templateType": ["entity.name.type.template"]
            }
          }
        ]
      }
    }
  • The provider of a TextMate grammar can describe the language specific scopes. That helps with themes that contain language specific theming rules.

    {
      "contributes": {
        "semanticTokenScopes": [
          {
            "language": "typescript",
            "scopes": {
              "property.readonly": ["variable.other.constant.property.ts"]
            }
          }
        ]
      }
    }

Try it out

We have a Semantic Tokens sample that illustrates how to create a semantic token provider.

The scope inspector tool allows you to explore what semantic tokens are present in a source file and what theme rules they match to. To see semantic token, use a built-in theme (for example, Dark+) on a TypeScript file.