Enablling LSP to work with cpp2/cppfront #762

Oct 17, 2023

APokorny
Oct 17, 2023

Is there already a way to get a working LSP to provide at least minimal completion hints for a cpp2 file?
I thought maybe running cppfront on the buffer content and then sending that to clangd..

Or even better is someone already cooking a cppfront in clang?

MaxSagebaum · Oct 5, 2024

vanceism7
Oct 5, 2024

This would be really cool. To me, having an LSP server instantly legitimizes a language, +10!

I've never implemented an LSP server before but I'm interested in trying to tackle something like this. I've started researching to see what it takes

14 replies

MaxSagebaum Oct 13, 2024

I had a look at the specification. It seems to be simple enough, that a self implemented solution could be feasible. One question would be which language to choose for the server. Cpp2 or Cpp1? You proposed one other interesting idea that Cppfront should output all information about the sourcefile in a json file. The server could then use this information. A short search did not provide any insights if such a framework exists. I also did not found a general framework for implementing a language server.

Herb specifically did not want to use any frameworks when implementing cppfront. Maybe this is also a way for the language server?

So what would you favorite way be?

cppfront -> details in json file -> server that gets information from the json file?
cppfront -> internal data structure -> server that works on the internal data structure?

vanceism7 Oct 13, 2024

Hmm, the choice of which language to use was one I was thinking about too. I think using c++ is the fun option, and cpp2 is the even funner option, but I don't know that it's really necessary. The easiest option would be to just use typescript - then it's just a matter of implementing the couple of LSP feature functions we need and we're done. All of this is already stubbed out in that link I shared/the example repo they provide; I copied that example and began modifying it to implement the cpp2 functionality I shared in my comment below.

Not that I'm necessarily attached to the idea of using typescript; perhaps there are reasons why c++ is the better option that just I haven't thought about.

But I think regarding your question of how we work with cppfront: whether to use json or internal structure depends on which language we pick for the language server. I think we can only use internal structure if we choose cpp/cpp2, otherwise we must use json (or some other serialized format) to interface with the data. Using json gives a nice interface that allows us more flexibility of what language we want to implement in

So yea, those are my initial thoughts on it. What do you think?

MaxSagebaum Oct 14, 2024

From what Herb wrote below, I think the currently minimal invasive approach would be to have an to_json.h file. So that this can actually be merged into cppfront. Thinking about it, it has 3 advatages:

Writing out first simple json items should be straight forward.
Writing the server in a separate project removes cppfront contraints and libraries can more easily be used. E.g. a json parser.
It lets you (us) gather experience on what kind of data is actually required.

Migrating the server to cpp1/cpp2 should then be simpler, since we know what kind of data we need. The to_json.h could then be changed into an to_lsp.h, which produces the required data structure directly.

Would this be feasible?

vanceism7 Oct 15, 2024

Yea, if we think implementing the language server in c++ is a goal worth pursuing, then this is probably the way to go. Really I think we need to aggregate this data regardless of what path we choose. As I've been playing with the cppfront code, I looked at the debug_print code that gets triggered when you use the -debug flag and you can see in that function that you basically have access to all of the compilers diagnostics right there. (Namely, source, tokens, parser, and sema)

cppfront/source/to_cpp1.h

Lines 7176 to 7195 in 68b716a

    
           auto debug_print() const 
        
               -> void 
        
           { 
        
               //  Only create debug output files if we managed to load the source file. 
        
               // 
        
               if (source_loaded) 
        
               { 
        
                   auto out_source  = std::ofstream{ sourcefile+"-source"  }; 
        
                   source.debug_print( out_source ); 
        
                   auto out_tokens  = std::ofstream{ sourcefile+"-tokens"  }; 
        
                   tokens.debug_print( out_tokens ); 
        
                   auto out_parse   = std::ofstream{ sourcefile+"-parse"   }; 
        
                   parser.debug_print( out_parse ); 
        
                   auto out_symbols = std::ofstream{ sourcefile+"-symbols" }; 
        
                   sema.debug_print  ( out_symbols ); 
        
               } 
        
           }

So I think the first step would just be to aggregate the data from these fields into something that is more easily consumable from another program. I think particularly, the names of all symbols defined in the file, and all of the errors definitions (which can be found in sema.errors). I've been poking at this a bit but haven't made too much progress yet

MaxSagebaum Oct 15, 2024

This sound like a good first step. If you need some help just contact me.

Oct 7, 2024

hsutter
Oct 7, 2024
Maintainer

Thanks for the idea!

Info in case it helps... cppfront's linear structure is this, where each file depends only on the one immediately above it:

common.h
io.h                    Cpp1/Cpp2 source file i/o
lex.h                   Cpp2 tokenizer
parse.h                 Cpp2 parser
reflect.h2              Cpp2 reflection API (with cpp2regex.h)
sema.h                  Cpp2 semantic analysis
to_cpp1.h               Cpp2->Cpp1 lowering
cppfront.cpp            Driver

Each subset is intended to be standalone and reusable, so:

If you want a Cpp2 parser (only), you could try reusing parse.h and ignore what's below it.
If you want the reflection API too, you could go one step down and try reflect.h2 (use the generated reflect.h if you're using it from Cpp1 code).
If you want to actually generate parts of the Cpp1 code to show/analyze/pretty-format/etc., you could go down to to_cpp1.h (check out print_to_string which lowers to a std::string).
If you want the whole Cpp1 code, you could run the whole compiler and analyze/pretty-format/etc. the entire generated output .cpp file.

If you see that it "almost" works and you need something more that's not there, and it's super easy to add (especially if it's a lightweight PR, since code is appreciated) it could fit in #1287's Priority 1. Disclaimer: I don't currently have capacity to do larger-scale refactoring/reorganization or invasive changes though.

HTH!

0 replies

hsutter · Oct 11, 2024

vanceism7
Oct 11, 2024

I cobbled this together today

I had to (or rather, wanted to) add a flag to cppfront to output its error results as json, just so it was easier to parse the error reports without too much fuss. It's probably bad... but perhaps I'll open a PR

4 replies

hsutter Oct 11, 2024
Maintainer

Nice!

Re JSON: I don't have an issue with that, but just curious what other compilers do -- do they have switches for emitting errors as JSON, or do they do something else?

vanceism7 Oct 11, 2024

My experience with compilers is super limited, but IIRC, I believe at-least the Purescript guys did this with their compiler to make it easier to integrate tooling.

Actually, as I'm thinking about it more, I think they didn't just emit errors as json, but a whole slew of compiler information was written as json. If that's the case, the PR I just opened may be too limited in scope, but maybe its still a good starting point

zaucy Oct 15, 2024

While not all compilers here are a few 'compiler adjacent' tools that output json stream (json objects separated by newlines). I'm sure there's plenty more but these are the ones off the top of my head

bazel query - --output=streamed_jsonproto (json to stdout)
bazel build - --build_event_json_file=/path/to/json (json to file)
rustc (compiler!) - --error-format=json (json to stderr)
fastbuild - -report=json (json to stdout/stderr)

JVApen Nov 13, 2024

Nice!

Re JSON: I don't have an issue with that, but just curious what other compilers do -- do they have switches for emitting errors as JSON, or do they do something else?

Clang has https://clang.llvm.org/docs/UsersManual.html#formatting-of-diagnostics
I believe they also support outputting SARIF, which I believe to be structured JSON

MaxSagebaum · Oct 18, 2024

vanceism7
Oct 18, 2024

I've been working on this language server for a little bit now, and have managed to implement goto definition and autocompletion.

Figured I'd paste it here in case anyone wants to check it out. It's all currently in typescript, I've been using it as a test-bed to inform the types of info I need out of cppfront.

https://github.com/vanceism7/ls-cpp2

10 replies

MaxSagebaum Jan 6, 2025

Ok, sounds good. I never programmed in type script. So you are probably faster in doing it? Or should I have a look?

vanceism7 Jan 6, 2025

It's probably faster for me to do it. I'll see if I can give it a look tomorrow. But feel free to check it out if you want. Typescript is super easy, I'm sure you'd get the hang of it quickly haha

vanceism7 Jan 6, 2025

Ok - I pushed an update that implements the merging logic, so it should now do a better job of not losing symbols in the symbol table when errors occur in the document

MaxSagebaum Jan 8, 2025

It works now quite well now, even on reflect.h2. There seems a subtile bug, where the server loses the definitions again and you get only the text completion from VSC. I do not know if this is because of the server or VCS. I could not find a procedure to reproduce it consistently.

What is your plan for next steps? From my point of view two things are missing:

hints/completion for function arguments. Like name of the argument, different versions
completion for members of structures

This probably only concerns the generated json file, or are these special commands to the server. I will have some time in February where I could have look.

vanceism7 Jan 8, 2025

What is your plan for next steps? From my point of view two things are missing:

hints/completion for function arguments. Like name of the argument, different versions

...
This probably only concerns the generated json file, or are these special commands to the server.

I think it's both for this one. When the type of symbol is something more sophisticated than a variable, e.g: a function or class, we need to augment the symbols in the json diagnostics with that information. During the check for the symbol type, I think sema may have more info available, such as function argument type info.

Once we have this info, we need to augment the server to read it in and include it in the language servers completion hint details. This is likely trivial compared to collecting the info

completion for members of structures

Specifically regarding this one, I was looking into it but from what I was seeing, cppfront doesn't seem to capture any symbol information for class/struct members variables. I opened an issue for that:
#1323

I don't know if it was a bug or intentional - it could be that the intention was just to defer those symbols to the cpp compiler.

There seems a subtile bug, where the server loses the definitions again and you get only the text completion from VSC.

I noticed a similar weird bug but as you said, it wasn't consistent. It may have to do with the fact that cppfront is generating a new diagnostic file every single key press - it could be working out the file system with too much io. Adding a debouncer to the compiler that waits for a 1-2 seconds of inactivity before recompiling the file might help here.

The other big thing is I wanted to start capturing symbols from the cpp compiler, but this info isn't published in the sarif output of the compilers. I think its output to the .obj files or something of that sort, so that part will take a bit more time to implement.

vanceism7 · Nov 2, 2024

JVApen
Nov 2, 2024

How do you see interactions between Cpp1 LSP (MS C/C++ or clangd) and Cpp2 LSP?
I was under the impression that a Cpp1 LSP could perfectly work with the converted sources. At least for things like auto-complete, that would be sufficient.

However, if I start a rename operation in my Cpp1 LSP, it would suggest renaming the callers in the preprocessed code, not the Cpp2 code. As such, after the rename, everything seems to be working until the generated file gets regenerated. At which point, your build is broken.

Vice versa, I see auto-complete in Cpp2 only usable if it can give suggestions of functions defined in pure Cpp1. A rename started from Cpp2 code that is used in Cpp1 code has the exact same problem.

Note: this problem holds for all C++ successors, though Cpp2 seems to be the first to reach this point. Actually, it holds for all C++ code generators, for example lex/yacc. For any successor, you expect that over some time it covers sufficient code that this problem can no longer be ignored.

In order to solve this, you either need 1 LSP that understands both Cpp1 and Cpp2 or you should find a good way to have them collaborate.

Semantic code rewrites (like clang-tidy) might even result in a harder problem to solve. Having multiple successors in a single codebase will even be more challenging.

5 replies

vanceism7 Nov 2, 2024

How do you see interactions between Cpp1 LSP (MS C/C++ or clangd) and Cpp2 LSP?
...
In order to solve this, you either need 1 LSP that understands both Cpp1 and Cpp2 ...

This is basically the answer, at least in my mind. From what I've learned so far working on this language server, there's really no way to make the server fully functional without incorporating c++ diagnostics. Trying to interface two language servers together also seems like tackling the problem at the wrong abstraction level.

I did initially explore this problem from the angle of trying to extend something like clangd to understand cpp2, it intuitively feels like it wouldn't be that hard, but after some discussion with the llvm folks, it appeared that this approach requires non-trivial work going directly into clang itself. That's bad because it's hard and because a cpp2 language server shouldn't be dependent on a specific compiler.

So in the end, I think the right approach is to pull diagnostics from cppfront, and then compile the file and pull diagnostics from the users chosen c++ compiler. As far as I've read, all of the major c++ compilers do have the ability to output diagnostics in a way that can be consumed by external tools - I assume this is how the other c++ language servers work. The cpp2 server will also need to manage mappings from cpp1 code to cpp2 so we can consume the cpp1 diagnostics correctly. Generating and consuming the cpp1 data is the next major piece I'll be looking at soon.

JVApen Nov 3, 2024

I saw that discussion and responded in it, though it felt like it stopped without a clear track forward.

Consuming the Cpp1 diagnostic and mapping it back to Cpp2 seems indeed like the most reasonable approach. I'm wondering if this is something you can do for most of the LSP calls.

For example: syntax highlighting of Cpp1 code,except the location, can stay unmodified, as soon as it reaches Cpp2, you include your own code. Fixing the offsets seems quite possible.

For refactorings (rename, tidy, formatting), you most likely need to translate changes to the generated code and map it back to Cpp2. Especially decomposing the changes will be challenging here.

For auto-complete you might be able to first map your cursor onto the generated code and pass it along after which you can convert back. The only annoying thing here (and any LSP) is that you should be able to deal with incomplete code. In case of Cpp2, that would imply generating broken Cpp1 code first.
The same will most likely hold for hover.

As far as I'm aware, there are only 1-3 calls where the full source code is send to the server, all others use a location in the previously known code. So if you want to do preprocessing there, it should be possible. For all other requests, you mainly should be able to do the mapping of location towards the Cpp1 LSP and do the reverse mapping and code rewrite on responses.

The background indexing is most likely going to be the most challenging as it will find preprocessed files without going through your wrapper.

The more I think about this, the more feasible it seems to write a wrapper around the whole LSP and let the Cpp1 LSP handle the majority of the work, even for Cpp2 syntax. Although getting all details right is going to be challenging. This should in theory be LSP unaware, though every LSP has its own extensions.
It still sounds more complex than teaching Clang to parse Cpp2, though even in that case you need to somehow give responses in Cpp2 syntax.

If this works out, a more integrated plugin approach can be introduced in clangd and others LSPs, which might also work for lex/yacc and other code generators. Maybe some generic meta-data can simplify it even more if we have more of them. Though that's something to only worry about once something works.

JVApen Nov 3, 2024

I'm not convinced that you should be using every compiler as backend for your LSP. Even Microsoft uses a separate parser for intellisense than the compiler. It also doesn't change when you would compile with GCC for Linux or Clang for any platform.

So it seems reasonable to focus on a single LSP first. If you don't want to wrap completely, you can hack into your own clone of clangd. All requests come in here: https://github.com/llvm/llvm-project/blob/229abcd459dc365201185aa3988b9f8ae455de76/clang-tools-extra/clangd/ClangdLSPServer.cpp#L1659

I wish I had the time to actually assist in coding.

vanceism7 Nov 13, 2024

I've been thinking about what you said here. I think you're right. I just realized that going down the path of integrating diagnostics from c++ compilers is essentially writing a language server for c++. It seems like the easier approach would be to clone a c++ language server and augment it with cpp2, rather than the reverse, otherwise we're duplicating a lot of effort!

Thanks for the reference to the clangd code. I was poking around on that project a bit but hadn't really made sense of it yet. The link you gave provides a nice starting point.

vanceism7 Nov 19, 2024

I have some good news to report regarding the concern of building for multiple compilers - it actually looks like Microsoft has been working on a diagnostics standard that will allow for easier consumption from multiple compilers (as well as other dev tooling). The search took me a while, but whilst trying to figure out how to get compiler results in json from the major compilers (msvc, clang, and gcc), I discovered that the json-like format they all share in common is something called SARIF.

Even though its duplicate work, creating a language server that pulls its diagnostic results directly from the compiler could be beneficial to give more accurate error messages. (I've already noticed both clangd and microsoft's c++ ls both have issues where they report errors that aren't really there) As a bonus, it also makes the language server more flexible since it can work with whatever compiler the user chooses to use. So yea, perhaps even though it's duplicating effort in some sense, it could be worth it since sarif standardizes results (hopefully meaning we only need to duplicate the effort once, and then it works for all sarif compliant compilers afterwards). Gonna experiment with stuff more

I just noticed you mentioned SARIF in your other response up above. Just for general info for anyone reading, each compiler can output to sarif as follows

clang++ -fdiagnostics-format=sarif file.cpp
cl.exe /EHsc /experimental:log .\sarif-output-name .\file.cpp
g++ -fdiagnostics-format=sarif-file file.cpp

Nov 21, 2024

vanceism7
Nov 21, 2024

A new update. All of the basic cpp2 diagnostics themselves are working, and I've just managed today to augment them with the standard c++ compiler errors now too. Although the diagnostics from the c++ compiler are a little off... At this point, we need a better mapping between the generated c++ code and the cpp2 code in order to show more accurate error locations/reports.

A big bulk of the work is taken care of automatically thanks to the #line pragmas, but the difference in syntax causes the column number to be off a bit. I'm not sure if we're catching line/column numbers for symbols in cpp2 code and c++ code (or atleast just the columns), but I think if cppfront was capturing this information, it'd enable us to get a pretty flawless translation between c++ and cpp2 code for error reporting.

0 replies

Search code, repositories, users, issues, pull requests...

Enablling LSP to work with cpp2/cppfront #762

Uh oh!

Replies: 6 comments · 33 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsutter Oct 7, 2024 Maintainer

Uh oh!

Uh oh!

hsutter Oct 11, 2024 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hsutter
Oct 7, 2024
Maintainer

hsutter Oct 11, 2024
Maintainer