-
-
Notifications
You must be signed in to change notification settings - Fork 58
Add WithEnvironment mixin for visitors #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WithEnvironment mixin for visitors #157
Conversation
a1d5501
to
dca699a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need to handle regex captures. Pattern matching breaks down to var_field
so we already have that.
On top of that, we'll need tons of testing.
f6e0c2e
to
c273f33
Compare
I added a few test scenarios. Also, the previous style using I tried adding support for the named captures in a regex, but we only get a regexp_literal node with a string node inside. Do you think we should try to extract the names and build a location for the captures? |
c273f33
to
c9dca3a
Compare
attr_reader :type | ||
|
||
# [Array[Location]] The locations of all occurrences of this local, | ||
# including its definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to track the definition of a local separately, so that tools can use that information? For example, that would allow Ruby LSP to implement Go to Definition
for local variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense. It would always be the first location in the array. Do you think we should store it separately or just provide a method that returns the first location? Something like
def definition_location
@locations.first
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it wouldn't always be the first necessarily.
a = 1 if a
In this case the var_ref is found before the var_field.
I think I'd definitely like to track the definition and usages separately. They're different types usually anyway, since all of the definitions will be *Field
nodes, and the usages will be VarRef
nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. What would you expect for locals that are assigned multiple times or conditionally? E.g.:
# Which one of these are stored as the definition location? Both?
if something
a = 1
else
a = 2
end
# ...
# How about this one? Do we also include it as a definition location?
a = 5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof that's hard. I would say it would be an array of the first and second, but that's going to get into some pretty in-depth flow analysis that maybe we shouldn't do. How about just separating the assignments from the references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that makes sense, especially as a first iteration. Maybe in the future we can take a stab at more complex analysis.
The one thing that can be slightly odd is for arguments. Should we include their definitions together with re-assignments? It might be fine as long as we document it and name the array something like definitions
and not assignments
.
def foo(a) # This would be a definition
something(a) # This would be a usage
a = 123 # This would be another "definition"
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah definitions
makes a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I split usages and definitions in the latest commit.
I also refactored the tests a little bit to check what is saved in the environment, rather than saving the nodes. This uncovered a couple of bugs that I fixed.
The one thing that it uncovered and I'm not sure what we could do is that rest variables in pinning are considered VCall
, not a VarRef
.
I think we had spoken about this a while back and you mentioned this comes from Ripper itself, so we may just have to leave it as is for now.
def foo
case [1, 2]
in ^a, *rest # => this is a var_field and gets put in definitions
puts rest # => this is considered a vcall and is ignored
end
end
c9dca3a
to
2eccaec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation looks good to me, but at the expense of coming in last minute with a design decision, I would like to ask: Is this best exposed as an optional visitor level feature?
Here is my thinking: If this is a feature that will collect locals per visitor, then for an LSP implementation, each visitor for each request would collect the locals all over again separately. This means wasted work for visitors that keep working on the same tree.
Instead if it was possible to decorate the syntax tree returned from the parse operation (either via calling a parse_with_locals
method or by passing the parse
result through another operation), then each visitor can operate over the collected locals without doing extra work. The LSP server could even cache that tree with locals between changes.
@vinistock and @kddnewton: Do think this this makes sense and is feasible?
We explored subclassing the parser to create a
class MyLspRequest < BaseRequest
def visit_var_ref(node)
# With the current implementation, the current_environment is always the relevant one
local = current_environment.find_local(node.value)
# If the parser builds the environment structure, how do you access it from here?
# We thought about keeping the information inside the node classes. Something like `node.occurrences`
# Alex pointed out that this may be brittle, especially if the AST is transformed in any way
end
end It would be sweet to do this in the parser, but we couldn't figure out a nice way of accessing the right environment. |
Yeah it would definitely be hard. Since the SAX-style ripper is bottom-up, you'd probably have to keep all assignments and references around while the nodes are being built until you get to various breakpoints. So for example, if you had: class Foo
def foo
a = 1
a
end
end you'd need to: init --> create an empty array of assignments and references Note that the pushing and popping from the stack is already done, since that's how location is determined. Also, referencing a list of things is also done, since that's how non-inline comments are attached to blocks of statements. So it's not impossible, but it's also not necessarily easy. Let me know what you'd like to do. I'd be okay keeping it in its current form, but if you'd like to attempt doing it with the parser, I do think that'd be nice. |
Also, regardless of the way we go, can you include some documentation for this in the README? |
f6d02c6
to
7030af0
Compare
Co-authored-by: Alexandre Terrasa <Morriar@users.noreply.github.com> Co-authored-by: Stan Lo <st0012@users.noreply.github.com> Co-authored-by: Emily Samp <egiurleo@users.noreply.github.com> Co-authored-by: Kaan Ozkan <KaanOzkan@users.noreply.github.com> Co-authored-by: Adison Lampert <adisonlampert@users.noreply.github.com> Co-authored-by: Dirceu Tiegs <dirceu@users.noreply.github.com>
7030af0
to
7003178
Compare
Added some documentation to the README. We attempted to get the implementation of this working in the parser, but it was a bit messier. I'd say, we go with this implementation for now and if we manage to get it working nicely with the parser, we can switch. |
Add a mixin to automatically keep track of local variables and arguments defined in the current environment for visitors. This module is very useful when trying to determine all occurrences of a particular local or its type, given a specific scope in the code.
Implementation
The implementation has two parts: the
Environment
and theWithEnvironment
module.Environment
class keeps track of what locals exist in a given environment. It also keeps a reference to an optional parent environment, since we need to go up the chain to determine whether a local is defined or notWithEnvironment
module can be included in classes inheriting fromVisitor
to override some visits methods and automatically keep track of variablesExample usage
Questions
Did we cover all possible local variable occurrences? Maybe we're missing pattern matching with variable pinning.
Notes
We explored populating the environment during parsing, which is definitely possible. However, it makes accessing the current environment quite tricky, as parsing is completely decoupled from visits.