Skip to content

Conversation

@CelebrateVC
Copy link
Contributor

Thank you for submitting this pull request! We appreciate you spending the time to work on these changes.

What is the motivation?

In #662 Charlie was asking if globset could speed up runtimes of file matching. After discussing a test strategy for the use case was decided.

What does this change do?

swaps out glob for globset entirely, Strips code of FilePattern as those are now compiled Globs which instead of being stored in Vec<FilePattern>s are grouped together in a GlobSet. is_excluded is modified to reflect this change and now takes in GlobSets.

What is your testing strategy?

Compiled the base image using glob and compiled altered code, per Contributing guide, benchmarked using hyperfine. Based on the discussion in #662 kneecapped the linter.rs#lint_path function to immediately return default Diagnostic, in order to exclusively test path discovery. below are current code state benchmarks.

image

Is this related to any issues?

#662 - as above mentioned

Have you read the Contributing Guidelines?

Signed-off-by: Simon (Desktop) <[email protected]>
Moving forward Vec<FilePattern> or similar should be compiled into
a GlobSet, still need builder function `from_user` to create patterns
for both filenames and fully specified directories

Signed-off-by: Simon (Desktop) <[email protected]>
everything is now considered complex,
so `exclude_simple` would never trigger, ditto ditto for extend_exclude

Signed-off-by: Simon (Desktop) <[email protected]>
Signed-off-by: Simon (Desktop) <[email protected]>
Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It seems like the code both got simpler and there are meaningful performance gains here (I'm seeing a 10ms difference on the benchmark on my machine, which is a > 3% speed-up).

.map(|per_file_ignore| {
(
Exclusion::from_file_pattern(per_file_ignore.pattern),
Exclusion::from_file_pattern(&per_file_ignore.pattern),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove Exclusion and just inline the string, now that basename and absolute are identical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc32661 re-split basename and absolute, we should continue to use Exclusions

.map(Exclusion::from_file_pattern)
.collect(),
exclude: configuration.exclude,
extend_exclude: configuration.extend_exclude,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to map these to strings somehow -- the current output of cargo run resources/test/fixtures/ --show-settings isn't very user-friendly:

Screen Shot 2022-11-22 at 10 10 44 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points all around, I'll go over everything you have shared and take your advice on improvements, this especially should absolutely be cleaned up to something readable ( all those numbers are character codes so it shouldn't be too hard to get strings back out).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! Thank you for your contribution :)

If at any point you feel like you won't have time to get to the comments, just let me know and we can figure out a plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e4d537d and 9e59346 - Had to keep an uncompiled version of the globs to get the strings needed but settings now look much cleaner

src/fs.rs Outdated
for x in ex {
build.add(x);
}
build.build().expect("bad")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we add more expansive expectations here? Like .expect("Failed to build GlobSet")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the changes in f4660bd, this .expect is no longer needed. (or really it's now a ? to waterfall errors up)


for x in ex {
exclude_extend.add(x?);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a little tighter as:

let extend_exclude = {
    let mut set = globset::GlobSetBuilder::new();
    for path in options.extend_exclude.unwrap_or_default() {
        set.add(from_user(&path, project_root)?);
    }
    set.build()?
};

Copy link
Contributor Author

@CelebrateVC CelebrateVC Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f4660bd - great suggestion

FilePattern::Simple("venv"),
]
});
static DEFAULT_EXCLUDE: Lazy<Vec<std::result::Result<globset::Glob, globset::Error>>> =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should just be a GlobSet, rather than a vector of Glob?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a great idea, except that we need to use the globs to make settings look nice, but I did clean this up to not include Result anymore just a Lazy<Vec<Glob>>>

src/main.rs Outdated
for x in cli
.exclude
.iter()
.map(|path| from_user(path, project_root.as_ref()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're going to for-loop here, maybe just make this part of the body, rather than doing a comprehension with the loop too. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, the map was just an artifact of old code, fixed in 708d408 and further improved by cc32661

let absolute = Pattern::new(&absolute_path.to_string_lossy())?;
let basename = if pattern.contains(std::path::MAIN_SEPARATOR) {
None
pub fn from_user(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name made more sense when it was part of FilePattern, since FilePattern::from_user had clear semantics. Maybe we rename to create_glob or glob_from_user_pattern?

}
std::borrow::Cow::Owned(
"{".to_owned() + &absolute_path.to_string_lossy() + "," + pattern + "}",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this merits a comment. Is this creating a pattern that's effectively a union of the absolute and base patterns?

Would it be clearer to instead return two Globs, one for each of those two patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 Globs made much more sense to me, so I basically reimplemented FilePattern::Complex in cc32661 🤣

@charliermarsh
Copy link
Member

Thank you for this PR! Let me know if you're up for tackling the comments.

Signed-off-by: Simon (Desktop) <[email protected]>
Signed-off-by: Simon (Desktop) <[email protected]>
add flexibility to Exclusion to break up Multi-part globs
change created for settings display purposes

Signed-off-by: Simon (Desktop) <[email protected]>
these can be used to turn items into strings in UserConfigurations

Signed-off-by: Simon (Desktop) <[email protected]>
Basically reimplement Complex, but keep GlobSet  for faster checks

Signed-off-by: Simon (Desktop) <[email protected]>
@CelebrateVC
Copy link
Contributor Author

I think that's all your reviews addressed.

Benchmarks for me are still showing ~10 ms improvement even after changes. Do LMK if there are any other changes you'd like to see. I won't be on here over Thanksgiving weekend, but we can continue going back and forth on changes after that if any are needed.

@charliermarsh
Copy link
Member

Great, thanks! I'll try to get this merged in the next few days.

@charliermarsh charliermarsh merged commit a3af6c1 into astral-sh:main Nov 25, 2022
@charliermarsh
Copy link
Member

@CelebrateVC - Thank you for this! I ended up making some tweaks to restore FilePattern (rather than using a tuple with Option), and moved the GlobSet stuff out of Configuration and into Settings which (I think) simplified a few things.

@CelebrateVC CelebrateVC deleted the issue-662-explore-globset branch December 11, 2022 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants