Skip to main content
  1. About
  2. For Teams
Asked
Modified 8 years ago
Viewed 4k times
1

So I have spent far too long on this and have tried tons of things with no luck. I think I am just bad at regex. I am trying to clean a string of ALL non alpha numeric characters but leaving spaces. I DO NOT WANT TO USE [^A-Za-z0-9 ]+ due language concerns.

Here are a few things I have tried:

cleaned_string = Regex.Replace(input_string, @"[^\w ]+[_]+);

cleaned_string = Regex.Replace(input_string, ([^\w ]+)([_]+));

cleaned_string = Regex.Replace(input_string, [^ \w?<!_]+);

Edit: Solved thanks to a very helpful person below.

My final product ended up being this: [_]+|[^\w\s]+

Thanks for all the help!

5
  • I think you just need to escape the underscore [^\w\_]
    steve v
    –  steve v
    2017-10-05 20:00:47 +00:00
    Commented Oct 5, 2017 at 20:00
  • None of your examples compile. What are "language concerns"? What is your example input/output?
    Blorgbeard
    –  Blorgbeard
    2017-10-05 20:00:55 +00:00
    Commented Oct 5, 2017 at 20:00
  • @stephen.vakil \_ was one of the first things I tried and it caused an exception.
    David Bentley
    –  David Bentley
    2017-10-05 20:12:50 +00:00
    Commented Oct 5, 2017 at 20:12
  • @Blorgbeard I prefer \w incase the input is not just English
    David Bentley
    –  David Bentley
    2017-10-05 20:13:13 +00:00
    Commented Oct 5, 2017 at 20:13
  • Ah, doesn't need to be escaped I guess. "[\\W_]" seems to work locally for me.
    steve v
    –  steve v
    2017-10-05 20:20:16 +00:00
    Commented Oct 5, 2017 at 20:20

2 Answers 2

3

This should work for you

// Expression: _|[^\w\d ]
cleaned_string = Regex.Replace(input_string, @"/_|[^\w\d ]", "");
Sign up to request clarification or add additional context in comments.

3 Comments

This works perfect. I was just missing | in the expression it seems. Please vote this up since some tool likes to down vote.
No prob regex can cause me a head ach sometimes as well!
I also noticed that I do not need \d as it works with just \w.
2

You may use

var res = Regex.Replace(s, @"[\W_-[\s]]+", string.Empty);

See the regex demo.

Look at \W pattern: it matches any non-word chars. Now, you want to exclude a whitespace matching pattern from \W - use character class subtraction: [\W-[\s]]. This matches any char \W matches except what \s matches. And to also match a _, just add it to the character class. Add + quantifier to remove whole consecutive chunks of matching chars at one go.

Details

  • [ - start of a character class
    • \W_ - any non-word or _ chars
    • -[\s] - except for chars matched with \s (whitespace) pattern
  • ] - end of the character class
  • + - one or more times.

1 Comment

Nice, I swapped to the \s in my code but kept it mostly the same. Nice info!!

Your Answer

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.