Google Sheets REGEXREPLACE: clean messy text columns

Open the messy CSV your sales team just dropped in your lap and the same problem shows up in every column. Phone numbers in five formats. Names with stray asterisks. Dates that snuck in as text. Find & replace handles the easy wins, but anything that varies row-to-row needs pattern matching, and that is what REGEXREPLACE exists to do.

This article covers the patterns you’ll reach for every week, the capture-group tricks that reformat instead of just strip, an ARRAYFORMULA wrapper for whole-column cleanups, and the RE2-engine quirks that quietly break formulas that look fine.

The REGEXREPLACE contract: three arguments, RE2 under the hood

The signature is short. =REGEXREPLACE(text, regular_expression, replacement) takes a string, a pattern, and what to swap each match with. Every match in the string is replaced, not just the first — that is the difference between this function and SUBSTITUTE with an instance number, where you control exactly which occurrence to touch.

The pattern follows RE2, the regex engine Google built for safety and predictable performance, not the PCRE flavor most online cheat sheets default to. According to the official Google Sheets REGEXREPLACE reference, RE2 supports the everyday metacharacters — \d, \s, \w, character classes, anchors, quantifiers — but drops a few features regex veterans assume are always there.

What RE2 does not support

Lookbehind assertions like (?<=) are out. So are backreferences inside the pattern itself (\1 mid-match) — backreferences in the replacement string work fine, that is a separate feature handled below. Unicode character classes like \p{L} are unsupported too. If a Stack Overflow answer for a different regex tool fails when you paste it into Sheets, it is almost always one of these three.

Note. RE2 also requires text input. =REGEXREPLACE(123, "1", "9") returns #VALUE!. Wrap the input in TEXT(A2,"@") when the column might hold numbers.

Strip junk: the cheat-sheet patterns you’ll use weekly

Most messy-text problems collapse into a handful of patterns. Each formula below keeps the meaningful content and drops the noise.

=REGEXREPLACE(A2, "[^0-9]", "")            // digits only, e.g. for phone numbers
=REGEXREPLACE(A2, "\s+", " ")               // collapse runs of whitespace
=REGEXREPLACE(A2, "^\s+|\s+$", "")          // trim leading and trailing whitespace
=REGEXREPLACE(A2, "[^\w\s]", "")            // strip punctuation, keep words and spaces
=REGEXREPLACE(A2, "\b[A-Z]{3,}\b", "")      // remove ALL-CAPS tokens like SKU tags

Read each pattern as: replace anything that matches. [^0-9] says “any character that is not a digit”; \s+ says “one or more whitespace characters.” The replacement is what each match becomes — an empty "" deletes the match, a single space collapses runs into one. The \b word boundary in the last example is the one worth memorizing: it stops a pattern from eating the middle of a longer word it shouldn’t.

A worked example

Column A holds raw user-entered phone strings. The formula in B2, copied down, produces the clean version. Row 4 highlights a real risk worth seeing before you ship.

	A	B
1	Phone (raw)	Phone (clean)
2	(415) 555-0148	=REGEXREPLACE(A2,”[^0-9]”,””)
3	415.555.0149	4155550149
4	+1 415 555 0150 ext. 3	141555501503

Row 4 shows what happens when a blunt strip meets edge data: the extension digits merged into the number. If extensions matter, split them off before the strip with a separate REGEXEXTRACT pass.

Capture groups: reformat instead of strip

Stripping junk is the common use, but the more useful trick is rearranging what is already there. Capture groups let you save fragments of the match and put them back in a new order. In Google Sheets you reference captured groups in the replacement string with $1, $2, and so on — not \1, which is a common copy-paste trap from other regex flavors.

The classic case: the sheet has dates in MM-DD-YYYY, your downstream tool wants YYYY-MM-DD. Two formulas show the difference between deleting and reshuffling.

Before.

=REGEXREPLACE(A2,"-","")

Returns 05292026 — usable but loses the order signal.

After.

=REGEXREPLACE(A2,"(\d{2})-(\d{2})-(\d{4})","$3-$1-$2")

Returns 2026-05-29 — ISO-sortable.

The pattern (\d{2})-(\d{2})-(\d{4}) binds three groups: month, day, year. The replacement string writes them back in the order you want with a literal dash between. Anything outside the captured groups is dropped, which is handy when you also want to discard surrounding noise.

Swapping a “Lastname, Firstname” column to “Firstname Lastname” works the same way: =REGEXREPLACE(A2, "^(\w+),\s*(\w+)$", "$2 $1"). The anchors ^ and $ are not optional — without them the formula would also rewrite names like “Smith, John Q.” in ways you did not want.

Clean a whole column with ARRAYFORMULA

Most tutorials show REGEXREPLACE one cell at a time. That is fine for ten rows; for ten thousand, you want one formula in the header row that fills the whole column automatically. Wrapping in ARRAYFORMULA lifts the function across a range.

=ARRAYFORMULA(IF(A2:A="","",REGEXREPLACE(A2:A,"[^0-9]","")))

Three pieces matter. A2:A is an open-ended range, so the formula keeps working when new rows arrive. The IF(...="","") guard suppresses empty rows so the column doesn’t render a wall of blanks. REGEXREPLACE itself runs unchanged — it just receives a range instead of a single cell, and the array engine fans it out row by row.

That pattern is the spine of nearly every batch cleanup in a live sheet. For the broader array story — wrapping conditional logic, joining columns, lifting other text functions — our array formula techniques walkthrough has the rest.

Warning. An ARRAYFORMULA-wrapped REGEXREPLACE recalculates whenever anything in its source range changes. On 50,000+ rows you will feel it — freeze the result to static values once the cleanup is settled (covered below).

Troubleshooting when REGEXREPLACE silently does nothing

The most frustrating REGEXREPLACE bug is the one where the formula runs, returns text, and changes nothing. No error, no warning; the output just equals the input. Three usual suspects, in the order you should check them.

Smart quotes and hidden Unicode. Pasting from Word, Slack, Notion, or a PDF brings in curly quotes, non-breaking spaces (U+00A0), and zero-width joiners. A pattern looking for a straight apostrophe will match nothing if the source cell has the curly version. Either expand the character class to cover both, or normalize the column first with a pass that strips everything outside printable ASCII.

Numeric inputs masquerading as text. If column A holds numbers and you point REGEXREPLACE at it, you get #VALUE!. Wrap the input in TEXT(A2,"@") to force a text cast. The reverse problem also bites: REGEXREPLACE always returns text, so a phone number you cleaned will sort as a string until you wrap the formula with VALUE().

Escaping the wrong characters. RE2 treats ., ?, +, *, (, ), [, ], {, }, |, ^, $, and \ as special. A literal dot in a domain name needs a backslash: \.. Inside character classes most of those lose their special status — [.+*] is three literals, not three quantifiers. When in doubt, test the pattern on a one-cell scratchpad before pasting it across a column.

✓ Source range is text (or wrapped in TEXT())
✓ Pattern uses \., not bare ., for literal dots
✓ Lookbehind assertions (?<=) have been removed — RE2 rejects them
✓ Replacement uses $1, $2 for capture groups, not \1
✓ Curly quotes and non-breaking spaces have been normalized first

Freeze the result, and know when REGEXREPLACE is wrong

A live REGEXREPLACE column is useful while you iterate the pattern. Once the column is clean, freeze it. Select the range, Ctrl+C, then Ctrl+Shift+V to paste values over the formulas. The result is static text — no more recalc cost, and a downstream QUERY or pivot table treats the values as data, not derived strings.

This matters when the clean column will feed a QUERY-driven analysis. QUERY can read across an ARRAYFORMULA, but it runs faster and more predictably against literal values, and it survives the source column being archived or deleted.

Tip. Paste-values right after a pattern stabilizes. A formula cannot outlive the source column it references, and a future you will not remember which regex you used six months from now.

When REGEXREPLACE is the wrong tool

A regex is overkill when the change is a single literal swap. =SUBSTITUTE(A2,"Mr. ","") is shorter and clearer than wrapping the same idea in a regex. Find & replace with the “Match using regular expressions” checkbox is even better for one-time fixes, because the result lives on the sheet directly with no helper column.

Reach for REGEXREPLACE when the transformation is conditional on a pattern, when it should follow new rows automatically, or when the cleanup is part of a pipeline you want a future-you to be able to read. And for input prevention — stopping mess from entering the sheet in the first place — pair it with data validation rules on the source column. Cleaning is cheaper after the fact; preventing is cheaper still.

Pick the column that wastes the most of your week — phone numbers, addresses, product IDs, whatever it is — and write one REGEXREPLACE formula for it. Wrap it in ARRAYFORMULA so new rows join the cleanup automatically. Watch it for a week, then paste-values to lock the result in. Add the next column once the first one feels solved.