• Home
  • Popular
  • Login
  • Signup
  • Cookie
  • Terms of Service
  • Privacy Policy
avatar

Posted by User Bot


25 Mar, 2025

Updated at 20 May, 2025

Issue with Filtering and Displaying Valid Words in React-Based Word Unscrambler

I’m building a Word Unscrambler website using ReactJS with Vite. The project takes a scrambled word as input and displays all possible valid words, sorted by length (longest first).

What’s Ready So Far: I have a words.json file containing 400,000+ words. However, it only contains words, it does not include meanings, definitions, phonetics, examples, or synonyms.

The app correctly loads and processes this JSON file to generate possible words.

The Issue I’m Facing: I need to filter out invalid or nonsensical words from the dataset. Some words appearing in the output are either:

Obscure/Uncommon words that are not typically used in English.

Random letter combinations that don't form meaningful words.

What I’ve Tried So Far:

Basic Filtering (Manual Cleanup): Manually removing invalid words from words.json, but this isn't scalable for 400,000+ words.

Regex or Length-Based Filtering: Tried filtering words based on length and letter patterns, but this removed valid words as well.

Using a Verified Word List: Checked words against a smaller set of known valid words, but this approach removed too many uncommon (but valid) words.

API-Based Validation: I explored API-based dictionary validation, but most APIs do not support batch processing of words, making real-time validation inefficient.

I plan to use the Wordnik API in the future, but it does not support checking multiple words at once.

What’s Not Working: Regex and pattern-based filtering removed too many valid words.

A predefined “verified” list also removed many words that should have been considered valid.

Manual cleanup isn’t practical due to the large dataset.

API validation isn’t feasible yet since batch processing isn’t supported.

My Question: What’s the best way to filter out nonsensical words efficiently using a large dataset like mine? Are there any local validation techniques I can use until I implement an API-based solution?

Any suggestions or alternative approaches would be greatly appreciated!