codebook, https://github.com/blopker/codebook.

A spellchecker that works in code.

I've replaced `harper` by `codebook` just today, and it works quite well.

It's based on `spellbook`, https://github.com/helix-editor/spellbook, a library made by the Helix editor team.

#SpellChecker #code #RustLang

GitHub - blopker/codebook: An unholy spell checker for code

An unholy spell checker for code. Contribute to blopker/codebook development by creating an account on GitHub.

GitHub

Another week, another discussion, maybe lets talk about LLM with general blog text? Or how to use a tool as a tool? Or try to expose some more contradictions?

https://jeferson.me/blog/2026/05/15/agents-for-text

#Agents #AI #Automation #Blog #CLI #Coding #Contradictions #Efficiency #JevonsParadox #KnowledgeWork #Layoffs #LLM #Productivity #Rails #Ruby #SoftwareEngineering #Spellchecker #Writing

Agents for text

Will agents replace us? Not sure, but maybe you want to read some discussion on the topic.

Building Open Source Tamil Spellchecker – Released Iyal Spellchecker

Iyal Tamil Spellchecker

I am working on a Free/Open Source Tamil Spellchecker. Released it as Iyal at https://iyal.kaniyam.ca

Iyal means Prose/Text in Tamil. ( My daughter name too ).

Sharing few notes here.

A good Free/Open Source Tamil Spellchecker is a dream for many decades for me. Explored on these around 2020.

I realized that we need a huge word bank to keep as a base. For a year, I started to collect words from blogs, websites. around 600+ websites.

Collecting huge words list

Wrote a python script to do the below things.
  • Download recent articles
  • extract the text
  • clean the text – Remove HTML keywords/symbols, English words, numbers
  • In a text file, add the unique words, with word count
  • Increment the count, if the word already exists
  • Create frequently used word list, if a word has 50+ usage.
  • Create a bloom filter model file with this frequently used word list
  • Now, this files become a base asset for the spellchecker.

    We have now 150819 unique highly used words in our collection. see the word list here – https://github.com/KaniyamFoundation/iyal-tamil-spellchecker/tree/main/collect_words

    Backend

    Bloom filter can quickly find if a word is there or not in a given huge word list. BK-Tree can suggest nearly similar words.

    With these two, built a small spellchecker in Python.

    Added flask to give as API and make a web version

    Front End

    Used simple JavaScript and HTML to build a web interface. Added the words counter, “In-Progress/DONE” messages. Huge content is processed as small chunks to make sure the site is not broken.

    Pre-filters

    There are three files.

    rightwordlist.txt – to store all always right words
    wrongwordlist.txt – to store all always wrong words
    replacements.txt – to have custom dictionary based replace, like suggesting good tamil words for English words. example – பஸ் | பேருந்து

    These three file are processed as a first filter.

    Adding LanguageTool.org for simple grammar checking

    The spellchecker is working good, powered by big word list. Still, there are many things to improve. We need to add Sandhi checker, and basic grammar checker.

    Some 15 years ago, Elanchezhiyan and Prof Ilantamil from Malaysia, Thamizha Community, added many basic Tamil grammar rules to LanguageTool.org it is a generic grammar engine for all languages.

    Fortunately, it is still maintained project and working well when hosted locally.

    Installed LanguageTool and implemented check/suggestions with the current Iyal Spellchecker.

    Adding Tamilinaiya Vaani Spellchecker

    Around 2020, Mr. Neechalkaran, creator of Vaani Spellchecker, released a mini version of his spellchecker as open source, as part of a grant program by Tamil Virtual Academy. Code is here – https://github.com/Tamil-Virtual-Academy/Tamilinaiya-Spellchecker

    It is a rule based spellchecker. All the rules are in DB.json file. The code is in C#. My friends Manik and Ashok Ramachandran helped to port that to Python – port is here – https://github.com/tshrinivasan/Tamilinaiya-Spellchecker/tree/master/PythonPort

    At that time, we made as a command line spellchecker in python. Thanks to Neechalkaran and Tamil Virtual Academy for the nice work and releasing as Free/Open Source Software. It will be so good, if all the government and university sponsored projects are released as Free/Open Source software, in all the world.

    The Word bank based Iyal spellchecker is good. But, as Tamil is ever growing language, collecting/adding all the world in Tamil is taking more time. Thought of integrating Tamilinaiya Vaani to Iyal, as a preprocessor. It works well as expected.

    Suggestions

    BK-Tree is a simple algorithm to suggest the near similar words. When the words are not found in word bank, BK tree suggests a near similar word. When Tamilinaiya Vaani has some suggestion it is added. LanguageTool suggestions also added in the suggestion menu.

    Handling huge content

    When a huge content is pasted, it is divided into multiple chunks ( 200 words per chunk ), then processed, to make sure the server is alive. Timer is added to know how long it will take to complete processing all the words.

    Architecture

    [ USER UI (Vanilla JS) ] | | (Batch Streaming POST) v [ FLASK BACKEND (app.py) ] | +--- 1. CUSTOM OVERRIDES (whitelist.txt, blacklist.txt, replacements.txt) | +--- 2. L1 CACHE (Bloom Filter: Instant Dictionary Check) | +--- 3. L2 ENGINE (Tamilinaiya Vaani: Morphological Rule-Check)
    | +--- 4. L3 LanguageTool Suggestions
    | +--- 5. L5 FALLBACK (BK-Tree: Fuzzy Similarity Search) | v [ JSON RESPONSE ] --> (UI Highlight / Suggestion Menu)

     

    Live at iyal.kaniyam.ca

    With the current design, happy to release the Iyal Tamil Spellchecker at https://iyal.kaniyam.ca

    Code – https://github.com/KaniyamFoundation/iyal-tamil-spellchecker

    Current version is 0.0.3 It is still under beta.

    Tamil scholars may find this spellchecker as a elementary one. But, still it is a good working version. Give a try. Test with the tamil content you write or read. If you have any suggestions to improve, raise as an issue in github. or if you have code contributions, send as PR.

    Send a mail to [email protected] with your feedback.

    What next?

    • Keep adding more words to the word bank
    • Check and remove any wrong words from the word bank
    • Add more replacement words
    • Add any more available open source spellcheckers, as layers.
    • Make the site a secured and robust
    • Make it full HTML compatible. Currently it works for plain text and for basic HTML formatting like bold, italic, headings only.
    • Make extensions for browsers, word processors, editors across all operating systems.

    Will keep working on them. Please contribute to improve these.

    Read all days notes on building tamil spellchecker.

  • Study notes on open-tamil spellchecker – day 1
  • Building Tamil Spellchecker – Day 2 – Bloom Filter to quick query on dataset
  • Building Tamil Spellchecker – Day 3 – Collecting all Tamil Nouns
  • Building Tamil Spellchecker – Day 4 – Shall we collect ALL Tamil Words?
  • Building Tamil Spellchecker – Day 5 – started collecting ALL Tamil Words
  • Building Open Source Tamil Spellchecker – Day 6 – How fast is bloom filter for 24 lakh words?
  • Building Open Source Tamil Spellchecker – Day 7 – Scrapping websites to get more words
  • Building Open Source Tamil Spellchecker – Day 8 – Porting from C# to Python
  • Building Open Source Tamil Spellchecker – Day 9 – Ported from C# to Python
  • Building Open Source Tamil Spellchecker – Day 10 – Released Iyal Tamil Spellchecker
  • Rate this:

    #bkTree #bloomFilter #spellchecker #tamil #tamilinaiyaVaani
    இயல் தமிழ் எழுத்துப் பிழைத்திருத்தி

    Is there an particular reason #spellchecker is so obsessed with substituting “its” for “it’s”, and vice versa?

    I’m just too trusting with #SpellChecker|s / #TextReplacement. I should check the checked texts before I tap the send button.

    You should assume that at least 50% of my typos are due to this.

    Those fancy new local NPUs, like the one in my #AMD #Ryzen 7840HS, that get shipped with every new-enough box, SHOULD be capable of enhancing the #SpellChecker performance/hit-rate/semantic accuracy, shouldn't they?

    Someone, more knowledgeable (#39c3?) than myself, should reply - please boost this toot.

    This might be an ACTUALLY USEFUL (non-hallucinatory) application of #LLM #AI #DeepLearning #MachineLearning, that wouldn't make the #RAM #price exponentially through the moon

    Oh, apparently, gritter isn't a word at all. It should be glitter, critter or fritter.
    #spellchecker
    Dear spellchecker, I can't believe that "fritter" is a more common word than "gritter". I stand to be, erm, corrected.
    #spellchecker

    have this conversation with students all the time - "I don't care if it's 'polished!' I need facts, examples, explanations! But still run #spellchecker..."

    Never use #Grammarly again — the reason every #writer should care https://www.makeuseof.com/ill-never-use-grammarly-again-reason-every-writer-should-care/

    #generativeAI #settings

    I’ll never use Grammarly again — and this is the reason every writer should care

    Once felt like a helpful grammar checker for writers, Grammarly has now turned into an aggressive AI tool always trying to erase your individuality.

    MakeUseOf

    Look, #Spellchecker, I can do without the weird words with spelling close to common ones. I can assure you that when I type "alow" I really mean "allow" and I'd like you to catch that. #spelling

    If I want an uncommon word in the dictionary, let me put it in. And let me remove any words I want, too. I'd rather be dinged every time I type "form" and mean it, than to have a mistyped "from" get by. #typo