Glad you asked. spaCy is one of the most advanced, natural language processing libraries you can use with Python right now, competing with the likes of NLTK. It gives you raw control of what you want in terms of changing its text processing pipeline etc., or not if you don’t. Simple.
In this particular piece I plan on covering the absolute raw basics. We’ll get to the meat of it in another article. But for now, let’s get our basics right. …
Understand how it works and how to implement it.
I was going through a bunch of ideas for a personal website and decided after a lot of deliberation that I have to somehow throw in a parallax effect.
I had no clue as to how to do it. But I had a real thirst to learn how. One of my various searches yielded this fantastic YouTube video, from where I learnt how to implement it. If you prefer videos to reading, then I must say, it’s a really worthwhile watch.
If you stuck around though….let’s dive right in.
I know there’s plenty of tutorials on the web, but all with varying degrees of success. I tried half a dozen to get mine up and running. And, in then end, felt like I’d spent way too much time on something so simple. So here I am, saving your time.
Apart from the fact that you can develop for web, there’s also another reason for as to why you would want to do something like this. App testing. A lot of people whose machines just can’t handle the load of running a full fledged emulator can benefit a lot from…
Python, being Python, apart from its incredible readability, has some remarkable libraries at hand. One of which is NLTK. NLTK or Natural Language Tool Kit is one of the best Python NLP libraries out there. The functionality it leaves at your fingertips while maintaining its ease of use and again, readability is just fantastic.
In fact, we’re going to be completing this mini project under 25 lines of code. And you’re most probably going to understand each line as you read through it. Crazy, I know.
Personally whenever I’m doing anything even relatively fancy, in Python, I use Jupyter…
Pytesseract is a wrapper for Google’s OCR engine.
That one line should most probably leave you extremely pleased. I mean come on. Google? And OCR ? That’s the point when you know it’s good.
Ok, time to start downloading stuff.
I’m writing this article assuming you’re using Anaconda, and trust me it’s significantly easier setting things up using Anaconda instead of doing it manually using pip. There’s just so much that can go wrong.
So first things first let’s get our hands on the OCR engine itself !
Head over to https://github.com/UB-Mannheim/tesseract/wiki and get the 32-bit or 64-bit version depending…