TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
TextBlob is a wrapper around NLTK library which is used for text processing and NLP. NLTK is very in-depth, but TextBlob provides a very easy and convenient to use API to NLTK to perform most common tasks which makes it very suitable for beginner learning and experimenting. After learning TextBlob you can move on to learning NLTK and learn its in-depth concepts and working. NLTK is very good for learning intermediate and advanced NLP concepts.
Features
- Noun phrase extraction
- Part-of-speech tagging
- Sentiment analysis
- Classification (Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Parsing
n
-grams- Word inflection (pluralization and singularization) and lemmatization
- Spelling correction
- Add new models or languages through extensions
- WordNet integration
Installation
TextBlob can be installed using pip. If you want to install python or pip see this post.
pip install textblob
Or if you have python3 installed
pip3 install textblob
Introduction
Main class in textblob package is TextBlob class
from textblob import TextBlob
Let’s create an instance of TextBlob and supply a paragraph of text from textblob documentation example.
text = ''' The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact. Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant. ''' #creating TextBlob object blob = TextBlob(text)
Tokenization
Like other functions in textblob, Tokenization is also easy to do. To retrieve the tokens we can access properties of blob object like words and sentences.
>>> blob.words WordList(['The', 'titular', 'threat', 'of', 'The', 'Blob', 'has', 'always', 'struck', 'me', 'as', 'the', 'ultimate', 'movie', 'monster', 'an', 'insatiably', 'hungry', 'amoeba-like', 'mass', 'able', 'to', 'penetrate', 'virtually', 'any', 'safeguard', 'capable', 'of', 'as', 'a', 'doomed', 'doctor', 'chillingly', 'describes', 'it', 'assimilating', 'flesh', 'on', 'contact', 'Snide', 'comparisons', 'to', 'gelatin', 'be', 'damned', 'it', "'s", 'a', 'concept', 'with', 'the', 'most', 'devastating', 'of', 'potential', 'consequences', 'not', 'unlike', 'the', 'grey', 'goo', 'scenario', 'proposed', 'by', 'technological', 'theorists', 'fearful', 'of', 'artificial', 'intelligence', 'run', 'rampant']) >>> blob.sentences [Sentence(" The titular threat of The Blob has always struck me as the ultimate movie monster: an insatiably hungry, amoeba-like mass able to penetrate virtually any safeguard, capable of--as a doomed doctor chillingly describes it--"assimilating flesh on contact."), Sentence("Snide comparisons to gelatin be damned, it's a concept with the most devastating of potential consequences, not unlike the grey goo scenario proposed by technological theorists fearful of artificial intelligence run rampant.")]
Various properties and methods of WordList and Sentence will be discussed in later section of this post.
Lemmatization
for word in blob.words: print(word.lemmatize()) #OUTPUT The titular threat of The Blob ha always struck me a the ultimate movie monster an insatiably hungry amoeba-like mass able to penetrate virtually any safeguard capable of a a doomed doctor chillingly describes it assimilating flesh on contact Snide comparison to gelatin be damned it 's a concept with the most devastating of potential consequence not unlike the grey goo scenario proposed by technological theorist fearful of artificial intelligence run rampant
P.O.S Tags
>>> blob.tags [('The', 'DT'), ('titular', 'JJ'), ('threat', 'NN'), ('of', 'IN'), ('The', 'DT'), ('Blob', 'NNP'), ('has', 'VBZ'), ('always', 'RB'), ('struck', 'VBN'), ('me', 'PRP'), ('as', 'IN'), ('the', 'DT'), ('ultimate', 'JJ'), ('movie', 'NN'), ('monster', 'NN'), ('an', 'DT'), ('insatiably', 'RB'), ('hungry', 'JJ'), ('amoeba-like', 'JJ'), ('mass', 'NN'), ('able', 'JJ'), ('to', 'TO'), ('penetrate', 'VB'), ('virtually', 'RB'), ('any', 'DT'), ('safeguard', 'NN'), ('capable', 'JJ'), ('of', 'IN'), ('as', 'IN'), ('a', 'DT'), ('doomed', 'JJ'), ('doctor', 'NN'), ('chillingly', 'RB'), ('describes', 'VBZ'), ('it', 'PRP'), ('assimilating', 'VBG'), ('flesh', 'NN'), ('on', 'IN'), ('contact', 'NN'), ('Snide', 'JJ'), ('comparisons', 'NNS'), ('to', 'TO'), ('gelatin', 'VB'), ('be', 'VB'), ('damned', 'VBN'), ('it', 'PRP'), ("'s", 'VBZ'), ('a', 'DT'), ('concept', 'NN'), ('with', 'IN'), ('the', 'DT'), ('most', 'RBS'), ('devastating', 'JJ'), ('of', 'IN'), ('potential', 'JJ'), ('consequences', 'NNS'), ('not', 'RB'), ('unlike', 'IN'), ('the', 'DT'), ('grey', 'NN'), ('goo', 'NN'), ('scenario', 'NN'), ('proposed', 'VBN'), ('by', 'IN'), ('technological', 'JJ'), ('theorists', 'NNS'), ('fearful', 'NN'), ('of', 'IN'), ('artificial', 'JJ'), ('intelligence', 'NN'), ('run', 'NN'), ('rampant', 'NN')]
Getting Noun Phrases
>>> blob.noun_phrases WordList(['titular threat', 'blob', 'ultimate movie monster', 'amoeba-like mass', 'snide', 'potential consequences', 'grey goo scenario', 'technological theorists fearful', 'artificial intelligence run rampant'])
Sentiment
>>> blob.sentiment Sentiment(polarity=-0.1590909090909091, subjectivity=0.6931818181818182)
Sentiment class has polarity and subjectivity properties to retrieve individual polarity and subjectivity.
>>> blob.sentiment.polarity -0.1590909090909091 >>> blob.sentiment.subjectivity 0.6931818181818182
We can also get sentiment of individual sentences by sentiment on Sentence instances
>>> for sentence in blob.sentences: ... print(sentence.sentiment) Sentiment(polarity=0.06000000000000001, subjectivity=0.605) Sentiment(polarity=-0.34166666666666673, subjectivity=0.7666666666666666)
Synsets
>>> blob.words[1].synsets [Synset('titular.a.01'), Synset('titular.a.02'), Synset('titular.a.03'), Synset('titular.a.04'), Synset('nominal.s.06')]
We can also integrate WordNet
from textblob.wordnet import VERB from textblob import Word >>> Word('run').get_synsets(pos=VERB) [Synset('run.v.01'), Synset('scat.v.01'), Synset('run.v.03'), Synset('operate.v.01'), Synset('run.v.05'), Synset('run.v.06'), Synset('function.v.01'), Synset('range.v.01'), Synset('campaign.v.01'), Synset('play.v.18'), Synset('run.v.11'), Synset('tend.v.01'), Synset('run.v.13'), Synset('run.v.14'), Synset('run.v.15'), Synset('run.v.16'), Synset('prevail.v.03'), Synset('run.v.18'), Synset('run.v.19'), Synset('carry.v.15'), Synset('run.v.21'), Synset('guide.v.05'), Synset('run.v.23'), Synset('run.v.24'), Synset('run.v.25'), Synset('run.v.26'), Synset('run.v.27'), Synset('run.v.28'), Synset('run.v.29'), Synset('run.v.30'), Synset('run.v.31'), Synset('run.v.32'), Synset('run.v.33'), Synset('run.v.34'), Synset('ply.v.03'), Synset('hunt.v.01'), Synset('race.v.02'), Synset('move.v.13'), Synset('melt.v.01'), Synset('ladder.v.01'), Synset('run.v.41')] #getting definition >>> Word('car').definitions ['a motor vehicle with four wheels; usually propelled by an internal combustion engine', 'a wheeled vehicle adapted to the rails of railroad', 'the compartment that is suspended from an airship and that carries personnel and the cargo and the power plant', 'where passengers ride up and down', 'a conveyance for passengers or freight on a cable railway']
n-grams
>>> blob.ngrams(2) [WordList(['The', 'titular']), WordList(['titular', 'threat']), WordList(['threat', 'of']), WordList(['of', 'The']), WordList(['The', 'Blob']), WordList(['Blob', 'has']), WordList(['has', 'always']), WordList(['always', 'struck']), WordList(['struck', 'me']), WordList(['me', 'as']), WordList(['as', 'the']), WordList(['the', 'ultimate']), WordList(['ultimate', 'movie']), WordList(['movie', 'monster']), WordList(['monster', 'an']), WordList(['an', 'insatiably']), WordList(['insatiably', 'hungry']), WordList(['hungry', 'amoeba-like']), WordList(['amoeba-like', 'mass']), WordList(['mass', 'able']), WordList(['able', 'to']), WordList(['to', 'penetrate']), WordList(['penetrate', 'virtually']), WordList(['virtually', 'any']), WordList(['any', 'safeguard']), WordList(['safeguard', 'capable']), WordList(['capable', 'of']), WordList(['of', 'as']), WordList(['as', 'a']), WordList(['a', 'doomed']), WordList(['doomed', 'doctor']), WordList(['doctor', 'chillingly']), WordList(['chillingly', 'describes']), WordList(['describes', 'it']), WordList(['it', 'assimilating']), WordList(['assimilating', 'flesh']), WordList(['flesh', 'on']), WordList(['on', 'contact']), WordList(['contact', 'Snide']), WordList(['Snide', 'comparisons']), WordList(['comparisons', 'to']), WordList(['to', 'gelatin']), WordList(['gelatin', 'be']), WordList(['be', 'damned']), WordList(['damned', 'it']), WordList(['it', "'s"]), WordList(["'s", 'a']), WordList(['a', 'concept']), WordList(['concept', 'with']), WordList(['with', 'the']), WordList(['the', 'most']), WordList(['most', 'devastating']), WordList(['devastating', 'of']), WordList(['of', 'potential']), WordList(['potential', 'consequences']), WordList(['consequences', 'not']), WordList(['not', 'unlike']), WordList(['unlike', 'the']), WordList(['the', 'grey']), WordList(['grey', 'goo']), WordList(['goo', 'scenario']), WordList(['scenario', 'proposed']), WordList(['proposed', 'by']), WordList(['by', 'technological']), WordList(['technological', 'theorists']), WordList(['theorists', 'fearful']), WordList(['fearful', 'of']), WordList(['of', 'artificial']), WordList(['artificial', 'intelligence']), WordList(['intelligence', 'run']), WordList(['run', 'rampant'])]
Translations
We can easily detect language of text by using detect_language()
>>> blob.detect_language() 'en'
We can also perform translations
>>> blob.translate(to='fr') TextBlob("La menace titulaire de The Blob m'a toujours été le film ultime monstre: une masse insatisfaisante affamée et amibe capable de pénétrer pratiquement n'importe quelle sauvegarde, capable de - en tant que docteur condamné avec calme le décrit - "assimilant la chair au contact. Les comparaisons de Snide à la gélatine seront damnées, c'est un concept avec le plus grand Dévastatrice de conséquences potentielles, contrairement au scénario gris proposé par les théoriciens technologiques craignant l'intelligence artificielle est courante.")
Note: to parameter in translate() function requires an ISO 639-2 language code. To see the full reference of language code see this link.