Introduction
In this part of this tutorial series we will learn how to authenticate and access gmail API to retrieve user emails and perform other operations using google’s official python client library “google-api-python-client“.
In this post I will provide the scaffolding and setup code to access gmail, but to understand code completely or to learn how to acquire OAuth credentials see this Using Google API in Python: Introduction and Setup.
Setup Code
Imports
from __future__ import print_function import httplib2 import os from apiclient import discovery from oauth2client import client from oauth2client import tools from oauth2client.file import Storage try: import argparse flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args() except ImportError: flags = None
First acquire google OAuth Client ID using google developer console and then code below to authenticate and connect to API. For full tutorial see this Using Google API in Python: Part 1 – Introduction and Setup.
Code to get credentials
We are using SCOPES list to define scopes our application will be working with API. There are various other scopes for Gmail API. To use multiple scopes we have to use python list as SCOPES variable instead of a string.
Different scopes that can be defined for Gmail are:
- ‘https://www.googleapis.com/auth/gmail.readonly‘ — Read-only access to all resources + metadata
- ‘https://www.googleapis.com/auth/gmail.send’ — Send messages only (no inbox read nor modify)
- ‘https://www.googleapis.com/auth/gmail.labels‘ — Create, read, update, and delete labels only
- ‘https://www.googleapis.com/auth/gmail.insert‘ — Insert and import messages only
- ‘https://www.googleapis.com/auth/gmail.compose‘ — Create, read, update, delete, and send email drafts and messages
- ‘https://www.googleapis.com/auth/gmail.modify‘ — All read/write operations except for immediate & permanent deletion of threads & messages
- ‘https://mail.google.com/‘ — All read/write operations (use with caution)
# If modifying these scopes, delete your previously saved credentials # at ~/.credentials/gmail-python-quickstart.json SCOPES = 'https://www.googleapis.com/auth/gmail.readonly' CLIENT_SECRET_FILE = 'client_secret.json' APPLICATION_NAME = 'Gmail API Python Quickstart' def get_credentials(): """Gets valid user credentials from storage. If nothing has been stored, or if the stored credentials are invalid, the OAuth2 flow is completed to obtain the new credentials. Returns: Credentials, the obtained credential. """ home_dir = os.path.expanduser('~') credential_dir = os.path.join(home_dir, '.credentials') if not os.path.exists(credential_dir): os.makedirs(credential_dir) credential_path = os.path.join(credential_dir, 'gmail-python-quickstart.json') store = Storage(credential_path) credentials = store.get() if not credentials or credentials.invalid: flow = client.flow_from_clientsecrets(CLIENT_SECRET_FILE, SCOPES) flow.user_agent = APPLICATION_NAME if flags: credentials = tools.run_flow(flow, store, flags) else: # Needed only for compatibility with Python 2.6 credentials = tools.run(flow, store) print('Storing credentials to ' + credential_path) return credentials
Connecting and Getting credentials
#getting credentials credentials = get_credentials() #authorization of credentials http = credentials.authorize(httplib2.Http()) #service variable is the access point to complete gmail API service = discovery.build('gmail', 'v1', http=http)
Basics
Retrieving Emails
Getting email IDs
#getting email ids ids = service.users().messages().list(userId='me').execute()['messages']
Parsing email body as HTML
- First we will extract mail’s body content in RAW form
- Then we will do base64 decoding to convert mail body to HTML
- Then will print prettified HTML using BeautifulSoup
from bs4 import BeautifulSoup import base64 for i,id in enumerate(ids): #getting messsage body in raw format body = service.users().messages().get(userId='me',id=id['id'], format='raw').execute() #getting html from bodoyo html = base64.urlsafe_b64decode(body['raw'].encode('ASCII') #if you have lxml installed you can use that too instead of html5lib soup = BeautifulSoup(html, 'html5lib')
Extracting text data from email body
We cannot just convert BeautifulSoup instance to text we first have to do some code cleaning like:
- Remove style, script, meta, document, head and title.
- Remove \n newlines
- Break html into lines and remove leading/trailing spaces
def htmlToText(html): soup = BeautifulSoup(html, 'html5lib') #removing scripts, styles and other useless tags [element.extract() for element in soup(['style','script','meta','[document]','head','title'] #getting text from html text = soup.getText() #removing leading/trailing spaces lines = [line.strip() for line in text.splitlines()] #breaking multi-headlines into line each chunks = [phrase.strip() for line in lines for phrase in line.split(' ')] #removing newlines text = '\n'.join([chunk for chunk in chunks]) return text
Sending Emails
There are two ways to send email using the Gmail API:
- You can send it directly using the messages.send method.
- You can send it from a draft, using the drafts.send method.
Emails are sent as base64url encoded strings within the raw property of a message resource. The high-level workflow to send an email is to:
- Create the email content in some convenient way and encode it as a base64url string.
- Create a new message resource and set its raw property to the base64url string you just created.
- Call messages.send or, if sending a draft, drafts.send to send the message.
Creating Email
Gmail API requires mail messages to be base64 encoded
import base64 def createMessage(sender, to, subject, message): message = MIMEText(message) message['to'] = to message['from'] = sender message['subject'] = subject return {'raw': base64.urlsafe_b64encode(message.as_string())}
Sending Email
def sendMail(service, userId, message): try: message = (service.users().messages().send(userId=userId, body=message).execute()) return message except errors.HttpError, error: print(error)
Creating Drafts
Creating drafts is also easy and very similar to sending email, instead of using messages we will be using drafts() and create() method to create a draft.
def createDraft(service, userId, message): try: message = (service.users().drafts().create(userId=userId, body=message).execute()) return message except errors.HttpError, error: print(error)
Filtering Messages
messages.list takes an optional parameter called q which takes string as an argument and through which we can filter our messages.
messages = service.users().messages().list(userId='me', q='from:someuse@example.com').execute()
Working with labels
You can use labels to tag, organize, and categorize messages and threads in Gmail. A label has a many-to-many relationship with messages and threads: a single message or thread may have multiple labels applied to it and a single label may be applied to multiple messages or threads.
Types of Labes
There are many types of labels few of them are:
- SPAM
- INBOX
- TRASH
- UNREAD
- IMPORTANT
- STARRED
and many more…
There are few labels that cannot be manually applied for privacy reasons. Those labels are:
- SENT
- DRAFT
def getSpamMails(service, userId, labelId): spams = service.users().labels().get(userId, labelId) return spams
Hi, I found your tutorial very helpful. There are a few typos in the code. One part I could not figure out. In the HTMLtoText function the following line seems to be missing something. Please excuse me if the answer is obvious, I am a new programmer:
#removing scripts, styles and other useless tags
[element.extract() for element in soup([‘style’,’script’,’meta’,'[document]’,’head’,’title’]
Should that line include ‘soup = ‘ at the beginning?
LikeLike