A simple way to fetch emails in python

In today’s world, where technology has made its way into almost everything. Countries across the globe are slowly adapting to the changing atmosphere and bringing in tech at all levels of administration. So now, it has become a lot easier for us to get the information that we are looking for quickly and securely. Ranging from the status of the service request with your Internet provider to your monthly SIP investment details or credit card bill. Everything you need any info about is getting served in your email dynamically on the go.

Wouldn’t it be nice if we could automatically fetch/download all the emails that come in the inbox and make use of them?

I say it would be. Why? because,

  1. We can use the dumped email text data in some text analytics projects later on.
  2. We can upgrade it to send an automatic reply to the sender.
  3. This may be used by business analytics teams to make use of bulk customer emails.

There can be hundreds of other possibilities. So, without further ado let’s see how in python we can write a simple mail scrapper or mail downloader snippet.
 I have used IMAP (Internet Message Access Protocol) in this module. However, POP (Post Office Protocol) can also be used to achieve the same.
Python has library support for imap. It is called imaplib. If you don’t have it in your virtual environment. Please get it by: pip install imaplib

The idea is simple. Run an infinite polling to check for new emails, and upon receiving one scrape the mail content and store locally.

Lets first see the fragments and then at the end we can easily club them together to build the complete module. Do not try to connect the dots yet by connecting the fragmented snippets. That can be done at the end.

First, get the imports done.

import sys
import json
import pickle
import os
import time
import logging
import email
import getpass
import imaplib
view raw mail_handler.py hosted with ❤ by GitHub

The connect and login part.

self.mail_handle = imaplib.IMAP4_SSL('imap.gmail.com')
self.mail_id = username
logging.info(f'Please provide the password:')
rv, data = self.login()
view raw mail_handler.py hosted with ❤ by GitHub

The self.login() is defined as a class method and the getpass() asks for your password whenever it is invoked:

def login(self):
rv = None
data = None
try:
rv, data = self.mail_handle.login(self.mail_id, getpass.getpass())
except imaplib.IMAP4.error as e:
raise PermissionError
return rv, data # Could have been removed. Doesn't need to return anything as such
view raw mail_handler.py hosted with ❤ by GitHub

The processing block which goes inside an infinite loop. This block does the main job here.

def process_mailbox(self):
try:
self.mail_handle.select('Inbox')
r, in_mail_ids = self.mail_handle.search(None, f'{self.in_mail_start_id}:*')
in_mail_ids_int = list(map(int, in_mail_ids[0].decode('utf-8').split()))
if in_mail_ids_int:
last_read_mail_id = max(in_mail_ids_int)
if last_read_mail_id >= self.in_mail_start_id:
logging.info(f'New mail…')
for i, item in enumerate(in_mail_ids[0].split()):
body = None
r, v = self.mail_handle.fetch(item, '(RFC822)')
msg = email.message_from_string(v[0][1].decode('utf-8'))
subject = email.header.make_header(email.header.decode_header(msg['Subject']))
if msg.is_multipart():
for part in msg.walk():
c_type = part.get_content_type()
if c_type == 'text/plain':
body = part.get_payload(decode=True) # decode
else:
body = msg.get_payload(decode=True)
save_as = {
'subject': str(subject),
'to': str(msg.get('To')),
'from': str(msg.get('From')),
'date': str(msg.get('Date')),
'body': '\n'.join(str(body.decode('utf-8')).splitlines())
}
with open(os.path.join(cfg.dest, str(time.ctime()).replace(" ", "_").replace(":", "_") + ".json"), 'w') as fp:
json.dump(save_as, fp)
self.in_mail_start_id = last_read_mail_id + 1
else:
logging.info(f'No new mail')
else:
logging.info(f'Empty inbox')
self.in_mail_start_id = 1 # If all the in messages are deleted
except KeyboardInterrupt:
logging.info(f'Keyboard Interrupt.')
logging.info('Exiting. . .')
self.logout()
sys.exit()
except imaplib.IMAP4.error as e:
logging.error(f'ERROR: {e}')
pass
view raw mail_handler.py hosted with ❤ by GitHub

Now. let’s clear the water a bit.

  1. The ‘(RFC822)’ is the standard internet message exchange format.
  2. The is_multipart() and walk() methods give you the different parts of an email content and let’s you walk through them respectively.
  3. Here we are only interested in text content. Therefore I have put a check with get_content_type() and for all such text contents I am doing a get_payload().
  4. And at the end I am dumping them into a JSON.

All these methods are then put under the class:

class MailHandler:
def __init__(self, username):
self.mail_handle = imaplib.IMAP4_SSL('imap.gmail.com')
self.mail_id = username
logging.info(f'Please provide the password:')
rv, data = self.login()
if rv != 'OK':
logging.warning(f'Could not login')
self.in_mail_start_id = 1
view raw mail_handler.py hosted with ❤ by GitHub

And finally place it all in an infinite mailbox polling loop.

def watch_inbox(self):
while True:
self.process_mailbox()
time.sleep(cfg.wait)
view raw mail_handler.py hosted with ❤ by GitHub

Then place that under another infinite loop in the main block, which takes care of credentials.

if __name__ == '__main__':
current_attempt = 0
while True:
if current_attempt >= cfg.login_attempt:
logging.warning(f'3 incorrect attempts')
break
try:
mail_handle = MailHandler(cfg.user_mail_id)
mail_handle.watch_inbox()
except PermissionError as e:
print('Incorrect credentials. ')
current_attempt += 1
view raw mail_handler.py hosted with ❤ by GitHub

That is it. Now, you can attempt to attach the snippets together and run it. This code wasn’t reviewed properly, therefore some redundant statements are still there. However, it can be modified easily and can be used accordingly.

Thank you!
Happy coding.

Leave a comment

Blog at WordPress.com.

Up ↑

Design a site like this with WordPress.com
Get started