In today’s world, where technology has made its way into almost everything. Countries across the globe are slowly adapting to the changing atmosphere and bringing in tech at all levels of administration. So now, it has become a lot easier for us to get the information that we are looking for quickly and securely. Ranging from the status of the service request with your Internet provider to your monthly SIP investment details or credit card bill. Everything you need any info about is getting served in your email dynamically on the go.
Wouldn’t it be nice if we could automatically fetch/download all the emails that come in the inbox and make use of them?
I say it would be. Why? because,
We can use the dumped email text data in some text analytics projects later on.
We can upgrade it to send an automatic reply to the sender.
This may be used by business analytics teams to make use of bulk customer emails.
There can be hundreds of other possibilities. So, without further ado let’s see how in python we can write a simple mail scrapper or mail downloader snippet. I have used IMAP (Internet Message Access Protocol) in this module. However, POP (Post Office Protocol) can also be used to achieve the same. Python has library support for imap. It is called imaplib. If you don’t have it in your virtual environment. Please get it by: pip install imaplib
The idea is simple. Run an infinite polling to check for new emails, and upon receiving one scrape the mail content and store locally.
Lets first see the fragments and then at the end we can easily club them together to build the complete module. Do not try to connect the dots yet by connecting the fragmented snippets. That can be done at the end.
First, get the imports done.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The self.login() is defined as a class method and the getpass() asks for your password whenever it is invoked:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The processing block which goes inside an infinite loop. This block does the main job here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The ‘(RFC822)’ is the standard internet message exchange format.
The is_multipart() and walk() methods give you the different parts of an email content and let’s you walk through them respectively.
Here we are only interested in text content. Therefore I have put a check with get_content_type() and for all such text contents I am doing a get_payload().
And at the end I am dumping them into a JSON.
All these methods are then put under the class:
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
And finally place it all in an infinite mailbox polling loop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Then place that under another infinite loop in the main block, which takes care of credentials.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
That is it. Now, you can attempt to attach the snippets together and run it. This code wasn’t reviewed properly, therefore some redundant statements are still there. However, it can be modified easily and can be used accordingly.
Leave a comment