Multiprocessing in Python

How to do multiple tasks in Python at the same time ...

    Inhaltsangabe
  1. Multiprocessing process
    1. Without multiprocessing process
    2. With multiprocessing process
  2. Multiprocessing pools
  3. __name__ == '__main__'
  4. Conclusion

I started writing a Python script some time ago for downloading files from customer servers, so I can back up entire servers. However, I had the problem that Python is a synchronous programming language. This means that each file is downloaded individually and nothing else can be done while doing so.

Considering that we’re talking about more than a million files here, it quickly becomes clear that a full backup of synchronous programming might take more than 24 hours. However, we would like to provide the customer with a daily backup. This requires a whole new concept …

Multiprocessing process

With multiprocessing (Process) you can ensure that a function is called several times (at the same time) in the Python. This means that Python generates several processes and lets them perform the same function with different parameters. If we take the FTP download as an example, it would look like this:

Without multiprocessing process

  • We change to the folder
  • List all files
  • File 1 is being downloaded
  • It is waiting for the download
  • File 2 is being downloaded
  • It is waiting for the download
  • File 3 will be downgraded

With multiprocessing process

  • We change to the folder
  • List all files
  • We pass multiprocessing the file list so that all files should be downloaded.
  • Multiprocessing simultaneously executes the download for all files, so while a large file is being downloaded, several smaller files are being downloaded at the same time.
import multiprocessing
import ftplib

project = {
    'slug'     : sys.argv[1],
    'server'   : sys.argv[2],
    'username' : sys.argv[3],
    'password' : sys.argv[4],
    'root'     : sys.argv[5],
}


def download(file):
    ftp_obj = ftplib.FTP(host=project['server'], user=project['username'], passwd=project['password'])
    

if __name__ == '__main__':
    for file in range(files):
        p = multiprocessing.Process(target=download, args=(file,))
        jobs.append(p)
        p.start()

As we can see, multiprocessing increases our productivity enormously. However, I had a problem. If I put all files to multiprocessing at the same time, most web hosts will block my request because most of them allow a maximum of 10 connections at a time. What to do?

Multiprocessing pools

Multiprocessing provides a class pool that does exactly what you’re thinking of. It serves as a kind of transistor for processes.

Let’s get back to our FTP download issue. We have an array of 10,000 files, but we can only build a maximum of 10 connections to the FTP server. Then you simply create a certain number of processes thanks to pools. For example, five

from multiprocessing import Pool
Pool = Pool(processes=5)

Then you can do exactly the same with the class map of the Pool Module, which was described above, except that a maximum of five processes run simultaneously. So that would look something like this:

from multiprocessing import Pool
import ftplib

project = {
    'slug'     : sys.argv[1],
    'server'   : sys.argv[2],
    'username' : sys.argv[3],
    'password' : sys.argv[4],
    'root'     : sys.argv[5],
}

def download(file):
    ftp_obj = ftplib.FTP(host=project['server'], user=project['username'], passwd=project['password'])
    

if __name__ == '__main__':
    Pool = Pool(processes=5)
    Pool.map_async(download, files)
    Pool.close()
    Pool.join()

__name__ == ‘__main__’

If you already noticed in the code lines above, I always started the main part of my program with an if __name__ == ‘__main__’:. This means that everything within this if is only executed on the main execution.

Easier explained: If you use multiprocessing, this python script will be executed again and again. If you do not prevent the main functions from executing on each call, you will end up in a loop. Logically, this means that the program will never end, because each time the whole program is restarted.

Conclusion

Of course, multiprocessing brings a lot of different and very extensive classes to use. However, I can not list all of them here, but have brought you the classes I’ve used in my backup script. More information about multiprocessing can be found here.

In addition, you can also send me a message via email to [email protected] . I will help you as much as I can, but you should know that I am also in the learning phase of Python.

Topics

productivity Program python technology

Beitrag teilen

WhatsApp it

Folgen Sie uns