Working with Binary Files in Python

Binary Files: In Python, a binary file is a file that contains textual and non-textual data, such as images, audio files, video files, and executable programs. Binary files are usually opened in binary mode to avoid any character encoding issues, and they can be read and written using the built-in file handling functions.
1) It stores information in the form of a stream of bytes
2) It contains the information in the same format in which information is held in memory.
3)There are no delimiters for a line and thus no translation for EOL occurs, So it is faster to read and write binary files.
4) It is not in human readable form.
5) They can have any extension

File Operations : We can perform various operations on a file such as
1) Opening the already existing file
2) Creating the new file
2) Writing data into file
3) Reading data from the file
4) Appending data into the file
5) Searching data into the file
6) Updating data into the file
7) Closing the file

File Object: It is a built-in object or link used to interact with a file on disk or in memory. A file object is created using the open() function, and provides methods and attributes for reading and writing data to the file. All operations are performed on files through the file object or file handle. To create a file object in Python, you can use the open() function, which takes two arguments: the name of the file, and the mode in which the file should be opened.

Opening Files: We can open files using the built-in function open(). The open() function takes two arguments: the name of the file you want to open, and the mode in which you want to open the file. Syntax is
file_object_name = open(“file name with complete path” , “file mode”)

File Mode: The file mode determines the purpose of opening the file i.e. whether the file is opened for reading, writing, appending, or some combination of these operations. The file mode is specified as the second argument to the open() function, and it is a string that contains one or more characters.
The most common file modes are:

  1. ‘r’ open for reading (default)
  2. ‘w’ open for writing, truncating the file first
  3. ‘x’ create a new file and open it for writing
  4. ‘a’ open for writing, appending to the end of the file if it exists
  5. ‘b’ binary mode
  6. ‘t’ text mode (default)
  7. ‘+’ open a disk file for updating (reading and writing)
  8. ‘U’ universal newline mode (deprecated)

Note:
1) Default file opening mode is “rt” which opens a text file for reading if file exists and gives error if file doesn’t exist.
2) In write mode, file is created if it doesn’t exist and overwritten if file already exists while “X” mode (exclusive creation) is used to create a new file and open it for writing. If the file already exists, the operation will fail and raise a FileExistsError. This mode is useful when you want to create a new file and avoid accidentally overwriting an existing file.
3) In append mode, file is created if it doesn’t exist and opened (but not ovrewritten) if file already exists and new contents are appended at the end of the file.
4) While opening files in read and write file modes, file pointer is placed at the begining of the file but in append mode, file pointer is placed at the end of the file.
5) We can write exactly any one of create/read/write/append mode i.e r/w/a/x mode at a time
6) We can’t have text and binary mode at the same time
7) ‘U’ mode is deprecated and will raise an exception in future versions of Python. It has no effect in Python 3.
8) We can combine any two or more possible combinations of file modes in any possible way by writing as
a) ‘rb’ , ‘br’ , ‘rt’ , ‘tr’ , ‘r+b’ , ‘rb+’ , ‘+rb’ , ‘b+r’ , ‘br+’ , ‘+br’
b) ‘wb’ , ‘bw’ , ‘wt’ , ‘tw’ , ‘w+b’ , ‘wb+’ , ‘+wb’ , ‘b+w’ , ‘bw+’ , ‘+bw’
c) ‘ab’ , ‘ba’ , ‘at’ , ‘ta’ , ‘a+b’ , ‘ab+’ , ‘+ab’ , ‘b+a’ , ‘ba+’ , ‘+ba’
d) ‘xb’ , ‘bx’ , ‘xt’ , ‘tx’ , ‘x+b’ , ‘xb+’ , ‘+xb’ , ‘b+x’ , ‘bx+’ , ‘+bx’

>>># default mode is "rt" i.e. open a text file for reading
>>> f1=open("garg.txt")

>>>#opening a binary file for reading
>>> f2=open("garg.dat" ,"rb")

>>>#opening a binary file for writing
>>> f3=open("garg.dat" ,"wb")

>>>#opening a binary file for appending
>>> f4=open("garg.dat" ,"ab")

>>>#will give error because file already exists, and it is used to avoid accidentally overwriting it.
>>> f5=open("garg.dat" ,"xb")
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    f5=open("garg.dat" ,"xb")
FileExistsError: [Errno 17] File exists: 'garg.dat'

>>>#error because file is opened for reading (default mode) and file doesn't exist
>>> f6=open("gupta.dat","b")
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    f6=open("gupta.dat","b")
FileNotFoundError: [Errno 2] No such file or directory: 'gupta.dat'

>>># We must give double slashes as it is a escape sequence and special meaning attached to  character
>>> f7=open("f:\demo\file1.dat","b")

>>># We can write r to specify raw string in which single slashes are used
>>># which means no special meaning attached to any character
>>> f8=open(r"f:demofile1.dat","b")

>>># no space allowed between r and file name
>>> f9=open(r "f:demofile1.dat","b")
SyntaxError: invalid syntax

>>># opens a binary file in write mode but both reading and writing operations can be performed
>>> f10=open("f:\demo\file1.dat", "+wb")

Path: It is a sequence of directory names which give you the hierarchy to access a particular directory of file.
Path Name: It is the full name of a file or a directory.
Absolute Path: It is the complete path that starts from the root directory and includes all directories and subdirectories necessary to locate the file. Example: F:demofile1.txt
Relative paths: It is a path that is relative to the current working directory CWD and does not start from the root directory. It describes the path to a file or directory in relation to the current working directory denoted by a . (dot) and its parent directory denoted with .. (two dots). It is used to locate files that are stored in the same directory as the current working directory, or in subdirectories. Your CWD is where you have opened the file in which you are running the python shell or where you are writing the python code. Suppose your CWD is C:\python, then
1) .file1.dat will search in CWD i.e. C:\python\file1.dat
2) ..file1.dat will search in parent folder of CWD i.e. C:\file1.dat
3) .demo\file1.dat will search in CWD and then find the file file1.dat in the folder named demo i.e. C:\python\demo\file1.dat
4) ..demo\file1.dat will search in parent folder of CWD and then find the file file1.dat in the folder named demo i.e. C:\demo\file1.dat

Closing Files: Once you are done reading from the file or performing other operations on it, you can use the close() method to close the file. It’s generally a good practice to close files , as leaving files open can cause problems with file locking, resource usage, and potential data corruption. In Python, files are automatically closed when the program terminates, but it’s still a good idea to close them explicitly to avoid any issues. We can close a file using the close() method. Syntax is:
file_object_name.close()
Example: f1.close()
It breaks the link of the file object and the file on the disk/memory. After that no task can be performed on that file.

Note: open() is a built in function but close() is a function of file handle object.

Reading/Writing Binary Files: To read and write non-simple objects like dictionaries , tuples, lists or nested lists etc. into the file , the objects are often serialized and then stored in binary files to maintain their structure.
Serialization / Marshalling / (Pickling) : It is a process that allows you to convert a python object into a stream of bytes, which can be stored in a file or transferred over a network. The process of pickling is also called serialization or marshalling. The resulting byte stream can be used later to reconstruct the original object. The pickling process in Python is done using the “pickle” module, which provides a set of methods for serializing and deserializing python objects. To pickle an object, you can use the dump() method of pickle module to serialize the object and write it to a file.
Deserialization / Marshalling / (Unpickling) :It refers to the process of deserializing or reconstructing a python object from a stream of bytes (usually stored in a file or received over a network) that was previously created using the pickling process. The “pickle” module in python provides a set of methods for pickling and unpickling Python objects. To unpickle or deserialize a pickled object, you can use the load() method of pickle module to read the pickled object from the file and reconstruct the original object.
Library Needed: pickle
Importing the library: We can import any library in python using the import statement as
import library_name
To give alias name, we can write as
import library_name as alias_name
To import pickle library, we can write
import pickle
To give alias name, we can write as
import pickle as p

Pickling/ Dumping / Writing into Binary File : Python allows pickling of the objects of various data types like Booleans, integers, floats, complex numbers, strings, lists, tuples, sets, dictionaries etc.
Syntax is:
pickle.dump(object)
We can write data into a binary file using dump() method after importing pickle library as

>>> import pickle
>>> f1=open("file3.dat", "wb")
>>> pickle.dump([1,"sheetal" ,500],f1)
>>> pickle.dump([2,"amit" ,400],f1)
>>> pickle.dump([3,"nidhi" ,300],f1)
>>> f1.close()

Note that binary files are not in human readable form. So we cant read the contents directly from a binary file

Unpickling / Loading / Reading a Binary File : We can load a binary file (say file3.dat, created above) after importing the pickle library as

Contents are not in human readable form




               file3.dat
              Empty File
>>> import pickle
>>> f1=open("file3.dat","rb")
>>> pickle.load(f1)
[1, 'sheetal', 500]
>>> pickle.load(f1)
[2, 'amit', 400]
>>> pickle.load(f1)
[3, 'nidhi', 300]
>>> # Error because no more data in file
>>> pickle.load(f1)
Traceback (most recent call last):
  File "<pyshell#102>", line 1, in <module>
    pickle.load(f1)
EOFError: Ran out of input
>>> f1.close()

If we try to load from a file when the end of the file has reached and no more data is available for reading (as in above case) or load from an empty file (say file4.dat, which is an empty file) , then we will get an error as

               file4.dat
              Empty File
>>> import pickle
>>> f1=open("file4.dat","rb")
>>># Error in reading as the file is empty 
>>> ob=pickle.load(f1)
Traceback (most recent call last):
  File "<pyshell#46>", line 1, in <module>
    ob=pickle.load(f1)
EOFError: Ran out of input

To avoid/handle EOFError (as in above case while reading the file when the file is empty or end of file EOF has reached) Exceptions, we must always load a file in try and except blocks. In that case, our program will not interrupt, but gives proper error message like “File is empty” or “End of File has reached” or any other message specified by the user.

>>> import pickle
>>> f1=open("file4.dat","rb")
>>> try:
	ob=pickle.load(f1)
except EOFError:
	print("EOF has reached")
	f1.close()

	
EOF has reached

Searching in a Binary File: We can search by comparing the data after reading it from a binary file and process it in any desired way.
Program to search a record with given roll no in a binary file (say file3.dat created above) :

import pickle
f1=open("file3.dat","rb")
rollno=int(input("enter rollno to find "))
found=False

try:
    while True:
        record=pickle.load(f1)
        if ( record[0]==rollno):
            print("Record found :")
            print(record)
            found=True
except EOFError:
    if found==False:
        print("Sorry, Record doesn't exist")
f1.close()     

Output 1:

enter rollno to find 2
Record found :
[2, 'amit', 400]

Output 2:

enter rollno to find 5
Sorry, Record doesn't exist

Searching in a Binary File: We can process the records after reading from a binary file (say file3.dat created above) as desired.
Program to print the details of all the students having marks greater than given targetted marks

import pickle
f1=open("file3.dat","rb")
marks=int(input("enter targetted marks : "))
print("Required Records are : ")
found=False
try:
    while True:
        record=pickle.load(f1)
        if ( record[2] > marks):
            print(record)
            found=True
except EOFError:
    if found==False:
        print("Sorry, No record exists ")
f1.close()
        

Output 1:

enter targetted marks : 350
Required Records are :
[1, 'sheetal', 500]
[2, 'amit', 400]

Output 2:

enter targetted marks : 500
Required Records are :
Sorry, No record exists 

Updating a Binary File : We can update a record by moving the file pointer to the location of the record and then updating it. For moving the file pointer to the desired location, we must know its position. There are 2 functions seek() and tell() to perform this action.
1) tell() : It returns the current position of the file pointer.
Syntax is file_pointer.tell()
Example:
f1.tell()
2) seek() : It places the file pointer at the desired loaction.
Syntax is file_pointer.seek(offset,mode)
offset : It is the number of bytes to move the file pointer. If offset is positive, file pointer will move in the forward direction and if offset is negative, file pointer will move in the backward direction.
mode : It is an integer which specifies the location w.r.t which, we want the file pointer to move. We can write 0 for begining of file (BOF) , 1 for current location of file pointer and 2 for end of file (EOF)
Example:
f1.seek(+5,0) will move the file pointer 5 bytes in forward direction from begining of file (BOF)
f1.seek(-5,1) will move the file pointer 5 bytes in backward direction from current position of file pointer
f1.seek(5,1) will move the file pointer 5 bytes in forward direction from current position of file pointer
f1.seek(-5,2) will move the file pointer 5 bytes in backward direction from end of file (EOF)
Note: Movement in Backward direction is not possible from begining of file (BOF)
Movement in Forward direction is not possible from end of file (EOF)
For updating a record, we must open the file in w+ or r+ mode to enable both reading and writing operations. We save the position of file pointer before reading the record and after reading the record, if record is matched with given criteria, then we will send the file pointer to begining of that record i.e. the position before reading the record

Program to update a record in a binary file (say file3.dat created above) :

import pickle
f1=open("file3.dat","rb+")
rollno=int(input("enter rollno of the record to update : "))
found=False
try:
    while True:

        pos=f1.tell()   
        record=pickle.load(f1)
        if ( record[0] == rollno):
            found=True
            print("Record is ")
            print(record)
            newrollno=int(input("enter updated rollno "))
            newname=input("enter updated name ")
            newmarks=int(input("enter updated marks "))
            f1.seek(pos,0)

            pickle.dump([newrollno,newname,newmarks],f1)
            print("Record is updated succesfully")
except EOFError:
    if found==False:
        print("Sorry, given rollno doesn't not exist")
f1.close()

Output 1:

enter rollno of the record to update : 2
Record is 
[2, 'amit', 450]
enter updated rollno 20
enter updated name anil
enter updated marks 450
Record is updated succesfully

Output 2:

enter rollno of the record to update : 6
Sorry, given rollno doesn't not exist

Note: Modification of the file must not change the data type of value being changed i.e. if a file contains a list of records, then we must update the new record as a list only. We can’t update the new record as a tuple, dictionary etc., otherwise we will get a unpickling error when we will read the file next time.

>>> # write 3 records as lists
>>> # in file file4.dat

>>> import pickle
>>> f1=open("file4.dat","wb")

>>> pickle.dump([1 , "sheetal" , 500] , f1)
>>> pickle.dump([2 , "amit" , 400] , f1)
>>> pickle.dump([3,"nidhi", 300], f1)

>>> f1.close()
>>> # update 2nd record 
>>> # as a tuple
>>> import pickle
>>> f1=open("file4.dat","rb+")
>>> pickle.load(f1)
[1, 'sheetal', 500]
>>> #after reading 1st record
>>> # file pointer is at 2nd >>> # location
>>> # so 2nd record will be 
>>> # updated
>>> pickle.dump((20,"anil",450) , f1)
>>> f1.close()
>>> f1=open("file4.dat","rb+")
>>> pickle.load(f1)
[1, 'sheetal', 500]
>>> pickle.load(f1)
(20, 'anil', 450)

>>> pickle.load(f1)
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    pickle.load(f1)
_pickle.UnpicklingError: could not find MARK

Standard Exceptions while using pickle module:
pickle.PicklingError: It is raised when we are trying to write an unpickable object into the file.
pickle.UnpicklingError: It is raised when we are trying to read or load an unpickable object from the file.

What if we need to change the data type of already existing objects in the file:
1) Create a new file
2) Write all the records into the new file that exists before the record (to be modified) in the old file.
3) Write the modified record in new file.
4) Write all the records into the new file that exists after the record (to be modified) in the old file.
5) Delete the old file using remove() function of os module
6) rename the newfile as oldfile using the rename() function of os module.

>>> # write 3 records as lists
>>> # in file file5.dat

>>> import pickle
>>> f1=open("file5.dat","wb")

>>> pickle.dump([1 , "sheetal" , 500] , f1)

>>> pickle.dump([2 , "amit" , 400] , f1)

>>> pickle.dump([3,"nidhi", 300], f1)

>>> f1.close()
>>> # open file 5 for reading
>>> f1=open("file5.dat","rb")
>>> # open file 6 for writing
>>> f2=open("file6.dat","wb")
>>> # Read 1st record from file >>> # 5 and write into file 6
>>> pickle.dump(pickle.load(f1) , f2)
>>> # read the 2nd record from >>> # file 5
>>> pickle.load(f1)
[2, 'amit', 400]
>>> # write the updated record
>>> # i.e. a tuple into file 6 
>>> pickle.dump((20,"anil",450) , f2)
>>> # Read 3rd record from file >>> # 5 and write into file 6
>>> pickle.dump(pickle.load(f1) , f2)
>>> f1.close()
>>> f2.close()
>>> import os

>>> # remove file5
>>> os.remove("file5.dat")

>>> # rename file6 as file5
>>> os.rename("file6.dat" , "file5.dat")
>>> f1=open("file5.dat","rb")
>>> pickle.load(f1)
[1, 'sheetal', 500]
>>> pickle.load(f1)
(20, 'anil', 450)
>>> pickle.load(f1)
[3, 'nidhi', 300]
>>> f1.close()
error: You can only copy the programs code and output from this website. You are not allowed to copy anything else.