5.4. Reading data from file

The current version of PyBEST supports the following file formats to read specific wave function information from disk,

File format	Description
.h5	PyBEST’s internal format. This format allows you to read all PyBEST objects from a binary file. All internal checkpoint files that are dump to disk use this format.
.xyz	Read some molecular coordinates form an xyz file. By default all coordinates are transformed from Angstrom to bohr (atomic units).
.molden	Read orbitals, coordinates, and basis set information from a molden file. This works for basis sets that include up to g functions.
.mkl	Read orbitals, coordinates, and basis set information from a molekel file.
.FCIDUMP	Read Hamiltonian (including the external terms) in the Molpro FCIDUMP format. All one-electron integrals are contracted to one single term. PyBEST also return the molecular orbitals and overlap matrix, assuming that the molecular orbitals form an orthonormal set.

When reading data from one of the above mentioned file formats, PyBEST will assign it to some IOData container. The wave function or molecular information are thus stored as its attributes using the default attribute names defined in Naming conventions in PyBEST.

Note

If you use the internal format to store your own checkpoint files and you choose different variable names, PyBEST stores the corresponding objects under the user-defined attribute names. If such a checkpoint file is read in, those attributes are accessible under the user-defined names. Note, however, that some operations might not be fully supported if you decide to break PyBEST’s naming convention.

Similar to the dumping procedure (Dumping data to file), PyBEST automatically recognizes the (supported) file format: the from_file() method stores the corresponding date in an instance of the IOData container,

# Read data from some internal (checkpoint) file and store it to the IOData
# container data
# ---------------------------------------------
data = IOData.from_file('checkpoint.h5')

Changing the file extension to one of the supported file formats mentioned above will steer PyBEST’s reading behavior.

5.4.1. Accessing the `IOData` container

When reading data from disk using the from_file() method, an instance of the IOData container is created and all data that is contained in the file is stored as attributes of the container. Once constructed, you can access and modify the corresponding attributes on the fly. This can be done in a similar manner as explained in Dumping data to file. The code snippet below shows how to assign, update, and delete attributes (all objects are defined in Naming conventions in PyBEST),

# Read internal checkpoint file
# -----------------------------
data = IOData.from_file("checkpoint.h5")

# Print all attributes that are contained in checkpoint file
# ----------------------------------------------------------
print("\ninternal file:")
print(data.__dict__)

# Modify data as you please
# -------------------------
del data.eri
print(data.__dict__)

5.4.2. Reading the internal h5 format

The example below, shows how to read an internal checkpoint file (see also previous section), which ends with the file extension .h5,

# Read internal checkpoint file
# -----------------------------
data = IOData.from_file("checkpoint.h5")

# Print all attributes that are contained in checkpoint file
# ----------------------------------------------------------
print("\ninternal file:")
print(data.__dict__)

5.4.3. Reading an xyz file

The example below, summarizes all steps to read in molecular coordinates form an xyz file using the file extension .xyz. The corresponding IOData container stores the coordinates under the attribute coordinates (a np.array), while the atoms are stored as a list (either str or int) under the attribute atom,

# Read xyz file (atoms and coordinates only)
# ------------------------------------------
data = IOData.from_file("mol.xyz")

# Print all attributes that are read in from xyz file
# ---------------------------------------------------
print("\nxyz file:")
print(data.__dict__)

5.4.4. Reading a molden file

A detailed instruction on how to export orbitals to the molden format can be found in Generating molden files. The example below, briefly summarizes how to read in a molden file and how to access its corresponding attributes,

# Read molden file
# ----------------
data = IOData.from_file("water-scf.molden")

# Print all attributes that are read in from xyz file
# ---------------------------------------------------
print("\nmolden file:")
print(data.__dict__)

# Access attributes separately
coord = data.coordinates  # np.array
factory = data.gobasis  # Basis instance
atom = data.atom  # list of str
orb_a = data.orb_a  # orbitals

Once a molden file has been read in, you can, for instance, use the gobasis attribute to calculate some Hamiltonian matrix elements (see Computing the matrix representation of the Hamiltonian).

5.4.5. Reading a Hamiltonian in the FCIDUMP format

A detailed instruction on how to export a Hamiltonian into the FCIDUMP format can be found in FCIDUMP format. The example below, briefly summarizes how to read in some external Hamiltonian from a FCIDUMP file and how to access its corresponding attributes,

# Read FCIDUMP file
# -----------------
data = IOData.from_file("hamiltonian_mo.FCIDUMP")

# Print all attributes that are read in from FCIDUMP file
# -------------------------------------------------------
print("\nFCIDUMP file:")
print(data.__dict__)

# Access attributes separately
one = data.one  # one-electron integrals
two = data.two  # two-electron integrals
e_core = data.e_core  # core energy
orb_a = data.orb_a  # orbitals (assuming orthonormal orbitals)
olp = data.olp  # overlap matrix (assuming orthonormal orbitals)
lf = data.lf  # an instance of DenseLinalgFactory

5.4.6. Example Python scripts

Several complete examples can be found in the directory data/examples/iodata.

5.4.6.1. Summary of all supported reading options

This is a basic example that summarizes all steps mentioned above, namely, how to read and access data from the internal .h5, the .xyz, the .molden, and the FCIDUMP format.

Note

This example will only works if you execute the dumping example here first.

Listing 5.2 data/examples/iodata/reading.py

from pybest.io import IOData

# Read internal checkpoint file
# -----------------------------
data = IOData.from_file("checkpoint.h5")

# Print all attributes that are contained in checkpoint file
# ----------------------------------------------------------
print("\ninternal file:")
print(data.__dict__)

# Modify data as you please
# -------------------------
del data.eri
print(data.__dict__)

# Read xyz file (atoms and coordinates only)
# ------------------------------------------
data = IOData.from_file("mol.xyz")

# Print all attributes that are read in from xyz file
# ---------------------------------------------------
print("\nxyz file:")
print(data.__dict__)

# Read molden file
# ----------------
data = IOData.from_file("water-scf.molden")

# Print all attributes that are read in from xyz file
# ---------------------------------------------------
print("\nmolden file:")
print(data.__dict__)

# Access attributes separately
coord = data.coordinates  # np.array
factory = data.gobasis  # Basis instance
atom = data.atom  # list of str
orb_a = data.orb_a  # orbitals

# Read FCIDUMP file
# -----------------
data = IOData.from_file("hamiltonian_mo.FCIDUMP")

# Print all attributes that are read in from FCIDUMP file
# -------------------------------------------------------
print("\nFCIDUMP file:")
print(data.__dict__)

# Access attributes separately
one = data.one  # one-electron integrals
two = data.two  # two-electron integrals
e_core = data.e_core  # core energy
orb_a = data.orb_a  # orbitals (assuming orthonormal orbitals)
olp = data.olp  # overlap matrix (assuming orthonormal orbitals)
lf = data.lf  # an instance of DenseLinalgFactory