Creating and working with structure databases #

Before the excercise, you should:

  • Run the 01_Introduction_Pyiron.ipynb notebook

The aim of this exercise is to make you familiar with:

  • Creating structure databases and working with them for potential fitting (day 2)

Importing necessary modules and creating a project#

This is done the same way as shown in the first exercise

import numpy as np
%matplotlib inline
import matplotlib.pylab as plt
import os
from pyiron import Project
/srv/conda/envs/notebook/lib/python3.8/site-packages/pkg_resources/__init__.py:123: PkgResourcesDeprecationWarning: NOT-A-GIT-REPOSITORY is an invalid version and will not be supported in a future release
  warnings.warn(
pr = Project("creating_datasets")

Creating a structure “container” from the data#

We now go over the jobs generated in the first notebook to store structures, energies, and forces into a structure container which will later be used for potential fitting

Note: Usually these datasets are created using highly accurate DFT calculations. But for practical reasons, we only demonstrate how to do this using data from LAMMPS calculations (the workflow remain the same)

Access the project created in exercise 1. .. means go up one folder in the directory tree as usual in linux.

pr_fs = pr["../first_steps"]

Create a TrainingContainer job (to store structures and databases).

container = pr.create.job.TrainingContainer('dataset_example')

Add structures from the E-V curves#

For starters, we append structures from the energy volume curves we calculated earlier

for job in pr_fs["E_V_curve"].iter_jobs(status="finished"):
    container.include_job(job)

We can obtain this data as a pandas table

container.to_pandas()
name atoms energy forces stress number_of_atoms
0 job_a_3_8 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.192897 [[1.6276043110675176e-16, 1.0529105848988852e-16, 5.1718187378489436e-17]] [25.037460604415703, 25.037460605376904, 25.037460604885503, 9.278150850360161e-10, 9.583505518476542e-10, -6.776540272768861e-10] 1
1 job_a_3_9 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.319542 [[7.639186604470381e-18, 1.2897999183801795e-17, 6.560662375038692e-17]] [11.7835809621756, 11.783580962894401, 11.783580964499402, -2.9210890660206204e-11, 4.5548187747551897e-10, -3.2208895273694305e-10] 1
2 job_a_4_0 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.367063 [[-3.5024524628727396e-17, -1.320930466294525e-17, 5.849496262865055e-18]] [2.1774865934278202, 2.17748659599896, 2.1774865953501803, -1.2204712117387501e-11, 8.04195020225017e-10, 1.13730200248357e-09] 1
3 job_a_4_1 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.360600 [[-2.2377622695133157e-17, -4.0689075283847526e-17, 2.1062919550300275e-17]] [-3.32656345373484, -3.32656345287999, -3.32656345199962, 1.44346086248294e-10, 2.68736816186093e-10, 3.8004542651085804e-10] 1
4 job_a_4_2 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.317017 [[2.1405562304448044e-17, 9.465137265930533e-17, -1.614674972511662e-17]] [-7.34400540416366, -7.344005404000251, -7.3440054020844014, 5.444172612856411e-10, 9.032537237010151e-10, -6.38697915161281e-10] 1
5 job_a_4_3 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.241535 [[-5.0018187940959333e-17, -7.753256254350387e-17, -7.947668332487412e-17]] [-10.206225127542101, -10.2062251265863, -10.2062251275071, 2.3284396085851003e-10, 2.69898641821963e-10, 3.8169349934623504e-10] 1
6 job_a_4_4 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.145751 [[-7.31320096256601e-17, 2.773206044106321e-16, -1.2031135854225408e-16]] [-11.0438299224574, -11.0438299224868, -11.0438299224679, -5.24346994489793e-11, -2.1255620053309103e-11, 1.50307693032171e-11] 1

Add structures from the MD#

We also add some structures obtained from the MD simulations

Reloading the MD job. Indexing a project loads jobs within.

job_md = pr_fs["lammps_job"]

We can now iterate over the structures within and add each of them to the container.

traj_length = job_md.number_of_structures
stride = 10

By default include_job will fetch the last computation step from the given job for other steps you have to explicitly pass which step you want.

for i in range(0, traj_length, stride):
    container.include_job(job_md, iteration_step=i)

Add some defect structures (vacancies, surfaces, etc)#

It’s necessary to also include some defect structures, and surfaces to the training dataset.

Setup a MD calculation for a structure with a vacancy.

job_lammps = pr.create.job.Lammps("lammps_job_vac")
job_lammps.structure = pr.create.structure.bulk("Al", cubic=True, a=3.61).repeat([3, 3, 3])

remove the first atom of the structure to create the vacancy

del job_lammps.structure[0]
job_lammps.potential = "2005--Mendelev-M-I--Al-Fe--LAMMPS--ipr1"
job_lammps.calc_md(temperature=800, pressure=0, n_ionic_steps=10000)
job_lammps.run()
2023-03-21 16:54:04,186 - pyiron_log - WARNING - The job lammps_job_vac is being loaded instead of running. To re-run use the argument 'delete_existing_job=True in create_job'

Setup a MD calculation for a surface structure. This function follows the ASE interface to create surfaces.

job_lammps = pr.create.job.Lammps("lammps_job_surf")
job_lammps.structure = pr.create.structure.surface("Al", surface_type="fcc111", size=(4, 4, 8), vacuum=12, orthogonal=True)
job_lammps.potential = "2005--Mendelev-M-I--Al-Fe--LAMMPS--ipr1"
job_lammps.calc_md(temperature=800, pressure=0, n_ionic_steps=10000)
job_lammps.run()
2023-03-21 16:54:05,603 - pyiron_log - WARNING - The job lammps_job_surf is being loaded instead of running. To re-run use the argument 'delete_existing_job=True in create_job'
pr
{'groups': [], 'nodes': ['lammps_job_vac', 'lammps_job_surf']}

We now add these structures to the dataset like we did before.

for job_md in pr.iter_jobs(status="finished", hamilton="Lammps"):
    stride = 10
    for i in range(0, job.number_of_structures, stride):
        container.include_job(job_md, iteration_step=i)

We run the job to store this dataset in the pyiron database. Without running the training container “job” the data will not saved!

container.run()
The job dataset_example was saved and received the ID: 57
pr.job_table()
id status chemicalformula job subjob projectpath project timestart timestop totalcputime computer hamilton hamversion parentid masterid
0 51 finished Al107 lammps_job_vac /lammps_job_vac None /home/joyvan/potentials/introduction/creating_datasets/ 2023-03-17 12:05:20.160750 2023-03-17 12:05:23.753826 3.0 pyiron@jupyter-poul#1 Lammps 0.1 None None
1 52 finished Al128 lammps_job_surf /lammps_job_surf None /home/joyvan/potentials/introduction/creating_datasets/ 2023-03-17 12:05:24.161941 2023-03-17 12:05:27.544571 3.0 pyiron@jupyter-poul#1 Lammps 0.1 None None
2 57 finished None dataset_example /dataset_example None /home/joyvan/potentials/introduction/creating_datasets/ 2023-03-21 16:54:33.226455 NaT NaN pyiron@jupyter-poul#1 TrainingContainer 0.4 None None

Reloading the dataset#

This dataset can now be reloaded anywhere to use in the potential fitting procedures

dataset = pr["dataset_example"]
dataset.to_pandas()
name atoms energy forces stress number_of_atoms
0 job_a_3_8 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.192897 [[1.6276043110675176e-16, 1.0529105848988852e-16, 5.1718187378489436e-17]] [25.037460604415703, 25.037460605376904, 25.037460604885503, 9.278150850360161e-10, 9.583505518476542e-10, -6.776540272768861e-10] 1
1 job_a_3_9 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.319542 [[7.639186604470381e-18, 1.2897999183801795e-17, 6.560662375038692e-17]] [11.7835809621756, 11.783580962894401, 11.783580964499402, -2.9210890660206204e-11, 4.5548187747551897e-10, -3.2208895273694305e-10] 1
2 job_a_4_0 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.367063 [[-3.5024524628727396e-17, -1.320930466294525e-17, 5.849496262865055e-18]] [2.1774865934278202, 2.17748659599896, 2.1774865953501803, -1.2204712117387501e-11, 8.04195020225017e-10, 1.13730200248357e-09] 1
3 job_a_4_1 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.360600 [[-2.2377622695133157e-17, -4.0689075283847526e-17, 2.1062919550300275e-17]] [-3.32656345373484, -3.32656345287999, -3.32656345199962, 1.44346086248294e-10, 2.68736816186093e-10, 3.8004542651085804e-10] 1
4 job_a_4_2 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.317017 [[2.1405562304448044e-17, 9.465137265930533e-17, -1.614674972511662e-17]] [-7.34400540416366, -7.344005404000251, -7.3440054020844014, 5.444172612856411e-10, 9.032537237010151e-10, -6.38697915161281e-10] 1
5 job_a_4_3 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.241535 [[-5.0018187940959333e-17, -7.753256254350387e-17, -7.947668332487412e-17]] [-10.206225127542101, -10.2062251265863, -10.2062251275071, 2.3284396085851003e-10, 2.69898641821963e-10, 3.8169349934623504e-10] 1
6 job_a_4_4 [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -3.145751 [[-7.31320096256601e-17, 2.773206044106321e-16, -1.2031135854225408e-16]] [-11.0438299224574, -11.0438299224868, -11.0438299224679, -5.24346994489793e-11, -2.1255620053309103e-11, 1.50307693032171e-11] 1
7 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -363.917370 [[5.8841820305133305e-15, 3.7990444123892135e-16, 2.740863092043364e-16], [5.02375918642883e-15, -1.5751289161869397e-15, -1.3274971399912499e-15], [1.02695629777827e-15, 8.812395257962181e-16, -8... [0.999556665124294, 0.9904736758167861, 0.824951894171107, -0.017955018197828142, 0.06363360519613635, -0.04256361660396504] 108
8 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -352.327898 [[-0.75036385499113, 0.4598380918639449, 0.725200216845603], [-0.271732166045788, 0.302073802280348, 0.257384300490495], [0.448407157614891, -0.296549448310268, 0.241166662468148], [-0.00933123696... [-0.20333593198685002, -0.0514950077978542, -0.380336649881006, -0.192008378726288, -0.0008491478026102756, -0.021247590027086812] 108
9 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -349.888834 [[-0.380822550162949, 0.795020858927261, 0.552227922138795], [-0.815959874301847, -0.02386117799239805, -0.47623155209815404], [-0.286795110662345, 0.30418979949872, 0.970569998348215], [-0.550088... [-0.636237386917572, -1.22215332293502, -0.718802458515107, 0.03341135510320849, -0.4677817561429891, -0.15083572558775005] 108
10 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -350.830057 [[-0.420012726158848, -0.266177748010431, -0.10061532349639205], [0.127384145208824, -0.248152628480852, -0.154576243850877], [-0.968537642185758, -0.39199433409687007, 0.11778481825176891], [-0.5... [-0.5131786603043991, 0.08578602954855179, -0.487658631946179, -0.009605776130282108, -0.19413518570039506, -0.21118720406901204] 108
11 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -350.702007 [[0.9153781717748, -0.5196757756775509, 0.498517073710415], [-0.667528324894004, -0.21672146416275304, -0.31211194811421505], [0.120312800491091, -0.043411302241849095, -0.406043538747074], [0.235... [-0.51351056300053, -0.20205565562462502, -0.3496115696193531, 0.06566750820152925, -0.549794514544474, -0.08985658356369164] 108
12 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -350.283347 [[0.235132393610786, -0.92320491047853, 0.23010090806109595], [0.4464059221284, 0.599380126332439, -0.47014243704993197], [-0.0750267922586507, 0.262245923203885, 0.479967633267096], [-0.694735048... [-0.578429638769351, -0.434046865104863, -0.168847000423391, 0.14404756870586297, -0.10672866413529905, -0.31830962918949407] 108
13 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -351.595189 [[-0.10082749580572, 0.523484145402097, -1.0350343625323], [0.354729759304219, 0.701653159364364, 0.7878115361205081], [-0.0367556462367837, -0.495902791299659, 1.00115978036795], [-0.010109163992... [-0.48245036720220597, -0.6429619358987481, -0.45615891646821305, 0.12156601043838997, 0.028722694543442163, -0.13886960361517803] 108
14 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -351.355429 [[-0.53033805263242, 0.30151077109769797, 0.48889617311781], [0.365332073308618, 0.257204756311451, 0.46421415718266706], [-0.55795462205606, -0.528920209655739, -1.01193560142206], [-0.8349912505... [0.13501098545634, 0.8436525885969681, 0.633953784861439, -0.09832009167866315, -0.0583454827760237, -0.148099371736581] 108
15 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -349.861266 [[-0.735834944299172, 0.16983472103263894, 0.30896574018502293], [-0.56617377829579, -0.06252047071209063, -0.18361928260349702], [-0.326718424112994, -0.721001479459434, 0.09705913082936173], [0.... [0.446586866327166, 0.13068019034171802, -0.06937589902723172, -0.263866851749946, 0.10925215678469705, 0.13607792544806202] 108
16 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -352.007585 [[0.301178602933076, 0.258002276944641, -0.27143156569769294], [-0.298930228924858, -0.03075712133705462, -0.364812391605405], [-0.797427142712701, -0.74286906074633, -0.4273785140338071], [0.0727... [-0.757934402614806, -0.614667264232027, -0.667927922027262, -0.20235825539995306, -0.18872796386125404, 0.387520484780866] 108
17 lammps_job [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -350.590328 [[0.224009094047, -0.839607604551706, 0.777637424807001], [0.694708728593008, -0.01978842244270086, -0.563089538819962], [-0.0726046276015651, 0.180208519203361, 0.157259401253289], [0.11284631102... [-0.57750724810661, -0.3307052919049551, -0.24897277922262204, 0.00354743839297119, 0.21765237954049194, -0.37280357166477507] 108
18 lammps_job_vac [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -290.793066 [[-3.1974423109204496e-14, -0.5904705706758219, -0.590470570675822], [-0.5904705706758229, -3.325957990662624e-14, -0.590470570675822], [-0.5904705706758219, -0.590470570675821, -3.642777903722078... [58.87969078759508, 58.65874845858828, 58.50796880315438, -0.047554178778302515, -0.33314286846391333, 0.07410497611298358] 107
19 lammps_job_surf [element: [None, AtomicNumber 13\nAtomicRadius 118.0\nAtomicMass 26.981539\nColor Silver\nCovalentRadius ... -428.609075 [[2.44249065417534e-15, 4.56905929757667e-10, 0.314474097336679], [-9.29811783123569e-16, 4.56905354696835e-10, 0.314474097336678], [2.7611647777231502e-15, 4.56905929757667e-10, 0.314474097336678... [-0.693731819636784, -0.575956660364361, -0.802656043307567, -0.06679561501062026, -0.14566337813487507, -0.020996350092703946] 128

We can now inspect the data in this dataset quite easily

struct = dataset.get_structure(10)
struct.plot3d()
dataset.plot.energy_volume();
../../_images/a635eda5874f1d6fed93591968daa49aa4f9f74e466f54247caf2685dcbe2ae6.png
dataset.plot.forces()
../../_images/2a7b6d8d3bbb02e641a6e937679d10ba7c38a9d6a21d0314bad9f0ae42cced61.png

The datasets used in the potential fitting procedure for day 2 (obtained from accurate DFT calculations) will be accessed in the same way.

Extra Credit#

  1. Add more interesting structures. Ideas:

    • Dimer, trimers

    • Cleaving of a bulk structure, i.e. create a super cell and separate the atoms along a chosen plane

    • high or low pressure MD

    • Different crystal structures