Part III: Customising your workflow #

So far we have covered python and simulation workflows with LAMMPS. We have seen how pyiron can help your data analysis. Now we will consider a case when you have a workflow which you program in python, and then convert to a pyiron custom Job. Converting it to Job ensures that you can use a number of powerful features that pyiron provides, such as data management, job management and so on.

In this example, we start from a datafile in csv format. The file containes data from a tensile test of typical S355 (material number: 1.0577) structural steel (designation of steel according to DIN EN 10025-2:2019). The data were generated in the frame of the digitization project Innovationplatform MaterialDigital (PMD) which, amongst other activities, aims to store data in a semantically and machine understandable way. Therefore, data structuring and data formats are focused in addition to aspects in the field of material science and engineering (MSE).

First, we will use python to extract the Youngs modulus from the data. Then we will make a pyiron Job from it.

As usual we start with some libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

We will read in the first csv file using pandas. Note that we use a number of other keyword arguments.

df = pd.read_csv("tensile_test/dataset_1.csv", delimiter=";", header=[0,1], decimal=',')
df

As expected from a tensile test, the data contains the load and the elongation. First we need to convert the load to stress in MPa. The area in \(mm^2\) of the sample is 120.636.

df['Stress'] = df['Load']*1000/120.636

We calculated the stress and added it as a new column. Note that we converted kN to N. Now we can plot this data.

plt.plot(df['Extensometer elongation'], df['Stress'])
plt.xlabel("Strain [%]");
plt.ylabel("Stress [MPa]");

That looks great, we can find the Youngs modulus from the elastic part. For this we will consider the data upto 0.25% Strain.

plt.plot(df['Extensometer elongation'], df['Stress'])
plt.xlabel("Strain [%]");
plt.ylabel("Stress [MPa]");
plt.axvline(0.25, color="gray", ls='dashed')

Now we need to find the index of the Strain value closest to 0.2. We extract the values and work with numpy arrays

strain = df['Extensometer elongation'].values.flatten()
stress = df['Stress'].values.flatten()
np.argsort(np.abs(np.array(strain)-0.2))[0]

We need to consider stress and strain upto the 370th index. We can plot and confirm this.

plt.plot(strain[:305], stress[:305])
plt.xlabel("Strain [%]");
plt.ylabel("Stress [MPa]");

Now all we need to do is to fit this data to a straight line and get the slope

fit = np.polyfit(strain[:305], stress[:305], 1)
fit

We do some unit conversions to change strain to a ratio, and convert to GPa

fit[0]*(1/0.01)/1000

Great we have successfully managed to calculate the Youngs modulus. But if we have to now calculate the results again for a different set of data (repetition of the experiment), we will have to redo the whole code again. Furthermore we would have to take care of naming files and folders in an appropriate manner. We will tackle this problem using pyiron. We will create a Job class, similar to ones we saw before.

Since we use python for our analysis we will use a PythonTemplateJob from pyiron. These are easy to use templates for convenience.

from pyiron_base import PythonTemplateJob
from pyiron import Project

Note that we also used pyiron_base instead of pyiron. This module provides the core functionality of pyiron. Now we will make a class.

class YoungsModulusJob(PythonTemplateJob):
    def __init__(self, project, job_name):
        super().__init__(project, job_name)
    
    def run_static(self):
        print("run")
    
    

That is all the code needed to manage a Job. Pyiron provides a lot of different things to manage your work. For example, all pyiron jobs are saved in hdf5 format on the disk with all input and output parameters. This means that you can always reproduce your calculation without losing information. To facilitate this, pyiron provides job.input and job.output. Whatever you save in these fields will be automatically stored. We will customise the class above to suit our needs.

We will:

  • Add necessary inputs

  • Add necessary outputs

  • Function to process data

  • Function to calculate Youngs modulus

  • Function to plot the Stress-strain curve

class YoungsModulusJob(PythonTemplateJob):
    def __init__(self, project, job_name):
        super().__init__(project, job_name)
        #now we define our input parameters
        #first one, input file
        self.input.filename = None
        #then the sample area
        self.input.area = None
        #we should also take a strain cutoff to identify the linear region
        self.input.strain_cutoff = 0.2 
    
    def read_input(self):
        """
        My custom function to read an input file and process it
        """
        df = pd.read_csv(self.input.filename, delimiter=";", header=[0,1], decimal=',')
        df['Stress'] = df['Load']*1000/self.input.area
        self.input.load = df['Load']*1000
        self.input.strain = df['Extensometer elongation'].values.flatten()
        self.input.stress = df['Stress'].values.flatten()
        #note that prefixed some values with self.input, these will be stored
    
    def calculate_youngs_modulus(self):
        """
        My custom job to calculate Youngs modulus
        """
        arg = np.argsort(np.abs(np.array(self.input.strain)-self.input.strain_cutoff))[0]
        fit = np.polyfit(self.input.strain[:arg], self.input.stress[:arg], 1)
        self.output.youngs_modulus = fit[0]*(1/0.01)/1000
        with self.project_hdf5.open("output") as h5out: 
             h5out["youngs_modulus"] = fit[0]*(1/0.01)/1000
    
    def plot(self):
        """
        Function to plot
        """
        plt.plot(self.input.strain, self.input.stress)
        plt.xlabel("Strain [%]");
        plt.ylabel("Stress [MPa]");
    
    def run_static(self):
        """
        And the last function, this tells pyiron what to execute
        """
        #first read input
        self.read_input()
        #then calculate 
        self.calculate_youngs_modulus()
        self.status.finished = True

The class is complete now. We can try it out.

pr = Project("custom_project_4")
job=pr.create_job(job_type=YoungsModulusJob, job_name='y1', delete_existing_job=True)
job.input.filename = "tensile_test/dataset_1.csv"
job.input.area = 120.636
job.run()

First we can plot and see the curves

job.plot()

Now we can check the results

job.output

Task

Use this class and calculate Youngs modulus for the other datasets

We can do this easily with a loop. Uncomment the below lines to get the solution.

# %load solution_3.py

The pyiron job table#

pyiron offers a feature to check your jobs at a glance

pr.job_table()

You can see that all the jobs we ran are indexed there along with the associated metadata. This is a powertool tool with which we can do further analysis. Now we calculated the Youngs modulus for the independent repetitions of the experiment. What if you want to calculate an average value over all the experiments?

First we create a pyiron table

table = pr.create.table("table_youngs", delete_existing_job=True)

Now we need to add some conditions to add data to the table. First we will filter jobs for the table. We will only consider jobs that have Hamilton YoungsModulusJob.

def get_only_youngs(table):
    return (table.hamilton == "YoungsModulusJob")

We add this as a filter function

table.db_filter_function = get_only_youngs

Now we create a function to extract the Youngs modulus from the output

def get_youngs(job_path):
    return job_path["output/youngs_modulus"]

We add this function to the table

job["output"]
table.add["youngs_modulus"] = get_youngs

Now we can run the table

table.run()

The table has finished execution. Now we can look at the results.

tdf =  table.get_dataframe()
tdf
tdf['youngs_modulus'].mean()

Further reading..