DigitalGlassware™: Your Chemistry, Your Data, Your Insights

Welcome to the first in a series of posts on how you can access and use the data captured by DigitalGlassware to further your own chemistry insights. These posts will cover how to get your chemistry data, what is contained in it, perform analysis, and develop insights to further your own business requirements. All this is made possible through Deepmatter’s hardware and software platform, which records real time chemistry data and persists it in the cloud.

DigitalGlassware™:

DeepMatter believes better chemical outcomes can come from data collection and the resulting realtime and post-hoc analysis of the chemistry. More to the point, we understand that chemistry goes beyond lab work. As such, the person who performs the chemistry might want to do their own analysis of their aggregated data, or other members of the company may wish to investigate and review it at a micro or macro level.

This first blog post will introduce some of the things which can be achieved by parsing through the data captured by DigitalGlassware™, in its raw form. It assumes a basic knowledge of Python, but we’ll go through all the necessary steps to get you viewing the data nonetheless. If you are unfamiliar with some of the tools mentioned, we have included a list of these at the end of the article, along with the corresponding download locations.

However, before we begin, let's set up our environment for this and future blog articles. If you are comfortable using tools such as pip, pipenv or virtualenv, you can probably skim through the following section.

Environment Setup

In this and later blog posts we will be using Python to demonstrate use of the data. We’ll set up a Python environment using pipenv, as it will take care of all dependencies for us. Python 3.7 was used in the writing of this post, but anything above Python 3.5 will almost certainly work. You can check your Python version by opening a terminal/command prompt and entering the following.

1 python --version

If it says something like “Python 3.7” or higher, you should be good to go. If there is an error (something along the lines of “python not found” or “python is not a recognized command”), then you need to install Python on your local machine. You can get Python from the official webpage at https://www.python.org/downloads/. Be sure to get the 64-bit (x86-64) version, as we will be dealing with some big files in future posts, and the 32-bit (x86) version will not work. The 64-bit versions can be found further down the download page — the default download button will get the 32-bit version which we don’t want.

Download the appropriate version for your operating system and run the installer. If you are using Windows, be sure to tick the “Add Python 3.x to PATH” checkbox. Once installation is finished, open a new terminal/command prompt and run the above command again. If you still get an error, try looking at this page for some additional setup.

We’ll use Git for accessing the code and data associated with these blogs. Git is a way of tracking projects and distributing them to others. Git repositories can be downloaded at the command line, or via a Zip file directly from the repository’s webpage.

Check if you have Git installed by typing the following into the command prompt.

1 git --version

If you see something reporting the version of Git you have, then you can proceed. If you get an error, you can choose to install Git, or download the ZIP file of the Deepmatter Github repository and unzip it to an appropriate location on your hard drive. If/once you have Git installed, clone the Deepmatter repository (https://github.com/deepmatterltd/dm_datascience) to your local machine using the following command. This will download all the appropriate data and code used in this blog post.

1 git clone https://github.com/deepmatterltd/dm_datascience.git

Regardless of how you downloaded the repository, you now need to enter the directory containing the code.

1 cd dm_datascience

If you cloned/extracted the ZIP file to a directory other than this, you will need to adjust the command appropriately.

Next, if you don’t already have it, install pipenv to your local Python environment.

1 pip install pipenv

… and if you don’t have pip installed already, then install it. Now, let pipenv take care of all that messy installation of packages for you. This may take a minute or two to complete.

1 pipenv install

Alternatively, we’ve included the more traditional requirements.txt file in the root of the project. We’ll assume if you know what this is, then you will know what to do with it.

That’s all that should be required to get your environment setup. Pipenv will have created a virtual environment for you, which you can now enter by running

1 pipenv shell

This will place you inside a Python 3 environment all set up for our needs, and isolated from the rest of your wider Python environment. We can now have a look at some example data collected by DigitalGlassware™ from a real chemistry reaction. You can view this reaction on the public DigitalGlassware™ platform itself, at https://public.deepmatter.tech.

You can now go through the code in this post by copying and pasting it into a Python terminal. Start the terminal by typing…

1 python

… which will drop you into a Python 3 session. Alternatively, more advanced users may wish to use the Jupyter notebook available under dm_datascience/01_Setup_Intro/notebooks/introduction.ipynb (viewable online here).

Extracting Data From Chemistry Recipes

Now that we have our environment set up, we can start to look at the data collected by DigitalGlassware. Specifically, we’ll look at PCML (Practical Chemistry Markup Language), which represents the chemical “recipe” a chemist performs in the lab. The recipe we will use is synthesis of N-(1-Naphthoyl)-4-methylbenzenesulfonohydrazide, which appeared in Organic Synthesis 2018, volume 95, 276-288 (DOI: 10.15227/orgsyn.095.0276). PCML is an XML representation of a single recipe, containing the chemicals used, operations performed and user metadata, along with many other pieces of relevant data. You can review the journal article linked and see how the operations have been extracted, and in some cases additional text embedded (tips, safety warnings, expected versus actual weights used etc.). By encoding a recipe in this form we record an ever-widening view of the expected chemistry process, such that the recipe can be permanently recorded for retrieval, sharing, and to ultimately increase reproducibility and confidence in the process. PCML represents what is expected to happen during chemistry. In the next blog post, we will look at the data associated with what actually happened during chemistry. When the chemistry is executed, the PCML recipe is copied and annotated with additional data, such as when steps and operations were performed; who performed them in which lab; responses to questions embedded in the recipe; textual and image/photo notes and the associated outcomes.

1 <?xml version="1.0" encoding="UTF-8"?>

2 <pcml version="1.3.4" experiment="3a) Synthesis of N-(1-Naphthoyl)-4-methylbenzenesulfonohydrazide - 3a"

3    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="https://schemas.deepmatter.tech/pcml-1.3.4.xsd">

 4   <description></description>

 5  <meta>

 6       <owner>Deepmatter</owner>

7        <author>Deepmatter</author>

 8       <version>1</version>

 9       <version-author>Deepmatter</version-author>

 10       <version-owner>Deepmatter</version-owner>

11       <publication>Org. Synth. 2018, 95, 276-288</publication>

12        <custom-tag>3a</custom-tag>

 13       <creation-date>13/09/2018</creation-date>

 14   </meta>

15    <synopsis>Synthesis of N-(1-Naphthoyl)-4-methylbenzenesulfonohdrazide</synopsis>

16   <chemistry>

 17       <reaction-scheme>

 18           <inputs>

 19              <chemical vol="11.8" volunit="ml" step="synthesis" role="starting-material" moles="0.0788" molweight="190.63" >

20                    <name>1-Naphthoyl chloride</name>

21                </chemical>

22 ...

Future blog posts will look at PCML and other runtime data formats in much greater detail. However, for now let’s just highlight how this recipe encoding can be used in conjunction with Python to extract various interesting pieces of data.

First, we’ll use the lxml library to read the PCML into a Python object which we can manipulate. Copy the following code into your python terminal to read the PCML file into memory. You may have to press Shift+Enter, or Enter twice to get the commands to actually execute.

1 from lxml import etree

3 pcml_recipe_file = './01_Setup_Intro/data/3a_recipe.pcml'

4 pcml_obj = etree.parse(pcml_recipe_file)

We can now search over the PCML, looking for certain tags and patterns. To demonstrate this, let’s see which chemicals were used in the recipe by using Python to trawl the XML for us.

1 chem_elem = pcml_obj.find(".//chemicals")

2 for c in chem_elem:

3     print("Chemical: {}".format(c[0].text))

Output:

1 Chemical: 1-Naphthoyl chloride

2 Chemical: N-(4-methylbenzenesulfonyl)naphthalene-1-carbohydrazide

3 Chemical: 4-dimethylaminopyridine

4 ...

5 Chemical: dichloromethane

6 Chemical: hexane

7 Chemical: 1,3,5-trimethoxybenzene

What if we’re particularly concerned about certain safety codes being present? Let’s look for “H318”, which indicates there is a risk of serious eye damage if exposure occurs.

1 import itertools

2

3 code_to_search = "H318"

4 has_code = len(pcml_obj.xpath('.//safetycode/code[text()="{}"]'.format(code_to_search))) > 0

5 print("{} {} code associated with recipe chemicals".format("Found" if has_code else "Did not find", code_to_search))

Output:

1 Found H318 code associated with recipe chemicals

Now pull out all the unique safety codes present in the recipe.

1 safety_elem = pcml_obj.findall(".//safetycode/code")

2 all_s_codes = [s.text for s in safety_elem]

3

4 uniq_s_codes = set(itertools.chain.from_iterable([x.split(" + ") for x in all_s_codes]))

5 print("Found the following unique safety codes:", sorted(uniq_s_codes))

Output:

1 Found the following unique safety codes: ['H-N/A', 'H-Unknown', 'H225', 'H242', 'H301', 'H302', 'H304', 'H310', 'H311', 'H314', 'H315', 'H318', 'H319', 'H331', 'H335', 'H336', 'H351', 'H361d', 'H373', 'H411', 'H412', 'P-Unknown', 'P201', 'P210', 'P261', 'P264', 'P273', 'P280', 'P301', 'P302', 'P303', 'P304', 'P305', 'P308', 'P310', 'P312', 'P313', 'P330', 'P331', 'P337', 'P338', 'P340', 'P351', 'P352', 'P353', 'P361', 'P370', 'P378', 'R-N/A', 'R-Unknown', 'S-N/A', 'S-Unknown']

The role of a chemical in a reaction is crucial. Let’s list off the counts of the various roles the chemicals in our PCML correspond to.

1 from collections import Counter

2 import pprint

3

4 role_elems = pcml_obj.xpath('.//chemicals/chemical')

5 role_counts = Counter([r.get("role", None) for r in role_elems])

6

7 pp = pprint.PrettyPrinter()

8 pp.pprint(role_counts)

Output:

1 Counter({'reagent': 4,

2         'solvent': 3,

3         'washing-solution': 3,

4         'starting-material': 1,

5         'product': 1,

6         'quenching-solution': 1,

7         'drying-agent': 1})

To wrap things up, let’s count how many operations are in each step of the recipe and then enumerate all of the operations in the Synthesis step. An operation typically corresponds to a physical action performed by the chemist which could have an impact on the recorded sensor data. The time at which the operation was performed at runtime then provides context on what is observed in the sensor data (as can be seen on the public version of DigitalGlassware).

1 from collections import defaultdict

2

3vstep_ops = defaultdict(list)

4vop_elems = pcml_obj.xpath('/pcml/step/group/operation')
5 for oe in op_elems:

6    step_ops[oe.getparent().getparent().get("type")].append(oe)

7

8 for step, ops in step_ops.items():

9    print("{} has {} operations".format(step, len(ops)))

Output:

1 synthesis has 30 operations

2 isolation has 20 operations

3 purification has 14 operations

4 analysis has 14 operations

5 characterisation has 6 operations

Finally, print out the operations of the Synthesis step, in order.

1 for i, op in enumerate(step_ops.get("synthesis"), 1):

2     print("Operation {}: {}".format(i, op.find("text").text))

Output (abbreviated):

1 Operation 1: The NCU should be powered and connected to the internet.

2 Operation 2: All sensors used (i.e DeviceX, ESP, IKA hot plate) should be on.

3 Operation 3: Pick up a clean, oven-dried, 1 L three-neck round bottom flask

4 ...

5 Operation 28: Transfer the solvent into the addition funnel.

6 Operation 29: Add the solution from the dropping funnel dropwise over a period of 10 minutes.

7Operation 30: Allow the reaction to stir in an ice bath for 30 minutes.

Conclusion

This has been a brief introduction to the data which is generated and collected by DigitalGlassware. By having access to this baseline data, chemistry companies and others can use it in a variety of functions. Chemists and data scientists can record and review their recipe and runtime data; lab managers can track experimentation, and data analysts can build dashboards to track chemical resources and costs.

The next post will look further at PCML and its runtime cousin PCRR (Practical Chemistry Runtime Record), highlighting how Deepmatter have adhered to open standards for data recording and security, in order that our customers can have full confidence in the DigitalGlassware platform. Until then, why not have a look over our other blog posts, try out DigitalGlassware for yourself, register for updates, or contact us for further information at enquiries@deepmatter.io.

Appendix: Software Used

The article assumes that you have prior knowledge of things like Python and Git, but we know there will be readers who want to look at their data that may be unfamiliar with these. The following is a list of the tools used in the blog, where to get them and links on how to install them. Once you have everything installed, you should be able to follow the steps outlined above.

Software

Website

Installation Instructions

Tutorial

Python

https://www.python.org/downloads/

https://wiki.python.org/moin/BeginnersGuide/Download

https://www.programiz.com/python-programming/tutorial

Git

https://git-scm.com/downloads

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git