Introduction to atomman: system_model conversions

Lucas M. Hale, lucas.hale@nist.gov, Materials Science and Engineering Division, NIST.

Disclaimers

1. Introduction

The system_model format provides a direct representation of an atomman.System object that can be equivalently saved as either JSON or XML atomman.System class. As it is specifically designed for the System class, it captures all information about the system.

1.1. Notes on the system model format

The system model format was updated starting atomman version 1.2.7. This format change was done to provide consistent representation of the system in data model format with respect to other atomman objects and to ensure that all data defining the System class is captured in the data model. Subsequent versions of atomman may add fields to the model if the System class adds representations for them, but will likely remain compatible with any version of atomman after 1.2.7. Unfortunately, the system models generated by versions of atomman <= 1.2.6 are distinctly different and therefore not compatible or supported anymore.

The sytem model format is a tree-like structure, with a single root element, “atomic-system”, and multiple subelements. For consistency with the other atomman objects, the System model contains the models for the system’s box and atoms. Expressing paths using periods to separate elements and subelements, the model consists of:

  • “atomic-system.box” is the model for the System’s Box containing values for the vects and the origin.

  • “atomic-system.periodic-boundary-condition” lists the three boolean pbc values.

  • “atomic-system.atomic-type-symbol” lists the symbols associated with each atom type.

  • “atomic-system.atomic-type-mass” Added version 1.3.0 lists the masses associated with each atom type, if any were assigned.

  • “atomic-system.atoms” is the model for the System’s Atoms and contains all per-atom values.

Consistent with atomman.unitconvert.model(), all multidimensional data is represented as a flattened array combined with shape parameters. This choice allows the data model to be equivalently represented as JSON or XML while remaining optimized for JSON/Python-based handling.

Library Imports

[1]:
# Standard Python libraries
import os
import datetime

# http://www.numpy.org/
import numpy as np

# https://github.com/usnistgov/DataModelDict
from DataModelDict import DataModelDict as DM

# https://github.com/usnistgov/atomman
import atomman as am
import atomman.unitconvert as uc

# Show atomman version
print('atomman version =', am.__version__)

# Show date of Notebook execution
print('Notebook executed on', datetime.date.today())
atomman version = 1.4.10
Notebook executed on 2023-07-28

Generate test system information (CsCl)

[2]:
# Generate box
alat = uc.set_in_units(3.2, 'angstrom')
box = am.Box(a=alat, b=alat, c=alat)

# Generate atoms with atype, pos, charge, and stress properties
atype = [1, 2]
pos = [[0,0,0], [0.5, 0.5, 0.5]]
charge = uc.set_in_units([1, -1], 'e')
stress = uc.set_in_units(np.zeros((2, 3, 3)), 'MPa')
atoms = am.Atoms(pos=pos, atype=atype, charge=charge, stress=stress)

# Build system from box and atoms, and scale atoms
system = am.System(atoms=atoms, box=box, scale=True, symbols=['Cs', 'Cl'])

# Print system information
print(system)
system.atoms_df()
avect =  [ 3.200,  0.000,  0.000]
bvect =  [ 0.000,  3.200,  0.000]
cvect =  [ 0.000,  0.000,  3.200]
origin = [ 0.000,  0.000,  0.000]
natoms = 2
natypes = 2
symbols = ('Cs', 'Cl')
pbc = [ True  True  True]
per-atom properties = ['atype', 'pos', 'charge', 'stress']
     id   atype  pos[0]  pos[1]  pos[2]
      0       1   0.000   0.000   0.000
      1       2   1.600   1.600   1.600
[2]:
atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]
0 1 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 2 1.6 1.6 1.6 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

2. Dump

2.1. System.model()

Similar to other atomman classes, the System class has a model() method that generates a data model representation for the object. This allows for all content of the system to be saved as either JSON or XML, and reloaded later by initializing a new System object using the model.

Parameters

  • box_unit (str, optional) Length unit to use for the box. Default value is ‘angstrom’.

  • prop_name (list, optional) The Atoms properties to include. If neither prop_name nor prop_unit are given, all system properties will be included.

  • unit (list, optional) Lists the units for each prop_name as stored in the table. For a value of None, no conversion will be performed for that property. For a value of ‘scaled’, the corresponding table values will be taken in box-scaled units. If neither unit nor prop_units given, pos will be given in Angstroms and all other values will not be converted.

  • prop_unit (dict, optional) dictionary where the keys are the property keys to include, and the values are units to use. If neither unit nor prop_units given, pos will be given in Angstroms and all other values will not be converted.

Returns

  • (DataModelDict.DataModelDict) A JSON/XML data model for the current System object.

2.1.1. Simple example

[3]:
# Retrieve model as a DataModelDict using model
model = system.model()
print(model)
DataModelDict([('atomic-system', DataModelDict([('box', DataModelDict([('avect', DataModelDict([('value', [3.2, 0.0, 0.0])])), ('bvect', DataModelDict([('value', [0.0, 3.2, 0.0])])), ('cvect', DataModelDict([('value', [0.0, 0.0, 3.2])])), ('origin', DataModelDict([('value', [0.0, 0.0, 0.0])]))])), ('periodic-boundary-condition', [True, True, True]), ('atom-type-symbol', ['Cs', 'Cl']), ('atoms', DataModelDict([('natoms', 2), ('property', [DataModelDict([('name', 'atype'), ('data', DataModelDict([('value', [1, 2])]))]), DataModelDict([('name', 'pos'), ('data', DataModelDict([('value', [0.0, 0.0, 0.0, 1.6, 1.6, 1.6]), ('shape', [2, 3]), ('unit', 'angstrom')]))]), DataModelDict([('name', 'charge'), ('data', DataModelDict([('value', [1.0, -1.0])]))]), DataModelDict([('name', 'stress'), ('data', DataModelDict([('value', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), ('shape', [2, 3, 3])]))])])]))]))])
[4]:
# Convert model to JSON
print(model.json())
{"atomic-system": {"box": {"avect": {"value": [3.2, 0.0, 0.0]}, "bvect": {"value": [0.0, 3.2, 0.0]}, "cvect": {"value": [0.0, 0.0, 3.2]}, "origin": {"value": [0.0, 0.0, 0.0]}}, "periodic-boundary-condition": [true, true, true], "atom-type-symbol": ["Cs", "Cl"], "atoms": {"natoms": 2, "property": [{"name": "atype", "data": {"value": [1, 2]}}, {"name": "pos", "data": {"value": [0.0, 0.0, 0.0, 1.6, 1.6, 1.6], "shape": [2, 3], "unit": "angstrom"}}, {"name": "charge", "data": {"value": [1.0, -1.0]}}, {"name": "stress", "data": {"value": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], "shape": [2, 3, 3]}}]}}}
[5]:
# Convert model to XML
print(model.xml())
<?xml version="1.0" encoding="utf-8"?>
<atomic-system><box><avect><value>3.2</value><value>0.0</value><value>0.0</value></avect><bvect><value>0.0</value><value>3.2</value><value>0.0</value></bvect><cvect><value>0.0</value><value>0.0</value><value>3.2</value></cvect><origin><value>0.0</value><value>0.0</value><value>0.0</value></origin></box><periodic-boundary-condition>true</periodic-boundary-condition><periodic-boundary-condition>true</periodic-boundary-condition><periodic-boundary-condition>true</periodic-boundary-condition><atom-type-symbol>Cs</atom-type-symbol><atom-type-symbol>Cl</atom-type-symbol><atoms><natoms>2</natoms><property><name>atype</name><data><value>1</value><value>2</value></data></property><property><name>pos</name><data><value>0.0</value><value>0.0</value><value>0.0</value><value>1.6</value><value>1.6</value><value>1.6</value><shape>2</shape><shape>3</shape><unit>angstrom</unit></data></property><property><name>charge</name><data><value>1.0</value><value>-1.0</value></data></property><property><name>stress</name><data><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><value>0.0</value><shape>2</shape><shape>3</shape><shape>3</shape></data></property></atoms></atomic-system>

2.1.2. Specify units and/or limit included properties

By default, all per-atom properties will be saved to the data model. Since freely-assigned properties can theoretically be in any unit, the values will be saved in atomman’s working units if no unit information is provided. This implicitly assumes that atomman will be used to read the data back in, and that atomman’s working units during dumping and loading are the same.

[6]:
# Show (lack of) units as set in the model above
for prop in model.finds('property'):
    name = prop['name']
    unit = prop['data'].get('unit', None)
    print(f'{name} is in units {unit}')
atype is in units None
pos is in units angstrom
charge is in units None
stress is in units None

The units that the values are saved in can be explicitly set by providing a list that gives a unit for each of the set per-atom properties in the order that System.atoms_prop() lists them.

[7]:
model2 = system.model(unit=[None, 'nm', 'e', 'GPa'])

# Show units are now assigned
for prop in model2.finds('property'):
    name = prop['name']
    unit = prop['data'].get('unit', None)
    print(f'{name} is in units {unit}')
atype is in units None
pos is in units nm
charge is in units e
stress is in units GPa

Properties can also be excluded from the data model by using prop_name to list only the wanted properties. In that case, the unit values should match with the prop_name values.

[8]:
model2 = system.model(prop_name=['atype', 'pos', 'stress'],
                      unit=[None, 'nm', 'GPa'])

# Show units are now assigned
for prop in model2.finds('property'):
    name = prop['name']
    unit = prop['data'].get('unit', None)
    print(f'{name} is in units {unit}')
atype is in units None
pos is in units nm
stress is in units GPa

For convenience, the unit and property choice can alternatively be represented in dictionary format and passed in using the prop_unit parameter.

[9]:
prop_unit = {}
prop_unit['atype'] = None
prop_unit['pos'] = 'nm'
prop_unit['charge'] = 'e'

model2 = system.model(prop_unit=prop_unit)

# Show units are now assigned
for prop in model2.finds('property'):
    name = prop['name']
    unit = prop['data'].get('unit', None)
    print(f'{name} is in units {unit}')
atype is in units None
pos is in units nm
charge is in units e

2.2. System.dump(‘system_model’)

Alternatively, a model of the system can be generated by calling the System.dump() method using the ‘system_model’ style. This allows for consistency with the other System-level conversions. There is no difference in the resulting models produced by the two methods as System.dump() calls System.model().

Parameters

  • f (str or file-like object, optional) File path or file-like object to write the content to. If not given, then the content is returned.

  • box_unit (str, optional) Length unit to use for the box. Default value is ‘angstrom’.

  • prop_name (list, optional) The Atoms properties to include. If neither prop_name nor prop_unit are given, all system properties will be included.

  • unit (list, optional) Lists the units for each prop_name as stored in the table. For a value of None, no conversion will be performed for that property. For a value of ‘scaled’, the corresponding table values will be taken in box-scaled units. If neither unit nor prop_units given, pos will be given in Angstroms and all other values will not be converted.

  • prop_unit (dict, optional) dictionary where the keys are the property keys to include, and the values are units to use. If neither unit nor prop_units given, pos will be given in Angstroms and all other values will not be converted.

  • format (str, optional) File format ‘xml’ or ‘json’ to save the content as if f is given. If f is a filename, then the format will be automatically inferred from f’s extension. If format is not given and cannot be inferred, then it will be set to ‘json’.

  • indent (int or None, optional) Indentation option to use for XML/JSON content if f is given. A value of None (default) will add no line separatations or indentations.

Returns

  • model (DataModelDict.DataModelDict or str) The generated model representation of the system. Will be a DataModelDict if format is not specified, and a JSON- or XML-formatted string if format is specified. Returned if f is not given.

2.2.1. Simple example

As System.dump(‘system_model’) calls System.model(), most parameters of the two functions are the same.

[10]:
model = system.dump('system_model', prop_unit={'atype':None, 'pos':'scaled', 'charge': 'e', 'stress': 'GPa'})
print(model)
DataModelDict([('atomic-system', DataModelDict([('box', DataModelDict([('avect', DataModelDict([('value', [3.2, 0.0, 0.0])])), ('bvect', DataModelDict([('value', [0.0, 3.2, 0.0])])), ('cvect', DataModelDict([('value', [0.0, 0.0, 3.2])])), ('origin', DataModelDict([('value', [0.0, 0.0, 0.0])]))])), ('periodic-boundary-condition', [True, True, True]), ('atom-type-symbol', ['Cs', 'Cl']), ('atoms', DataModelDict([('natoms', 2), ('property', [DataModelDict([('name', 'atype'), ('data', DataModelDict([('value', [1, 2])]))]), DataModelDict([('name', 'pos'), ('data', DataModelDict([('value', [0.0, 0.0, 0.0, 0.5, 0.5, 0.5]), ('shape', [2, 3]), ('unit', 'scaled')]))]), DataModelDict([('name', 'charge'), ('data', DataModelDict([('value', [1.0, -1.0]), ('unit', 'e')]))]), DataModelDict([('name', 'stress'), ('data', DataModelDict([('value', [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]), ('shape', [2, 3, 3]), ('unit', 'GPa')]))])])]))]))])

The primary difference is that System.dump(‘atom_model’) can directly convert to JSON/XML and save to a file.

[11]:
model_json = system.dump('system_model', format='json', indent=2,
                         prop_unit={'atype':None, 'pos':'scaled'})
print(model_json)
{
  "atomic-system": {
    "box": {
      "avect": {
        "value": [
          3.2,
          0.0,
          0.0
        ]
      },
      "bvect": {
        "value": [
          0.0,
          3.2,
          0.0
        ]
      },
      "cvect": {
        "value": [
          0.0,
          0.0,
          3.2
        ]
      },
      "origin": {
        "value": [
          0.0,
          0.0,
          0.0
        ]
      }
    },
    "periodic-boundary-condition": [
      true,
      true,
      true
    ],
    "atom-type-symbol": [
      "Cs",
      "Cl"
    ],
    "atoms": {
      "natoms": 2,
      "property": [
        {
          "name": "atype",
          "data": {
            "value": [
              1,
              2
            ]
          }
        },
        {
          "name": "pos",
          "data": {
            "value": [
              0.0,
              0.0,
              0.0,
              0.5,
              0.5,
              0.5
            ],
            "shape": [
              2,
              3
            ],
            "unit": "scaled"
          }
        }
      ]
    }
  }
}
[12]:
# Save to file as XML
system.dump('system_model', f='model.xml', format='xml',
            prop_unit={'atype':None, 'pos':'scaled'})

with open('model.xml') as f:
    print(f.read())

os.remove('model.xml')
<?xml version="1.0" encoding="utf-8"?>
<atomic-system><box><avect><value>3.2</value><value>0.0</value><value>0.0</value></avect><bvect><value>0.0</value><value>3.2</value><value>0.0</value></bvect><cvect><value>0.0</value><value>0.0</value><value>3.2</value></cvect><origin><value>0.0</value><value>0.0</value><value>0.0</value></origin></box><periodic-boundary-condition>true</periodic-boundary-condition><periodic-boundary-condition>true</periodic-boundary-condition><periodic-boundary-condition>true</periodic-boundary-condition><atom-type-symbol>Cs</atom-type-symbol><atom-type-symbol>Cl</atom-type-symbol><atoms><natoms>2</natoms><property><name>atype</name><data><value>1</value><value>2</value></data></property><property><name>pos</name><data><value>0.0</value><value>0.0</value><value>0.0</value><value>0.5</value><value>0.5</value><value>0.5</value><shape>2</shape><shape>3</shape><unit>scaled</unit></data></property></atoms></atomic-system>

3. Load

3.1. System.__init__(model)

A model can be interpreted by passing it as a parameter when initializing a new System object. Note that the supplied model value can be a DataModelDict, a JSON or XML string, or the name of a JSON or XML file.

[13]:
# Initialize new system using model
system2 = am.System(model=model)

# Print system information
print(system2)
system2.atoms_df()
avect =  [ 3.200,  0.000,  0.000]
bvect =  [ 0.000,  3.200,  0.000]
cvect =  [ 0.000,  0.000,  3.200]
origin = [ 0.000,  0.000,  0.000]
natoms = 2
natypes = 2
symbols = ('Cs', 'Cl')
pbc = [ True  True  True]
per-atom properties = ['atype', 'pos', 'charge', 'stress']
     id   atype  pos[0]  pos[1]  pos[2]
      0       1   0.000   0.000   0.000
      1       2   1.600   1.600   1.600
[13]:
atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]
0 1 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 2 1.6 1.6 1.6 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

3.2. atomman.load(‘system_model’)

The atomman.load() function with the’system_model’ style also supports reading System.model content.

Parameters

  • model (str, file-like object or DataModelDict) The data model to read.

  • symbols (tuple, optional) Allows the list of element symbols to be assigned during loading.

  • key (str, optional) The key identifying the root element for the system definition. Default value is ‘atomic-system’.

  • index (int, optional) If the full model has multiple key entries, the index specifies which to access. Default value is 0 (first, or only entry).

Returns

  • system (atomman.System) The system object associated with the data model.

3.2.1. Examples

The default behavior of atomman.load() is identical to initializing a new System object using the model.

[14]:
model_system = am.load('system_model', model)
print(model_system)
model_system.atoms_df()
avect =  [ 3.200,  0.000,  0.000]
bvect =  [ 0.000,  3.200,  0.000]
cvect =  [ 0.000,  0.000,  3.200]
origin = [ 0.000,  0.000,  0.000]
natoms = 2
natypes = 2
symbols = ('Cs', 'Cl')
pbc = [ True  True  True]
per-atom properties = ['atype', 'pos', 'charge', 'stress']
     id   atype  pos[0]  pos[1]  pos[2]
      0       1   0.000   0.000   0.000
      1       2   1.600   1.600   1.600
[14]:
atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]
0 1 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 2 1.6 1.6 1.6 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

The advantage of using atomman.load(‘system_model’) is that it is designed to also handle larger data models that may contain embedded system model elements. Depending on what the larger data model represents, multiple system data models may be embedded as a list or differentiated by different root element keys. The key and index parameters of atomman.load(‘system_model’) therefore make it possible to uniquely select one system model from within a larger data model.

[15]:
# Define a larger data model
collection_model = DM()
collection_model['system-collection'] = DM()

# Add multiple system models under the test-atomic-system key
collection_model['system-collection'].append('test-atomic-system', model['atomic-system'])
collection_model['system-collection'].append('test-atomic-system', model['atomic-system'])

# Use atomman.load() to load the second 'test-atomic-system' model
system3 = am.load('system_model', collection_model, index=1, key='test-atomic-system')
print(system3)
system3.atoms_df()
avect =  [ 3.200,  0.000,  0.000]
bvect =  [ 0.000,  3.200,  0.000]
cvect =  [ 0.000,  0.000,  3.200]
origin = [ 0.000,  0.000,  0.000]
natoms = 2
natypes = 2
symbols = ('Cs', 'Cl')
pbc = [ True  True  True]
per-atom properties = ['atype', 'pos', 'charge', 'stress']
     id   atype  pos[0]  pos[1]  pos[2]
      0       1   0.000   0.000   0.000
      1       2   1.600   1.600   1.600
[15]:
atype pos[0] pos[1] pos[2] charge stress[0][0] stress[0][1] stress[0][2] stress[1][0] stress[1][1] stress[1][2] stress[2][0] stress[2][1] stress[2][2]
0 1 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 2 1.6 1.6 1.6 -1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0