Printer Friendly
The Free Library
19,573,952 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Accelerating scientific discovery through computation and visualization III. tight-binding wave functions for quantum dots.


1. Introduction

This is the third in a series of articles (1), (2) that describe, through examples, how the Scientific Applications and Visualization Group (SAVG) at the National Institute of Standards and Technology National Institute of Standards and Technology, governmental agency within the U.S. Dept. of Commerce with the mission of "working with industry to develop and apply technology, measurements, and standards" in the national interest.  (NIST (National Institute of Standards & Technology, Washington, DC, www.nist.gov) The standards-defining agency of the U.S. government, formerly the National Bureau of Standards. It is one of three agencies that fall under the Technology Administration (www.technology. ) has utilized high performance parallel computing Solving a problem with multiple computers or computers made up of multiple processors. It is an umbrella term for a variety of architectures, including symmetric multiprocessing (SMP), clusters of SMP systems, massively parallel processors (MPPs) and grid computing. , visualization, and machine learning to accelerate scientific discovery. In this article we focus on the use of high performance computing and visualization for simulations of nanotechnology.

Research and development of nanotechnology, with applications ranging from smart materials to quantum computation to biolabs on a chip, has the highest national priority. Semiconductor nanoparticles, also known as nanocrystals and quantum dots (physics) quantum dot - (Or "single-electron transistor") A location capable of containing a single electrical charge; i.e., a single electron of Coulomb charge. Physically, quantum dots are nanometer-size semiconductor structures in which the presence or absence of a quantum , are one of the most intensely studied nanotechnology paradigms. Nanoparticles are typically 1 nm to 10 nm in size with a thousand to a million atoms. Precise control of particle size Particle size, also called grain size, refers to the diameter of individual grains of sediment, or the lithified particles in clastic rocks. The term may also be applied to other granular materials. , shape and composition allows one to tailor charge distributions and control quantum effects to tailor properties completely different from the bulk and from small clusters. As a result of enhanced quantum confinement effects, nanoparticles act as artificial, man-made atoms with discrete electronic spectra that can be exploited as light sources for novel enhanced lasers, discrete components An elementary electronic device constructed as a single unit. Before integrated circuits (chips), all transistors, resistors and diodes were discrete. They are widely used in amplifiers and other devices that use large amounts of current.  in nanoelectronics, qubits for quantum information In quantum mechanics, quantum information is physical information that is held in the "state" of a quantum system. The most popular unit of quantum information is the qubit, a two-state quantum system.  processing and enhanced ultra-stable fluorescent labels for biosensors to detect, for example, cancers, malaria or other pathogens, and to do cell biology Cell biology

The study of the activities, functions, properties, and structures of cells. Cells were discovered in the middle of the seventeenth century after the microscope was invented.
.

Semiconductor nanoparticles come in two kinds. Nanocrystals are small crystallites, 1 nm to 6 nm in dimension, typically made with colloidal colloidal

of the nature of a colloid.


colloidal bath
a bath containing gelatin, bran, starch or similar substances, to relieve skin irritation and pruritus.
 chemistry, and containing up to 100 000 atoms (3). Such nanoparticles are ideal for biosensor A device that detects and analyzes body movement, temperature or fluids and turns it into an electronic signal. See lab on a chip and data glove.
Biosensor 
 applications because they can be conjugated conjugated
adj.
Conjugate.


estrogens, conjugated Warning - Hazardous drug!

C.E.S.
 to be made soluble for in vivo in vivo /in vi·vo/ (ve´vo) [L.] within the living body.

in vi·vo
adj.
Within a living organism.



in vivo adv.
 and in vitro in vitro /in vi·tro/ (in ve´tro) [L.] within a glass; observable in a test tube; in an artificial environment.

in vi·tro
adj.
In an artificial environment outside a living organism.
 studies and can be functionalized to bind to to contract; as, to bind one's self to a wife s>.

See also: Bind
 specific biological structures (4). At the same time, these nanocrystals can be used as the building blocks, linked together by coordinating ligands, to form nanoarchitectures for nanoelectronics, quantum computing quantum computing

Experimental method of computing that makes use of quantum-mechanical phenomena. It incorporates quantum theory and the uncertainty principle. Quantum computers would allow a bit to store a value of 0 and 1 simultaneously.
 and spintronics (5). Larger semiconductor quantum dots can be formed by controlled epitaxial growth, typically using molecular beam epitaxy A technique that "grows" atomic-sized layers on a chip rather than creating layers by diffusion. , to grow a low bandgap wetting layer The introduction to this article provides insufficient context for those unfamiliar with the subject matter.
Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page.
 and pyramidal or hemispherical dot, often InAs, embedded Inserted into. See embedded system.  in a high bandgap material, usually GaAs. These dots are typically 10 nm to 20 nm wide and a few nm high with 100 000 to 1 000 000 atoms. These nanoparticles have compelling interest because they can be grown and integrated into current microelectronic devices for use in optoelectronics and electronics.

Initially, nanoparticles were modeled by continuum, effective mass theories, essentially treating the confined con·fine  
v. con·fined, con·fin·ing, con·fines

v.tr.
1. To keep within bounds; restrict: Please confine your remarks to the issues at hand. See Synonyms at limit.
 electrons and holes as "particles-in-a-box" with sophisticated models for the particle mass. Such approaches are easy to implement and work well enough for large systems where the atomic-scale details do not matter. More recently, atomic-scale theories, such as tight-binding theory and ab initio [Latin, From the beginning; from the first act; from the inception.] An agreement is said to be "void ab initio" if it has at no time had any legal validity.  theory, have been used to model nanoparticles. These approaches are much more difficult to implement computationally, because all atoms must be included. However, in contrast to continuum models, these atomistic at·om·is·tic   also at·om·is·ti·cal
adj.
1. Of or having to do with atoms or atomism.

2. Consisting of many separate, often disparate elements: an atomistic culture.
 models can be used to develop precise models for quantum dot surfaces with defects, lattice relaxation, partial passivation passivation

the final stage in instrument manufacture, passing the finished instruments through a bath of nitric acid which removes foreign particles and promotes the formation of a protective coating of chromium oxide.
, imperfect interfaces between core quantum dots and capping layers, dots passivated with molecular ligands, and molecularly linked dot arrays. All of these features must be accounted for in the precision modeling tools that are needed to design and implement nanoparticles in specific applications. Hence largescale implementations of atomistic models for complex nanosystems are essential.

We study the electrical and optical properties of semiconductor nanocrystals and quantum dots such as the pyramidal dot shown in Fig. 1, as well as more complex nanocrystal structures with the nanocrystal coordinated with capping molecules and functionalized with linker molecules, and nanodevice architectures formed by linking together complex dot structures, also shown in Fig. 1.

[FIGURE 1 OMITTED]

We use an atomistic approach that allows us to explicitly account for all atoms in the nanoparticles as well as all atoms in any conjugating ligands, linkers and surrounding material. In the most complex structures this entails modeling structures with on the order of a million atoms. Highly parallel computational and visualization platforms are critical for obtaining the computational speeds necessary for systematic, comprehensive study of these structures.

2. The Tight-Binding Method and Electronic States of Quantum Dots

Two approaches are now being used to explicitly account for all atoms in a structure. More ab initio theories directly calculate the full electronic states of the system, typically calculating the contribution of each atom to the electronic potential self-consistently within density functional or pseudopotential theory (see for example A. J. Williamson and A. Zunger (6). This provides the most detailed picture of nanoparticle states but becomes computationally prohibitive for large structures with up to a million atoms. A simpler approach is the tight-binding method.

The tight-binding model (7) is based upon the Linear Combination of Atomic Orbitals An atomic orbital is a mathematical description of the region in which an electron may be found around a single atom.[1] Specifically, atomic orbitals are the possible quantum states of the individual electrons in the electron cloud around a single atom.  (LCAO LCAO Linear Combination of Atomic Orbitals
LCAO Leadership Council of Aging Organizations
LCAO Limited Configuration Atomic Orbital
) method and is both accurate and easy to implement (8), (9). It finds the electronic states of a quantum dot by assuming that the electronic state can be represented near a given atom by a few atomic orbitals (typically the s and p orbitals, but in more complete calculations the s,p, and d orbitals) localized to that atom. This assumption makes the calculation significantly simpler than the more ab initio theories but still explicitly accounts for all atoms. The Hamiltonian matrix In mathematics, a Hamiltonian matrix A is any real 2n×2n matrix, that satisfies the condition that KA is symmetric, where K is the skew-symmetric matrix

 that describes the coupling between orbitals on neighboring neigh·bor  
n.
1. One who lives near or next to another.

2. A person, place, or thing adjacent to or located near another.

3. A fellow human.

4. Used as a form of familiar address.

v.
 atoms is usually found empirically by adjusting those matrix elements to ensure that bulk band structures are correctly reproduced. Thus there is no need to find an atomic-scale electronic single-particle potential self-consistently. This simplification also greatly speeds up the calculations. Even with these simplifications, the tight-binding theory provides a full atomic-scale theory of complex nanoparticles with monolayer mon·o·lay·er
n.
1. A film or layer one molecule thick formed at the interface between water and either oil or air by a substance such as a partially esterified fatty acid that contains both hydrophobic and hydrophilic groups in the same
 variations in composition capable of accurately describing the structures (9).

Any structure can be studied once the atomic positions are defined. For a nanoparticle that retains the bulk crystal structure, we can start with a large cube of bulk material and throw away all atoms not inside the nanoparticle. Once the position of an atom is determined, its neighbors can be identified. In a tight-binding approach, the orbitals on an atom only couple to its nearest neighbor See point sampling.  atoms and possibly to the next nearest neighbors. The short range of the coupling greatly simplifies the computations because the Hamiltonian is sparse. Atomic relaxation away from the bulk positions can also be included, usually with a valence Valence, city, France
Valence (väläNs`), city (1990 pop. 65,026), capital of Drôme dept., SE France, in Dauphiné, on the Rhône River.
 force-field model. Once atomic positions and neighbors are determined for the relaxed lattice, the Hamiltonian coupling matrix can again be determined. Similarly, conjugating and linker molecules can be attached to the nanoparticle. Once the atomic positions in the external molecules are known, the coupling between the nanoparticle and the molecules can be determined. A typical atomic basis would include 10 orbitals. Each atom is described by its outer valence s orbital, the 3 outer p orbitals, the 5 outer d orbitals, and a fictitious Based upon a fabrication or pretense.

A fictitious name is an assumed name that differs from an individual's actual name. A fictitious action is a lawsuit brought not for the adjudication of an actual controversy between the parties but merely for the purpose of
 excited s* orbital that is included to mimic the effects of higher lying states. When spin is included the basis is doubled. For a system with N atoms, there will be 10 N eigenstates (20 N if spin-orbit coupling is included). For any system except for the very smallest (less than about 1000 atoms), iterative it·er·a·tive  
adj.
1. Characterized by or involving repetition, recurrence, reiteration, or repetitiousness.

2. Grammar Frequentative.

Noun 1.
 techniques, such as the Arnoldi method (10), must be used to diagonalize the system Hamiltonian and only those eigenstates near the fundamental gap are found. This may mean that 100 to several thousand states are found, but this is still far less than 20 N (where N can be as high as a million or more).

In the tight binding calculations that we typically do, we assume that the atoms in a nanoparticle occupy the sites of a regular zinc-blende or wurtzite wurt·zite  
n.
A light to dark brown mineral, (Zn,Fe)S, that is a polymorph of sphalerite, used as a minor ore of zinc.



[French, after Charles Wurtz (1817-1884), French chemist.
 lattice. We typically model spherical spher·i·cal
adj.
Having the shape of or approximating a sphere; globular.
, hemispherical, tetrahedral tet·ra·he·dral  
adj.
1. Of or relating to a tetrahedron.

2. Having four faces.



tet
, or pyramidal nanoparticles or nanocrystals linked together, e.g., a double quantum dot, quantum dot molecule or arrays of quantum dots.

3. The Calculation of Electron and Hole States

3.1 Solving the Hamiltonian Equation Sequentially and in Parallel for a Single Nanoparticle

Given that only nearest neighbor coupling is included in our models, the matrix we work with is a large, sparse, banded Hermitian matrix A Hermitian matrix (or self-adjoint matrix) is a square matrix with complex entries which is equal to its own conjugate transpose — that is, the element in the ith row and jth column is equal to the complex conjugate of the element in the j . The "large matrix" we are referring to is the tight-binding Hamiltonian (H) whose solutions are the single particle wave functions [[psi].sub.n] in

H[[psi].sub.n] = [E.sub.n] [[psi].sub.n], (1)

where the empirical tight-binding Hamiltonian matrix (H) is determined by adjusting the matrix elements to reproduce known band gaps and effective masses of the bulk bandstructures.

Solving this matrix equation for its eigenvalues eigenvalues

statistical term meaning latent root.
 and eigenvectors (the wave functions) reduces to a matrix diagonalization problem which can be stated as

Az = [lambda]z, (2)

where A is a large, sparse, Hermitian matrix. The [lambda]s are the energies, and for each [lambda], z is an eigenfunction Eigenfunction

One of the solutions of an eigenvalue equation. A parameter-dependent equation that possesses nonvanishing solutions only for particular values (eigenvalues) of the parameter is an eigenvalue equation, the associated solutions being the
 (wave function) for the corresponding energy.

One of the best solvers for this type of matrix diagonalization is ARPACK (10). In ARPACK, the computational work is proportional to n (the size of the matrix is n by n, n = 10 N or 20 N) times ncv (which defines the size of the space to use in finding eigenvectors; we choose ncv to be 4 times the number of eigenvalues sought). The only other eigensolver for this class of problem is the Jacobi-Davidson technique (11), but it is O([n.sup.2]), i.e., the computational work is proportional to [n.sup.2] rather than to n. There is another consideration, namely, the number of eigenvalues sought. But as long as the number of eigenvalues sought is much less than 20 N, ARPACK is much more efficient. The method employed in ARPACK when A is symmetric reduces to a variant of the Lanczos method called Implicitly Restarted Lanczos Method (IRLM IRLM IMS/Vs Resource Lock Manager (IBM)
IRLM Inter System Resource Lock Manager
IRLM Ims Resource Lock Manager
) which is a synthe sis of Arnoldi/Lanczos with the Implicitly Shifted QR scheme (12). Implicitly restarted implies iterative, and the matrix is diagonalized iteratively by a series of actions of the matrix A on a vector,

A[V.sub.k] [??] W, (3)

where W is part of the workspace in the Arnoldi solver in ARPACK. [V.sub.k] is the vector representing the single-particle wave function calculated by ARPACK, changing on each iteration One repetition of a sequence of instructions or events. For example, in a program loop, one iteration is once through the instructions in the loop. See iterative development.

(programming) iteration - Repetition of a sequence of instructions.
 k (converging to the solution). On the other hand, the action of A on [V.sub.k]. [AV.sub.k], is a user supplied matrix-vector multiplication multiplication, fundamental operation in arithmetic and algebra. Multiplication by a whole number can be interpreted as successive addition. For example, a number N multiplied by 3 is N + N + N. . In the tight-binding code, the matrix-vector multiplication is time-consuming, but the dominant time cost is still the vector reorthogonalization in the Arnoldi solver.

Partial, iterative diagonalization of the sparse Hamiltonian for a nanoparticle benefits greatly from parallelization for the sizes of nanoparticle structures that we deal with. Parallel ARPACK (PARPACK) [13] can be used to explicitly partition and distribute the matrix across nodes of a distributed memory (architecture) distributed memory - The kind of memory in a parallel processor where each processor has fast access to its own local memory and where to access another processor's memory it must send a message via the inter-processor network.

Opposite: shared memory.
 machine (Linux cluster, for example) or processors of a shared memory (1) Using part of main memory to support a low-cost display circuit that does not have its own memory. See shared video memory.

(2) The common memory in a symmetric multiprocessing system that is available to all CPUs. See SMP.

1.
 parallel machine, thereby distributing the workload.

Since PARPACK's parallelization scheme distributes the Arnoldi vectors [V.sub.k] across a 1-D processor grid (blocked by rows), we decompose de·com·pose  
v. de·com·posed, de·com·pos·ing, de·com·pos·es

v.tr.
1. To separate into components or basic elements.

2. To cause to rot.

v.intr.
1.
 the nanoparticle into slabs such that there are approximately equal numbers of atoms on each processor. Atoms in one slab are coupled to atoms in another slab only via the neighboring atoms at the common interface. Communication between neighboring slabs is minimal while there is no communication between slabs that are further apart. This makes the tight-binding, sparse coupling approach ideal for this parallelization.

The parallel code looks essentially like the sequential code except that the local block of the Arnoldi vector, [V.sub.loc], is passed in place of V, and [n.sub.loc], the dimension of the local block, is passed instead of n. There is a user supplied matrix-vector product subroutine A group of instructions that perform a specific task. A large subroutine might be called a "module" or "procedure." Subroutine is somewhat of a dated term, but it is still quite valid.  to compute the local segment of the matrix-vector product AV that is consistent with the partition of V. This product requires communication in addition to the communication managed by PARPACK. Memory management is handled by Fortran 90 allocate and deallocate statements so there are paralleliation gains coming from the ability to spread the matrices across multiple processors as well as from overall computational time (turnaround) speedup.

To minimize communication between neigboring slabs, we assume the atoms are ordered in one of the coordinates, which we take to be the z-coordinate. We sort and order atoms by z, and then catalog [z.sub.min],[z.sub.max] for each processor. We distribute atoms to processors in layers, as depicted in Fig. 2, to more or less evenly distribute the atoms for load balancing The fine tuning of a computer system, network or disk subsystem in order to more evenly distribute the data and/or processing across available resources. For example, in clustering, load balancing might distribute the incoming transactions evenly to all servers, or it might redirect them . (1) During the communication phase, each atom on a processor P only needs information about atoms which are on P plus any atoms which are on neighboring slabs and are the nearest-neighbor distance away from the edge layers on P.

[FIGURE 2 OMITTED]

For each P, we determine the layers in neighboring processors which contain atoms which are nearest neighbors to atoms in P. Only the V's for these layers need to be communicated by neighboring processors. So, in Fig. 3, [V.sub.2BOTg] is a layer used on processor 1 that is a "ghost" of the bottom layer on processor 2. Similarly [V.sub.0TOPg] is a layer used on processor 1 that is a "ghost" of the top layer on processor 0. During the communication phase, processor 1 has only to receive [V.sub.2BOTg] from processor 2 and [V.sub.0TOPg] from processor 0 for the

A[V.sub.loc] [??] [W.sub.loc] (4)

product to be computed. Exchanging only these ghost layers and allocating only the actual amount of memory to do this cuts down communication drastically and is the key insight to effective parallelization.

[FIGURE 3 OMITTED]

3.2 Results

To assess the speedup due to parallelization, we present an example of a diagonalization for an 18 nm diameter spherical HgS nanocrystal with 195393 atoms, each with 5 orbitals. In Table 1 we discuss results for the PARPACK parallelization for the corresponding 976 965 x 976 965 matrix running on the NIST cluster of Pentium, Athlon, and Intel processors running RedHat Linux (2). The run times are for computing twenty different eigenvalue-eigenvector pairs (run times depend on the number of eigenvalue-eigenvector pairs computed).

Notice that simply by sorting in z we significantly improved the run time. This speedup is due partly to faster convergence (a smaller number of iterations to convergence), presumably pre·sum·a·ble  
adj.
That can be presumed or taken for granted; reasonable as a supposition: presumable causes of the disaster.
 due to better location of the bands in the sparse matrix In the mathematical subfield of numerical analysis a sparse matrix is a matrix populated primarily with zeros.

Conceptually, sparsity corresponds to systems which are loosely coupled.
.
Table 1. Computation and communication times in seconds
to find 20 eigenvalues of a 976 975 x 976 975 matrix as
a function of the number of processors

  Number      user-supplied  communication    CPU      Number of
    of        matrix-vector     time (s)    time (s)   iterations
processors     subroutine

1 (unsorted)     13525                      778420        1016
1 (sorted)        9710                      228470         863
    2             4616           1957       111720         788
    4             2464           2065        64065         890
    8             1405           1075        44122         877
   10             1253           1194        46314        1154
   16              592           1365        17244         780
   32              323            605        14568         925
   50              225            117         9239         800


For all of the sorted runs, timings are consistent. (3) The matrix-vector multiplication is split across processors in an approximately even way. For up to 50 processors, communication is a minor part of the run and CPU time The amount of time it takes for the CPU to execute a set of instructions and generally excludes the waiting time for input and output.

CPU time - processor time
 scales. Communication time is 10 to 50 times smaller than the total time. CPU time scales so that the 8 processor time is 5.2 times faster than the 1 processor time. The effects of parallelization are consistent with having a similar number of iterations to convergence for each number of processors. A factor of 25 speedup is achieved on 50 processors for a quantum dot with 195393 atoms, and a matrix size of 976 965 (almost a million). The original sequential job took 2.7 d, the 50 processor job completes in 2.5 h, a significant improvement in turnaround time (1) In batch processing, the time it takes to receive finished reports after submission of documents or files for processing. In an online environment, turnaround time is the same as response time. . Neither communication time nor user-supplied subroutine time is the rate determining step. The bulk of the time and the rate-limiting step is the computation done in PARPACK.

4. Building Larger Structures by Stitching

Often it is easy to define the simple nanosubsystems that make up a complex, heterogeneous nanosystem. However, it may be difficult to explicitly define the entire structure. For example, a single quantum dot is easy to define and implement. An arbitrary sequence of stacked dots is more difficult to implement. However, with the simpler building blocks defined, it is easy to build arbitrary structures with them. A novel feature of our code is the ability to link together heterogenous (spelling) heterogenous - It's spelled heterogeneous.  nanostruetures (also referred to here as nanosubsystems). For example, when a nanoparticle nanostructure includes conjugating and linker molecules, these molecules can be assigned separately to different computational nodes to take advantage of the parallelization. If the nanosystem includes multiple smaller nanosubsystems linked together, then each smaller nanosubsystem can be parallelized on a different set of nodes with only minimal communication required between different nodes. In the tight-binding code, the matrix-vector multiplication is important, but the dominant part of the CPU time is the vector reorthogonalization in the Arnoldi solver, which is parallelized efficiently by PARPACK. Hence we are not constrained con·strain  
tr.v. con·strained, con·strain·ing, con·strains
1. To compel by physical, moral, or circumstantial force; oblige: felt constrained to object. See Synonyms at force.

2.
 in how we partition atoms into nanosubsystems (groups in our code, each group having its own cluster of processors), as long as each nanosubsystem partition can be handled as a group. Thus we can choose a partition that conveniently follows the physical partitioning of the structure. Once we can do multiprocessor Multiple processors. A multiprocessor machine uses two or more CPUs for routine processing. See multiprocessing.

multiprocessor - parallel processing
 runs routinely, we have the basic building blocks for making larger structures by "stitching" together disparate subsystems into composite structures, each separate subsystem to be stitched together being a smaller multiprocessor run. The basic idea is to consider each smaller nanosubsystem as its own cluster, using the same input data as before, but at each iteration in the computation, information about atoms in the cluster that are intercluster neighbors (see Fig. 4) has to be distributed to the appropriate processors for the neighboring clusters, thereby "stitching" the calculations on the nanosubsystems in the heterogeneous structure together.

[FIGURE 4 OMITTED]

4.1 Modifications to the Lattice Generating Code for Stitching

For each small nanosubsystem (group), we build a structure as before, starting with a large cube of bulk material and throwing away all atoms not inside the nanosubsystem. Once the position of an atom is determined, its neighbors in the nanosubsystem can be identified. In the stitching case, each processor needs to know the atom data for each atom in the group (cluster) attached to the processor, as well as all of the atom's nearest neighbors, whether in its own group or in a neighboring cluster. To reduce communication data between processors, a second pass is done during the lattice generation process to analyze, for each group, the data for all other groups and determine the intra-cluster and intercluster nearest neighbors for each atom. The nearest neighbor table for atoms in each group is augmented by intercluster neighbors (each atom has a global atom id), and data for the intercluster neighbor atoms from other groups is added to the group's atom data table. This eliminates the need to communicate nearest neighbor data between processors during the iteration phase of the diagonalization.

In the case of a single group, atoms are distributed between processors based on an atom's z position. For a structure with several groups stitched together, each atom is first distributed to a cluster of processors, based on the group that the atom belongs to. Then the atom is distributed to a particular processor in that cluster based on the atom's z position.

4.2 Modifications to the Matrix-Vector Multiplier multiplier

In economics, a numerical coefficient showing the effect of a change in one economic variable on another. One macroeconomic multiplier, the autonomous expenditures multiplier, relates the impact of a change in total national investment on the nation's total
 for Stitching

When computing the product in Eq. (4)

A[V.sub.loc] [??] [W.sub.loc], (5)

at each iteration, diagonal matrix Noun 1. diagonal matrix - a square matrix with all elements not on the main diagonal equal to zero
square matrix - a matrix with the same number of rows and columns

scalar matrix - a diagonal matrix in which all of the diagonal elements are equal
 elements present no problem. However, off-diagonal matrix elements require that the V's (eigenvector (mathematics) eigenvector - A vector which, when acted on by a particular linear transformation, produces a scalar multiple of the original vector. The scalar in question is called the eigenvalue corresponding to this eigenvector.  pieces that change in each iteration) for each atom's intercluster nearest neighbor atoms (see Fig. 4) must be communicated to the processor who "owns" the atom as in the non-stitching case. Contrary to the non-stitching, one group case where the processors are correlated to the atom z position, now we have an irregular grid (where processor atom z position is not as well defined) because we stitch arbitrary structures together. With an irregular grid, the communications scheme has to be defined explicitly.

Basically, each processor has to exchange information with all nearest neighbor atoms not on the processor, both from neighbor atoms within its group and outside its group. Initially each processor knows (i.e., can figure out) what processor to send V's to, but not which processors it will be receiving data from or how much data it will be receiving. To solve this problem we use a set of irregular communication routines available from Steve Plimpton[14]. These routines work by first setting up a communication pattern or descriptor (1) A word or phrase that identifies a document in an indexed information retrieval system.

(2) A category name used to identify data.

(operating system) descriptor
 and then invoking the communication operation with arbitrary types of data. The end result is that each processor learns how much data it will receive and from whom, so the subsequent communication operations can be performed efficiently. The irregular grid used for stitching heterogeneous nanostructures affects the CPU CPU
 in full central processing unit

Principal component of a digital computer, composed of a control unit, an instruction-decoding unit, and an arithmetic-logic unit.
 speedup of our parallel code. In benchmarking stitching code, we find that the parallelization is similar to the non-stitching code illustrated in Table 1 for up to about 40 processors (see Table 2). After that, there is no further speedup as additional processors are included in a run, indicating that for irregular grids communication eventually dominates CPU time and limits the size of a cluster that can be effectively employed in a stitching calculation.
Table 2. Computation times for two groups stitched
together as a function of the number of processors

Number of processors  CPU time (s)

          4             155402
          8              81910
         16              59905
         32              34826
         40              20869
         50              25697
         80              36600


4.3 Results

To appreciate the utility of the stitching approach, structures with multiple subsystems must be studied. Two vertically stacked InAs pyramids (quantum dots) on InAs wetting layers embedded in a large GaAs medium (computationally, a large box) have been investigated previously (15), (16) as a single nanostructure. As a test of our stitching code, we have calculated this system as two separate pyramids (each modeled as in Fig. 1) and obtained results for the energy bands and eigenfunctions that are identical to the previous single group calculation. This demonstrates the validity of the stitching code.

As another test, we considered a multilayer nanocrystal structure with a core of CdS, a middle shell of HgS and an outer shell of CdS. Previously these structures have been considered as a single structure (9). There are six possible combinations of how layers can be ordered for a stitched structure. When considered as a stitched structure, all constructions should give the same results, and they do, again validating our stitching code.

As a more complicated example, we consider three GaAs pyramidal dots (with their wetting layers) stacked on top of each other and embedded in an AlGaAs matrix. Such a structure is a prototypical quantum device with electrons and holes stored in the individual GaAs quantum dots. Charge (either the electrons or the holes) in the outer dots could be brought together in the middle dot to interact and transfer quantum bits quantum bit
n.
The smallest unit of information in a computer designed to manipulate or store information through effects predicted by quantum physics.
 (qubits) of information. To design and engineer such devices, it is critical to determine the device tolerance to imperfections in the fabrication fabrication (fab´rikā´shn),
n the construction or making of a restoration.
. The stitching code greatly facilitates the analysis of such effects.

Figure 5 shows the three groups of atoms corresponding to three slabs of AlGaAs, each slab with one embedded GaAs quantum dot and wetting layer. Various alignments of the quantum dots in the structure can be immediately tested simply by shifting the alignment of the slabs, using the same slabs as building blocks for all of the calculations. This greatly reduces the work needed to build different structures, especially if several related structures are to be studied.

[FIGURE 5 OMITTED]

The electronic structure of a perfectly aligned structure can be directly determined. For example, the energy levels of the 6 lowest hole states are shown in Fig. 6. To study imperfect structures, we misalign the slabs, redetermine Verb 1. redetermine - fix, find, or establish again; "the physicists redetermined Planck's constant"
ascertain, determine, find out, find - establish after a calculation, investigation, experiment, survey, or study; "find the product of two numbers"; "The physicist
 the intercluster nearest neighbor assignments and repeat the calculations. Figure 6 shows how sensitive the levels are to misalignment mis·a·ligned  
adj.
Incorrectly aligned.



misa·lignment n.
. The effects become significant for misalignments of 3 or more lattice constants The lattice constant refers to the constant distance between unit cells in a crystal lattice. Lattices in three dimensions generally have three lattice constants, referred to as a, b, and c.  between adjacent dots.

[FIGURE 6 OMITTED]

The effects of misalignment are more apparent if the charge densities for the corresponding states are visualized. Figure 7 shows the charge density from the s orbitals for the first electron state in the aligned structure, while Fig. 8 shows the corresponding charge density when the adjacent slabs are misaligned mis·a·ligned  
adj.
Incorrectly aligned.



misa·lignment n.
 by 6 lattice constants, which is nearly the half-width of the dot base and is shown in Fig. 5. In this case the electron state is able to spread between the three dots. This delocalization persists even when the dots are substantially misaligned.

[FIGURE 7 OMITTED]

The [p.sub.x] orbital charge densities for the lowest hole state in the aligned structure and the strongly misaligned structure are shown in Fig. 9 and Fig. 10. In the aligned structure the hole is also delocalized. However, in the misaligned structure the hole becomes strongly localized. Here critical differences in the effects of misalignment on electron and hole states are apparent. Such differences in the consequences of imperfect alignment would be critical in making choices about how to use these structures in quantum devices, as mentioned above. Using these structures as optical devices depends on the electron and hole states having a large overlap. The results show that the overlap is severely impacted by misalignment. Our stitching approach using building blocks to implement and parallelize Par´al`lel`ize

v. t. 1. To render parallel.

Verb 1. parallelize - place parallel to one another
lay, place, put, set, position, pose - put into a certain place or abstract location; "Put your things here"; "Set
 calculations for large systems makes such studies practical. The calculations we mentioned above did not include d orbitals. Including d orbitals doubles the number of states per atom. Similarly including spin-orbit coupling doubles the number of states (spin-up and spin-down). In addition the computational demands increase because the spin-orbit coupling requires complex arithmetic. However in a recent study of the electronic structure of GaAs nanocrystals, inclusion of d orbitals and spin-orbit coupling proved to be critical to a proper description of the lowest electron states (17). Hence in these latest calculations, the use of a parallel approach is even more essential. The introduction of d orbitals and spin-orbit coupling increases the eigenvector size by a factor of 8 and increases the time for each arithmetic operation (complex arithmetic). Stitching becomes even more critical in this case, because it allows us to take full advantage of multiple processors.

[FIGURE 8 OMITTED]

[FIGURE 9 OMITTED]

[FIGURE 10 OMITTED]

5. Visualization

Visual models of laboratory experiments and computational simulations to explore the nanoworld can be critical to comprehension. However, increasing amounts of data are being generated. For example, in the example of the three stacked dots, the region considered has nearly 700 000 atoms. Each atom has 5 orbitals. Thus there are 3.5 million pieces of data to describe one state. Both high performance computing and experiment must be augmented by high performance visualization. At NIST our visual analysis capabilities include both coarse grain coarse grain - granularity  capabilities and finer grain capabilities (which are more demanding of CPU and visualization resources) as well as static graphical representations and fully three-dimensional immersive capabilities.

In our quantum dot simulations we visualize the atomic scale structure of the lattice and the charge density of the electrons and holes at both the fine grained and coarser grained levels. Figures in the previous section show one of the finer grained ways we visualize the charge density, i.e., by displaying the contribution of the s and p orbitals to the charge distribution of an eigenstate of the triple pyramid quantum dot. The orbitals are centered on the atoms in the structure, so these images also represent the atoms in the structure. This visualization is important because the presence of significant orbital charge density between the dots indicates that tunneling is probable between the structures, i.e., the visualization shows the tunneling created by coupling dots in the structure.

Finer detail than the detail visible in Figs. 7-10 can be represented in a visualization. Figure 11 shows the charge density of the lowest hole state in a CdS nanocrystal. In this case, much greater detail is apparent. The contributions from [p.sub.z] orbitals (green) and [p.sub.x] orbitals (blue) are shown. The contributions of [p.sub.y] and S orbitals are not visible in this example. The orbitals are centered on the corresponding atom. The shape, size, and color represent the orbital type and the magnitude of its contribution. The different colors of the orbital lobes (for example, lighter and darker blue for [p.sub.x]) indicate the phase of the orbital. In this way, complete information about electron and hole states can be obtained. For example, state symmetries can be discerned immediately from these visualizations. Such symmetries are more difficult to discern otherwise.

Even for these examples, the amount of data to be visualized can be prohibitive. Coarser grained visualizations can avoid that problem. Figure 12 is an example of contours and transparent surfaces which shows charge densities in a coarser grain way. The figure shows the atomic scale charge density of an electronic state trapped in the well region of a CdS/HgS/CdS core/well/clad nanoheterostructured nanocrystal.

[FIGURE 11 OMITTED]

We can do much more with the output of our nanostructure calculations. Our visual analysis capabilities include an immersive environment that allows scientists to interact with their data by navigating through a three-dimensional virtual landscape of the data rather than by simply viewing pictures of the data. Our nanostructure calculations output detailed charge distributions which are transferred to the NIST immersive environment where they can be studied interactively. One can move through space, going inside the structure and moving around inside the structure. In this way one can visualize the structure looking in from the outside, or looking out from the inside. One can visualize both the nanostructure (see, for example, Fig. 13), and the atomic scale variation of calculated nanostructure properties from any orientation and position in space. This is not possible with any static graphical representation. For an example of an interaction with a nanostructure in the immersive environment (which can be saved as a quicktime movie), see (18).

[FIGURE 12 OMITTED]

[FIGURE 13 OMITTED]

Our representations are tremendously helpful. They encapsulate en·cap·su·late
v.
1. To form a capsule or sheath around.

2. To become encapsulated.



en·cap
 the physics and allow one to easily see features that might be missed by just perusing the voluminous output from a supercomputer supercomputer, a state-of-the-art, extremely powerful computer capable of manipulating massive amounts of data in a relatively short time. Supercomputers are very expensive and are employed for specialized scientific and engineering applications that must handle very  size calculation. Such insights are very helpful and greatly speedup the extraction of useful understanding and insights as we explore the properties of new and unfamiliar systems.

6. Summary

In this article we discuss the use of high performance computing and visualization for the simulations of the nanoscale At nanometer size. Any device only a few nanometers in size is nanoscale. See nanotechnology and nanometer.  systems that would be used in emerging nanotechnologies, biosensors and quantum devices. Paradoxically, the properties of individual nanostructures often depend on their atomic scale structure while the complex device structures used in these nanotechnologies integrate multiple nanosystems and contain a million or more atoms. One must have a multiscale computational approach that allows one to routinely study systems with a million atoms or more, including the atomic scale detail. We use the tight-binding approach to include atomic-scale detail. We use code parallelization to make million atom calculations feasible. We have implemented a stitching approach to the parallelization to allow us to implement and study efficiently complex nanosystems built from heterogeneous building blocks. To mine the voluminous amounts of data that are generated, we used a variety of fine-grained and coarsegrained approaches that span the range from static representations to immersive visualization See virtual reality and CAVE. . The later allows us to move interactively to regions of high interest in complex structures to rapidly identify and isolate key features. However, immersive visualization still comes at the cost of tremendous hardware demands to run the immersions. Thus simpler representations still play a critical role in gaining insights into the physics and operations of these nanotechnologies and quantum devices.

Acknowledgments

We would like to thank Heidi Thornquist of Rice University for help extracting eigenvectors from PARPACK, Steve Plimpton for the irregular grid code, William Mitchell Noun 1. William Mitchell - United States aviator and general who was an early advocate of military air power (1879-1936)
Billy Mitchell, Mitchell
, for his expertise with ARPACK and Fortran 90, Julien C. Franiatte for writing the first version of the parallel code used in this work, Robert Bohn for help with figures, Denis Denis, king of Portugal: see Diniz.  Lehane, Carl Spangler and Michael Strawbridge for Linux cluster support, Chris Schanzle for Linux support, and Donald Koss for SGI (SGI, Sunnyvale, CA, www.sgi.com) A manufacturer of workstations and servers, founded in 1982 by Jim Clark. The company was founded as Silicon Graphics, Inc., but changed to its acronym in 1999.  support. One of us (JSS JSS Junior Secondary School
JSS JICO (Joint Interface Control Officer) Support System
JSS Javascript Style Sheets (Netscape)
JSS Network Security Services for Java
JSS Joint Support Ship
) would like to thank Kraemer Sims Becker for introducing him to quantum dots.

7. References

(1) J. S. Sims, J. G. Hagedorn, P. M. Ketcham, S. G. Satterfield, T. J. Griffin, W. L. George Walter Lionel George (1882 – 1926) was an English writer, born and brought up in Paris, France. He was known for novels and writings on feminism. Works
  • Engines of Social Progress (1907)
  • France in the twentieth century (1908)
, H. A. Fowler, B. A. am Ende, H. K. Hung, R. B. Bohn, J. E. Koontz, N. S. Martys, C. E. Bouldin, J. A. Warren, D. L. Feder, C. W. Clark, B. J. Filla, and J. E. Devaney, Accelerating Scientific Discovery Through Computation and Visualization, J. Res. Natl. Inst. Stand. Technol. 105 (6), 875-894 (2000).

(2) J. S. Sims, W. L. George, S. G. Satterfield, H. K. Hung, J. G. Hagedorn, P. M. Ketcham, T. J. Griffin, S. A. Hagstrom, J. C. Franiatte, G. W. Bryant, W. Jaskolski, N. S. Martys, C. E. Bouldin, V. Simmons, O. P. Nicolas, J. A. Warren, B. A. am Ende, J. E. Koontz, B. J. Filla, V. G. Pourprix, S. R. Copley, R. B. Bohn, A. P. Peskin, Y. M. Parker, and J. E. Devaney. Accelerating Scientific Discovery Through Computation and Visualization II. J. Res. Natl. Inst. Stand. Technol. 107 (3), 223-245 (2002).

(3) A. P. Alivisatos, Semiconductor clusters, nanocrystals, and quantum dots, Science 271, 933-937 (1996).

(4) B. Dubertret, P. Skourides, D. J. Norris, V. Noireaux, A. H. Brivanlou, and A. Libchaber, In Vivo Imaging of Quantum Dots Encapsulated in Phospholipid phospholipid (fŏs'fōlĭp`ĭd), lipid that in its simplest form is composed of glycerol bonded to two fatty acids and a phosphate group.  Micelles, Science 298, 1759-1762 (2002).

(5) M. Ouyang and David D. Awschalom, Coherent Spin Transfer Between Molecularly Bridged Quantum Dots, Science 301, 1074-1078 (2003).

(6) A. J. Williamson and A. Zunger, Pseudopotential study of electron-hole excitations in colloidal free-standing InAs quantum dots, Phys. Rev. B61, 1978-1991 (2000).

(7) P. Vogl, H. P. Hjalmarson, and J. D. Dow, A Semi-emperical Tight-binding Theory of the Electronic-structure of Semiconductors, J. Physics and Chemistry of Solids 44 (5), 365-378 (1983).

(8) G. W. Bryant and W. Jaskolski, Designing quantum dots and quantum-dot solids, Physica E 11 (2-3), 72-77 (Oct. 2001).

(9) Garnett W. Bryant and W. Jaskolski, Tight-binding theory of quantum-dot quantum wells A quantum well is a potential well that confines particles, which were originally free to move in three dimensions, to two dimensions, forcing them to occupy a planar region. : Single-particle effects and near-band-edge structure, Phys. Rev. B67, 205320 (2003).

(10) D. C. Sorensen, Implicit application of polynomial polynomial, mathematical expression which is a finite sum, each term being a constant times a product of one or more variables raised to powers. With only one variable the general form of a polynomial is a0xn+a  filters in a k-step Arnoldi method, SIAM Journal on Matrix Analysis and Applications 13 (1), 357-385 (Jan. 1992).

(11) G. L. G. Sleijpen and H. A. Van der Vorst, A Jacobi-Davidson Iteration Method for Linear Eigenvalue eigenvalue

In mathematical analysis, one of a set of discrete values of a parameter, k, in an equation of the form Lx = kx. Such characteristic equations are particularly useful in solving differential equations, integral equations, and systems of
 Problems, SIAM Review 42 (2), 267-293 (2001).

(12) Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, Templates for the Solution of Algebraic 1. (language) ALGEBRAIC - An early system on MIT's Whirlwind.

[CACM 2(5):16 (May 1959)].
2. (theory) algebraic - In domain theory, a complete partial order is algebraic if every element is the least upper bound of some chain of compact elements.
 Eigenvalue Problems SIAM, Philadelphia (2000).

(13) K. Maschhoff and D. Sorensen, An Efficient Portable Large Scale Eigenvalue Package for Distributed Memory Parallel Architectures, 1996. Available from <http://www.caam.rice.edu/software/ARPACK>. Accessed 08 April 2007.

(14) Parallel Algorithms In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is one which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result. , Available from: <http://www.cs.sandia.gov/sjplimp/algorithms. html>. Accessed 06 January 2006.

(15) W. Jaskolski, M. Zieli'nski, and G. W. Bryant, Electronic properties of quantum-dot molecules, Physica E 17, 40-41 (2003).

(16) W. Jaskolski, M. Zieli'nski, G. W. Bryant, and J. Aizpurua, Strain effects on the electronic structure of strongly coupled self-assembled InAs/GaAs quantum dots: Tight-binding approach, Phys. Rev. B 74, 195359 (2006).

(17) J.G. Diaz and G. W. Bryant, Electronic and optical fine structure of GaAs nanocrystals: The role of d orbitals in a tight-binding approach, Phys. Rev. B 73, 75329 (2006).

(18) Visualization of Nano structures, Available from;

<http://math.nist.gov/mcsd/savg/vis/nano/nano.sample.mov>, Accessed 17 April 2007.

About the authors: James S. Sims, William L. George, Terence J. Griffin, John G. Hagedorn. Howard K. Hung, John T. Kelso, Adele P. Peskin, and Steven G. Satterfield are computational scientists The term computational scientist is used to describe someone skilled in scientific computing. This person is usually a scientist, an engineer or an applied mathematician who applies high performance computers in different ways to advance the state-of-the-art in their respective  and Marc Olano is a Faculty Appointee APPOINTEE. A person who is appointed or selected for a particular purpose; as the appointee under a power, is the person who is to receive the benefit of the trust or power.  in the Scientific Applications and Visualization Group, Mathematical and Computational Sciences | Computational science (or scientific computing) is the field of study concerned with constructing mathematical models and numerical solution techniques and using computers to analyze and solve scientific, social scientific and engineering problems.  Division, of the NIST Information Technology Laboratory. Marc Olano is also an Assistant Professor in the Computer Science and Electrical Engineering electrical engineering: see engineering.
electrical engineering

Branch of engineering concerned with the practical applications of electricity in all its forms, including those of electronics.
 Department at the University of Maryland, Baltimore University of Maryland, Baltimore, (also known as UMB) was founded in 1807. It is one of the oldest universities in the United States and comprises some of the oldest professional schools in the nation and world.  County. Judith Devaney Terrill is Group Leader of the Scientific Applications and Visualization Group. Garnett W. Bryant is Group Leader of the Quantum Processes and Metrology Group in the Physics Laboratory. Jose Diaz José Díaz, Jose Díaz, José Diaz, or Jose Diaz
  • José Díaz, Spanish politician
  • Jose "Pepi" Diaz, American attorney and The Apprentice contestant
  • José R.
 is a former guest researcher working with Garnett Bryant. The National Institute of Standards and Technology is an agency of the U.S. Department of Commerce.

(1) Because we put all atoms with the same z on the same processor, the number of atoms on different processors may be different. In practice the difference between the maximum and minimum number of atoms in a layer is about 10%.

(2) Certain commercial equipment, instruments, or materials are identified in this paper to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.

(3) There is an anomaly at 10 processors which is due to the number of iterations to convergence. This is because convergence depends on the choice of initial vector and ARPACK's choice of an initial vector is different for different numbers of processors. We find a 10% to 30% deviation in our runs coming from this factor.

James S. Sims, William L. George, Terence J. Griffin, John G. Hagedorn, Howard K. Hung, John T. Kelso, Mare Olano, Adele P. Peskin, Steven G. Satterfield, and Judith Devaney Terrill

Math and Computational Sciences Division (ITL ITL

The ISO 4217 currency code for the Italian Lira.
), National Institute of Standards and Technology, Gaithersburg, MD 20899-8911 and

Garnett W. Bryant and Jose G. Diaz

Atomic Physics atomic physics

Scientific study of the structure of the atom, its energy states, and its interaction with other particles and fields. The modern understanding of the atom is that it consists of a heavy nucleus of positive charge surrounded by a cloud of light, negatively
 Division (PL),

National Institute of Standards and Technology,

Gaithersburg, MD 20899-8423

james.sims@nist.gov

william.george@nist.gov

terence.griffin@nist.gov

john.hagedorn@nist.gov

hhung@verizon.net

john.kelso@nist.gov

marc.olano@nist.gov

adele.peskin@nist.gov

steven.satterfield@nist.gov

judith.terrill@nist.gov

garnett.bryant@nist.gov

gabriel_fyq@hotmai1.com

This is the third in a series of articles that describe, through examples, how the Scientific Applications and Visualization Group (SAVG) at NIST has utilized high performance parallel computing, visualization, and machine learning to accelerate scientific discovery. In this article we focus on the use of high performance computing and visualization for simulations of nanotechnology.

Key words: high-performance computing High-speed computing, which typically refers to supercomputers used in scientific research. ; MPI MPI - Message Passing Interface ; nanotechnology; parallel computing; quantum dots; RAVE; tight-binding; virtual measurements; visualization.

Accepted: May 19,2008

Available online: http://www.nist.gov/jres
COPYRIGHT 2008 National Institute of Standards and Technology
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Sims, James S.; George, William L.; Griffin, Terence J.; Hagedorn, John G.; Hung, Howard K.; Kelso,
Publication:Journal of Research of the National Institute of Standards and Technology
Article Type:Report
Geographic Code:1USA
Date:May 1, 2008
Words:6439
Previous Article:Preparation of reference material 8504, transformer oil.
Next Article:Uncertainty due to finite resolution measurements.
Topics:



Related Articles
Wee dots yield rainbow of molecule markers.
Novel computation enables best-yet estimate of ground state of neutral helium. (News Briefs).
Accelerating scientific discovery through computation and visualization II.
SGI ALTIX SCALABLE LINUX SUPERCOMPUTING CLUSTERS DEBUT.
An electron runs through it: surprising rivulets and ripples complicate the microchip picture.
Making a little progress: nanotechnology takes on cancer.
National Institute of Advanced Industrial Science and Technology (AIST) Develops Optical Control Device Structure for Quantum Computers.
AIST Analyses Primary Processes in the Growth of Quantum Dot Nano-crystals at Solid-Liquid Interfaces; Opening Doorway to Clarification of...
Quantum information processing; from theory to experiment.

Terms of use | Copyright © 2012 Farlex, Inc. | Feedback | For webmasters | Submit articles