How the Java Virtual Machine Works - An Insider's View | PaperHelp

How the Java Virtual Machine Works - An Insider's View | PaperHelp



It will be beneficial for every Java developer to understand what a JVM is, how code gets there and how it is executed. The article is more suitable for beginners, but experienced programmers will find something new here even more. The article briefly describes how a class file is organized and how virtual machines process and execute bytecode.


The main goal of Java developers was to create portable applications. The JVM plays a central role in portability-it provides the proper level of abstraction between the compiled program and the underlying hardware platform and operating system. Despite this extra "layer," application speeds are unusually fast because the byte code that the JVM executes, and the JVM itself, are perfectly optimized. Let's consider JVM's working scheme in this writing help about Java.


The structure of a class file


Let's write a simple application and compile it. The compiler carefully creates a file with the extension class and puts all information about our mini-application for JVM. What will we see inside? The file is divided into ten sections, which are strictly defined and define the whole structure of the class file.


The file starts with a magic number: 0xCAFEBABE. This number is present in every class and is an obligatory flag for the JVM: it lets the system know it is a class file.


The next four bytes of the class file contain the major and minor version numbers of Java. They identify the format version of a particular class file and allow the JVM to check whether it can be supported and loaded. Each JVM has a limit to the performance it can load - later versions will be ignored. As you can see in the example file above, we have version 0x34, corresponding to Java SE 8. For Java, SE 11 will be the value 0x37.


From the ninth byte comes the constant pool, which contains all the constants of our class. Since each type can have a different number of constants, before the array is a variable indicating its length, so the constant pool is an array of variable length. Each regular occupies one element in the collection. The constants are indicated by an integer index that denotes their position in the exhibition throughout the class file. The initial constant has an index of 1. The second constant has an index of 2, etc.


If the tag indicates that the constant is a string, the JVM receives the value of tag one and treats the number following the tag as the length of the array of bytes, which must be read to get the string we want entirely.


After reading a block with constants, the JVM moves on to the following two bytes, the access flags, which determine whether this file describes a class or an interface, public or abstract, and whether the course is final.


The names of the class and its parent class are stored in an array of constants, which are pointed to by the following 4 bytes in the file.


Things are a little different with interfaces. Since the class can inherit from multiple interfaces simultaneously, it is necessary to store an array of references to the pool of constants. The definition of the class and its parent class is followed by the number characterizing the size of the variety of interfaces and the array itself.


Next, we move on to the most crucial place in any class - its methods; they are where all the logic of any program is concentrated, all the executable bytecode.


The situation is precisely the same as with the fields described above. The variable-length array contains structures that contain the full description of the method signature: access modifiers, method name, and its attributes, which also represent a structure because there can be many of them, and each of them can belong to different types.


In the last block, there is additional meta-information, for example, the name of the file which has been compiled. It may or may not be present. In case of some problems, JVM ignores this block.


We have considered the file structure and are ready to go to the next part: loading the class file into JVM and executing bytecode from this class. To consolidate your knowledge about the class-file structure, you can use the built-in Java decompiler and see the result of its execution with -c 

-verbose keys (javap -c -verbose TestJava.class).


Loading classes


Now, having figured out the general structure of the file, let's see how the JVM handles it.


To get to the JVM, a class has to be loaded. There are unique class loaders for this:


  • Bootstrap, the base loader, loads platform classes. This loader is the parent of all other courses and part of the platform.
  • Extension ClassLoader - extension loader, a descendant of Bootstrap loader. Loads extension classes that are in the JRE/lib/ext directory by default.
  • AppClassLoader is a system classloader from the classpath, which is a direct descendant of Extension ClassLoader. It loads classes from directories and jar files specified by the CLASSPATH environment variable, the Java.class.path system property, or the -classpath command line parameter.
  • Own loader - an application can have its loaders.


The system loader always loads the main class, while different custom loaders can load other types. It is worth mentioning that the loader's name creates a unique namespace, which means that in the program, there can be several classes with the same full name if different loaders process them.

Therefore, each loader delegates its authority to its parent, i.e., before searching for a class to load, it will try to determine whether the required class has not been loaded before.


After the class is loaded, the linking stage begins, which is divided into three parts.


  • Bytecode verification. It is a static code analysis performed once per class. The system checks if there are any errors in the bytecode. For example, it checks the correctness of instructions, stack overflow, and compatibility of variable types.
  • Allocation of memory for static fields and their initialization.
  • Allowing character references - JVM substitutes references to other classes, methods, and fields. In most cases, this happens lazily, i.e., the first time a class is referenced.


The class is initialized, and the JVM can start executing bytecode methods.


The JVM receives one bytecode stream for each method in the class. The method bytecode is executed when that method is called in the course of the program. The method bytecode stream is a sequence of instructions for the Java Virtual Machine. Each education consists of a one-byte operation code, which multiple operands can follow. The operation code specifies the action to be taken. There are currently over 200 operations in total in Java. All operation codes occupy only 1 byte because they were designed to be compact, so their maximum number cannot exceed 256.


The JVM is based on the stack-the basic instructions work with it.


Method calls


Java provides two main types of methods: instance methods and class methods. Instance methods use dynamic (late) binding, while class methods use static (early) binding.


The Java virtual machine calls a class method by selecting it based on the object reference type, which is always known at compile time. On the other hand, when a virtual machine calls an instance method, it selects a method to call based on the actual object class, which can only be known at runtime. Therefore, different instructions are used to call methods: invokevirtual and invoke static. These functions reference an entry in the constant pool as the full path to the required part. The virtual machine removes the desired number of variables from the stack and passes them to the method.


Conclusion


Bytecode is initially interpreted in most JVMs, but as soon as the system notices that some code is used very often, it hooks the built-in compiler, which compiles bytecodes to machine code, thus speeding up the application significantly.