Pages

Wednesday, April 13, 2022

Data Structures and Algorithmic Design Techniques for Beginners

 


Data Structures are simply implementations of other data types that allow programmers to store and arrange data on computers.  Data structures can be linear (LinkedList, Array, Stack, Queue, etc...) or non-linear (Trees, Graphs, etc...).  Which data structure to use largely depends on what data you want to store and how you want to interact or access its elements.

For example, if a requirement of a program is to access the data using FIFO or LIFO, then a Stack or Queue data will be the most appropriate data structure for this application.  Conversely, if you want to store data that is hierarchical in nature, then a Tree will provide the best structure for this application.

There isn't really a best or worst data structure out there without knowing the context in which it will be used; however, after knowing the context, some data structures will prove to be more efficient than others given their specific interfaces and the time it takes to iterate through the data to perform a given task.

An important component in selecting the appropriate data structure is determining the algorithm that will be used to interact with the data.  An algorithm is a procedure to solve a particular problem that requires n number of steps to finalize its task given an input.

There are three main categories of algorithms.  Recursion, Exact or Approximate, and Serial, Parallel, or Distributed algorithms.  A recursive algorithm is one that calls itself in a loop until a specific condition is met.  An Exact algorithm includes most sorting algorithms.  A Serial algorithm is one that executes one instruction at a time, whereas a Parallel algorithm divides the problem into subproblems and executes each of them in a different process or thread.

With all this taken into consideration, if I had a problem to solve on a large dataset that required multiple problems to be solved independently in the most expeditious manner, I would use a Parallel algorithm such that I could take advantage of a multithreaded processor to increase overall processing speed.  The data structure to be used in this case will depend on what the data is, but say it was employee data sorted hierarchically to reflect reporting lines, a Tree would be used along with that parallel algorithm.

Cheers!

Hugo

Wednesday, March 16, 2022

Getting Started with Java

 Java is an object-oriented programming language developed by Sun Microsystems with the primary objective of providing a development framework that is platform-independent.  This meant that the same code could be executed on computers running different operating systems, something that always required porting.  Oracle purchased Sun Microsystems and is now the company that develops and supports the Java platform.

We will talk about Object-oriented programming in a bit, but first, as a beginner java coder, you will want to start with installing the required software to write your Java code and to execute it.  This includes an IDE (Integrated Development Environment) and the Java SDK (Software Development Kit).  The Java SDK includes that Java Virtual Machine runtime that actually executes your code within its environment.  There are SDK's for most platforms, including Windows, MacOS, Linux, and several other UNIX variants.

There are many IDE's available for Java development with many bells and whistles.  My recommendation, however, is to start with one that is not complicated to install or run.  NetBeans is one IDE that is popular and not too complex to configure, and it is also recommended by Oracle.  Before NetBeans or another Java IDE is installed, you should install the Java SDK.  Most IDE's will require Java SDK version 8 or later.  This can be downloaded directly from Oracle here.

Now back to Object-Oriented Programming (OOP).  The main objective behind OOP, is to make programming easier and faster by reusing code.  Imagine that you had a program that interacted with three balloons, each with different properties (color, size, shape, etc...).  Without OOP, a programmer would have to write the code for each of the balloons to represent all of its properties and actions/functions.

OOP solves this issue by defining classes that can be used to create objects.  So in the example above, a balloon would be the class and the three balloons with different properties would each be created (instantiated) from the balloon class, thereby inheriting all of the properties and functions from the original balloon class.  Every time you create a new object based on a predefined class, you can set or change the default property values.

Computing101.net has a great article on their website going over the concepts of OOP.  The article can be found here:  https://www.101computing.net/object-oriented-programming-concepts/.  Once you've understood the concepts of OOP, I suggest you review TutorialsPoint's Java Tutorial.  It provides some of the basics and examples for beginners.

I hope this has been helpful, and wish you good luck as you embark on your Java development journey.

Cheers!

Hugo



Monday, March 14, 2022

 Operating Systems Theory & Design

Modern operating systems have common major functions and services.  While their specific implementation may vary between the operating system and computing environment, they all have User and System services that provide the main functions of any OS.  These include Program Execution, I/O Operations, Resource Allocation, File Systems, Protection & Security, and Error Detection, among others.


Figure 1- Major Functions and Services of Operating Systems

 

Processes Control & Management

Operating systems enable processes to share, exchange and manage information and resources.  This function is implemented via a Process Control Block, which is a data structure that contains all process-related information, including the current state of the operating system.  From the process ID, registers, memory, and files used, the Process Control Block (PCB) is the quarterback of every process.

Figure 2- Process, Threads and Process Synchronization

 Operating systems enable processes to share, exchange and manage information and resources.  This function is implemented via a Process Control Block, which is a data structure that contains all process-related information, including the current state of the operating system.  From the process ID, registers, memory, and files used, the Process Control Block (PCB) is the quarterback of every process.

Modern processors and computing needs have demanded a solution that allowed multiple processes to be executed in parallel as opposed to sequentially.  To accomplish this, multi-core and multi-threaded processors were developed to allow for processes with multiple threads to be executed.  A thread is considered to be a light process, but can only exist inside a process.  The benefit is that every thread has its own registers, counter, stack, but shares data, files, and code with the other processes being executed in parallel.  This has vastly improved process execution.

Critical Section Problem

The critical-section is a code segment that accesses shared variables and has to be executed as an atomic action. The critical section problem refers to the problem of how to ensure that at most one process is executing its critical section at a given time.

For example, if two processes are running, and both need to access the same file, the software needs to be written in a way that these two processes are not executing their critical-sections at the same time. Otherwise, one process may be editing a file that the other process is trying to access at the same time. This can cause the program to throw an exception error.

The critical-section problem can be solved by implementing mutual exclusion in the code, progress, and bounded waiting.  The code below is an example using the Peterson Solution:

 do {

    flag[i] = true;

    turn = j;

    while(flag[j] && turn == j);

    critical section

    flag[i] = false;

 remainder section

}

while(true);

 In this solution, when a process is executing in a critical state, then the other process only executes the rest of the code, and the opposite can happen. This method also helps to make sure that only a single process runs in the critical section at a specific time.

Memory Management

Memory management is a main function of the Operating System.  Its primary responsibility is to manage memory and move processes and related process data back and forth between the physical memory and the virtual memory on the disk.  It decides which process will get what memory at what time, based on the knowledge of what memory is free and available.

Memory in a computer is dynamically allocated depending on the needs of the application and processes being executed, and it is freed up when the process no longer requires the memory, thus allocating that slot of memory to another process if needed.

Ultimately how memory is managed will depend on how effective the hardware, operating system, and programs or applications are programmed and/or configured.

 

Figure 3- Memory Management

File Systems Management

File management is a key function of the operating system.  It is also one that most of us interact with frequently, even if we don't specifically interact with the OS in a terminal.  All applications that open and save files are calling on the OS to locate, open, edit, save or delete files.

A file is a collection of information saved on secondary storage and loaded into memory.  File management can be thought of as the process of manipulating files on a computer.  There are several logical structures of a directory as shown in Figure 4 below:

Figure 4- File Systems Management / Directory Structures

  Each of these directory structures provides advantages and disadvantages.  For example, the single directory structure is easy to implement, and manipulating files in this structure is easier and faster.  A disadvantage in this structure is that name collisions can occur and searching can become time-consuming if a directory structure is long.

The two-level directory is a bit more robust in that you can have different users, and each can contain directories and files of the same name.  Searching in this structure also becomes easier; however, a user cannot share files with other users.

Tree-structured directories are the most common.  This is what we experience in the Windows OS environment.  It is scalable, easy to search, and can use both absolute and relative paths.  Despite its popularity, this structure cannot share files and is considered inefficient.

The acyclic graph is a graph with no cycle and allows users to share subdirectories and files. The same file or subdirectories may be in two different directories. I think of it as a souped-up tree-structured directory.

In general graph directory structure, cycles are allowed within a directory structure where multiple directories can be derived from more than one parent directory. The main problem with this directory structure is to calculate the total size or space that has been taken by the files and directories.

OS Security & Protection

The goal of security is to protect the computer system from various threats, including malicious software such as worms, trojans, viruses, and unwanted remote intrusions.  Security typically involves the implementation of control techniques that can protect a computer from unauthorized changes to files, configurations, and even access to specific applications and resources attached to it.

The most common methods employed in protecting computer systems include the use of antivirus and antimalware software, as well as other administrative system policies requiring regular OS system updates, user access control, and even physical access control.

Figure 5- Security and Protection

Conclusion

Having a more in-depth understanding of all of these important concepts about operating systems will play a key role in my future as an IT professional. From deployment of security systems to programming for specific environments, my understanding of resource management will be helpful in delivering a more optimized product for my internal stakeholder.

 

Tuesday, June 1, 2021

Tech Topic: Programming Languages

Source: Hack Reactor

            Programming languages are the fuel that makes information technology function.  While many advances in technology come from the semiconductor sector, just as many are attributable to advances in the way instructions are coded into machines, that is, via programming languages.

Thursday, May 27, 2021

Network Security

 Information and System Security

A basic tenet in the relationship between employees and employers is the trust they place in each other.  This trust at its foundation begins with the security of both parties' information.  On the one hand, the employer should have in place security layers protecting company information, including that of its employees.  On the other hand, employees should follow company guidelines on handling and sharing of company information.  Employers should invest in training and educating their employee base about the threats and risks that exist to its systems and valuable information.

Computers in the Workplace: Midstream Oil & Gas

I have worked in the midstream oil & gas industry for nearly three decades; during this period, I have been involved from the design and construction to commissioning and operation of oil & gas pipelines and related facilities.  The technology used across the value chain is broad, has grown and evolved, and has become more and more dependent on it as the operation and control of these systems become more complex.

Midstream Oil & Gas involves the gathering and transportation of oil, gas, and other related products from the production field to processing plants and other industrial users.  Computers used in the space are responsible for the following aspects:

Traveling Through a Network

I used the ping an traceroute commands directly from the Terminal utility app on my Mac Pro.  The first thing I noticed between the two is that the Ping command sends 56-byte packets, while Traceroute command sends 52-byte packets.  I also found that a few of these websites I pinged had multiple IP addresses returned from the DNS lookup that occurs when the Ping command is executed.  It appears that the Ping command selects only of the IP addresses returned to send the packets to.  This is also true for the Traceroute command.

I pinged google.com, gov.au, amazon.co.jp, and china.org.cn.  In all cases, the Ping command sent away a 54-byte packet and received multiple 64-byte packets in return.  The fastest to return the packets was gov.au with a TTL of 58 and an average time of 9 ms.  The slowest to return packets was amazon.co.jp with a TTL of 226 and an average time of 48 ms.  None of the pings timed out, but I did interrupt the command after ten packets returned.

6047601E-E5FF-477D-95D9-8807434A5BF1.jpeg