Data collected till 2010 is less than the data collected in last four years. With increase in competition, management level team for each firm feel the need of analytic.
Analytics
Analytics include obtained marketing patterns, demand analysis and better technologies to grow economically and intellectually etc.. Networking or Communication is considered as another important factor in improving business. Analytic is based on the size of company, competitors, marketing strategy, consumer demand etc.
Information Technology
Information Technology is a vast area of study. Using various analysis techniques, one can grow his business. Databases are the major part of IT field in business. It can maintain the details of people associated with their business. IT can perform operations on the data carried by an organization in terms of providing various answers to questions being framed for future betterment. However, business people are deprived of IT knowledge.
Business Analysts
Business people can perform various financial and management operations. They have ample of marketing strategies. These marketing strategies can be both functional or non-functional. However, it gets difficult for business analysts to mark patterns like those of relational analysis between product sale, which sale is suitable in which period of time, where can it hit the competition etc.. Obtaining these patterns from ample of data gets impossible to calculate manually. Here, we need IT professionals and database technologies.
Integration
Business Intelligence is the integrated field of management and IT. Business analysts undergo the study of data mining and data warehousing type of concepts to understand and implement new techniques for betterment of their marketing strategies. Thus, we need analytic as important part of business.
Database at single node
Long time ago, the entire database used to get available at single node. This single node is said to be mainframe and entire data used to get stored there. Access to the data used to be very slow and it was getting expensive to maintain such large amount of data at single node by search engines. Failure of one node (mainframe) makes it difficult to provide reliable services. It was not scalable. These issues were resolved by distributed system introduced by Google.
Distributed File System
In 1995, Google introduced search engine which proved to be better source of information than all other search engines. However, Google did not reveal it's concept. In 2000, Google got famous. Thereby, in 2003 and 2004, Google reveal its white papers (research papers of the firm), in which they mentioned about the concept and techniques they followed. Thereafter, in 2005, Hadoop was introduced. In 2006, Apache gets associated with Hadoop.
Hadoop
In Hadoop, there are two constituents namely MapReduce and HDFS (Hadoop File System). This is said to be Hadoop distributed system. Externally there exists projects which include handling data via different tools including PIG, Hive, HBase, Mahout etc..
The basic concept of Hadoop is that it contains one main node in which name node, data node, job tracker and task tracker. There are multiple slave node under this main node. Each of these slave node contain data node and task tracker. The data file gets divided into multiple blocks of about 64MB each and multiple copies are made for each block usually 3. Now, these blocks are stored on nodes picked by HDFS randomly. Name node of main node contains information about data nodes of slave nodes. Similarly, one computation is distributed among different nodes to perform several parallel computations and unite them at once. This is taken care by task tracker which keeps the track of job tracker for each node.
If any one slave node gets failed due to some unavoidable reason, the copy of data also reside on other nodes and it can be recovered from there until or unless the machine gets available again. There is one more node called secondary node. The secondary node works as substitute to main node. It keeps the periodic backup for the computations being performed by main node using various slave nodes. Secondary node is used just to recover the main node. However, it does not mean that we can use secondary node as main node. This also eliminates fault tolerance, is more reliable, permit parallel access to data and data access is relatively fast.
Business Intelligence and Hadoop
Along with the maintenance of data, there exists need of conversion of unstructured data to structured data today. It gets difficult to perform such operations. However, it has been expected that by 2015, 90% of the data will be structured and maximum MNC's shall be working with Hadoop. Hadoop is one of the upcoming scope for jobs these days in IT sector. Many IT professionals who started their career in Hadoop two years ago, have hike of 300% in their package i.e. from 3.5 lpa to 10 lpa.
Apart from being an interesting area of job, Hadoop deals with many futuristic plans.
Mishty...
No comments:
Post a Comment