Programming Hive

By Edward Capriolo, Dean Wampler

Need to maneuver a relational database software to Hadoop? This accomplished consultant introduces you to Apache Hive, Hadoop’s info warehouse infrastructure. You’ll quick methods to use Hive’s SQL dialect—HiveQL—to summarize, question, and learn huge datasets saved in Hadoop’s dispensed filesystem.

This example-driven advisor indicates you the way to establish and configure Hive on your surroundings, presents a close evaluation of Hadoop and MapReduce, and demonstrates how Hive works in the Hadoop surroundings. You’ll additionally locate real-world case reports that describe how businesses have used Hive to resolve exact difficulties regarding petabytes of data.

  • Use Hive to create, regulate, and drop databases, tables, perspectives, services, and indexes
  • Customize info codecs and garage concepts, from documents to exterior databases
  • Load and extract info from tables—and use queries, grouping, filtering, becoming a member of, and different traditional question methods
  • Gain top practices for growing person outlined capabilities (UDFs)
  • Learn Hive styles you can use and anti-patterns you want to avoid
  • Integrate Hive with different facts processing programs
  • Use garage handlers for NoSQL databases and different datastores
  • Learn the professionals and cons of operating Hive on Amazon’s Elastic MapReduce

Show description

Preview of Programming Hive PDF

Best Computers books

Networks: An Introduction

The medical examine of networks, together with computing device networks, social networks, and organic networks, has acquired a huge volume of curiosity within the previous couple of years. the increase of the web and the extensive availability of cheap pcs have made it attainable to assemble and examine community information on a wide scale, and the advance of various new theoretical instruments has allowed us to extract new wisdom from many alternative forms of networks.

LaTeX: A Document Preparation System (2nd Edition)

LaTex is a software program method for typesetting files. since it is principally reliable for technical files and is accessible for nearly any desktop process, LaTex has develop into a lingua franca of the clinical international. Researchers, educators, and scholars in universities, in addition to scientists in undefined, use LaTex to supply professionally formatted papers, proposals, and books.

Building a WordPress Blog People Want to Read

Having your individual web publication is not just for the nerdy anymore. this present day, it sort of feels everyone—from multinational organisations to a neighbor up the street—has a weblog. all of them have one, partially, as the fogeys at WordPress make it effortless to get one. yet to truly construct a great blog—to create a web publication humans are looking to read—takes suggestion, making plans, and a few attempt.

AutoCAD 2008 For Dummies

A gradual, funny creation to this fearsomely advanced software program that is helping new clients begin growing second and 3D technical drawings immediately Covers the hot good points and improvements within the newest AutoCAD model and gives insurance of AutoCAD LT, AutoCAD's lower-cost sibling issues lined contain making a uncomplicated format, utilizing AutoCAD DesignCenter, drawing and modifying, operating with dimensions, plotting, utilizing blocks, including textual content to drawings, and drawing on the web AutoCAD is the prime CAD software program for architects, engineers, and draftspeople who have to create specified second and 3D technical drawings; there are greater than five million registered AutoCAD and AutoCAD LT clients

Additional resources for Programming Hive

Show sample text content

GetStandardStructObjectInspector( fieldNames, fieldOIs); } ... the method technique merely returns a unmarried row. even if, every one aspect within the item array may be certain to a particular variable: ... @Override public void process(Object[] os) throws HiveException { despatched = new Text(((StringObjectInspector)args[0]) . getPrimitiveJavaObject(os[0])); String components = new String(this. despatched. getBytes()); String [] half = components. split("\\|"); forwardObj[0]=Integer. parseInt( part[0] ); forwardObj[1]=part[1] ; forwardObj[2]=part[2]. split(","); this. forward(forwardObj); } @Override public void close() throws HiveException { } } we now have the decision to the ebook UDTF with AS, which permits the outcome columns to be named via the person. they could then be utilized in different components of the question with no need to parse info from the publication back: consumer. execute( "create transitority functionality e-book as 'com. jointhegrid. udf. gather. UDTFBook'"); purchaser. execute("create desk booktest (str string) "); consumer. execute( "load information neighborhood inpath '" + p. toString() + "' into desk booktest"); purchaser. execute("select book(str) AS (book, identify, authors) from booktest"); [555 Programming Hive "Dean","Jason","Edward"] having access to the allotted Cache from a UDF UDFs might entry documents contained in the disbursed cache, the neighborhood filesystem, or perhaps the dispensed filesystem. This entry can be used carefully because the overhead is critical. a typical utilization of Hive is the studying of net logs. a favored operation is figuring out the geolocation of net site visitors in keeping with the IP deal with. Maxmind makes a GeoIP database on hand and a Java API to go looking this database. by means of wrapping a UDF round this API, situation info will be seemed up approximately an IP deal with from inside a Hive question. The GeoIP API makes use of a small info dossier. this can be excellent for exhibiting the performance of gaining access to a dispensed cache dossier from a UDF. the entire code for this instance is located at https://github. com/edwardcapriolo/hive-geoip/. upload dossier is used to cache the required information records with Hive. upload JAR is used so as to add the necessary Java JAR documents to the cache and the classpath. eventually, the transitority functionality needs to be outlined because the ultimate step sooner than acting queries: hive> upload dossier GeoIP. dat; hive> upload JAR geo-ip-java. jar; hive> upload JAR hive-udf-geo-ip-jtg. jar; hive> CREATE transitority functionality geoip > AS 'com. jointhegrid. hive. udf. GenericUDFGeoIP'; hive> opt for ip, geoip(source_ip, 'COUNTRY_NAME', '. /GeoIP. dat') FROM weblogs; 209. 191. 139. two hundred usa 10. 10. zero. 1 Unknown the 2 examples again contain an IP handle within the usa and a personal IP deal with that has no fastened deal with. The geoip() functionality takes 3 arguments: the IP handle in both string or lengthy layout, a string that needs to fit one of many constants COUNTRY_NAME or DMA_CODE, and a last argument that's the identify of the knowledge dossier that has already been positioned within the allotted cache. the 1st name to the UDF (which triggers the 1st name to the evaluation Java functionality within the implementation) will instantiate a LookupService item that makes use of the dossier situated within the disbursed cache.

Download PDF sample

Rated 4.74 of 5 – based on 36 votes