Lotus what? Nah mate, the world has moved on.

Friday, 13 November 2015

NVIDIA® Jetson™ TX1 Supercomputer-on-Module Drives Next Wave of Autonomous Machines | Parallel Forall

Today NVIDIA introduced Jetson TX1, a small form-factor Linux system-on-module, destined for demanding embedded applications in visual computing.  Designed for developers and makers everywhere, the miniature Jetson TX1 (figure 1) deploys teraflop-level supercomputing performance onboard platforms in the field.  Backed by the Jetson TX1 Developer Kit, a premier developer community, and a software ecosystem including Jetpack, Linux For Tegra R23.1, CUDA Toolkit 7, cuDNN, and VisionWorks, Jetson enables machines everywhere with the proverbial brains required to achieve advanced levels of autonomy in today’s world.
Aimed at developers interested in computer vision and on-the-fly sensing, Jetson TX1’s credit-card footprint and low power consumption mean that it’s geared for deployment onboard embedded systems with constrained size, weight, and power (SWaP).  Jetson TX1 exceeds the performance of Intel’s high-end Core i7-6700K Skylake in deep learning classification with Caffe, and while drawing only a fraction of the power, achieves more than ten times the perf-per-watt.
Jetson provides superior efficiency while maintaining a developer-friendly environment for agile prototyping and product development, removing extra legwork typically associated with deploying power-limited embedded systems. Jetson TX1’s small form-factor module enables developers everywhere to deploy Tegra into embedded applications ranging from autonomous navigation to deep learning-driven inference and analytics.

Jetson TX1 Module

Built around NVIDIA’s 20nm Tegra X1 SoC featuring the 1024-GFLOP Maxwell GPU, 64-bit quad-core ARM Cortex-A57, and hardware H.265 encoder/decoder, Jetson TX1 measures in at 50x87mm and is packed with performance and functionality. Onboard components include 4GB LPDDR4, 16GB eMMC flash, 802.11ac WiFi, Bluetooth 4.0, Gigabit Ethernet, and accepts 5.5V-19.6VDC input (figure 2).  Peripheral interfaces consist of up to six MIPI CSI-2 cameras (on a dual ISP), 2x USB 3.0, 3x USB 2.0, PCIe gen2 x4 + x1, independent HDMI 2.0/DP 1.2 and DSI/eDP 1.4, 3x SPI, 4x I2C, 3x UART, SATA, GPIO, and others.  Needless to say, Jetson TX1 stands tall in the face of many an algorithmic and integration challenge.
Figure 2. Jetson TX1 block diagram. Blocks on the outside indicate typical routing on the carrier.
Figure 2. Jetson TX1 block diagram. Blocks on the outside indicate typical routing on the carrier.
The Jetson module utilizes a 400-pin board-to-board connector (figure 3) for interfacing with the Developer Kit’s reference carrier board, or with a bespoke, customized board designed during your productization process.  Tegra’s chip-level capabilities and I/O are closely mapped to the module’s pin-out.  The pin-out will be backward-compatible with future versions of the Jetson module.  Jetson TX1 comes with an integrated thermal transfer plate (figure 3), rated between -25°C and 80°C, for interfacing with passive or active cooling solutions.  Consult NVIDIA’s Embedded Developer Zone for thorough documentation and detailed electromechanical specifications, in addition to visiting the active and open development community on Devtalk.
Figure 3. Left to right: Top of Jetson TX1 module, bottom (with connector), and complete assembly with TTP.
Figure 3. Left to right: Top of Jetson TX1 module, bottom (with connector), and complete assembly with TTP.
Jetson TX1 draws as little as 1 watt of power or lower while idle, around 8-10 watts under typical CUDA load, and up to 15 watts TDP when the module is fully utilized, for example during gameplay and the most demanding vision routines.  Jetson TX1 provides exceptional dynamic power scaling either based on workload via its automated governor, or by explicit user commands to gate cores and specify clock frequencies. The four ARM A57 cores automatically scale between 102 MHz and 1.9 GHz, the memory controller between 40MHz and 1.6GHz, and the Maxwell GPU between 76 MHz and 998 MHz.  Touting 256 CUDA cores with Compute Capability 5.3 and Dynamic Parallelism, Jetson TX1’s Maxwell GPU is rated for up to 1024 GFLOPS of FP16.  When combined with support for up to 1200 megapixels/sec from either three MIPI CSI x4 cameras or six CSI x2 cameras, along with hardware H.265 encoder & decoder, integrated WiFi and HDMI 2.0, Jetson TX1 is primed for all-4K video processing. The Jetson TX1 module retails for $299 and has 5-year availability. In addition to releasing the ecosystem tools, NVIDIA has made available the Jetson TX1 Developer Kit to help users get started today.

Jetson TX1 Developer Kit

NVIDIA’s Jetson TX1 Developer Kit includes everything you need to get started developing on Jetson. Including the pre-mounted module, the Jetson TX1 Developer Kit (figure 4) contains a reference mini-ITX carrier board, 5MP MIPI CSI-2 camera module, two 2.4/5GHz antennas, an active heatsink & fan, an acrylic base plate, and a 19VDC power supply brick.
Figure 4. Jetson TX1 Developer Kit, including module, reference carrier and camera board.
Figure 4. Jetson TX1 Developer Kit, including module, reference carrier and camera board. (Click image to zoom)
The PCIe lanes on the Jetson TK1 Developer Kit are routed from the module to a PCIe x4 desktop slot on the carrier for easy prototyping, in addition to an M.2-E mezzanine with PCIe x1 for wireless radios.  Available on the Embedded Developer Zone, NVIDIA shares the schematics and design files for the reference carrier along with the 5MP CSI-2 camera module, including routing and signal integrity guidelines.  Board software support bundled by Jetpack provides easy flashing and device configuration.  Out of the box, the Jetson TX1 Developer Kit provides the experience of a desktop PC, but in a small embedded form factor that only draws a fraction of the power.  The Jetson TX1 Developer Kit is available for pre-order immediately for $599, with shipments beginning November 16 in the US and December 20 in Europe and APAC.
Select researchers had the chance to review the Jetson TX1 Developer Kit in the lead-up to launch.   MIT professor Dr. Sertac Karaman and his autonomous robotics lab worked hands-on with the new kit, upgrading their self-driving RACECAR from their previous Jetson TK1 setup.  Figure 5 shows their autonomous vehicle in action.
In addition to their autonomous RACECAR powered by Jetson TX1, Dr. Karaman’s lab at MIT is behind other projects that utilize Jetson for autonomy, as well.  In collaboration with MIT Media Lab’s Changing Places group on the Persuasive Electric Vehicle (PEV), their self-driving tricycle provides autonomous transport of pedestrians and packages in urban environments—and is also powered by Jetson.   Leveraging the ecosystem, the students at MIT quickly prototyped their projects and benefited from the flexible development environment and performance afforded by Jetson TX1.

Jetpack and Linux For Tegra R23.1

The software ecosystem for Jetson is extensive, and Jetpack simplifies software configuration and deployment.  Jetpack automates the installation process on Jetson to include all the tools and drivers for development.   Jetpack 2.0 is provided for Jetson TX1.  This version of Jetpack bundles Linux For Tegra (L4T) R23.1, Tegra System Profiler 2.4 and Graphics Debugger 2.1, PerfKit 4.5.0, and OpenCV4Tegra.  L4T R23.1 ships with U-Boot and Linux 3.10.64 aarch64 kernel, alongside the Ubuntu 14.04 armhf filesystem.  Recent improvements in L4T include gstreamer 1.6 extensions with hardware support for H.265, an improved nvgstcapture sample for testing the camera module, and integrated support for WiFi & Bluetooth.
L4T R23.1 includes support for full desktop OpenGL 4.5, allowing full-on Linux gaming/VR experience in addition to simulation.  OpenGL ES 3.1 is also provided.  This release includes OpenCV4Tegra, enabling users to transparently utilize NEON SIMD extensions from the standard OpenCV interface.  A video tutorial series on OpenCV is available through the Embedded Developer Zone.

CUDA 7 and cuDNN/Caffe

Jetpack 2.0 includes the CUDA Toolkit version 7.0,p with CUDA 7.5 coming in a future release. CUDA 7.0 unleashes Jetson TX1’s integrated Maxwell GPU.  Maxwell, with Compute Capability 5.3, supports Dynamic Parallelism and higher performance FP16.  The many uses for Dynamic Parallelism in embedded applications include point cloud processing & tree partitioning, parallel path planning & cost estimation, particle filtering, RANSAC, solvers, and many others.
One of the highlights of the Jetson software ecosystem is an incredible deep learning toolkit built on CUDA, providing Jetson with onboard inference and the ability to apply reasoning in the field. Included is NVIDIA’s cuDNN library, adopted by multiple deep learning frameworks including Caffe.
We ran a power benchmark using the Caffe AlexNet image classifier, comparing  Jetson TX1 to an Intel Core i7-6700K Skylake CPU. The table shows the results. Read more about these results in the post Inference: The Next Step in GPU-Accelerated Deep Learning”.
platformimg / sPower (AP+DRAM)Perf/wattEfficiency versus i7-6700K
 Intel i7-6700K24262.5W3.881x
 Jetson TX12585.7W4511.5x
Kespry Designs, a Silicon Valley industrial drone developer, is using deep learning on Jetson TX1 to provide inference on construction sites for asset tracking of equipment and materials. This takes the tiresome, human-intensive work out of looking after assets and on-site logistical planning.  Due to the low SWaP and computational capability of Jetson TX1, Kespry plans to migrate processing onboard Unmanned Aerial Vehicles instead of offline in the datacenter, shortening response times for tasks like inspection and triage.  See a short video about them in Figure 6.
Kespry developed their proof-of-concept on the Jetson TX1 Development Kit in just a few weeks.  The prototype uses a Caffe model trained to recognize and count different classes of construction equipment.  Using Jetson TX1, Kespry is now deploying this previously offline process in real-time onboard their drone.  Jetson is able to transfer resource-intensive tasks once performed in a datacenter onboard mobile platforms, thereby closing the loop on response and improving quick-reaction capabilities, creating new opportunities for companies like Kespry.


Jetson TX1 marks the first release of VisionWorks available to developers through Jetpack 2.0 and theEmbedded Developer Zone. Built on Khronos Group’s OpenVX standard for power-efficient vision processing, VisionWorks provides primitives and building blocks that are highly optimized for Tegra using tuned CUDA kernels. Figure 7 shows the results of benchmarks that we ran on Jetson TX1, profiling the differences between VisionWorks and OpenCV.
Figure 5. Benchmarks demonstrate the large speedup of VisionWorks vs. OpenCV running on the Jetson TX1 CPU and GPU.
Figure 5. Benchmarks demonstrate the large speedup of VisionWorks vs. OpenCV running on the Jetson TX1 CPU and GPU.
VisionWorks is more than 10x faster than upstream CPU-only OpenCV, is 4.5x faster than OpenCV4Tegra with NEON extensions, and is 1.6x faster than OpenCV’s GPU module.   The Overall Computer Vision Score was collected from the geometric mean performance of all the overlapping primitives between OpenCV and VisionWorks.  Each primitive was measured across image sizes 720p and larger, and across all permutations of argument parameters.
In addition to more than 50 filtering, warping, and image-enhancement primitives, VisionWorks also offers numerous higher-level building blocks as well, such as LK optical flow, stereo block-matching (SBM), Hough lines & circles, and Harris (Corner) feature-detection & tracking.   VisionWorks provides a full implementation of OpenVX 1.1.  Developers can leverage VisionWorks to deploy camera-ready algorithms and vision pipelines, already tuned for Jetson.
Get VisionWorks today on NVIDIA’s Embedded Developer Zone.

Jetson TX1: A Rich Development Platform

The NVIDIA Jetson ecosystem is rich with tools and support for enabling your research and development of applications and products with Jetson TX1. In the larger scheme, NVIDIA software toolkits for accelerated computing, deep learning, computer vision, and graphics are portable from the datacenter to the workstation to embedded SoC (figure 8), allowing enterprise users to seamlessly scale and deploy their applications to devices in the field.   Using Jetson, developers can leverage NVIDIA’s shared architecture and power-efficient technology to roll out high-performance embedded systems with ease and flexibility.
Figure 6. Jetson taps into the NVIDIA ecosystem to deliver unprecedented scalability and developer-friendly support.
Figure 6. Jetson taps into the NVIDIA ecosystem to deliver unprecedented scalability and developer-friendly support.
Adept at hosting core processing capabilities alongside learning-driven inference and reasoning, Jetson TX1 represents the ultimate in performance and efficiency for powering your device with the next wave of autonomy. With shipments of Jetson TX1 Developer Kit beginning November 16, secure your pre-order today.  And let us know about the amazing things you create using Jetson!

NVIDIA® Jetson™ TX1 Supercomputer-on-Module Drives Next Wave of Autonomous Machines | Parallel Forall:

'via Blog this'

1 comment :

  1. The blog was absolutely fantastic! Lot of great information which can be helpful in some or the other way. Keep updating the blog, looking forward for more contents...Great job, keep it up..
    Web Development Company | Website Designing Bangalore


Thank you for taking the time to comment. Your opinion is important and of value and we appreciate the positive feedback! If you are "Negative Nancy" then please do us, and humanity, a favor, and piss off.

Total Pageviews

Google+ Followers


Blog Archive

Popular Posts

Recent Comments

Rays Twitter feed


Web sites come and go and information is lost and therefore some pages are archived. @rayd123 . Powered by Blogger.