Tuesday, April 3, 2012

Machine Data Analytics: Splunk

Machine data or “data exhaust” analysis is one of the fastest growing and most pervasive segments of “big data”–generated by websites, applications, servers, networks, mobile devices and other sources. The goal is to parse and visualize this data – log files, scripts, messages, alerts, changes, IT configurations, tickets, user profiles etc – to spot trends and act.

By monitoring and analyzing everything from customer clickstreams, transactions, log files to network activity and call records–and more, there is new breed of startups that are racing to convert “invisible” machine data into useful performance insights. The label for this type of analytics – operational or application performance intelligence.

In this posting we cover an interesting low profile big data company, Splunk which has recently filed for an IPO has over 3500 customers already. Splunk’s search, analysis and visualization capabilities are used by companies — Comcast to Zynga — to make sense of the reams of log data they generate every second.

Some real-world examples include:

• E-commerce… Expedia uses Splunk to avoid website outages by monitoring server and application health and performance. Today, ~3,000 users at Expedia use Splunk to gain real-time visibility on tens of terabytes of unstructured, time-sensitive machine data (from not only their IT infrastructure, but also online bookings, deal analysis and coupon use).

• SaaS…. Salesforce.com uses Splunk to mine the large quantities of data generated from across its entire technology stack. Salesforce.com has over 500 users of Splunk dashboards from IT users monitoring customer experience to product managers performing analytics on new services like ‘Chatter.’ With Splunk, SFDC claims to have taken application troubleshooting for 97,000 customers to the next level.

• Digital publishing… NPR uses Splunk to gain insight of their digital asset infrastructure. NPR uses Splunk to monitor and troubleshoot their end to-end asset delivery infrastructure. They use Splunk to measure program popularity, views by device, reconcile royalty payments for digital rights, measure abandonment rates and more.

Machine Data Basics

Data can be categorized into

•Business application data,
•Human-generated content and
•Machine data.

Business application data is the digital information used by organizations to conduct their daily operations, such as payroll, supply chain and financial data. Most biz apps rely on traditional relational database technology and software that have pre-defined data structures, or schema for organizing, storing, accessing and reporting on structured data.

Human-generated content is the digital information derived from human-to-human (H2H) interactions, including email communications, spreadsheets and documents, mobile text messages, video, photos, recorded audio and social media messaging. Human-generated content typically comes in the form of unstructured data, which means that it is not optimized for storage in a relational database.

Machine data is produced by nearly every software application and electronic device. The applications, servers, network devices, sensors, browsers, desktop and laptop computers, mobile devices and various other systems that organizations have deployed to support their operations are continuously generating information relating to their status and activities.

Machine data can be found in a variety of formats such as application log files, call detail records, clickstream data associated with user web interactions, data files, system configuration files, alerts and tickets.

Machine data is generated by both machine-to-machine (M2M) as well as human-to-machine (H2M) interactions. Outside of an organization’s traditional IT infrastructure, every processor-based system, including HVAC controllers, smart electrical meters, GPS devices and RFID tags, and many consumer-oriented systems, such as mobile devices, automobiles and medical devices that contain embedded electronic devices, are also continuously generating machine data.

Machine data can be structured or unstructured.The growth of machine data has accelerated in recent years. The increasing complexity of IT infrastructures driven by the adoption of mobile devices, virtual servers and desktops, as well as cloud-based services and RFID technologies, is contributing to the growth.

Will continue this some time later ....

What is Hadoop ?

Pioneered by sensors, smart devices, and social collaboration technologies, a new information world has arrived. As these change drivers take hold, a huge mass of transactional data is starting to emerge—structured, semistructured, and unstructured—capturing trillions of bytes of information about customers, suppliers, and operations.

In itself, this sheer volume, velocity, and variety of data, also called “big data,”
is a global phenomenon—but what does it imply? Though many organizations around the world regard this collection of data and its value with skepticism, the use of big data is becoming a key way for leading organizations to outperform their peers. McKinsey estimates that retailers embracing big data have the potential to increase their operating margin by more than 60 percent.

In capitalizing the value of data available to them,organizations must go through a paradigm shift—rethink their management of the sheer scale of this data, their infrastructure, and their current management frameworks and IT processes.

To capture value from big data, organizations need to deploy new technologies. The integration of sophisticated analytics and unstructured data to create new business value is often inhibited by legacy or proprietary systems with incompatible

Apache Hadoop, an open-source nonproprietary technology and a new way for
enterprises to store and analyze data, opens the door to a world of possibilities.

But, the decision to deploy and start developing on a Hadoop cluster is not something you can make overnight. It needs planning, strategic thinking, a vision of where the change could possibly take you, how quickly and how long it will take for the returns on your investment to impact your business growth positively.

Hadoop is built on the underlying principles of scalability, performance, and low cost.Hadoop is a Java-based, open-source framework that enables users to run massive and parallel data processing projects on a low scale-out architecture.

Monday, February 28, 2011

Loop Guard versus UDLD

Loop guard and Unidirectional Link Detection (UDLD) functionality overlap, partly in the sense that both protect against STP failures caused by unidirectional links. However, these two features differ in functionality and how they approach the problem. This table describes loop guard and UDLD functionality:

Based on the various design considerations, you can choose either UDLD or the loop guard feature. In regards to STP, the most noticeable difference between the two features is the absence of protection in UDLD against STP failures caused by problems in software. As a result, the designated switch does not send BPDUs. However, this type of failure is (by an order of magnitude) more rare than failures caused by unidirectional links. In return, UDLD might be more flexible in the case of unidirectional links on EtherChannel. In this case, UDLD disables only failed links, and the channel should remain functional with the links that remain. In such a failure, the loop guard puts it into loop-inconsistent state in order to block the whole channel.

Additionally, loop guard does not work on shared links or in situations where the link has been unidirectional since the link-up. In the last case, the port never receives BPDU and becomes designated. Because this behaviour could be normal, this particular case is not covered by loop guard. UDLD provides protection against such a scenario.

As described, the highest level of protection is provided when you enable UDLD and loop guard.

Saturday, February 26, 2011

What is MSFC Card ?

A Multilayer Switch Feature Card (MSFC) is a routing daughter card that sits on the supervisor module of a 6000 or 6500 series switch and works with a piece called the PFC.The PFC is the Policy Feature Card. The cool thing about the PFC is packet filtering.
If you've gone through the CLSC or BCMSN courses or exams, then you're probably familiar with a Route Switch Module (RSM). The RSM from a 5000-series Catalyst switch isn't much smaller than the blade off a guillotine, which is why those modules are often called blades. These do the same work as a regular router (e.g., they route.) But because a RSM is a part of the switch, it doesn't have physical ports. Well, actually, it does--but not at the level you'd normally think. Instead, it uses VLAN ports that are then matched up with the VLANs created on the switch, allowing the RSM to route VLAN traffic.

Still with me? Good. Now, if you shrink a RSM down to a card the size of a NIC and then attach it directly to the supervisor card, you get an MSFC.

You may see a potential problem here. In order to get redundancy in a Catalyst 5000 with an RSM, you need two Supervisor modules--both have the capability of working with the single RSM. You could also have RSM redundancy, but you don't have a single point of failure. With the router now sitting on the Supervisor, if the Supervisor card goes, so does your router. This means you need to have two Supervisor cards, each with MSFCs on board. As anyone who has had to outfit a 6509 for redundancy can tell you, this gets expensive fast!

Benefits of MSFC
What does a MSFC card give you that the old RSM didn't? The first thing that most people latch on to is up to 15 million packets per second of forwarding while attached to a 32 gigabit backplane! MSFC can also do regular routing and packet filtering with Access Control Lists (ACLs).
But beyond the basic access lists, you can also configure dynamic and reflexive lists. The most interesting list is called a VLAN Access Control List (VACL), which requires the PFC.

As more and more people think they're qualified to make changes willy-nilly on their work PCs, the frequency of rogue DHCP servers is increasing. (I can see several heads nodding at this last statement.) There are a couple of ways of dealing with this, and one is using tried and true basic Extended Access Lists. This method works fine, except it's rather process intensive and won't filter any packets that stay on the same VLAN they originated on.

If you're using an Extended Access List, how do you filter a packet that doesn't touch the router? You don't, so you need to configure a list off the
router. With regards to the rogue DHCP problem, you'd be able to specify that only a certain device is able to forward a response to a DHCP client
request through the switch.

Sunday, February 20, 2011

Quick Reference Guides


GNS 3 Download


Virtual Circuits: Permanent Virtual Circuits (PVCs) and Switched Virtual Circuits (SVCs)

Virtual Circuits is a connection between two network devices appearing like a direct and dedicated connection but it but is actually a group of logic circuit resources from which specific circuits are allocated as needed to meet traffic requirements in a packet switched network. In this case, the two network devices can communicate as though they have a dedicated physical connection. Examples of networks with virtual circuit capabilities include X.25 connections, Frame Relay and ATM networks.

Virtual circuits can be either permanent, called Permanent virtual Circuits (PVC), or temporary, called Switched Virtual Circuits (SVCs).

A Permanent Virtual Circuit (PVC) is a virtual circuit that is permanently available to the user. A PVC is defined in advance by a network manager. A PVC is used on a circuit that includes routers that must maintain a constant connection in order to transfer routing information in a dynamic network environment. Carriers assign PVCs to customers to reduce overhead and improve performance on their networks.

A switched virtual circuit (SVC) is a virtual circuit in which a connection session is set up dynamically between individual nodes temporarily only for the duration of a session. Once a communication session is complete, the virtual circuit is disabled.

What is VRF ?

Virtual routing and forwarding (VRF) is a technology included in IP (Internet Protocol) network routers that allows multiple instances of a routing table to exist in a router and work simultaneously. This increases functionality by allowing network paths to be segmented without using multiple devices. Because traffic is automatically segregated, VRF also increases network security and can eliminate the need for encryption and authentication. Internet service providers (ISPs) often take advantage of VRF to create separate virtual private networks (VPNs) for customers; thus the technology is also referred to as VPN routing and forwarding.
VRF acts like a logical router, but while a logical router may include many routing tables, a VRF instance uses only a single routing table. In addition, VRF requires a forwarding table that designates the next hop for each data packet, a list of devices that may be called upon to forward the packet, and a set of rules and routing protocols that govern how the packet is forwarded. These tables prevent traffic from being forwarded outside a specific VRF path and also keep out traffic that should remain outside the VRF path.
VRF: is a VPN routing and forwarding instance that has set of routes and policies required by each organization.

Each VRF has the following tables:
1- a set of routes and policies for that vrf.
2- a cef table asscociated with it

In short: VRF is used to seperate/isolate between Networks and to make each vrf instance seperate entity.

VRF allows you to have seperate route tables. So an ISP can keep its customers seperate on a common infrastructure.

VRF-Lite allows router to have different routes for a group of interfaces so department X cannot route to department Y on the same router but both departments can share the connection to the cloud.