cflowd Frequently Asked Questions

cflowd Basics

Q: What is a flow?

A: A flow is a uni-directional traffic stream with a unique [source-IP-address, source-port, destination-IP-address, destination-port, IP-protocol] tuple. When a router receives a packet for which it currently does not have a flow entry, a flow structure is initialized which will contain all the information regarding that flow, i.e. number of bytes exchanged, IP addresses, port numbers, AS numbers etc. Each packet of the flow will contribute to the byte count and packet count of the flow structure until the flow is terminated and exported.

A flow may be terminated and exported by an inactive timeout (no packets seen for the flow for N seconds, configured via ip flow-cache inactive-timeout on a Cisco), an active timeout (flow terminated by time duration regardless of whether or not there are packets coming in for the flow, configurable via ip flow-cache active-timeout on a Cisco), or by other criteria (a FIN seen for a TCP flow, for example).

Q: What is NetFlow?

A: NetFlow was initially developed by Cisco in its Quality of Service (QoS) program. It is a switching method that allows more efficient switching of packets according to the type of packet.

Q: What is cflowd?

A: cflowd was developed to collect and analyze the information available from NetFlow flow-export. It allows the user to store the information and enables several views on the data. It produces port matrices, AS matrices, network matrices and pure flow structures. The amount of data stored depends on the configuration of cflowd and varies from a few hundred Kbytes to hundreds of Mbytes in one day per router.

A user can store flow information and view the data in different ways. cflowd can produce matrices by autonomous system and network, and tables by port number and Internet protocol. With this information, engineers can evaluate traffic flow patterns between nodes on their networks and other networks. Engineers also can analyze traffic by application (for example, Web vs. e-mail vs. streaming audio vs. FTP) as well as by protocol (TCP vs. ICMP vs. DNS, for example). Insights from these types of analyses can help ISPs manage current networks and plan future network upgrades.

Q: Who is currently using cflowd?

A: Version 1.32 is in limited use by backbone networks in the U.S. and Europe. Version 2.0 is still alpha, but is undergoing testing by several networks (including ANS, Frontier GlobalCenter, Merit, Verio). The inclusion of the art++ file storage features should significantly enhance cflowd's usefulness to networks of all sizes.

Q: What is an AS matrix?

A: An AS matrix contains packet and byte counters for traffic from source autonomous systems to destination autonomous systems.

Q: What is a net matrix?

A: A net matrix contains packet and byte counters for traffic from source networks to destination networks. A network is identified by the network address and netmask (CIDR).

Q: What is a port matrix?

A: A port matrix contains packet and byte counters for traffic from source ports to destination ports. Obviously, this data is only applicable to UDP and TCP traffic.

Q: What is ARTS?

A: ARTS is a binary file format specification for storing network data. Initially developed at ANS by David Bolen in 1992, ARTS was licensed to CAIDA in March of 1998. CAIDA has developed a C++ class library for ARTS. This work is separate from the software licensed from ANS. The C++ class library is used by the CAIDA packages cflowd and skitter. In addition to the class library, CAIDA distributes some simple applications for viewing and manipulating ARTS data. The entire package is called arts++.

Running cflowd

Q.Do cflowd workstations have to be directly connected to the router or can they be several hops away?

A.They don't have to be directly connected, but it's generally a good idea to have them very close to the exporting routers. A hop or two is generally harmless. The real concern is packet loss. Flow-export uses unacknowledged UDP; if packets are dropped, you lose data.

Q: On Solaris 2.x, why does cflowdmux report a shmget() error at startup (and then fail to work)?

A: Solaris' default shared memory segment size maximum is very low (1MB). cflowdmux wants a shared memory segment for packet buffers that's just over 1MB, which is beyond the default maximum limit imposed by the operating system. To permit a larger shared memory packet buffer, you should add this to /etc/system:

set shmsys:shminfo_shmmax = 4194304

and reboot the host. The setting above is 4MB; a higher setting is acceptable as well (no resources are allocated by this change, since shared memory is allocated on demand).

Note that the packet buffer size may be configured using the PKTBUFSIZE setting in the OPTIONS stanza in cflowd.conf.

Displaying cflowd data

Q: What visualization applications are available for CAIDA members?

A: Visualization tools are being created to display cflowd data, based on the ARTS data format. For example, the program "xartsprotos" will display a stacked bar chart of traffic (in bits/sec) per IP protocol versus time. It uses ARTS files as input, and permits the user to cycle through all of the datasets present in the input files (typically, there would be a dataset per interface per router). You may also initiate time-domain aggregation from within xartsprotos, and you can also print a plot to a file (in PostScript).

Sample screen shots of these kinds of visualization outputs are available at: Sample Screen Shots

Q: What if I want a different time granularity, e.g. daily instead of 5 min.?

A: You aggregate using one or more of the aggregation utilities (artsasagg, artsnetagg, artsprotoagg and artsportmagg). In the typical case, you might have a cron job that aggregates for you on a regular basis (for example, once a night you could create hourly and daily aggregates). This is how you get macroscopic views of traffic data for trend analysis.

Q: Is there any way I could choose to group routers, e.g., all the routers associated with a region/network/customer/ etc?

A: It's possible to do this with aggregation, but as yet this hasn't been implemented. It's actually pretty simple to add. The current aggregation utilities key by router IP address and interface index (ifIndex) when aggregating; if you set the IP address and the interface index fields to a single phony value for all of the ARTS data as it's read, the aggregate produced will be for all data in the input. The limitation here is that you have to think about data duplication whenever you do inter-router aggregation. If this is done at the data collection level, you never have duplicate data (for example, the best thing to do is to only run flow-switching and flow-export on the border interfaces in your network). If you don't do it this way, inter-router aggregation can be difficult (a lot of traffic may be counted twice, but not all of it, which can lead to meaningless aggregates).

Future Plans and Uses for cflowd

Q: What other uses can cflowd be applied to?

A: Other areas where cflowd may prove useful include usage tracking for Web hosting, accounting and billing, developing user profiles, and data warehousing and mining. The San Diego Supercomputer Center's Pacific Institute for Computer Security (PICS) is also collaborating with CAIDA on the development of scripts using cflowd to assist in monitoring network activity throughout an enclave (e.g. identifying hosts running httpd) and for low-bandwidth scanning activities.

Q: Can confidence intervals be applied to flow-export data (and cflowd analyses) for billing applications?

A: You can calculate a rough confidence interval from flow-export data, since it contains sequence numbers. The root of the problem with the sequence numbers today is that they don't tell you the significance of the flow data you missed. They could have been big flows (a lot of packets and bytes), the flow-export sequence number doesn't tell you. In most cases, you should just make sure you're not missing a significant number of flows. Off the cuff, I'd guess that there is probably an inverse relationship between the number of dropped flows and the amount of traffic per flow (many small flows cause higher flow-export rates, and congestion is the leading cause of dropped flow-export packets). A more conservative approach would assume a gaussian distribution about the average of the flows you have received. In either case, it's probably a manageable error if (and only if) you're not missing a lot of the data.

The bigger problem is the granularity of data necessary to do very granular billing (say down to the host and application level). It's significant, and flow-export version 5 will not fare well on OC-12 or higher transit links. More than just an issue with the router, it's an immense amount of data to process from a host running a real operating system (i.e. one on which a commercial software company can build a product quickly).

Future enhancements to NetFlow flow-export may deal with some of these issues, but it's a straightforward tradeoff between granularity and scalability (to scale to OC-12 and beyond, you have to give up some granularity or throw a lot more hardware and software at the problem). Today (and probably for the foreseeable future), 'call record' kinds of information will not be available from flow-export at OC-12 and beyond, at least for the usual notion of a call record (srcIP:srcport:dstIP:dstport:starttime:endtime granularity).

That is not to say that you can't arrive at reasonable billing schemes based on less granular information (net matrix or AS matrix, for example).

Q. Is/will netflow be supported on the Cisco GSRs?

A.Yes, though v5 flow-export won't scale to OC-12 speeds and beyond. Eventually there will be a new version of flow-export, directed at scaling issues for the GSR. You'll be able to configure the router to keep aggregate tables (like the AS matrix, net matrix, etc.) instead of the very granular v5 flow-export info, and the router will be able to export just the aggregate data (via reliable transport).

Q. If the cflowd machine is going to be collecting data from OC-48 router interfaces, how much power, memory req's, hard disk, etc. do I need?

You can't currently get flow-export data for OC-48 interfaces. The main reason is that v5 flow-export won't handle it; it's too granular, and will assert several bottlenecks at OC-48 rate (on the router, on the network, and on the collector host).

This isn't as bad as it sounds in most scenarios, because what you generally really want to do is collect data at the edges of your network, on border interfaces only. OC-48 transit links will eventually become common, but the best advice for now is to collect data at all border interfaces where possible.

In terms of cflowd host CPU... you don't generally need a killer machine, but it depends on your configuration (most importantly, how many routers are sending data to it and how much data). A general rule of thumb is one cflowd host per 15-20 DS-3 interfaces running flow-switching. Testing has been done using FreeBSD on Pentium II 233MHz (66MHz bus, not one of the new 100MHz chipsets), 128M RAM, 10/100 ethernet. cflowd has been tested with no data loss at 700 pps (21,000 flows/sec) on this configuration, saving all tabular data but not raw flows (which incur disk I/O and can hence be a significant bottleneck). A very rough estimate of flow-export traffic for a busy DS-3 interface is 30 packets/sec (900 flows/sec).

Q.What configuration do I need for the central processing machine?

On the central host, it's really up to you. There are no data integrity issues, since the transport from cflowd to the central collector is TCP. A faster central host will just speed up report generation and the like. You generally want a lot of disk space on the central collector (so you can keep granular data and/or lots of archived data for trend analysis), and a lot of memory can be useful as well (256M or more). But again, it really just speeds up report generation and the like, it doesn't affect your data integrity.